KR20100062784A

KR20100062784A - Apparatus for generating and playing object based audio contents

Info

Publication number: KR20100062784A
Application number: KR1020090020190A
Authority: KR
Inventors: 유재현; 심환; 정현주; 성굉모; 서정일; 강경옥; 홍진우; 안치득
Original assignee: 한국전자통신연구원
Priority date: 2008-12-02
Filing date: 2009-03-10
Publication date: 2010-06-10
Also published as: KR20120036329A

Abstract

PURPOSE: An object-based audio contents generation/playback apparatus is provided to generate the object-based audio contents by encoding one among a plurality of object audio signals, record space information, sound source location information and a multi-channel audio signal at least. CONSTITUTION: An object audio signal gaining unit(110) gains a plurality of object audio signals by recording a plurality of sound source signals. A record space information gaining unit(130) gains the record space information toward the record space of sound source signals. A sound source location information gaining unit(120) gains the sound source location information of the sound source signals. An encoder(140) generates a object-based audio contents by encoding one among a plurality of object audio signals, the record space information and the sound source location information at least.

Description

Object-based audio content creation / playback device {APPARATUS FOR GENERATING AND PLAYING OBJECT BASED AUDIO CONTENTS}

본 발명은 객체 기반 오디오 컨텐츠 생성/재생 장치에 관한 것으로서, 더욱 상세하게는 객체 기반 오디오 컨텐츠의 사용자 환경에 구애 받지 않고 객체 기반 오디오 컨텐츠를 생성/재생할 수 있는 객체 기반 오디오 컨텐츠 생성/재생 장치에 관한 것이다. The present invention relates to an object-based audio content generation / playback apparatus, and more particularly, to an object-based audio content generation / playback apparatus capable of generating / playing object-based audio content regardless of a user environment of object-based audio content. will be.

본 발명은 지식경제부 및 정보통신연구진흥원의 IT원천기술개발의 일환으로 수행한 연구로부터 도출된 것이다[과제관리번호 : 2008-F-011-01, 과제명 : 차세대DTV핵심기술개발(표준화연계)-무안경개인형3D방송기술개발(계속)].The present invention is derived from research conducted as part of IT source technology development by the Ministry of Knowledge Economy and the Ministry of Information and Communication Research and Development. [Task management number: 2008-F-011-01, Task name: Development of next-generation DTV core technology (standardized connection) -Development of personal glasses-free 3D broadcasting technology (continued)].

MPEG-4의 표준은 1998년에 표준 ISO/IEC(International Organization for Standardization/International Electrotechnical Commission) 산하의 MPEG(Moving Picture Experts Group)에 의해서 제안된 오디오/비디오의 부호화 표준이다. MPEG-4는 이전의 MPEG-1, MPEG-2의 표준체계를 더욱 발전시켜 VRML(Virtual Reality Markup Language), 객체 기반 복합 파일 체계(object-oriented composite file)에 관한 내용 등을 추가하였다. MPEG-4는 부호화 효율을 높이고 오디오, 비 디오, 음성에 대한 통합적 부호화 방법을 개발하고, 대화형의 오디오/비디오 재생을 가능하게 하고, 에러 복원 기술을 발전시키는 것을 주된 목표로 하고 있다. The MPEG-4 standard is an audio / video encoding standard proposed in 1998 by the Moving Picture Experts Group (MPEG) under the standard ISO / IEC (International Organization for Standardization / International Electrotechnical Commission). MPEG-4 further developed the standard system of MPEG-1 and MPEG-2, and added contents about VRML (Virtual Reality Markup Language) and object-oriented composite file. MPEG-4 aims to improve coding efficiency, develop integrated coding methods for audio, video, and voice, enable interactive audio / video playback, and develop error recovery techniques.

MPEG-4는 객체 기반의 오디오/비디오를 재생할 수 있다는 점을 주된 특징으로 하고 있다. 즉, MPEG-1, MPEG-2는 일반적인 구조, 다중 송신, 및 동기화에 국한되었다면, MPEG-4는 이에 부가하여 장면 묘사, 양방향성, 내용 묘사와 프로그램화 가능성을 포함하고 있다. MPEG-4에서는 부호화 대상을 객체 별로 나누고, 각 객체의 속성에 따라 부호화 방법을 설정하고 원하는 장면을 묘사하여 오디오 장면 정보 이진 포맷(audio Binary Format for Scenes, AudioBIFS)의 형태로 전송하게 된다. 또한 청취자들은 단말기를 통해서 각 객체의 크기나 위치 등의 정보를 조절하여 오디오를 청취할 수 있다. MPEG-4's main feature is that it can play object-based audio / video. That is, while MPEG-1 and MPEG-2 are limited to general structure, multiplexing, and synchronization, MPEG-4 includes scene description, bidirectionality, content description and programmability in addition. In MPEG-4, encoding targets are divided by objects, encoding methods are set according to attributes of each object, and a desired scene is described and transmitted in the form of audio binary information (AudioBIFS). In addition, the listener may listen to the audio by adjusting information such as the size and position of each object through the terminal.

대표적인 객체 기반의 오디오 컨텐츠 재생 기법으로 음장 합성 재생 기법이 있다. 음장 합성 재생 기술은 임의의 1차 음원에서 발생하는 1차 파면을, 다수의 라우드스피커(loudspeaker)를 통해서 재생되는 소리들로 합성하여 라우드스피커 어레이로 구분 지어지는 임의의 체적 안에 1차 파면과 동일한 파장을 발생키는 음장 재생 기술이다. A typical object-based audio content reproduction technique is a sound field synthesis reproduction technique. Sound field synthesis reproduction technology synthesizes the primary wavefront from any primary source into sounds reproduced through multiple loudspeakers, equaling the primary wavefront in any volume divided into loudspeaker arrays. This is a sound field reproduction technology that generates wavelengths.

음장 합성 재생 기법과 관련한 표준화 프로젝트인 CARROUSO(Creating Assessing and Rendering on Real time Of high quality aUdio-viSual envirOnments in MPEG-4 context)에서는 객체 지향적이고 상호적인 특징을 가진 MPEG-4를 통해서 객체의 형태로 음원을 전송하고 음장 합성으로 재생하기 위한 연구가 수행되었다 In the CARROUSO (Creating Assessing and Rendering on Real time Of high quality aUdio-viSual envirOnments in MPEG-4 context), Research has been carried out to transmit and reproduce sound field synthesis

본 발명은 사용자의 오디오 재생 환경에 구애 받지 않고, 음장 합성 재생 방식 및 멀티 채널 서라운드 재생 방식 중에서 적어도 하나에 기초하여 객체 기반 오디오 컨텐츠를 재생할 수 있도록 하는 객체 기반 오디오 컨텐츠 생성/재생 장치를 제공하는 것을 목적으로 한다. The present invention provides an object-based audio content generation / playback apparatus capable of playing object-based audio content based on at least one of a sound field synthesis playback method and a multi-channel surround playback method, regardless of the user's audio playback environment. The purpose.

상기와 같은 본 발명의 목적을 달성하기 위하여, 본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 생성 장치는 복수의 음원 신호를 녹음하여 복수의 객체 오디오 신호를 획득하는 객체 오디오 신호 획득부, 상기 복수의 음원 신호의 녹음 공간에 대한 녹음 공간 정보를 획득하는 녹음 공간 정보 획득부, 상기 복수의 음원 신호의 음원 위치 정보를 획득하는 음원 위치 정보 획득부, 및 상기 복수의 객체 오디오 신호, 상기 녹음 공간 정보 및 상기 음원 위치 정보 중에서 적어도 하나를 부호화하여 객체 기반 오디오 컨텐츠를 생성하는 부호화부를 포함한다.In order to achieve the object of the present invention as described above, the object-based audio content generating apparatus according to an embodiment of the present invention is an object audio signal acquisition unit for recording a plurality of sound source signals to obtain a plurality of object audio signals, the plurality A recording space information obtaining unit for obtaining recording space information of a recording space of a sound source signal of a sound source location information obtaining unit obtaining sound source position information of the plurality of sound source signals, and the plurality of object audio signals and the recording space information And an encoder to generate object-based audio content by encoding at least one of the sound source position information.

또한, 본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 재생 장치는 객체 기반 오디오 컨텐츠로부터, 복수의 음원 신호에 대한 복수의 객체 오디오 신호, 상기 복수의 음원 신호의 녹음 공간 정보, 및 상기 복수의 음원 신호의 음원 위치 정보를 복호화하는 복호화부, 상기 복수의 객체 기반 오디오 컨텐츠의 재생 공간에 대한 재생 공간 정보를 획득하는 재생 공간 정보 획득부, 상기 녹음 공간 정보, 상기 음원 위치 정보, 및 상기 재생 공간 정보에 기초하여 상기 복호화된 객 체 오디오 신호를 복수의 스피커 신호로 합성하는 신호 합성부, 및 상기 복수의 스피커 신호를 상기 복수의 스피커 신호 각각에 상응하는 복수의 스피커로 전송하는 전송부를 포함한다. In addition, the object-based audio content playback apparatus according to an embodiment of the present invention, from the object-based audio content, a plurality of object audio signals for a plurality of sound source signals, recording space information of the plurality of sound source signals, and the plurality of sound sources A decoder that decodes sound source position information of a signal, a play space information obtainer that obtains play space information of play spaces of the plurality of object-based audio contents, the recording space information, the sound source position information, and the play space information A signal synthesizer configured to synthesize the decoded object audio signal into a plurality of speaker signals, and a transmitter configured to transmit the plurality of speaker signals to a plurality of speakers corresponding to each of the plurality of speaker signals.

본 발명에 따르면, 사용자의 오디오 재생 환경에 구애 받지 않고, 음장 합성 재생 방식 및 멀티 채널 서라운드 재생 방식 중에서 적어도 하나에 기초하여 객체 기반 오디오 컨텐츠를 재생할 수 있게 된다. According to the present invention, object-based audio content can be played back based on at least one of a sound field synthesis playback method and a multi-channel surround playback method, regardless of the user's audio playback environment.

이하 첨부된 도면들 및 첨부된 도면들에 기재된 내용들을 참조하여 본 발명의 바람직한 실시예를 상세하게 설명하지만, 본 발명이 실시예들에 의해 제한되거나 한정되는 것은 아니다. 각 도면에 제시된 동일한 참조 부호는 동일한 부재를 나타낸다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings and the contents described in the accompanying drawings, but the present invention is not limited or limited to the embodiments. Like reference numerals in the drawings denote like elements.

도 1은 본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 생성 장치의 상세한 구성을 도시한 블록도이다. 1 is a block diagram showing a detailed configuration of an object-based audio content generating apparatus according to an embodiment of the present invention.

본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 생성 장치(100)는 객체 오디오 신호 획득부(110), 음원 위치 정보 획득부(120), 녹음 공간 정보 획득부(130), 및 부호화부(140)를 포함한다. 또한, 본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 생성 장치(100)는 충격 음원 신호 방사부(150), 및 충격 음원 신호 수신부(160)를 더 포함할 수 있다. 이하 각 구성 요소 별로 그 기능을 상술하기로 한다. The object-based audio content generating apparatus 100 according to an embodiment of the present invention may include the object audio signal obtaining unit 110, the sound source location information obtaining unit 120, the recording space information obtaining unit 130, and the encoding unit 140. ). In addition, the apparatus 100 for generating object-based audio content according to an embodiment of the present invention may further include an impact sound source signal emitter 150 and an impact sound source signal receiver 160. Hereinafter, the function of each component will be described in detail.

객체 오디오 신호 획득부(110)는 복수의 음원 신호를 녹음하여 복수의 객체 오디오 신호를 획득한다. The object audio signal acquisition unit 110 obtains a plurality of object audio signals by recording a plurality of sound source signals.

이 때, 복수의 음원 신호의 개수와 복수의 객체 오디오 신호의 개수는 동일할 수 있다. 즉, 객체 오디오 신호 획득부(110)는 하나의 음원 신호에 대하여 하나의 객체 오디오 신호를 획득할 수 있다. In this case, the number of the plurality of sound source signals and the number of the plurality of object audio signals may be the same. That is, the object audio signal acquisition unit 110 may obtain one object audio signal with respect to one sound source signal.

본 발명의 일실시예에 따르면, 객체 오디오 신호 획득부(110)는 복수의 스팟 마이크로폰(spot microphone) 및 마이크로폰 어레이(microphone array) 중에서 적어도 하나를 이용하여 복수의 객체 오디오 신호를 획득할 수 있다. According to an embodiment of the present invention, the object audio signal acquisition unit 110 may acquire a plurality of object audio signals using at least one of a plurality of spot microphones and a microphone array.

스팟 마이크로폰은 복수의 음원 각각과 인접 설치되어, 복수의 음원 각각으로부터 음원 신호를 녹음하여 객체 오디오 신호를 획득한다.The spot microphone is installed adjacent to each of the plurality of sound sources, and obtains an object audio signal by recording a sound source signal from each of the plurality of sound sources.

마이크로폰 어레이는 복수의 마이크로폰을 어레이 형태로 배열한 것이다. 마이크로폰 어레이를 사용하는 경우, 마이크로폰 어레이에 도달하는 복수의 음원 신호의 지연 시간 및 음압 레벨(Sound Pressure Level , SPL )을 이용하여 하여 복수의 음원 신호를 분리하여, 음원 별로 복수의 객체 오디오 신호를 획득할 수 있다. The microphone array is a plurality of microphones arranged in an array form. When using a microphone array, the delay time of the plurality of the sound source signal that reaches the microphone array and sound pressure level (Sound Pressure A plurality of sound source signals may be separated using Level , SPL ) to obtain a plurality of object audio signals for each sound source.

여기서, 복수의 음원 신호의 지연 시간은 마이크로폰 어레이를 구성하는 복수의 마이크로폰 중에서 하나의 마이크로폰에 도달하는 복수의 음원 간의 지연 시간, 및 하나의 음원 신호가 복수의 마이크로폰 각각에 도달하는 경우에 있어, 각각의 마이크로폰에 도달하는 음원 신호의 지연 시간 중에서 적어도 어느 하나를 포함할 수 있다. Here, the delay time of the plurality of sound source signals is a delay time between a plurality of sound sources reaching one microphone among a plurality of microphones constituting the microphone array, and when one sound source signal reaches each of the plurality of microphones, respectively. At least one of the delay time of the sound source signal to reach the microphone of.

음원 위치 정보 획득부(120)는 복수의 음원 신호의 음원 위치 정보를 획득한다. The sound source position information obtaining unit 120 obtains sound source position information of the plurality of sound source signals.

여기서, 음원 위치 정보는 녹음하고자 하는 복수의 음원 신호가 재생되는 공간, 즉, 녹음 공간 상에서의 복수의 음원 신호 각각의 위치에 대한 정보를 포함한다. 즉, 음원 위치 정보는 음상 정위 정보(Sound Image Location Information)를 포함할 수 있다. 음원 위치 정보, 즉 음상 정위 정보는 복수의 음원 신호 각각에 대하여 직교 좌표 형태 (즉, (x, y, z) 형태) 또는 원통 좌표 형태(즉,

형태)로 표현될 수 있다. Here, the sound source position information includes information on the position of each of the plurality of sound source signals in a space where a plurality of sound source signals to be recorded are reproduced, that is, a recording space. That is, the sound source location information may include sound image location information. The sound source position information, that is, the stereotactic position information, is in the rectangular coordinate form (i.e., (x, y, z) form) or cylindrical coordinate form (i.e., for each of the plurality of sound source signals).

Form).

본 발명의 일실시예에 따르면, 음원 위치 정보 획득부(120)는 복수의 스팟 마이크로폰의 위치, 마이크로폰 어레이에서의 복수의 음원 신호의 시간 지연, 및 마이크로폰 어레이에서의 복수의 음원 신호의 음압 레벨 중에서 적어도 하나를 이용하여 음원 위치 정보를 획득할 수 있다.According to an embodiment of the present invention, the sound source position information obtaining unit 120 includes a plurality of spot microphone positions, time delays of a plurality of sound source signals in the microphone array, and sound pressure levels of the plurality of sound source signals in the microphone array. Sound source location information may be obtained using at least one.

또한, 본 발명의 다른 일실시예에 따르면, 음원 위치 정보 획득부(120)는 객체 기반 오디오 컨텐츠 생성 장치(100)의 사용자로부터 복수의 음원의 위치를 입력 받아 음원 위치 정보를 획득할 수 있다. In addition, according to another embodiment of the present invention, the sound source position information acquisition unit 120 may receive sound source position information by receiving the positions of a plurality of sound sources from a user of the object-based audio content generating apparatus 100.

녹음 공간 정보 획득부(130)는 복수의 음원 신호의 녹음 공간에 대한 녹음 공간 정보를 획득한다. The recording space information acquisition unit 130 obtains recording space information on the recording spaces of the plurality of sound source signals.

여기서, 녹음 공간 정보는 녹음하고자 하는 복수의 음원이 재생되는 공간에 대한 정보를 의미한다. Here, the recording space information means information on a space in which a plurality of sound sources to be recorded are reproduced.

상기 언급한 바와 같이, 본 발명의 일실시예에 따르면, 객체 기반 오디오 컨텐츠 생성 장치(100)는 충격 음원 신호 방사부(150), 및 충격 음원 신호 수신부(160)를 더 포함할 수 있다. As mentioned above, according to an embodiment of the present invention, the object-based audio content generating apparatus 100 may further include an impact sound source signal radiator 150 and an impact sound source signal receiver 160.

충격 음원 신호 방사부(150)는 충격 음원 신호를 방사한다. The impact sound source signal emitter 150 radiates the impact sound source signal.

충격(impulse) 음원 신호는 아래에서 설명할 충격 응답(impulse response)을 산출하기 위한 신호를 의미한다.An impulse sound source signal refers to a signal for calculating an impulse response to be described below.

일례로서, 충격 음원 신호 방사부(150)는 MLS(Maximum-Length Sequence) 신호를 방사할 수 있다. As an example, the impact sound source signal emitter 150 may emit a maximum-length sequence (MLS) signal.

충격 음원 신호 수신부(160)는 충격 음원 신호 방사부(150)에서 방사된 충격 음원 신호를 수신하고, 수신된 충격 음원 신호에 기초하여 충격 응답을 산출한다. The impact sound source signal receiver 160 receives the impact sound source signal radiated from the impact sound source signal radiator 150 and calculates an impact response based on the received impact sound source signal.

충격 음원 신호 수신부(160)에서 수신되는 충격 음원 신호는 충격 음원 신호 방사부(150)에서 직접 충격 음원 신호 수신부(160)로 도달하는 음원 신호와 충격 음원 신호 방사부(150)에서 방사되어 녹음 공간의 벽면, 녹음 공간에 존재하는 임의의 객체 등으로부터 반사되어 충격 음원 신호 수신부(160)에 도달하는 음원 신호를 모두 포함한다. The impact sound source signal received by the impact sound source signal receiver 160 is radiated from the sound source signal and the impact sound source signal emitter 150 that arrives directly from the impact sound source signal emitter 150 to the impact sound source signal receiver 160 and the recording space. It includes all of the sound source signals that are reflected from the wall surface, any object existing in the recording space, etc. and reaches the impact sound source signal receiver 160.

이 경우, 녹음 공간 정보 획득부(130)는 산출된 충격 응답에 기초하여 녹음 공간 정보를 획득할 수 있는데, 본 발명의 일실시예에 따르면, 충격 응답은 복수의 임펄스 신호(impulse signal)를 포함하고, 녹음 공간 정보는 복수의 임펄스 신호 간의 수신 시간 차, 복수의 임펄스 신호 간의 음압 레벨 차, 및 복수의 임펄스 신 호 간의 수신 각도 차 중에서 적어도 하나를 포함할 수 있다. 즉, 녹음 공간 정보 획득부(130)는 녹음 공간에 대한 충격 응답을 웨이브 파일(wave file)과 같은 오디오 포맷 형태뿐만 아니라 데이터 형태로 획득할 수도 있다. 녹음 공간 정보가 상기 언급한 수신 시간 차, 음압 레벨 차, 및 수신 각도 차를 모두 포함하는 경우, 녹음 공간 정보는 (시간, 음압, 각도)의 순서쌍으로 표현될 수 있다. In this case, the recording space information acquisition unit 130 may obtain recording space information based on the calculated shock response. According to one embodiment of the present invention, the shock response includes a plurality of impulse signals. The recording space information may include at least one of a reception time difference between the plurality of impulse signals, a sound pressure level difference between the plurality of impulse signals, and a reception angle difference between the plurality of impulse signals. That is, the recording space information acquisition unit 130 may acquire the shock response for the recording space in the form of data as well as the audio format such as a wave file. When the recording space information includes all of the above-mentioned reception time difference, sound pressure level difference, and reception angle difference, the recording space information may be represented by an ordered pair of (time, sound pressure, angle).

부호화부(140)는 복수의 객체 오디오 신호, 녹음 공간 정보 및 음원 위치 정보 중에서 적어도 하나를 부호화하여 객체 기반 오디오 컨텐츠를 생성한다. The encoder 140 generates object-based audio content by encoding at least one of a plurality of object audio signals, recording space information, and sound source position information.

이 경우, 복수의 객체 오디오 신호 각각은 다양한 방식으로 부호화될 수 있다. 예를 들어, 객체 오디오 신호가 음악(music) 신호일 경우, 부호화부(140)는 음악 신호에 최적화된 오디오 부호화 방식(일례로, transform기반의 오디오 부호화 방식)을 적용하여 객체 오디오 신호를 부호화 할 수 있고, 객체 오디오 신호가 음성(speech) 신호일 경우, 부호화부(140)는 음성 신호에 최적의 오디오 부호화 방식(일례로, CELP구조의 오디오 부호화 방식)을 적용하여 객체 오디오 신호를 부호화 할 수 있다. In this case, each of the plurality of object audio signals may be encoded in various ways. For example, when the object audio signal is a music signal, the encoder 140 may encode the object audio signal by applying an audio encoding method (for example, a transform-based audio encoding method) optimized for the music signal. If the object audio signal is a speech signal, the encoder 140 may encode the object audio signal by applying an optimal audio encoding scheme (for example, an audio encoding scheme having a CELP structure) to the speech signal.

이 때, 부호화부(140)는 부호화된 객체 오디오 신호, 부호화된 음원 위치 정보 및 부호화된 녹음 공간 정보를 다중화하여 객체 기반 오디오 컨텐츠를 생성할 수 있다. In this case, the encoder 140 may generate object-based audio content by multiplexing the encoded object audio signal, the encoded sound source position information, and the encoded recording space information.

부호화부(140)에서 생성된 객체 기반 오디오 컨텐츠는 네트워크를 통해 전송될 수도 있고, 별도의 기록 매체에 저장될 수도 있다. The object-based audio content generated by the encoder 140 may be transmitted through a network or may be stored in a separate recording medium.

이와 같이, 본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 생성 장 치(100)는 복수의 객체 오디오 신호를 믹싱하여 멀티 채널 오디오 신호 형태로 부호화하지 않고, 복수의 객체 오디오 신호를 각각 부호화하고, 부호화된 객체 오디오 신호에 음원 위치 정보, 녹음 공간 정보 등의 기타 정보를 부가하여 객체 기반 오디오 컨텐츠를 생성함으로써, 객체 기반 오디오 컨텐츠 재생 장치의 사용자가 자신의 객체 기반 오디오 컨텐츠 재생 장치에 적합하도록 객체 기반 오디오 컨텐츠를 재생할 수 있도록 한다. 상기 객체 기반 오디오 컨텐츠 재생 장치에 대해서는 도 3을 참고하기로 한다. As described above, the object-based audio content generating apparatus 100 according to an embodiment of the present invention encodes a plurality of object audio signals without mixing the plurality of object audio signals to encode them in the form of a multi-channel audio signal. By generating object-based audio content by adding sound source position information and recording space information to the encoded object audio signal, the object-based audio content playback device is suitable for a user of the object-based audio content playback device. Enable playback of audio content. The object-based audio content playback apparatus will be described with reference to FIG. 3.

도 2는 본 발명의 다른 일실시예에 따른 객체 기반 오디오 컨텐츠 생성 장치의 상세한 구성을 도시한 블록도이다.2 is a block diagram illustrating a detailed configuration of an apparatus for generating object-based audio content according to another embodiment of the present invention.

본 발명의 다른 일실시예에 따른 객체 기반 오디오 컨텐츠 생성 장치(200)는 객체 오디오 신호 획득부(210), 음원 위치 정보 획득부(220), 녹음 공간 정보 획득부(230), 멀티 채널 오디오 믹싱부(240) 및 부호화부(250)를 포함한다. The object-based audio content generating apparatus 200 according to another exemplary embodiment of the present invention may include an object audio signal acquisition unit 210, a sound source location information acquisition unit 220, a recording space information acquisition unit 230, and multi-channel audio mixing. The unit 240 and the encoder 250 is included.

도 2에 도시된 객체 오디오 신호 획득부(210), 음원 위치 정보 획득부(220), 녹음 공간 정보 획득부(230), 및 부호화부(250)는 도 1에서 설명한 객체 오디오 신호 획득부(110), 음원 위치 정보 획득부(120), 녹음 공간 정보 획득부(130), 및 부호화부(140)와 대응된다. 따라서, 이하 생략된 내용이라 하더라도 도 1에 도시된 객체 기반 오디오 컨텐츠 생성 장치(100)에 관하여 이상에서 기술된 내용은 도 2에 도시된 객체 기반 오디오 컨텐츠 생성 장치(200)에도 적용된다.The object audio signal acquisition unit 210, the sound source position information acquisition unit 220, the recording space information acquisition unit 230, and the encoding unit 250 illustrated in FIG. 2 are the object audio signal acquisition unit 110 described with reference to FIG. 1. ), The sound source location information acquisition unit 120, the recording space information acquisition unit 130, and the encoding unit 140. Therefore, even if omitted below, the above description of the object-based audio content generating apparatus 100 illustrated in FIG. 1 may also be applied to the object-based audio content generating apparatus 200 illustrated in FIG. 2.

객체 오디오 신호 획득부(210)는 복수의 음원 신호를 녹음하여 복수의 객체 오디오 신호를 획득한다. The object audio signal acquisition unit 210 obtains a plurality of object audio signals by recording a plurality of sound source signals.

음원 위치 정보 획득부(220)는 복수의 음원 신호의 음원 위치 정보를 획득한다.The sound source position information obtaining unit 220 obtains sound source position information of the plurality of sound source signals.

녹음 공간 정보 획득부(230)는 복수의 음원 신호의 녹음 공간에 대한 녹음 공간 정보를 획득한다.The recording space information acquisition unit 230 obtains recording space information on recording spaces of the plurality of sound source signals.

멀티 채널 오디오 믹싱부(240)는 복수의 객체 오디오 신호, 녹음 공간 정보, 및 음원 위치 정보 중에서 적어도 하나를 믹싱하여 멀티 채널 오디오 신호를 생성한다. The multi-channel audio mixing unit 240 generates a multi-channel audio signal by mixing at least one of the plurality of object audio signals, recording space information, and sound source position information.

즉, 멀티 채널 오디오 믹싱부(240)는 멀티 채널 서라운드 재생 방식에 따른 오디오 컨텐츠 재생 장치와의 하위 호환성을 위하여, 객체 오디오 신호, 음원 위치 정보 및 녹음 공간 정보 중에서 적어도 하나를 믹싱하여 2채널 오디오 신호, 5.1채널 오디오 신호, 또는 7.1채널 오디오 신호 등의 멀티 채널 오디오 신호를 생성할 수 있다. That is, the multi-channel audio mixing unit 240 mixes at least one of an object audio signal, sound source location information, and recording space information for backward compatibility with an audio content playback device based on a multi-channel surround playback method. A multi-channel audio signal such as a 5.1-channel audio signal or a 7.1-channel audio signal can be generated.

부호화부(250)는 복수의 객체 오디오 신호, 녹음 공간 정보, 음원 위치 정보, 멀티 채널 오디오 신호 중에서 적어도 하나를 부호화하여 객체 기반 오디오 컨텐츠를 생성한다The encoder 250 generates object-based audio content by encoding at least one of a plurality of object audio signals, recording space information, sound source position information, and multi-channel audio signals.

도 3은 본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 재생 장치의 상세한 구성을 도시한 블록도이다. 3 is a block diagram showing a detailed configuration of an object-based audio content playback apparatus according to an embodiment of the present invention.

본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 재생 장치(300)는 복호화부(310), 재생 공간 정보 획득부(320), 신호 합성부(330), 및 전송부(340)를 포함한다. 이하 구성 요소 별로 그 기능을 상술하기로 한다. The object-based audio content reproducing apparatus 300 according to an embodiment of the present invention includes a decoder 310, a reproduction space information acquirer 320, a signal synthesizer 330, and a transmitter 340. Hereinafter, the function will be described for each component.

복호화부(310)는 객체 기반 오디오 컨텐츠로부터, 복수의 음원 신호에 대한 복수의 객체 오디오 신호, 및 복수의 음원 신호의 음원 위치 정보를 복호화한다. The decoder 310 decodes the plurality of object audio signals for the plurality of sound source signals and the sound source position information of the plurality of sound source signals from the object-based audio content.

객체 기반 오디오 컨텐츠는 네트워크를 통해 객체 기반 오디오 컨텐츠 생성 장치로부터 전송된 것일 수도 있고, 별도의 기록 매체로부터 독출된 것일 수도 있다. The object-based audio content may be transmitted from the object-based audio content generating device through a network, or may be read from a separate recording medium.

복호화부(310)는 객체 기반 오디오 컨텐츠를 역다중화하여 부호화된 복수의 객체 오디오 신호, 및 부호화된 음원 위치 정보를 생성하고, 이로부터 복수의 객체 오디오 신호, 녹음 공간 정보, 및 음원 위치 정보를 복원할 수 있다. The decoder 310 demultiplexes the object-based audio content to generate a plurality of encoded object audio signals and encoded sound source position information, and restores the plurality of object audio signals, recording space information, and sound source position information therefrom. can do.

재생 공간 정보 획득부(320)는 복수의 객체 오디오 신호의 재생 공간에 대한 재생 공간 정보를 획득한다. The reproduction space information acquisition unit 320 acquires reproduction space information on the reproduction spaces of the plurality of object audio signals.

재생 공간 정보는 객체 기반 오디오 컨텐츠를 재생하고자 하는 사용자 측의 재생 공간에 대한 정보를 의미하는 것으로서, 재생 공간에는 객체 기반 오디오 컨텐츠를 재생하는 복수 개의 스피커가 배치될 수 있다. The reproduction space information means information on a reproduction space of a user who wants to reproduce object-based audio content, and a plurality of speakers that reproduce object-based audio content may be disposed in the reproduction space.

따라서, 본 발명의 일실시예에 따르면, 재생 공간 정보는 재생 공간에 배치된 복수의 스피커의 개수, 복수의 스피커간의 간격, 복수의 스피커의 배치 각도, 스피커의 종류, 스피커의 위치 정보 및 재생 공간의 크기에 대한 정보 중에서 적어도 하나를 포함할 수 있다. Therefore, according to an embodiment of the present invention, the reproduction space information includes the number of the plurality of speakers arranged in the reproduction space, the distance between the plurality of speakers, the arrangement angle of the plurality of speakers, the type of speakers, the location information of the speakers, and the reproduction space. It may include at least one of information about the size of.

또한, 본 발명의 일실시예에 따르면, 재생 공간 정보 획득부(320)는 사용자로부터 직접 재생 공간 정보를 입력 받을 수도 있고, 재생 공간에 배치된 별도의 마이크로폰을 이용하여 재생 공간 정보를 산출할 수도 있다. In addition, according to an embodiment of the present invention, the playback space information acquisition unit 320 may directly receive playback space information from a user, or calculate playback space information using a separate microphone disposed in the playback space. have.

신호 합성부(330)는 음원 위치 정보, 및 재생 공간 정보에 기초하여 복호화된 객체 오디오 신호를 복수의 스피커 신호로 합성한다. The signal synthesizing unit 330 synthesizes the decoded object audio signal into a plurality of speaker signals based on the sound source position information and the reproduction space information.

즉, 신호 합성부(330)는 객체 오디오 신호, 음원의 위치 정보 및 재생 공간 정보에 기초하여, 객체 기반 오디오 컨텐츠가 효율적으로 재생될 수 있도록 스피커 신호를 합성하는 기능을 수행한다. 이 경우, 스피커 신호는 녹음 공간 정보에 따라서 복수의 객체 오디오 신호를 합성하여 생성한다. That is, the signal synthesizing unit 330 synthesizes the speaker signal so that the object-based audio content can be efficiently reproduced based on the object audio signal, the location information of the sound source, and the reproduction space information. In this case, the speaker signal is generated by synthesizing a plurality of object audio signals according to the recording space information.

본 발명의 일실시예에 따르면, 재생 공간의 크기 및 재생 공간에 설치된 스피커의 개수, 종류, 및 위치를 고려하였을 때, 객체 오디오 신호가 음장 합성 재생 방식으로 재생 가능한 경우, 신호 합성부(330)는 음장 합성 재생 방식에 따라 객체 오디오 신호를 렌더링하고, 객체 오디오 신호가 음장 합성 재생 방식으로 재생할 수 없는 경우, 신호 합성부(330)는 멀티 채널 서라운드 재생 방식에 따라 객체 오디오 신호를 렌더링하여 스피커 신호를 합성한다. 스피커 어레이가 설치되어 있는 환경에서 멀티 채널 서라운드 재생 방식에 따라 객체 오디오 신호를 렌더링 하는 경우, 신호 합성부(330)는 객체 오디오 신호를 재생하고자 하는 특정 스피커를 선택할 수도 있다. According to an embodiment of the present invention, in consideration of the size of the playback space and the number, type, and location of the speakers installed in the playback space, when the object audio signal can be reproduced by the sound field synthesis reproduction method, the signal synthesis unit 330 When the object audio signal is rendered according to the sound field synthesis reproduction method, and the object audio signal cannot be reproduced by the sound field synthesis reproduction method, the signal synthesizer 330 renders the object audio signal according to the multi-channel surround reproduction method and the speaker signal. Synthesize. When the object audio signal is rendered according to the multi-channel surround reproduction method in an environment in which the speaker array is installed, the signal synthesizing unit 330 may select a specific speaker to reproduce the object audio signal.

예를 들어, 청취자를 기준으로, 재생 공간의 전방에 라우드스피커 어레이가 배치되어 있고, 재생 공간의 후방에 2채널 서라운드 스피커가 설치되어 있는 경우에 있어서, 오디오 객체(즉, 음원)가 청취자를 기준으로 라우드스피커 어레이의 양끝단까지의 각도 내에 존재하는 경우, 신호 합성부(330)는 음장 합성 기법을 통하여 해당 오디오 객체에 대한 객체 오디오 신호를 렌더링하며, 그 이외의 각도에 위 치하는 오디오 객체에 대한 객체 오디오 신호에 대하여서는 위성 서라운드 라우드스피커를 활용하여, 파워 패닝 기법(power panning law)을 적용하여 렌더링한다.For example, if a loudspeaker array is arranged in front of the playback space based on the listener, and two-channel surround speakers are installed behind the playback space, the audio object (ie, the sound source) refers to the listener. As such, when present within an angle to both ends of the loudspeaker array, the signal synthesizing unit 330 renders an object audio signal for the corresponding audio object through a sound field synthesis technique, and the audio object located at other angles. For the object audio signal, the satellite audio signal is rendered by using a power panning law using a satellite surround loudspeaker.

전송부(340)는 복수의 스피커 신호를 복수의 스피커 신호 각각에 상응하는 스피커로 전송한다. 전송된 스피커 신호는 상응하는 스피커를 통해 재생된다. The transmitter 340 transmits a plurality of speaker signals to a speaker corresponding to each of the plurality of speaker signals. The transmitted speaker signal is reproduced through the corresponding speaker.

본 발명의 일실시예에 따르면, 복호화부(310)는 객체 기반 오디오 컨텐츠로부터 복수의 음원 녹음 공간 정보를 더 복호화하고, 신호 합성부(330)는 객체 오디오 신호, 음원 위치 정보 및 재생 공간 정보를 이용하여 객체 오디오 신호로부터 복수의 음원 신호에 대한 직접음을 생성하고, 생성된 직접음에 녹음 공간 정보에 기초하여 상기 생성된 직접음에 반사음을 부가하여 복수의 스피커 신호를 합성할 수 있다. According to an embodiment of the present invention, the decoder 310 further decodes a plurality of sound source recording space information from the object-based audio content, and the signal synthesizing unit 330 may decode the object audio signal, the sound source position information, and the play space information. The direct sound for the plurality of sound source signals may be generated from the object audio signal, and the plurality of speaker signals may be synthesized by adding the reflected sound to the generated direct sound based on recording space information.

일례로서, 재생 공간의 전방에 라우드스피커 어레이가 배치되어 있고, 상기 라우드스피커 어레이를 통하여 음장 합성 재생 기법을 통해 복수의 객체 오디오 신호를 재생하고자 하는 경우, 신호 합성부(330)는 하기 수학식 1 또는 수학식 2에 기초하여 복수의 객체 오디오 신호를 렌더링하여 복수의 음원 신호에 대한 직접음을 생성할 수 있다. As an example, when a loudspeaker array is disposed in front of a reproduction space, and a plurality of object audio signals are to be reproduced through a sound field synthesis reproduction technique through the loudspeaker array, the signal synthesis unit 330 may be represented by Equation 1 below. Alternatively, the plurality of object audio signals may be rendered based on Equation 2 to generate direct sounds for the plurality of sound source signals.

여기서,

는 라우드스피커 어레이의 n번째 라우드스피커가 방사하는 오디오 신호의 구동함수,

는 틸티드(tilted) 라우드스피커 어레이의 n번째 라우드스피커가 방사하는 오디오 신호의 구동함수,

는 가상 음원 신호,

는 라우드스피커의 지향성으로 음압에 가중치를 주는 성분,

는 라우드스피커의 좌표 정보,

는 음원의 좌표 정보,

는 가상 음원의 좌표 정보,

는 파수(wave number),

는 각속도,

는 n번째 라우드스피커와 청취자간의 각도,

는 음원과 청취자 간의 거리,

스피커와 청취자 간의 거리,

은 노말라이제이션(normalization) 변수,

은 틸티드 라우드스피커와 청취자 간의 각도를 각각 의미한다. here,

Is the driving function of the audio signal emitted by the nth loudspeaker of the loudspeaker array,

Is the driving function of the audio signal emitted by the nth loudspeaker of the tilted loudspeaker array,

The virtual sound source signal,

Is a loudspeaker's directivity, weighting the sound pressure,

Is the coordinate information of the loudspeaker,

Is the coordinate information of the sound source,

Is the coordinate information of the virtual sound source,

Is the wave number,

Is the angular velocity,

Is the angle between the nth loudspeaker and the listener,

Is the distance between the sound source and the listener,

The distance between the speaker and the listener,

Is a normalization variable,

Denotes the angle between the tilted loudspeaker and the listener, respectively.

또한, 상기 수학식 1 및 수학식 2에서

는 가상 음원 신호의 크기에 대한 가중치,

는 고주파 증폭 이퀄라이징(equalizing),

는 가상 음원과 n번째 라우드스피커 간의 거리에 의해 발생하는 전달 시간,

는 수직거리에 대한 가상 음원과 n번째 라우드스피커 간의 거리 비율,

는 하나의 원통파(cylindrical wave)의 확산을 의미한다. Further, in Equations 1 and 2

Is a weight for the magnitude of the virtual sound source signal,

Is a high frequency amplifying equalizing,

Is the propagation time caused by the distance between the virtual sound source and the nth loudspeaker,

Is the ratio of the distance between the virtual sound source and the nth loudspeaker to the vertical distance,

Means the diffusion of one cylindrical wave.

이 후, 신호 합성부(330)은 상기 수학식 1 내지 수학식 2에 따라 생성된 직접음과 (시간, 음압, 각도)의 순서쌍으로 표현된 녹음 공간 정보를 집합 반사음 기법(grouped reflections algorithm)에 따라 연산하여 직접음에 녹음 공간에서의 초기 반사음 정보를 부가한다. 이 때, 신호 합성부(330)는 반사음 정보에 포함된 각도 정보를 이용해서 각각의 반사음을 라우드스피커에 할당하는데, 만약 해당 각도에 라우드스피커가 존재하지 않는 경우, 해당 각도에 인접한 라우드스피커에서 반사음을 재생되도록 스피커 신호를 합성한다. Thereafter, the signal synthesizing unit 330 adds the recording space information expressed as an ordered pair of the direct sound generated according to Equations 1 to 2 and (time, sound pressure, angle) to the grouped reflections algorithm. Calculate accordingly and add the initial reflection sound information in the recording space to the direct sound. At this time, the signal synthesizing unit 330 allocates each reflection sound to the loudspeaker using the angle information included in the reflection sound information. If the loudspeaker does not exist at the corresponding angle, the reflection sound from the loudspeaker adjacent to the angle Synthesize the speaker signal to play it.

또한, 본 발명의 일실시예에 따르면, 신호 합성부(330)는 무한 충격 응답 필터(IIR filter : Infinite Impulse Response Filter)를 이용하여 상기 스피커 신호에 잔향 효과를 부가할 수 있다. In addition, according to an embodiment of the present invention, the signal synthesizer 330 may add a reverberation effect to the speaker signal using an infinite impulse response filter (IIR filter).

도 2에서 살펴본 바와 같이, 본 발명의 일실시예에 따르면, 객체 오디오 신호는 멀티 채널 오디오 신호를 더 포함할 수 있는데, 만약, 재생하고자 하는 오디오 신호가 채널 기반 신호이고, 재상 공간은 음장 합성 재생 방식에 적합하도록 설정되어 있는데, 청취자가 멀티 채널 서라운드 방식에 따라 오디오 신호를 재생하고자 하는 경우, 신호 합성부(330)는 특정 라우드스피커를 선택하여 멀티 채널 서라운드 재생 방법에 따라 객체 기반 오디오 컨텐츠가 재생되도록 스피커 신호를 합성할 수 있다. 예를 들어, 멀티 채널 오디오 신호가 5.1채널 오디오 신호이고, 재생 공간의 전방에 라우드스피커 어레이가 배치되어 있고, 재생 공간의 후방에 2채널 서라운드 스피커가 배치되어 있는 경우, 신호 합성부(330)는 청취자의 전면을 기준으로 0°, ±30°, ±110°에 배치된 라우드스피커를 선택하고, 선택된 라우드스피커를 통해 객체 기반 오디오 컨텐츠가 재생되도록 스피커 신호를 합성할 수 있다. As shown in FIG. 2, according to an embodiment of the present invention, the object audio signal may further include a multi-channel audio signal. If the audio signal to be reproduced is a channel-based signal, the reproduction space is a sound field synthesis reproduction. If it is set to suit the method, and the listener wants to reproduce the audio signal according to the multi-channel surround method, the signal synthesis unit 330 selects a specific loudspeaker to play the object-based audio content according to the multi-channel surround playback method. The speaker signal can be synthesized as much as possible. For example, when the multi-channel audio signal is a 5.1-channel audio signal, the loudspeaker array is disposed in front of the playback space, and the two-channel surround speaker is disposed at the rear of the playback space, the signal synthesis unit 330 You can select loudspeakers placed at 0 °, ± 30 °, and ± 110 ° from the front of the listener, and synthesize speaker signals to play object-based audio content through the selected loudspeakers.

또한, 재생하고자 하는 오디오 신호가 멀티 채널 오디오 신호이고, 재생 공간이 멀티 채널 서라운드 방식에 적합하도록 설정되어 있는 경우, 신호 합성부(330)는 멀티 채널 서라운드 방식에 따라 객체 기반 오디오 컨텐츠가 재생되도록 한다. In addition, when the audio signal to be reproduced is a multi-channel audio signal and the reproduction space is set to be suitable for the multi-channel surround method, the signal synthesizing unit 330 allows the object-based audio content to be reproduced according to the multi-channel surround method. .

이와 같이, 본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 재생 장치(300)는 청취자의 재생 환경에 상관 없이, 음장 합성 재생 방식 및 멀티 채널 서라운드 방식 중에서 적어도 하나를 이용하여 객체 기반 오디오 컨텐츠를 재생할 수 있게 된다. As such, the apparatus 300 for reproducing object-based audio content according to an embodiment of the present invention reproduces object-based audio content using at least one of a sound field synthesis reproducing method and a multi-channel surround method regardless of a listening environment of a listener. It becomes possible.

도 4는 본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 생성 방법의 흐름도를 도시한 도면이다. 이하 도 4를 참고하여 각 단계별로 수행되는 과정을 설명하기로 한다. 4 is a flowchart illustrating a method of generating object-based audio content according to an embodiment of the present invention. Hereinafter, a process performed at each step will be described with reference to FIG. 4.

단계(S410)에서는 복수의 음원 신호를 녹음하여 복수의 객체 오디오 신호를 획득한다. In step S410, a plurality of object audio signals are obtained by recording a plurality of sound source signals.

본 발명의 일실시예에 따르면, 단계(S410)에서는 복수의 스팟 마이크로폰 및 마이크로폰 어레이 중에서 적어도 하나를 이용하여 복수의 객체 오디오 신호를 획득할 수 있다. According to an embodiment of the present invention, in step S410, a plurality of object audio signals may be obtained using at least one of the plurality of spot microphones and the microphone array.

단계(S420)에서는 복수의 음원 신호의 음원 위치 정보를 획득한다. In step S420, sound source position information of the plurality of sound source signals is obtained.

본 발명의 일실시예에 따르면, 단계(S420)에서는 복수의 스팟 마이크로폰의 위치, 마이크로폰 어레이에서의 복수의 음원 신호의 시간 지연, 및 마이크로폰 어레이에서의 복수의 음원 신호의 음압 레벨 중에서 적어도 하나를 이용하여 음원 위치 정보를 획득할 수 있다.According to an embodiment of the present invention, in step S420, at least one of positions of the plurality of spot microphones, time delays of the plurality of sound source signals in the microphone array, and sound pressure levels of the plurality of sound source signals in the microphone array are used. Sound source position information can be obtained.

또한, 본 발명의 다른 일실시예에 따르면, 단계(S420)에서는 사용자로부터 복수의 음원의 위치를 입력 받아 음원 위치 정보를 획득할 수 있다. In addition, according to another embodiment of the present invention, in step S420, sound source position information may be obtained by receiving positions of a plurality of sound sources from a user.

단계(S430)에서는 복수의 음원 신호의 녹음 공간에 대한 녹음 공간 정보를 획득한다. In operation S430, recording space information of recording spaces of a plurality of sound source signals is obtained.

본 발명의 일실시예에 따르면, 객체 기반 오디오 컨텐츠 생성 방법은 충격 음원 신호를 방사하는 단계(미도시) 및 방사된 충격 음원 신호를 수신하고, 수신된 충격 음원 신호에 기초하여 충격 응답을 산출하는 단계(미도시)를 더 포함할 수 있다. 이 경우, 단계(S430)은 산출된 충격 응답에 기초하여 녹음 공간 정보를 획득할 수 있다. 이 경우, 본 발명의 일실시예에 따르면, 충격 응답은 복수의 임펄스 신호를 포함하고 녹음 공간 정보는 복수의 임펄스 신호 간의 수신 시간 차, 복수의 임펄스 신호 간의 음압 레벨 차, 및 복수의 임펄스 신호 간의 수신 각도 차 중에서 적어도 하나를 포함할 수 있다. According to an embodiment of the present invention, an object-based audio content generation method includes radiating an impact sound source signal (not shown), receiving a radiated impact sound source signal, and calculating an impact response based on the received impact sound source signal. It may further include a step (not shown). In this case, step S430 may obtain recording space information based on the calculated shock response. In this case, according to an embodiment of the present invention, the shock response includes a plurality of impulse signals and the recording space information includes a difference in reception time between the plurality of impulse signals, a difference in sound pressure level between the plurality of impulse signals, and a plurality of impulse signals. It may include at least one of the difference in the reception angle.

단계(S440)에서는 복수의 객체 오디오 신호, 녹음 공간 정보 및 음원 위치 정보 중에서 적어도 하나를 부호화하여 객체 기반 오디오 컨텐츠를 생성한다. In operation S440, object-based audio content is generated by encoding at least one of the plurality of object audio signals, recording space information, and sound source position information.

또한 본 발명의 일실시예에 따르면, 객체 기반 오디오 컨텐츠 생성 방법은 복수의 객체 오디오 신호, 녹음 공간 정보, 및 음원 위치 정보 중에서 적어도 하나를 믹싱하여 멀티 채널 오디오 신호를 생성하는 단계(미도시)를 더 포함할 수 있다. 이 경우, 단계(S440)에서는 복수의 객체 오디오 신호, 녹음 공간 정보, 음원 위치 정보, 멀티 채널 오디오 신호 중에서 적어도 하나를 부호화하여 객체 기반 오디오 컨텐츠를 생성할 수 있다.In addition, according to an embodiment of the present invention, the method for generating object-based audio content may include generating a multi-channel audio signal by mixing at least one of a plurality of object audio signals, recording space information, and sound source position information (not shown). It may further include. In this case, in operation S440, object-based audio content may be generated by encoding at least one of a plurality of object audio signals, recording space information, sound source position information, and multi-channel audio signals.

도 5는 본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 재생 방법의 흐름도를 도시한 도면이다. 이하 도 5를 참고하여 각 단계별로 수행되는 과정을 설명하기로 한다.5 is a flowchart illustrating a method of playing object-based audio content according to an embodiment of the present invention. Hereinafter, a process performed at each step will be described with reference to FIG. 5.

단계(S510)에서는 객체 기반 오디오 컨텐츠로부터, 복수의 음원 신호에 대한 복수의 객체 오디오 신호, 및 복수의 음원 신호의 음원 위치 정보를 복호화한다. In operation S510, the plurality of object audio signals for the plurality of sound source signals and the sound source position information of the plurality of sound source signals are decoded from the object-based audio content.

단계(S520)에서는 복수의 객체 오디오 신호의 재생 공간에 대한 재생 공간 정보를 획득한다. In operation S520, reproduction space information on reproduction spaces of the plurality of object audio signals may be acquired.

본 발명의 일실시예에 따르면, 재생 공간 정보는 재생 공간에 배치된 복수의 스피커의 개수, 복수의 스피커 간의 간격, 복수의 스피커의 배치 간격, 복수의 스피커의 종류, 스피커의 위치 정보 및 재생 공간의 크기에 대한 정보 중에서 적어도 하나를 포함할 수 있다. According to an embodiment of the present invention, the reproduction space information includes a number of speakers arranged in a reproduction space, a space between a plurality of speakers, an arrangement interval of a plurality of speakers, types of a plurality of speakers, location information of a speaker, and a reproduction space. It may include at least one of information about the size of.

또한, 본 발명의 일실시예에 따르면, 단계(S520)에서는 사용자로부터 직접 재생 공간 정보를 입력 받을 수도 있고, 재생 공간에 배치된 별도의 마이크로폰을 이용하여 재생 공간 정보를 산출할 수도 있다. In addition, according to an embodiment of the present invention, in step S520, the playback space information may be directly input by the user, or the playback space information may be calculated by using a separate microphone disposed in the playback space.

단계(S530)에서는 음원 위치 정보, 및 재생 공간 정보에 기초하여 복호화된 객체 오디오 신호를 복수의 스피커 신호로 합성한다. In step S530, the decoded object audio signal is synthesized into a plurality of speaker signals based on the sound source position information and the reproduction space information.

본 발명의 일실시예에 따르면, 단계(S530)에서는 무한 충격 응답 필터를 이용하여 상기 스피커 신호에 잔향 효과를 부가할 수 있다.According to an embodiment of the present invention, in step S530, a reverberation effect may be added to the speaker signal using an infinite shock response filter.

단계(S540)에서는 복수의 스피커 신호를 복수의 스피커 신호 각각에 상응하는 스피커로 전송한다. 전송된 스피커 신호는 상응하는 스피커를 통해 재생된다. In operation S540, a plurality of speaker signals are transmitted to a speaker corresponding to each of the plurality of speaker signals. The transmitted speaker signal is reproduced through the corresponding speaker.

지금까지 본 발명에 따른 객체 기반 오디오 컨텐츠 생성/재생 방법의 실시예들에 대하여 설명하였고, 앞서 도 1 내지 도 3에서 설명한 객체 기반 오디오 컨텐츠 생성/재생 장치에 관한 구성이 본 실시예에도 그대로 적용 가능하다. 이에, 보다 상세한 설명은 생략하기로 한다.So far, embodiments of the object-based audio content generation / reproduction method according to the present invention have been described, and the configuration of the object-based audio content generation / playback apparatus described above with reference to FIGS. 1 to 3 may be applied to the present embodiment as it is. Do. Hereinafter, a detailed description will be omitted.

또한, 본 발명에 따른 객체 기반 오디오 컨텐츠 생성/재생 방법은 다양한 컴퓨터 수단을 통하여 수행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다. In addition, the object-based audio content generation / reproduction method according to the present invention may be implemented in the form of program instructions that may be executed by various computer means and may be recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Examples of program instructions such as magneto-optical, ROM, RAM, flash memory, etc. may be executed by a computer using an interpreter as well as machine code such as produced by a compiler. Contains high-level language codes. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다. 그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 안되며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다. As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible. Therefore, the scope of the present invention should not be limited to the described embodiments, but should be defined not only by the claims below, but also by those equivalent to the claims.

도 4는 본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 생성 방법의 흐름도를 도시한 도면이다.4 is a flowchart illustrating a method of generating object-based audio content according to an embodiment of the present invention.

도 5는 본 발명의 일실시예에 따른 객체 기반 오디오 컨텐츠 재생 방법의 흐름도를 도시한 도면이다. 5 is a flowchart illustrating a method of playing object-based audio content according to an embodiment of the present invention.

<도면의 주요 부분에 대한 부호의 설명><Explanation of symbols for the main parts of the drawings>

100: 객체 기반 오디오 컨텐츠 생성 장치100: object-based audio content generating device

110: 객체 오디오 신호 획득부110: object audio signal acquisition unit

120: 음원 위치 정보 획득부120: sound source location information acquisition unit

130: 녹음 공간 정보 획득부130: recording space information acquisition unit

140: 부호화부140: encoder

150: 충격 음원 신호 방사부150: impact sound source signal radiator

160: 충격 음원 신호 수신부160: impact sound source signal receiving unit

Claims

An object audio signal acquisition unit for recording a plurality of sound source signals to obtain a plurality of object audio signals;

A recording space information obtaining unit which obtains recording space information about recording spaces of the plurality of sound source signals;

A sound source position information obtaining unit obtaining sound source position information of the plurality of sound source signals; And

An encoder that generates object-based audio content by encoding at least one of the plurality of object audio signals, the recording space information, and the sound source position information.

Object-based audio content generation device comprising a.

The method of claim 1,

The object audio signal obtaining unit obtains the plurality of object audio signals using at least one of a plurality of spot microphones and a microphone array.

The method of claim 2,

The sound source location information acquisition unit

Acquiring the sound source position information using at least one of positions of the plurality of spot microphones, time delays of the plurality of sound source signals in the microphone array, and sound pressure levels of the plurality of sound source signals in the microphone array. Object-based audio content generating device characterized in that.

The method of claim 1,

An impact sound source signal radiator for radiating an impact sound source signal; And

An impact sound source signal receiver configured to receive the impact sound source signal and calculate an impulse response based on the received impact sound source signal

More,

The recording space information acquisition unit

And generating the recording space information based on the calculated shock response.

The method of claim 4, wherein

The shock response includes a plurality of impulse signals,

The recording space information includes at least one of a difference between a reception time between the plurality of impulse signals, a sound pressure level difference between the plurality of impulse signals, and a reception angle difference between the plurality of impulse signals. Generating device.

The method of claim 1,

A multi-channel audio mixing unit generating a multi-channel audio signal by mixing at least one of the plurality of object audio signals, the recording space information, and the sound source position information.

More,

And the encoder is further configured to encode the multi-channel audio signal.

A decoder which decodes a plurality of object audio signals for a plurality of sound source signals and sound source position information of the plurality of sound source signals from object-based audio content;

A reproduction space information acquisition unit obtaining reproduction space information on a reproduction space of the plurality of object-based audio contents;

A signal synthesizer configured to synthesize the decoded object audio signal into a plurality of speaker signals based on the sound source position information and the reproduction space information; And

Transmitter for transmitting the plurality of speaker signals to a plurality of speakers corresponding to each of the plurality of speaker signals

Object-based audio content playback device comprising a.

The method of claim 7, wherein

The reproduction space information may include at least one of a number of the plurality of speakers, a distance between the plurality of speakers, an arrangement angle of the plurality of speakers, types of the plurality of speakers, position information of the speakers, and size information of the reproduction space. Object-based audio content playback device comprising a.

The method of claim 7, wherein

The decoder further decodes recording space information of the plurality of sound source signals from the object-based audio content,

The signal synthesizing unit generates a direct sound for the plurality of sound source signals from the object audio signal using the sound source position information and the reproduction space information, and reflects the sound to the direct sound based on the direct sound and the recording space information. The apparatus of claim 1, wherein the plurality of speaker signals are synthesized by adding a plurality of speaker signals.

The method of claim 7, wherein

The signal synthesis unit,

An object-based audio content playback device, characterized in that to add a reverberation effect to the speaker signal using an infinite impulse response filter (IIR filter).