KR19980073078A

KR19980073078A - Audio encoding / decoding apparatus and method

Info

Publication number: KR19980073078A
Application number: KR1019970008189A
Authority: KR
Inventors: 김상욱
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1997-03-12
Filing date: 1997-03-12
Publication date: 1998-11-05
Also published as: KR100316769B1

Abstract

본 발명은 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 부호화/복호화장치 및 방법에 관한 것으로서, 오디오 부호화장치는 입력오디오 신호를 저주파대역신호와 고주파대역신호로 나누는 제1필터; 분리된 저주파대역신호를 보다 세밀한 주파수 대역으로 나누는 제2필터; 제1필터에 의해 분리된 고주파대역 신호를 시간영역에서 주파수영역으로 변환하는 T/F변환부; 제2필터의 출력신호를 ADPCM 방식에 의해 디지털 신호로 부호화하는 ADPCM부호화기; T/F변환부의 출력신호를 비트할당하고, 양자화하는 비트할당양자화부; 분리된 저주파 및 고주파 신호를 소정의 음향심리모델에 따라 처리하여 ADPCM부호화기와 T/F변환부에서 발생하는 양자화오차 제어에 대한 정보를 제공하는 음향심리부; 입력오디오 신호의 제1필터, 제2필터 통과여부를 제어하고, T/F변환부의 처리주파수대역을 제어하며, 다중콘텐트처리 모드 또는 스케일조절가능 모드를 나타내는 정보 및 콘텐트의 위치정보를 제공하는 제어부; 및 ADPCM부호화기에서 부호화된 신호와 비트할당양자화부에서 양자화된 비트들과 콘텐트의 위치정보를 이용하여 비트스트림을 형성하는 1차비트스트림 형성부를 포함함을 특징으로 한다.The present invention relates to an audio encoding / decoding apparatus and method capable of scaling by a single structure and capable of simultaneously processing multiple contents, and an audio encoding apparatus includes a first filter for dividing an input audio signal into a low frequency band signal and a high frequency band signal, ; A second filter for dividing the separated low frequency band signal into a finer frequency band; A T / F converter for converting a high frequency band signal separated by the first filter from a time domain to a frequency domain; An ADPCM encoder for encoding an output signal of the second filter into a digital signal by an ADPCM method; A bit allocation quantization unit for bit-allocating and quantizing an output signal of the T / F conversion unit; A psychoacoustic unit processing the separated low frequency and high frequency signals according to a predetermined psychoacoustic model to provide information on quantization error control generated in the ADPCM encoder and the T / F converter; A control unit for controlling whether the input audio signal passes through a first filter and a second filter, controls a processing frequency band of the T / F conversion unit, and provides information indicating the multiple content processing mode or the scalable mode, ; And a primary bitstream forming unit for forming a bitstream using the signal encoded by the ADPCM encoder and the position information of the quantized bits and the content in the bit allocation quantization unit.

본 발명에 의하면, 단일 구조로 여러 콘텐트들에 대한 처리가 가능하고 스케일 조절이 가능한 부호화기 및 복호화기를 구현할 수가 있다.According to the present invention, it is possible to implement an encoder and a decoder capable of processing various contents with a single structure and capable of scale adjustment.

Description

Audio encoding / decoding apparatus and method

본 발명은 오디오 부호화/복호화 장치 및 방법에 관한 것으로서, 특히 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 부호화/복호화 장치 및 방법에 관한 것이다.The present invention relates to an audio encoding / decoding apparatus and method, and more particularly, to an audio encoding / decoding apparatus and method capable of performing scale adjustment with a single structure and simultaneously processing multiple contents.

최근 영상회의, 영상쇼핑 등 인터랙티브(interactive)한 서비스가 다양하게 제공되고 있다. 이러한 인터랙티브한 서비스에서는 의미있는 영상단위(content : 이하 콘텐트라 함)들이 모여서 하나의 화면을 이루고 있다. 상기 의미있는 영상단위(콘텐트)들 각각은 하나의 처리단위로 되며, 개별적으로 이동이나 확대, 축소 및 삭제가 된다. 이렇게 복수의 콘텐트들을 각기 개별적으로 또는 동시에 처리하는 시스템을 다중 콘크런트 시스템(multiple concurrent system)이라 한다.Recently, various interactive services such as video conferencing and video shopping have been provided. In such an interactive service, meaningful image units (hereinafter referred to as contents) are gathered to form a single screen. Each of the meaningful image units (contents) becomes one processing unit and is moved, enlarged, reduced and deleted individually. A system for processing a plurality of contents individually or simultaneously at the same time is called a multiple concurrent system.

또한 데이터 전송선로를 효과적으로 사용하기 위해서는 정보의 표현에 사용되는 비트들에 대해 상기 비트들이 가지는 정보의 중요도 또는 사용자의 요구에 따라 재현에 사용되는 비트율의 조절이 필요하다. 즉 사용되는 비트들의 수를 조절할 수 있는(scalable ) 처리가 요구된다.Further, in order to effectively use the data transmission line, it is necessary to control the bit rate used for reproduction according to the importance of the bits of the bits used for the representation of information or the demand of the user. That is, a process that can be scalable to the number of bits used is required.

일반적으로 오디오 데이터와 비디오 데이터를 부호화하거나 복호화할 때 각각의 콘텐트들은 서로 구분되지 않고 부호화 및 복호화되기 때문에, 존재하는 오디오 신호들 가운데 특정 오디오 신호만을 뽑아내서 재현한다든지, 존재하는 비디오 신호들 가운데 특정 부분만을 뽑아내서 이동, 삭제 및 변형 등의 처리를 하기가 쉽지 않다. 이러한 문제는 콘텐트 각각의 독립적인 제어가 가능하게 되면 해결될 수 있다. 콘텐트 각각의 독립적인 제어가 되면, 특정 사람의 목소리와 같은 콘텐트를 듣기 싫은 경우에는 이를 없앨 수가 있고, 특정 사람의 화면상의 위치가 변했을 때 변한 위치를 고려해 출력 오디오 데이터를 변형시킬 수가 있다.Generally, when audio data and video data are encoded or decoded, each content is encoded and decoded without being distinguished from each other. Therefore, only a specific audio signal among existing audio signals is extracted and reproduced, It is not easy to extract only a portion and perform processes such as movement, deletion, and transformation. This problem can be solved if independent control of each of the contents becomes possible. In the case of independent control of each of the contents, it is possible to eliminate the content such as a voice of a specific person, and to change the output audio data in consideration of the changed position when the position on the screen of a specific person changes.

그러나 이러한 경우, 각 사람의 소리가 독립된 채널로 전달되기 때문에 특정인의 소리를 없애기 위해서는 독립된 채널 데이터를 전달해준다던지 혹은 전달하지 않는다던지 하는 것에 의해 용이하게 이루어질 수 있으나, 처리하는 콘텐트 각각에 대해 독립된 채널을 할당함으로 인해 시스템의 복잡도가 커지는 문제가 있다. 또한 상기 시스템이 다중 콘텐트(multiple contents )를 사용하는 영상회의 등 특정 목적에 사용되지 않는 경우, 시스템 구성요소 중 사용되지 않는 부분들이 많아지기 때문에 시스템의 효과적인 활용이 곤란하다는 문제가 있다.However, in this case, since the sound of each person is transmitted to the independent channel, it can be easily performed by transmitting or not transmitting independent channel data in order to eliminate the sound of a specific person. However, There is a problem that the complexity of the system increases. Further, when the system is not used for a specific purpose such as a video conference using multiple contents, there is a problem that it is difficult to effectively utilize the system because many unused parts of system components are used.

한편 부호화기 및 복호화기에서 스케일조절(scalable)이 필요한 이유는 다음과 같다. 비디오와 오디오 정보가 있을 때, 경우에 따라서는 비디오 정보만이 중요한 경우가 있고, 또 오디오 정보만이 중요한 경우도 있다. 이와 같은 때, 비디오 정보와 오디오 정보에 고정된 비트율을 사용하면, 정보 전송시 채널의 전송능력을 효과적으로 활용하지 못할 수 있다. 이런 경우 정보의 중요도에 따라 처리에 사용되는 데이터 전송 비트율을 조절하면, 한정된 전송능력을 가진 채널을보다 효과적으로 사용할 수 있게 된다. 또한 비디오 채널검색, 오디오 채널검색과 같은 경우에는 어느 프로그램이 서비스되는지를 아는게 중요하다. 그래서 서비스에 사용되는 정보를 스케일러블(scalable)하게 줄여주어서 음질이나 화질은 저하되더라도, 많은 채널에 대한 정보를 동시에 보내줘 효과적인 채널검색이 가능하도록 한다. 그런데 스케일 조절이 가능한 장치를 구현함에 있어서, 기존의 방식은 부호화한 후 복호화해서 오차신호들을 구한 다음 스케일 조절이 가능한 비트스트림을 만들기 때문에, 스케일 조절에 필요한 단계가 많아지면 각 단계수 만큼 복잡도가 배로 늘어나는 문제가 있다.The reason why the scaler is required in the encoder and the decoder is as follows. When there are video and audio information, in some cases, only video information is important, and only audio information is important. In this case, if the bit rate fixed for the video information and the audio information is used, the transmission capability of the channel may not be utilized effectively during information transmission. In this case, by adjusting the data transmission bit rate used for processing according to the importance of information, a channel having a limited transmission capability can be used more effectively. It is also important to know which program is being served, such as video channel search or audio channel search. Thus, the information used in the service is scalably reduced, so that even if the sound quality or the image quality is deteriorated, information on a large number of channels is sent at the same time, thereby enabling an effective channel search. However, in implementing an apparatus capable of adjusting the scale, since the conventional method generates a bitstream that can be scaled after obtaining error signals by encoding and decoding the error signals, if the number of steps required for scale control increases, There is an increasing problem.

한편, Multiple concurrent processing 이 필요한 이유는 다음과 같다. 화상회의나 다자간의 통화와 같은 경우, 각자에 대해 또는 각 콘텐트들에 따라 처리가 가능하게 되면, 특정 사람의 목소리와 같은 콘텐트가 듣기 싫은 경우에는 삭제시킬 수 있으며, 또한 특정사람의 화면상의 위치가 변화할 때, 변화하는 위치를 고려해 출력 오디오 데이터를 변형시켜 음원의 위치를 이동하여 처리할 수가 있다. 이 모든 처리가 만약 동시에 일어나지 않는다면 듣는 소리와 입 모양이 틀려지게 되고 그렇게 되면 실시간으로 대화하고 있는 것같지 않아 부자연스럽게 된다. 그래서 여러 콘텐트들을 다루기 위해서는 multiple concurrent processing system 이 되어야 한다.On the other hand, the reason for multiple concurrent processing is as follows. In the case of a video conferencing or a multi-party call, when processing is possible for each person or each content, it is possible to delete a content such as a voice of a specific person if the person does not want to hear it. When changing, the position of the sound source can be moved and processed by modifying the output audio data in consideration of the changing position. If all these processes do not happen at the same time, the sound and the mouth shape will be different, and it will be unnatural because it does not seem to be in real-time conversation. So to handle multiple content, it must be a multiple concurrent processing system.

이 때, 음원의 위치 이동은 인간이 삼차원 공간에 존재하는 소리를 양쪽 귀로 듣고 느끼는 것에 대한 연구결과를 적용시켜 줌으로써 개선이 가능하다. 즉, 오른쪽 귀와 왼쪽 귀로 느끼는 소리 신호의 크기 차이라든지 소리의 전달 시간에 대한 연구결과에 의해 사람이 공간상의 한 점에서 존재하는 음원을 인식하는 인식 특성이 모델링되었고, 이러한 특성은 HRTF(head related transfer function)이라고 불리운다. 상기 HRTF 함수들은 공간 상의 어떤 한 점에서 소리가 존재할 때, 그 신호가 양 귀로 전송될 때에 대한 특징에 대한 중이(middle ear)에서의 임펄스 응답 또는 전달함수로 표현된다. 상기 HRTF를 응용하여 소리가 존재하는 곳을 삼차원 공간상의 임의의 위치로 옮겨주는 처리가 가능하게 되었다.At this time, the positional shift of the sound source can be improved by applying the result of studying the human being listening to the sound existing in the three-dimensional space with both ears. In other words, the recognition characteristics of a human being recognizing a sound source in a spatial point were modeled by a study on the size difference of the sound signal to be felt by the right ear and the left ear or the sound transmission time, function. The HRTF functions are represented by an impulse response or transfer function at the middle ear for a characteristic when the signal is transmitted to the ear when sound is present at any point in space. By applying the HRTF, it becomes possible to carry out a process of transferring the sound to an arbitrary position in the three-dimensional space.

그러나 종래에는 화면상의 특정영역, 즉 사람, 동물등과 같은 의미있는 부분(콘텐트)에서 발생되는 소리만을 골라 처리하기가 어려웠다. 예를 들어 화면상의 특정인의 위치를 바꾸거나 없애주기 위한 비디오와 오디오 처리가 곤란하였다. 그렇기 때문에 독립된 콘텐트 각각에 대한 처리를 할 수 없었으며, 복호화 단계에서 전송 또는 저장되어 있는 데이터 중 일부만을 변형처리하기가 용이하지 않았다. 결론적으로, 종래의 부호화 및 복호화 시스템에서는 스케일 조절이 가능하고 동시에 다중 콘텐트 처리가 가능한 방식에 대한 고려가 없었다.However, conventionally, it has been difficult to select only a sound generated from a specific area on the screen, that is, a meaningful part (content) such as a person or an animal. For example, it has been difficult to process video and audio to change or eliminate the position of a particular person on the screen. Therefore, it is not possible to process each independent content, and it is not easy to transform only a part of data transmitted or stored in the decoding step. As a result, in the conventional encoding and decoding system, scale control is possible and there is no consideration for a method capable of processing multiple contents at the same time.

본 발명은 상술한 문제점을 해결하기 위해 창출된 것으로서, 채널검색등에 활용이 가능한 스케일조절에 의한 재생이나, 다자간 통화나 영상회의 등에 활용이 가능한 여러 콘텐트들에 대한 처리를 위해, 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 부호화/복호화 장치 및 방법를 제공함에 그 목적이 있다.SUMMARY OF THE INVENTION The present invention has been made to solve the above-described problems, and it is an object of the present invention to provide an apparatus and method for processing a plurality of contents that can be used for scale- And an audio encoding / decoding apparatus and method capable of performing simultaneous processing of multiple contents.

도 1은 본 발명에 의한 스케일조절과 다중 콘텐트 처리가 가능한 오디오 부호화기의 구성을 블록도로 도시한 것이다.FIG. 1 is a block diagram of a configuration of an audio encoder capable of scale adjustment and multi-content processing according to the present invention.

도 2는 본 발명에 의한 스케일조절과 다중 콘텐트 처리가 가능한 오디오 복호화기의 구성을 블록도로 도시한 것이다.FIG. 2 is a block diagram of a configuration of an audio decoder capable of scale adjustment and multi-content processing according to the present invention.

도 3a 및 도 3b는 종래의 ADPCM 부호화기 및 ADPCM 복호화기를 블록도로 도시한 것이다.3A and 3B are block diagrams of a conventional ADPCM encoder and an ADPCM decoder.

도 4는 부호화기의 영상화면의 콘텐트 위치 정보 표현 방식을 도시한 것이다.4 illustrates a method of representing content location information of an image screen of an encoder.

도 5a 및 도 5b는 복호화기에서 영상화면에서의 콘텐트 이동에 따른 위치 정보표현 방식을 설명하기 위한 것으로서, 원래화면과 콘텐트가 이동한 후의 화면을 도시한 것이다.FIGS. 5A and 5B are views for explaining a location information presentation method according to content movement on an image screen in a decoder, and show a screen after an original screen and content are moved.

도 6a 및 도 6b는 본 발명에 사용되는 ADPCM의 부호화기 및 ADPCM의 복호화기의 구성을 블록도로 도시한 것이다.6A and 6B are block diagrams showing configurations of an ADPCM encoder and an ADPCM decoder used in the present invention.

도 7은 헤드폰과 스피커로 재현해 주는 경우에 대한 일예를 도시한 것이다.FIG. 7 shows an example of reproducing with a headphone and a speaker.

상기의 목적을 달성하기 위한 본 발명에 의한, 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 부호화장치는 입력오디오 신호를 저주파대역신호와 고주파대역신호로 나누는 제1필터; 상기 제1필터에 의해 분리된 저주파대역신호를 보다 세밀한 주파수 대역으로 나누는 제2필터; 상기 제1필터에 의해 분리된 고주파대역 신호를 시간영역에서 주파수영역으로 변환하는 T/F변환부; 상기 제2필터의 출력신호를 ADPCM 방식에 의해 디지털 신호로 부호화하는 ADPCM부호화기; 상기 T/F변환부의 출력신호를 비트할당하고, 양자화하는 비트할당양자화부; 상기 제1필터에서 분리된 저주파 및 고주파 신호를 소정의 음향심리모델에 따라 처리하여 상기 ADPCM부호화기와 T/F변환부에서 발생되는 양자화오차 제어에 대한 정보를 제공하는 음향심리부; 상기 입력오디오 신호의 상기 제1필터, 제2필터 통과여부를 제어하고, 상기 T/F변환부의 처리 주파수 대역을 제어하며, 다중콘텐트처리 모드 또는 스케일 조절가능 모드를 나타내는 정보 및 콘텐트의 위치정보를 제공하는 제어부; 및 상기 ADPCM부호화기에서 부호화된 신호와 상기 비트할당양자화부에서 양자화된 비트들과 상기 제어부의 콘텐트의 위치정보를 이용하여 비트스트림을 형성하는 1차비트스트림 형성부를 포함함을 특징으로 한다.According to another aspect of the present invention, there is provided an audio encoding apparatus capable of scaling and scalable processing of multiple contents, including a first filter for dividing an input audio signal into a low frequency band signal and a high frequency band signal; A second filter for dividing the low-frequency band signal separated by the first filter into a finer frequency band; A T / F converter for converting a high frequency band signal separated by the first filter from a time domain to a frequency domain; An ADPCM encoder for encoding an output signal of the second filter into a digital signal by an ADPCM method; A bit allocation quantizer for bit allocating and quantizing an output signal of the T / F converter; A psychoacoustic unit processing the low-frequency and high-frequency signals separated by the first filter according to a predetermined acoustic psychological model and providing information on quantization error control generated in the ADPCM encoder and the T / F converter; And controls the processing frequency band of the T / F conversion unit. The information processing apparatus includes information indicating a multi-content processing mode or a scaleable mode, and position information of the content ; And a primary bitstream forming unit for forming a bitstream using the signal encoded by the ADPCM encoder, the quantized bits of the bit allocation quantization unit, and the position information of the content of the control unit.

본 발명의 다른 목적을 달성하기 위한, 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 복호화장치는 입력 비트스트림을 해체하는 비트스트림해체부; 상기 비트스트림해체부에서 해체된 비트스트림을 역양자화하는 역양자화기; 상기 비트스트림해체부에서 해체된 비트스트림을 복호화하는 ADPCM복호화기; 상기 ADPCM복호화기에서 복호화된 저주파 대역별 신호를 합성하는 제1신호합성부; 고주파 대역 신호를 시간영역으로 변환하는 F/T변환부; 상기 제1신호합성부에서 합성된 저주파대역 신호와 상기 F/T변환부 출력신호를 합성하는 제2신호합성부; 상기 비트스트림해체부에서 해체된 신호에서 콘텐트들의 공간에서의 위치정보를 추출하여 스피커 재현인지, 헤드폰 재현인지에 따라 음원의 위치를 조절하는 공간제어처리부; 상기 비트스트림해체부에서 해체된 신호를 받아, 상기 비트스트림이 다중 콘텐트처리모드인지 스케일조절가능모드인지를 판별하고, 상기 판별된 모드가 다중 콘텐트처리모드이면 상기 F/T 변환부의 출력신호가 출력되지 않게 하며, 사용자의 스케일조절 명령에 따라 상기 공간제어처리부에서의 스케일조절을 제어하는 제어부; 및 상기 공간제어처리부 및 제2신호합성부에서 출력되는 신호를 일시 저장하여 출력하는 버퍼출력부를 포함함이 바람직하다.According to another aspect of the present invention, there is provided an audio decoding apparatus capable of scaling by a single structure and capable of simultaneously processing multiple contents, comprising: a bit stream decomposer for decomposing an input bit stream; An inverse quantizer for inversely quantizing the bitstream deconstructed by the bitstream decomposition unit; An ADPCM decoder for decoding the bitstream deconstructed by the bitstream decomposition unit; A first signal synthesizer for synthesizing low frequency band signals decoded by the ADPCM decoder; An F / T converter for converting a high frequency band signal into a time domain; A second signal synthesizer for synthesizing the low frequency band signal synthesized by the first signal synthesizer and the F / T converter output signal; A space control processor for extracting positional information in a space of contents from the signal demultiplexed by the bitstream demultiplexing unit and adjusting a position of the sound source according to speaker reproduction or headphone reproduction; Wherein the decoding means decodes the bitstream to determine whether the bitstream is in a multi-content processing mode or a scalable mode, and outputs the output signal of the F / T converter when the determined mode is the multi- And controlling the scale adjustment in the space control processing unit according to a scale adjustment command of the user; And a buffer output unit for temporarily storing and outputting signals output from the space control processing unit and the second signal combining unit.

상기의 또 다른 목적을 달성하기 위한 본 발명에 의한, 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 부호화방법은, 입력오디오 신호를 저주파대역신호와 고주파대역신호로 나누는 주파수대역분리단계; 상기 주파수대역분리단계에서 분리된 저주파대역신호를 보다 세밀한 주파수 대역으로 나누는 저주파분리단계; 다중 콘텐트를 동시에 처리할 수 있도록 부호화할 것인지, 스케일조절이 가능하도록 부호화할 것인지 판단하는 단계; 다중콘텐트를 동시에 처리할 수 있도록 부호화할 경우, 상기 저주파분리단계에서 분리된 신호를 ADPCM 방식에 의해 디지털 신호로 부호화하는 부호화단계; 스케일조절이 가능하도록 부호화하고자 할 경우, 상기 저주파분리단계에서 분리된 신호를 ADPCM 방식에 의해 디지털 신호로 부호화하고, 상기 주파수대역분리단계에서 분리된 고주파대역 신호를 시간영역에서 주파수영역으로 변환하는 T/F변환단계; 상기 T/F변환단계에서 변환된 신호를 비트할당하고, 양자화하는 양자화단계; 상기 주파수대역분리단계에서 분리된 저주파 및 고주파 신호를 소정의 음향심리모델에 따라 처리하여 상기 부호화단계에서 사용되는 양자화기의 단계 차이 값에 대한 정보를 제공하는 음향심리단계; 및 상기 부호화된 신호와 상기 양자화된 비트들과 콘텐트의 위치정보를 이용하여 비트스트림을 형성하는 단계를 포함함을 특징으로 한다.According to another aspect of the present invention, there is provided an audio encoding method capable of scaling a single structure and simultaneously processing multiple contents according to the present invention. The audio encoding method includes: dividing an input audio signal into a low frequency band signal and a high frequency band signal, Separation step; A low frequency separation step of dividing the low frequency band signal separated in the frequency band separation step into a finer frequency band; Determining whether to encode multiple contents to be processed simultaneously or to encode to enable scale adjustment; A coding step of coding a signal separated in the low frequency separation step into a digital signal by an ADPCM method when coding is performed so that multiple contents can be simultaneously processed; Frequency band separating step, the signal separated in the low-frequency demultiplexing step is encoded into a digital signal by the ADPCM method, and the high-frequency band signal separated in the frequency- / F conversion step; A quantization step of bit-allocating and quantizing a signal transformed in the T / F conversion step; An acoustic psychological step of processing the low-frequency and high-frequency signals separated in the frequency band separation step according to a predetermined acoustic psychological model and providing information on a step difference value of a quantizer used in the encoding step; And forming a bit stream using the encoded signal, the quantized bits, and the position information of the content.

본 발명의 또 다른 목적을 달성하기 위한, 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 복호화방법은, 입력되는 비트스트림을 해체하는 비트스트림해체단계; 상기 비트스트림해체단계에서 해체된 비트스트림이 다중 콘텐트처리모드인지 스케일조절가능모드인지를 판별하는 단계; 상기 해체된 비트스트림을 역양자화하는 역양자화단계; 상기 비트스트림해체단계에서 해체된 비트스트림을 복호화하는 복호화단계; 상기 복호화된 저주파 대역별 신호를 합성하는 제1신호합성단계; 상기 판별된 모드가 다중 콘텐트처리모드가 아니면, 고주파 대역 신호를 시간영역으로 변환하는 F/T변환단계; 상기 신호합성단계에서 합성된 저주파대역 신호와 상기 F/T변환단계에서 변환된 신호를 합성하는 제2신호합성단계; 사용자의 스케일조절 명령에 따라 스케일을 조절하고, 상기 비트스트림해체단계에서 해체된 신호에서 콘텐트들의 공간에서의 위치정보를 추출하여 스피커 재현인지, 헤드폰 재현인지에 따라 음원의 위치를 조절하는 공간처리단계; 및 상기 제2신호합성단계에서 합성된 신호와 상기 공간처리단계에서 처리된 신호를 버퍼링하여 출력하는 단계를 포함함이 바람직하다.According to another aspect of the present invention, there is provided an audio decoding method capable of adjusting a scale by a single structure and simultaneously processing multiple contents, comprising: a bit stream decomposing step of decomposing an input bit stream; Determining whether the decoded bitstream is a multiple content processing mode or a scalable mode in the bitstream decompression step; An inverse quantization step of inversely quantizing the decoded bit stream; A decoding step of decoding the decoded bit stream in the decoding step; A first signal synthesizing step of synthesizing the decoded low frequency band signal; An F / T conversion step of converting a high frequency band signal into a time domain if the determined mode is not a multiple content processing mode; A second signal synthesis step of synthesizing the low-frequency band signal synthesized in the signal synthesis step and the signal converted in the F / T conversion step; A spatial processing step of adjusting a scale according to a scale adjustment command of a user and extracting position information in a space of contents from the decomposed signal in the bit stream decompression step to adjust a position of a sound source according to speaker reproduction or headphone reproduction, ; And buffering the signal synthesized in the second signal synthesis step and the signal processed in the spatial processing step and outputting the signal.

이하에서 첨부된 도면을 참조하여 본 발명을 상세히 설명하기로 한다. 도 1은 본 발명에 의한 스케일조절과 다중 콘텐트 처리가 가능한 오디오 부호화기의 구성을 블록도로 도시한 것으로서, 제1필터(100), 제2필터(110), ADPCM부호화기(120), T/F변환부(130), 비트할당양자화부(140), 1차비트스트림형성부(150), 음향심리부(160), 제어부(170), 예측부(180) 및 비트스트림형성부(190)로 이루어진다.DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. FIG. 1 is a block diagram of a configuration of an audio encoder capable of scale control and multiple content processing according to an embodiment of the present invention. The first filter 100, the second filter 110, the ADPCM encoder 120, the T / A bit allocation quantization unit 140, a primary bitstream formation unit 150, an acoustic psychology unit 160, a control unit 170, a prediction unit 180, and a bitstream formation unit 190 .

상기 제1필터(100)는 입력오디오 신호를 저주파대역신호와 고주파대역신호로 분리한다. 상기 제2필터(110)는 상기 제1필터(100)에 의해 분리된 저주파대역신호를 보다 세밀한 주파수 대역으로 분리한다. 상기 ADPCM부호화기(120)는 상기 제2필터(110)의 출력신호를 ADPCM 방식에 의해 디지털 신호로 부호화한다.The first filter 100 separates an input audio signal into a low frequency band signal and a high frequency band signal. The second filter 110 separates the low frequency band signal separated by the first filter 100 into a finer frequency band. The ADPCM encoder 120 encodes an output signal of the second filter 110 into a digital signal by an ADPCM method.

상기 T/F변환부(130)는 상기 제1필터(100)에 의해 분리된 고주파대역 신호를 시간영역에서 주파수영역으로 변환한다. 상기 비트할당양자화부(140)는 상기 T/F변환부(130)의 출력신호를 비트할당하고 양자화한다. 상기 1차비트스트림 형성부(150)는 상기 ADPCM부호화기(120)에서 부호화된 신호와 상기 비트할당양자화부(40)에서 양자화된 비트들과 콘텐트의 위치정보 및 처리모드를 이용하여 비트스트림을 형성한다.The T / F converter 130 converts a high frequency band signal separated by the first filter 100 from a time domain to a frequency domain. The bit allocation quantization unit 140 performs bit allocation and quantization on the output signal of the T / F conversion unit 130. The primary bitstream forming unit 150 forms a bitstream using the position information and the processing mode of the signal encoded by the ADPCM encoder 120 and the bits and content quantized by the bit allocation quantization unit 40 do.

상기 음향심리부(160)는 상기 제1필터(100)에서 분리된 저주파 및 고주파 신호를 음향심리모델에 따라 처리하여 상기 ADPCM부호화기(120)에서 사용되는 양자화기의 단계값인 델타(delta) 값을 조절하고, 비트할당양자화부(140)에서 사용되는 비트수들을 결정하는 한 척도를 제공한다.The acoustic psychotherapy unit 160 processes low frequency and high frequency signals separated from the first filter 100 according to a psychoacoustic model and outputs a delta value which is a step value of a quantizer used in the ADPCM encoder 120 And provides a measure for determining the number of bits used in the bit allocation quantization unit 140.

상기 제어부(170)는 상기 입력오디오 신호의 상기 제1필터(100), 제2필터(110) 통과여부를 제어하고, 상기 T/F변환부(130)의 처리 주파수 대역을 제어하며, 다중콘텐트처리 모드 또는 스케일 조절가능 모드를 나타내는 처리모드 및 콘텐트의 위치정보를 제공한다.The control unit 170 controls whether the input audio signal passes through the first filter 100 and the second filter 110 and controls the processing frequency band of the T / F conversion unit 130, A processing mode indicating a processing mode or a scalable mode, and location information of the content.

상기 예측부(180)는 상기 1차비트스트림형성부(150)의 이전 프레임 정보와 현재 프레임 정보의 연관성을 구한다. 상기 비트스트림형성부(190)는 상기 예측부(180)에서 산출된 프레임연관성에 따라 중복되는 데이터를 줄여 비트스트림을 형성한다.The predictor 180 obtains the association between the previous frame information of the primary bitstream forming unit 150 and the current frame information. The bitstream forming unit 190 reduces the redundant data according to the frame relevance calculated by the predictor 180 to form a bitstream.

도 2는 본 발명에 의한 스케일조절과 다중 콘텐트 처리가 가능한 오디오 복호화기의 구성을 블록도로 도시한 것으로서, 비트스트림해체부(200), 역양자화기(250), ADPCM복호화기(230), 제1신호합성부(240), F/T변환부(260), 제2신호합성부(280), 공간제어처리부(270), 제어부(290), 버퍼출력부(295), 예측부(210), 1차비트스트림해체부(220)를 포함하여 이루어진다.FIG. 2 is a block diagram illustrating a configuration of an audio decoder capable of scale control and multiple content processing according to the present invention. The decoder includes a bitstream decomposition unit 200, an inverse quantizer 250, an ADPCM decoder 230, 1 signal synthesis unit 240, an F / T conversion unit 260, a second signal synthesis unit 280, a spatial control processing unit 270, a control unit 290, a buffer output unit 295, a prediction unit 210, And a primary bit stream demultiplexing unit 220.

상기 비트스트림해체부(200)는 입력 비트스트림을 해체한다. 상기 예측부(210)는 상기 비트스트림해체부(200)에서 해체된 비트스트림이 이전 프레임 정보를 이용한 비트스트림인지를 판별한다.The bitstream decompression unit 200 decomposes the input bitstream. The predictor 210 determines whether the bitstream decomposed in the bitstream decomposer 200 is a bitstream using previous frame information.

상기 1차비트스트림 해체부(220)는 상기 예측부(210)에서의 판별이 이전 프레임정보를 이용한 비트스트림이라고 판별하면 이전 프레임정보를 이용하여 비트스트림을 재구성한다.The primary bitstream demultiplexing unit 220 reconstructs the bitstream using the previous frame information if the prediction unit 210 determines that the bitstream is the previous frame information.

상기 역양자화기(250)는 상기 1차비트스트림해체부(220)에서 해체된 비트스트림을 역양자화한다. 상기 ADPCM복호화기(230)는 상기 1차비트스트림해체부(200)에서 해체된 비트스트림을 복호화한다. 상기 제1신호합성부(240)는 상기 ADPCM복호화기(230)에서 복호화된 저주파 대역별 신호를 합성한다.The dequantizer 250 dequantizes the decoded bitstream in the primary bitstream demultiplexer 220. The ADPCM decoder 230 decodes the decoded bitstream in the primary bitstream demultiplexer 200. The first signal synthesizer 240 synthesizes low frequency band signals decoded by the ADPCM decoder 230.

상기 F/T변환부(260)는 고주파 대역 신호를 시간영역으로 변환한다. 상기 제2신호합성부(280)는 상기 제1신호합성부(240)에서 합성된 저주파대역 신호와 상기 F/T변환부(260) 출력신호를 합성한다.The F / T converter 260 converts the high frequency band signal into a time domain. The second signal synthesizer 280 synthesizes the low frequency band signal synthesized by the first signal synthesizer 240 and the output signal of the F / T converter 260.

상기 공간제어처리부(270)는 상기 비트스트림해체부(200)에서 해체된 신호에서 콘텐트들의 공간에서의 위치정보를 추출하여 스피커 재현인지, 헤드폰 재현인지에 따라 음원의 위치를 조절한다. 상기 공간제어처리부(270)는 또한 위치정보를 갖는 영상 콘텐트의 위치 변동에 따라 영상콘텐트의 위치좌표를 새로 구해 음원의 위치 이동을 고려한 소리를 조절한다. 상기 공간제어처리부(270)의 위치정보는 발성기관의 위치를 기준위치로 사용한다.The space control processing unit 270 extracts positional information in the space of contents from the signal decomposed in the bitstream decomposition unit 200 and adjusts the position of the sound source according to the speaker reproduction or the headphone reproduction. The space control processor 270 also obtains the positional coordinates of the video content according to the positional change of the video content having the positional information to adjust the sound considering the positional shift of the sound source. The position information of the space control processor 270 uses the position of the vocal organ as a reference position.

상기 제어부(290)는 상기 비트스트림해체부(220)에서 해체된 신호를 받아, 상기 비트스트림이 다중 콘텐트처리모드인지 스케일조절가능모드인지를 판별하고, 상기 판별된 모드가 다중 콘텐트처리모드이면 상기 F/T 변환부(260)의 출력신호가 출력되지 않게 하며, 사용자의 스케일조절 명령에 따라 상기 공간제어처리부(270)에서의 스케일조절을 제어한다.The control unit 290 receives the demultiplexed signal from the bitstream demultiplexing unit 220 to determine whether the bitstream is in the multiple content processing mode or the scale adjustable mode. If the determined mode is the multiple content processing mode, The output signal of the F / T converter 260 is not output, and the scale control in the space control processor 270 is controlled according to the scale adjustment command of the user.

상기 버퍼출력부(295)는 상기 공간제어처리부(270) 및 제2신호합성부(280)에서 출력되는 신호를 일시 저장하여 출력한다.The buffer output unit 295 temporarily stores signals output from the space control processing unit 270 and the second signal combining unit 280 and outputs the signals.

그러면, 상기와 같은 구성에 의거하여 본 발명의 동작을 설명하기로 한다. 먼저, 부호화기에 대해서 살펴본다. 사용자가 상기 제어부(170)에서 다중콘커런트처리(multiple concurrent processing )와 스케일조절가능한 코딩(scalable coding) 중 하나의 동작모드를 선택한다. 만약, 선택된 동작모드가 multiple concurrent processing 일 경우, 상기 제어부(170)에 의해 상기 T/F변환부(130)은 동작을 하지 않고, 각 필터(100, 110)와 ADPCM부호화기(120)가 동작을 하여 배당된 콘텐트(content)에 대한 처리를 한다. 여기서 상기 필터(100, 110)는 콘텐트가 가지고 있는 주파수 특성을 처리에 고려해주기 위한 안티에일리어싱(anti-aliasing)필터이고, 대역 제한된 신호들은 도 6a에 도시된 바와 같은 ADPCM부호화기에 의해 부호화된다. 이때 상기 ADPCM부호화기(120)에서 사용되는 양자화기(도시안됨)의 단계 델타(delta)는 상기 음향심리를 모델링한 음향심리부(160)에 의해 제어가 된다. 여기서, 필터 및 ADPCM을 4개 병렬로 사용함으로써 최대 4개까지의 콘텐트들이 존재할 때 상기 콘텐트들의 콘커런트처리(concurrent processing)가 가능하게 한다.The operation of the present invention will now be described based on the above-described configuration. First, let's look at the encoder. The user selects one of the multiple concurrent processing and the scalable coding in the controller 170. [ If the selected operation mode is multiple concurrent processing, the controller 170 does not operate the T / F converter 130, and the filters 100 and 110 and the ADPCM encoder 120 operate And processes the allocated content. The filters 100 and 110 are anti-aliasing filters for considering the frequency characteristics of the content, and the band-limited signals are encoded by the ADPCM encoder as shown in FIG. 6A. At this time, the step delta of the quantizer (not shown) used in the ADPCM encoder 120 is controlled by the psychoacoustic unit 160 modeling the acoustic psychology. Here, by using four filters and ADPCM in parallel, concurrent processing of the contents is possible when a maximum of four contents are present.

한편 사용자가 상기 제어부(170)에서 동작모드를 scalable coding를 선택할 경우, 입력신호는 하나의 콘텐트에 대한 것으로서, 상기 입력신호를 크게 5개의 주파수 대역별로 처리가 가능하게 한다. 샘플링 주파수를 Fs라 하면, Fs/4 - Fs/2, 0 - Fs/16, Fs/16 - Fs/8, Fs/8 - 3Fs/16, 3Fs/16 - Fs/4의 5개 대역으로 처리한다. 인간이 고주파수쪽 신호에 대해서는 민감도가 떨어지므로, 고주파수쪽에는 처리에 사용되는 주파수 대역을 넓게 하여 상기 T/F변환부(120)에서 T/F 변환을 하고, 낮은 주파수 쪽에서는 구성상의 복잡도를 간단하게 해주면서 상기 ADPCM부(130)를 사용한다.On the other hand, when the user selects scalable coding in the operation mode in the controller 170, the input signal is for one piece of content, and the input signal can be processed in five frequency bands. If the sampling frequency is Fs, it is processed into five bands of Fs / 4 - Fs / 2, 0 - Fs / 16, Fs / 16 - Fs / 8, Fs / 8 - 3Fs / 16 and 3Fs / 16 - do. Since the sensitivity of the human is low for the high frequency signal, the T / F conversion unit 120 performs the T / F conversion on the high frequency side by widening the frequency band used for the processing and the complexity on the low frequency side is simple And the ADPCM unit 130 is used.

여기서 저주파수 대역에는 데이터 전송시 발생가능한 에러에 대한 탄력성(resilience)을 위해 비선형 예측기를 사용한 ADPCM을 사용한다. 비선형예측기를 사용한 ADPCM부호화기(120)의 에러 resilience 에 대한 일예는 뒤에 보다 자세히 설명하기로 한다. 그리고 DPCM을 사용하지 않고 ADPCM을 사용하는 이유는 신호에 보다 적합한 양자화기를 사용하기 위함이며, 또한 도 6a과 같이 ADPCM 결과에 의해 발생하는 오차신호의 파워를 계산해 상기 음향심리부(160)의 인간의 음향심리 모델에 의해서 구한 한계치 이내에 드는지를 고려하여, 버퍼제어가 가능하게 한다.Here, ADPCM using a nonlinear predictor is used in the low frequency band to resilience errors that may occur in data transmission. An example of the error resilience of the ADPCM encoder 120 using the nonlinear predictor will be described later in more detail. The reason for using the ADPCM without using the DPCM is to use a quantizer suitable for the signal and also to calculate the power of the error signal generated by the ADPCM result as shown in FIG. The buffer control can be made taking into consideration whether or not it falls within the limits determined by the psychoacoustic model.

데이터들의 처리결과는 1차 비트스트림 형성부(150)로 전달이 되고, 전달된 다음에는 상기 예측부(180)에서 이전에 구성한 비트스트림과 비교를 해서 다른 점을 구한다. 이 때, 앞의 프레임과 뒤의 프레임간의 연관성이 소정의 한계값 이상이 되면 예측 온(prediction on)을 해서 상기 비트스트림형성부(190)를 통해 비트스트림을 구성해 전달하고, 소정의 한계값 이하일 경우는 예측 오프(prediction off)를 해서 비트스트림을 구성한다.The result of the processing of the data is transmitted to the primary bitstream forming unit 150, and after being transmitted, the prediction unit 180 compares the bitstream with the previously configured bitstream to obtain different points. At this time, if the association between the previous frame and the subsequent frame becomes equal to or greater than a predetermined threshold value, a bit stream is formed and transmitted through the bit stream forming unit 190 with prediction on, The prediction off is performed to form a bit stream.

한편, 복호화기는 다음과 같다. 먼저, 상기 비트스트림헤체부(200)를 통해 비트스트림이 해체된다. 그리고 나서 상기 예측부(210)를 통해 해체된 비트스트림상의 프레딕션(prediction) 온/오프(on/off) 정보를 체크해 이전 프레임 결과를 처리에 사용하든지, 안하든지를 알고 비트스트림을 재구성한다. 만일 프레딕션 온인 경우에는 상기 1차비트스트림해체부(220)에 의해 이전 프레임 결과를 처리에 사용하여 비트스트림을 해체하며, 상기 역양자화기(250)을 통해 역양자화된다. 만일 프레딕션 오프인 경우는 상기 비트스트림 해체부(200)에서 해체된 비트스트림을 그대로 사용한다.On the other hand, the decoder is as follows. First, a bitstream is decomposed through the bitstream hetec unit 200. Then, the prediction unit 210 checks the prediction on / off information on the decoded bit stream to reconstruct the bit stream, knowing whether or not to use the previous frame result for processing. If the prediction is on, the primary bitstream decomposing unit 220 decomposes the bitstream using the previous frame result in the process, and is inversely quantized through the inverse quantizer 250. If the prediction is off, the bitstream demultiplexer 200 uses the bitstream as it is.

그 다음, 제어부(290)는 비트스트림상의 정보를 읽어 이 비트스트림이 multiple concurrent processing 을 하고 있는지 아니면 scalable 복호화기로서의 역할을 하고 있는지를 검출한다. 만일 multiple concurrent processing 의 경우 상기 제어부(290)에 의해 해체된 비트스트림은 F/T변환기(260)에 통과되지 않고 상기 ADPCM복호화기(230)에 의해 ADPCM을 수행한다. 그리고 나서 제1신호합성부(240)에서 부호화될 때와는 반대로 세밀하게 나누어진 저주파수 부분에 대해 다시 신호가 합쳐져 하나의 저주파수 대역으로 된다. 그리고 만일 scalable 부호화기 및 복호화기로서 사용된 경우, 상기 제어부(290)의 제어에 의해 상기 F/T변환부(260)에 통과되면서 신호들이 재현된다.Then, the control unit 290 reads information on the bitstream and detects whether the bitstream is performing multiple concurrent processing or a scalable decoder. In case of multiple concurrent processing, the bit stream decoded by the controller 290 is not passed to the F / T converter 260 but is performed by the ADPCM decoder 230. [ Then, in contrast to the case of being encoded by the first signal synthesizing unit 240, the signals are further summed with respect to the low frequency portions finely divided into one low frequency band. If it is used as a scalable encoder and a decoder, signals are reproduced while being passed to the F / T converter 260 under the control of the controller 290.

이렇게 상기 제1신호합성부(240)에서 재현된 저주파수 대역 신호와 상기 F/T변환부(260)에서 변환된 고주파수 대역 신호는 상기 제신호합성부(280)를 통해 합쳐져서 상기 버퍼출력부(295)로 출력된다.The low frequency band signal reproduced by the first signal synthesizer 240 and the high frequency band signal converted by the F / T converter 260 are combined through the signal synthesizer 280 and output to the buffer output unit 295 .

비트스트림 상에 있는 각 content들의 위치정보를 이용해 복호화시 각 content 들의 공간상의 위치에 다른 보다 효과적인 처리가 가능하게 된다. 여기서 제어부(290)에서 어떤 content의 위치를 이동시켜주면 이동되는 위치를 연산에 의해 구한 뒤, 음원의 위치이동에 따른 보상을 해준다. 부호화기에서 음원의 위치보상을 고려해주지 않고 복호화기에서 고려해주는 이유는 만약 부호화기에서 변형시켰을 때 복호화기에서 또 다른 이동에 따른 변형을 한다면, 부호화기에서 변형된 효과를 없앤 후에 다시 변형에 따른 제어를 해줘야 하기 때문에 복잡도가 2배로 드는 문제가 있기 때문이다. 복호화기에서만 고려해줌으로써 복잡도가 2배가 되는 것을 방지할 수가 있다.The location information of each content on the bit stream is used to perform a more effective processing on the spatial position of each content in decoding. Here, the control unit 290 obtains a moving position by moving the position of a content, and compensates for the movement of the position of the sound source. The reason why the encoder decides not to compensate the position of the sound source in the encoder is because if the encoder performs a transformation according to another movement when transformed by the encoder, the transformed effect is canceled after the transformed effect is removed from the encoder This is because there is a problem that the complexity is doubled. It is possible to prevent the complexity from doubling by considering only the decoder.

한편 상기 multiple 콘텐트에 대한 처리를 보다 상세하게 설명하면 다음과 같다. 상기 제어부(290)에 의해 상기 비트스트림이 multiple 콘텐트 처리입력인지 아닌지가 검출된다. multiple 콘텐트에 대한 신호인 경우 저주파수 신호들의 처리에 사용되는 ADPCM복호하기(230) 각각이 독립된 콘텐트를 처리하도록 한다. 이 때, 다루는 신호는 스케일 조절이 가능한 경우로 ADPCM부호화기(120)과 T/F변환부(130)들을 사용할 때 다루는 주파수 대역폭과는 다르다. 부호화시 각각의 콘텐트들에 대해 독립적인 ADPCM부호화기를 사용해 처리를 하기 때문에 사용자의 제어에 의해 특정 콘텐트의 소리를 완전히 없앨 수도 있고, 특정 콘텐트가 가지고 있는 공간에서의 분포특성도 변형할 수가 있다.The process for the multiple content will be described in more detail as follows. It is detected by the controller 290 whether the bitstream is a multiple content processing input or not. and ADPCM decoding 230 used for processing low frequency signals in the case of multiple contents, respectively, to process independent content. In this case, the signal to be handled is scalable and is different from the frequency bandwidth used when the ADPCM encoder 120 and the T / F converter 130 are used. Since the processing is performed using an ADPCM encoder that is independent of each content at the time of encoding, the sound of the specific content can be completely eliminated by the user's control, and the distribution characteristic in the space possessed by the specific content can be modified.

그리고 만일 스케일 조절이 가능한 신호인 경우 입력신호는 하나의 content에 대한 것이다. 5개의 주파수 대역별로 처리가 가능하게 되어 있고, 샘플링 주파수를 Fs라 하면, Fs/4 - Fs/2, 0 - Fs/16, Fs/16 - Fs/8, Fs/8 - 3Fs/16, 3Fs/16 - Fs/4의 5개 댜역으로 처리가 되어 있다. 인간이 고주파수쪽 신호에 대해서는 민감도가 떨어지므로 고주파수 쪽에는 처리에 사용되는 주파수 대역을 넓게 해주고 T/F 변환해주었으므로 상기 F/T 변환부(260)에서 F/T 변환에 의해 복원한다. 그리고 낮은 주파수쪽에서는 구성상의 복잡도를 간단하게 하기 위해 부호하기에서 ADPCM부호화기에 의해 부호화하였기 때문에 ADPCM 복호화기(230)로 복호화한다. 빠른 검색이 필요할 때에는 비트스트림상의 일부분만을 읽어서 복호해줌으로써 처리의 효율성을 높여준다.And if the signal is scalable, the input signal is for one content. 4 - Fs / 2, 0 - Fs / 16, Fs / 16 - Fs / 8, Fs / 8 - 3Fs / 16, and 3Fs / 16 - Fs / 4. Since the sensitivity of the human being to the high frequency signal is lowered, the frequency band used for processing is widened and subjected to T / F conversion in the high frequency side, so that the F / T conversion unit 260 restores it by F / T conversion. In order to simplify the configuration complexity in the low frequency side, the signal is decoded by the ADPCM decoder 230 since it is encoded by the ADPCM encoder. When a fast search is needed, it improves the processing efficiency by reading and decoding only a part of the bit stream.

한편, 콘텐트의 위치정보를 이용해 새로운 위치 정보를 구해 처리하는 것은 다음과 같다. 부호화시에 영상의 콘텐트의 위치를 비트스트림상에 포함해준다. 도 4는 부호화기의 영상화면의 콘텐트 위치 정보 표현 방식을 도시한 것으로서, 비트스트림에 의해 전달되는 위치정보는 도 4와 같은 영상화면에서의 x, y 좌표값에 대한 정보이고, 이 값은 영상 콘텐트의 한쪽 끝을 기준으로 삼아준다. 입의 위치가 소리가 나오는 음원의 위치이기 때문에 입의 위치를 처리에 사용해주는 것을 특징으로하고 영상에 나타나지 않는 입의 경우, 영상 테두리상의 한 점을 입이 존재하는 위치로 가정해 처리를 해준다.On the other hand, new location information is obtained and processed using the location information of the content as follows. And the location of the content of the image is included in the bitstream at the time of encoding. FIG. 4 illustrates a content position information presentation method of an image screen of an encoder. The location information transmitted by a bitstream is information on x and y coordinate values in an image screen as shown in FIG. 4, As a reference. Since the position of the mouth is the position of the sound source, the position of the mouth is used for processing. In the case of the mouth which does not appear in the image, a point on the image frame is assumed as the position where the mouth exists.

도 4에 그 기준점에 의한 예를 보였다. 이 때, 처리에 사용되는 화면을 배경과 콘텐트, 그리고 각 콘텐트들의 테두리 선으로 나누어 준 뒤, 각각을 결합해 영상을 재생해 줌으로써 복호화기에서 화면의 콘텐트를 이동시 테두리선 정보를 이용해 해당 콘텐트를 추출한 뒤 새로운 위치에 이동시킬 수 있도록 한다.FIG. 4 shows an example based on the reference point. At this time, the screen used for the processing is divided into the background, the content, and the border lines of the respective contents, and the combined images are reproduced to extract the content using the border line information when the content of the screen is moved by the decoder Move back to the new position.

복호화시에는 영상정보가 하나의 콘텐트로 사용자가 그 콘텐트를 상하좌우로 이동시키거나 zoom in/out에 의해 크기를 조절해 줄 수가 있다. 복호화시 상하좌우로 움직임에 따라 변화하는 좌표 값을 처리에 고려해 복원시 화면에서 보이는 위치에서 소리가 나오는 것과 같이 처리를 한다. 예로 도 5a 및 도 5b 에서와 같이 사람 A, B가 있을 때 사용자가 사람의 위치를 원래위치(도 5a)에서 도 5b에서와 같이 바꾸어준다면, 그 바뀐 위치정보값(x,y)를 이용해 재생되는 소리를 바뀐 영상 콘텐트의 위치를 고려해 바꾸어 주는 처리를 한다. 또 영상 content가 zoom in/out 이 되면 그 정보를 (z) 정보로 이용해 새롭게 (x,y,z)에 대한 기준을 삼아서 근거리에서 말을 하는 경우와 원거리에서 말을 하는 경우에 대한 효과가 나오도록 처리해 준다. 이 결과 영상 콘텐트의 상하좌우 이동은 물로 전후 이동에 대한 처리를 할 수 있다. 음원의 공간이동 기법에 대한 것은 뒤에서 보다 자세히 설명한다.At the time of decoding, the video information is a single content, and the user can move the content up / down / left / right or adjust the size by zoom in / out. When decoded, considering the processing of coordinate values which change according to the movement in the up, down, left, and right direction, the processing is performed as if the sound comes out from the position seen on the screen upon restoration. 5A and 5B, if the user changes the position of the person to the original position (FIG. 5A) as shown in FIG. 5B, the user reproduces the position using the changed position information value (x, y) The sound that is changed is processed in consideration of the position of the changed video content. In addition, when the video content zoom in / out, the information is used as the (z) information and the new (x, y, z) . As a result, the up / down / left / right movement of the image content can be processed for the back and forth movement with water. The method of spatial movement of sound sources will be described in more detail later.

한편 저주파수 대역 및 multiple concurrent 처리시, 부호화기 및 복호화기에 사용되는 ADPCM 부호화기 및 복호화기(120, 230)가 비선형예측기를 사용하는 이유를 설명하기로 한다. DPCM부나 ADPCM부를 구성하는 예측기를 선형예측기로 하느냐 비선형 예측기로 하느냐에 따라 오차 신호의 영향이 달라진다. 선형예측기는 오차신호가 누적되어 주위 신호에 계속 전달되는 데 반하여, 비선형예측기는 오차가 고립되기 때문에 주위의 신호에는 오차신호의 영향이 계속 전파되지 않는 효과가 있다. 예를 들어 도 3a 및 도 3b의 ADPCM 부호화기/복호화기의 예측기 부분에 선형예측기와 비선형예측기를 사용한 경우를 살펴보자.The reason why the ADPCM encoder and decoder 120 and 230 used in the encoder and the decoder in the low frequency band and the multiple concurrent processing use the nonlinear predictor will be described. The effect of the error signal varies depending on whether the predictor constituting the DPCM unit or the ADPCM unit is a linear predictor or a nonlinear predictor. The linear predictor has the effect that the error signal is accumulated and continues to be transmitted to the surrounding signal, whereas the nonlinear predictor has the effect that the influence of the error signal is not propagated to the surrounding signal because the error is isolated. For example, consider the case where a linear predictor and a nonlinear predictor are used in the predictor portion of the ADPCM encoder / decoder of FIGS. 3A and 3B.

[수학식 1][Equation 1]

P_out[n] = mean { P_in[n], P_in[n-1], P_in[n-2] }P_out [n] = mean {P_in [n], P_in [n-1], P_in [n-2]

= integer [ (P_in[n] + P_in[n-1] + P_in[n-2])/3.0 ]= integer [(P_in [n] + P_in [n-1] + P_in [n-2]) / 3.0]

선형예측기는 수학식 1과 같이 상기 P_in[n], P_in[n-1], P_in[n-2]의 세 값을 더해준 뒤 3으로 나눠 정수값으로 양자화 처리한 값을 P_out[n]의 값으로 해주는 예측기이다.The linear predictor adds three values of P_in [n], P_in [n-1] and P_in [n-2] as shown in Equation 1 and divides the value by 3 into an integer value to obtain a value of P_out [n] .

한편, 비선형 예측기로는 수학식 2와 같은 중앙값 예측기(median predictor)를 사용한다.On the other hand, as the nonlinear predictor, a median predictor such as Equation (2) is used.

[수학식 2]&Quot; (2) "

P_out = median { P_in[n], P_in[n-1], P_in[n-2] }P_out = median {P_in [n], P_in [n-1], P_in [n-2]

즉, 상기 비선형 예측기는 위와 같이 3개의 샘플, P_in[n], P_in[n-1], P_in[n-2]을 크기순으로 정열시킨 뒤, 그 정열된 순서들 중 가운데 순서에 위치하는 값을 P_out[n]의 값으로 해주는 예측기이다.That is, the nonlinear predictor arranges the three samples P_in [n], P_in [n-1], and P_in [n-2] in the order of magnitude as described above, To the value of P_out [n].

X_in 과 Cod_X, Cod_Y 와 Y_out에 대한 선형 예측기/비선형 예측기에 대한 부호화기의 입력값/출력값, 복호화기의 입력값/출력값의 예는 다음과 같다.The input / output values of the encoder and the input / output values of the decoder for the linear predictor / nonlinear predictor for X_in, Cod_X, Cod_Y and Y_out are as follows.

[표 1][Table 1]

선형예측기에 의한 부호화기 입력/출력Encoder input / output by linear predictor

nn 1One 22 33 44 55 66 77 88 99 X_inX_in 2525 3030 3535 4040 3535 3030 2525 2020 1515 Cod_XCod_X 1010 1010 1010 1010 00 -7-7 -10-10 -10-10 -10-10 P_inP_in 2020 2525 3030 3535 4040 3535 3030 2525 2020 P_outP_out 1515 2020 2525 3030 3535 3737 3535 3030 2525

[표 2][Table 2]

비선형 예측기에 의한 복호화기 입력/출력Decoder input / output by nonlinear predictor

nn 1One 22 33 44 55 66 77 88 99 Cod_YCod_Y 1010 1010 1010 1010 00 -7-7 -10-10 -10-10 -10-10 Y_outY_out 2020 2525 3030 3535 4040 3535 3030 2525 2020 P_outP_out 1515 2020 2525 3030 3535 3737 3535 3030 2525

[표 3][Table 3]

비선형 예측기에 의한 부호화기 출력Encoder output by nonlinear predictor

nn 1One 22 33 44 55 66 77 88 99 X_inX_in 2525 3030 3535 4040 3535 3030 2525 2020 1515 Cod_XCod_X 1010 1010 1010 1010 00 -5-5 -10-10 -10-10 -10-10 P_inP_in 2020 2525 3030 3535 4040 3535 3030 2525 2020 P_outP_out 1515 2020 2525 3030 3535 3535 3535 3030 2525

[표 4][Table 4]

비선형 예측기에 의한 복호화기 입력 및 출력Decoder input and output by nonlinear predictor

nn 1One 22 33 44 55 66 77 88 99 Cod_YCod_Y 1010 1010 1010 1010 00 -5-5 -10-10 -10-10 -10-10 Y_outY_out 2020 2525 3030 3535 4040 3535 3030 2525 2020 P_outP_out 1515 2020 2525 3030 3535 3535 3535 3030 2525

만약, 전송되는 상태에서 채널에서 잡음이 발생하게 된 경우를 고려해보면 다음과 같다. n이 5인 시간의 경우 원래의 신호는 0이었으나 오차신호에 의해 100으로 바뀐 경우에 대해서 선형예측기와 비선형 예측기를 사용한 복호화기에 의한 출력값 차이를 보인다.Consider a case where noise is generated in the channel in the transmitted state as follows. In the case where n is 5, the output value of the decoder using the linear predictor and the nonlinear predictor is different when the original signal is 0 but is changed to 100 by the error signal.

[표 5][Table 5]

채널에서 잡음발생시 선형예측기에 의한 복호화기 입력 및 출력Decoder input and output by linear predictor when noise occurs in channel

nn 1One 22 33 44 55 66 77 88 99 Cod_YCod_Y 1010 1010 1010 1010 100100 -7-7 -10-10 -10-10 -10-10 Y_outY_out 2020 2525 3030 3535 4040 135135 6363 6969 7979 P_outP_out 1515 2020 2525 3030 3535 7070 7979 8989 7070

[표 6][Table 6]

채널에서 잡음발생시 비선형예측기에 의한 복호화기 입력 및 출력Decoder input and output by nonlinear predictor when noise occurs in channel

nn 1One 22 33 44 55 66 77 88 99 Cod_YCod_Y 1010 1010 1010 1010 100100 -5-5 -10-10 -10-10 -10-10 Y_outY_out 2020 2525 3030 3535 4040 135135 3535 3030 2525 P_outP_out 1515 2020 2525 3030 3535 4040 4040 3535 3030

비선형 예측기의 경우, 0 이 100 으로 바뀐 경우 그 효과가 고립되나, 선형예측기에서는 그 효과가 고립되지 않고 전파되어 그 영향을 비치고 있음을 볼 수가 있다.In the case of the nonlinear predictor, the effect is isolated when the value of 0 is changed to 100, but it can be seen that the effect is propagated without being isolated in the linear predictor.

실제 발생하는 오차신호를 검출해서 오디오 부호화시에 양자화기를 보다 효과적으로 사용할 수 있고, 버퍼제어를 할 수 있다. 이것은 인간의 음향심리에 의해 발생하는 마스크된 문턱치(masked threshold)를 사용함으로서 가능하다. 이 문턱치는 인간이 들어도 느기지 못하느 신호의 파워를 나타낸다. 해당 대역의 신호들을 양자화 처리했을 때 발생되는 양자화 잡음의 합이 이 이하기 되면 더 이상의 세밀한 양자화기는 필요없고 또 더 이상의 비트들도 필요없다는 것을 의미한다. 이러한 성질을 이용해 사용되는 비트수들을 제어한다.It is possible to detect the error signal actually generated and use the quantizer more effectively in audio coding, and to perform buffer control. This is possible by using a masked threshold generated by human acoustic psychology. This threshold represents the power of a signal that a human can not even hear. If the sum of the quantization noise generated when the signals of the corresponding band are quantized becomes smaller, it means that no more fine quantizer is needed and no more bits are needed. This property is used to control the number of bits used.

도 6과 같이 ADPCM 부호화한 신호들을 복호화하면서 발생한 오차신호의 양을 계산한다. 그 오차의 총합과 음향심리 모델에 의해 결정된 문턱치 상수값과 비교하여 한계를 넘는지 넘지 않는지를 조사해서 양자화기의 양자화단계 조절, ADPCM의 델타 조절, 그리고 프레임의 버퍼제어에 활용한다. 만약 그 한계를 넘게 되면 새로운 양자화기를 이용해 그 결과 값을 줄여주는 처리를 수행하여 음질과 비트 사용량에 대한 trade-off에 따라 조절할 수 있도록 한다.The amount of error signal generated while decoding the ADPCM-coded signals is calculated as shown in FIG. The quantization step of the quantizer, the delta adjustment of the ADPCM, and the buffer control of the frame are examined by examining whether the sum of the errors and the threshold constant determined by the psychoacoustic model exceed the limit. If the limit is exceeded, a new quantizer is used to reduce the result so that it can be adjusted according to the trade-off between sound quality and bit usage.

삼차원 음향효과는 인간이 두 귀로서 소리를 모아서 듣기 때문에 발생하는 효과이다. 이러한 삼차원 음향효과는 스테레오 신호에 의한 재현시 고정된 재현 스피커들의 위치에 따라서 재현되는 신호들을 제어해 제공이 가능하다. 인간의 소리 인식에 대한 연구들은 크게 오른쪽이나 왼쪽 귀들 가운데 하나의 귀만을 가지고 한 연구와 양쪽 귀를 함께 고려해 한 연구들로 구분될 수가 있다. 한쪽 귀에 대한 연구는 소리 존재의 유무를 느끼는 과정 및 그 특징에 대한 모델링이 가능해 인간이 인지할 수 있는 신호의 최소 압력크기(absolute threshold value) 라든지 여러 신호들이 들어올 때 각 신호들간의 상호작용(masking)에 대한 연구결과들이 있어서 그 결과들을 데이터의 효과적인 표현, 즉 압축 등에 사용되고 있다. 양쪽 귀에 대한 연구는 양쪽 귀에 들어오는 입력신호들에 대한 상호 영향에 대한 연구, 즉 오른쪽 귀와 왼쪽귀로 느끼는 소리신호의 크기 차이라든지 소리의 전달시간의 차리로 발생하는 오른쪽 귀와 왼쪽 귀에 들어오는 소리의 위상에 대한 차이에 대한 것들을 수행해 왔다.The three-dimensional sound effect is an effect that occurs because humans collect and listen to sound as two ears. Such a three-dimensional sound effect can be provided by controlling signals reproduced according to the positions of the fixed reproduction speakers when the stereo signal is reproduced. Studies on human speech recognition can be broadly divided into one study with one ear of the right or left ear and one study with both ears. Studies on one ear can be modeled on the process of feeling the existence of a sound and its characteristics, so that the absolute threshold value of a human perceptible signal, or the masking ), And the results are used for effective expression of data, that is, compression. The study of both ears is based on a study of the interactions between input signals in both ears, ie, the difference in the magnitude of the sound signal between the right ear and the left ear, or the phase of the sound coming into the right ear and left ear, I have done things about difference.

이러한 양쪽 귀에 대한 연구결과에 의해, 사람이 공간상의 한 점에서 존재하는 음원을 인식하는 인식특성이 모델링되었고 이러한 특성은 HRTF(head related transfer function ) 이라고 불리운다. 상기 HRTF 함수들은 공간 상의 어떤 한 점에서 소리가 존재할 때 그 신호가 양귀로 전송될 때에 대한 특징에 대한 중이(middle ear)에서의 임펄스 응답 또는 전달함수로 표현된다. 상기 HRTF를 응용함으로써 소리가 존재하는 곳을 삼차원 공간상의 임의의 위치로 옮겨주어 보다 현장감있는 재현이 가능하도록 하였다.The results of these studies on both ears have modeled the cognitive characteristics of human beings that perceive sound sources at one point in space, and these characteristics are called HRTF (head related transfer function). The HRTF functions are represented by an impulse response or transfer function at the middle ear for a characteristic when the signal is transmitted in a positive direction when sound is present at any point in space. By applying the HRTF, the place where the sound exists can be moved to an arbitrary position in the three-dimensional space, so that a more realistic reproduction is possible.

삼차원 공간상의 임의의 한 점 A의 정보를 이용해, 그 점에서 소리가 재현되는 효과를 헤드폰으로 들을 때 쉽게 낼 수가 있다. 공간상의 특정 점 A에서 나는 소리를 X_A라 하면, 오른 쪽 귀와 왼쪽 귀에 들어오는 신호 E_r, E_l 는 다음과 같이 표현된다. 여기서 H_ar, H_al은 A점에서 나는 소리를 오른 쪽, 왼쪽 귀로 들을 때 느끼는 신호의 변형특성이다. 행렬로 표현하면,Using the information of an arbitrary point A in the three-dimensional space, the effect of reproducing the sound at that point can be easily obtained by listening to the headphones. If the sound at a specific point A in the space is X _A , the signals E_r and E_l coming into the right ear and left ear are expressed as follows. Here, H_ar and H_al are the deformation characteristics of the signal that is heard when the sound from point A is heard on the right and left ears. Expressed as a matrix,

[수학식 3]&Quot; (3) "

과 같다.Respectively.

모노 입력신호를 H_ar, H_al을 이용해 마치 A 점에서 들려오는 것과 같이 느끼게 한다. 이러한 효과를 전방 오른쪽/왼쪽 스피커를 이용해 낼 경우에는 양 스피커의 출력에 의해 발생하는 소리의 혼신(cross-talk) 효과를 보상해 주어야 한다. 오른쪽 스피커와 왼쪽 스피커로 나오는 신호들을 각각이라 할 때, 오른쪽 왼쪽 스피커를 통해 귀에 들어오는 신호들은Using the H_ar, H_al, the mono input signal is made to feel as though it is heard at point A When these effects are applied to the front right / left speakers, it is necessary to compensate for the cross-talk effect caused by the output of both speakers. The signals from the right speaker and the left speaker are , The signal coming into the ear through the right and left speakers The

[수학식 4]&Quot; (4) "

으로 나타낼 수가 있다. 여기서는 전달함수 이다.. here Is a transfer function.

이 양쪽 수학식 3, 수학식 4에 의한 값들이 같다면, 점 A에 신호가 위치하고 있다고 느끼게 된다. 풀어주면,If the values according to the equations (3) and (4) are the same, the signal is located at the point A. If you release,

[수학식 5]&Quot; (5) "

가 된다..

상기 수학식 5의 해를 구하기 위해서는 오른쪽 스피커와 왼쪽 스피커의 출력으로 나오는 값을 조절해 주어야 한다. 스피커의 출력값값들이값이 각각에 의해 변형된 신호라고 가정해 주면,In order to obtain the solution of Equation (5), the values output from the right speaker and the left speaker should be adjusted. Output value of speaker The values The values are Assuming that the signal is a signal transformed by < RTI ID = 0.0 >

[수학식 6]&Quot; (6) "

과 같으므로, 수학식 5에 대입해 정리하면(5), " (5) "

[수학식 7]&Quot; (7) "

이 된다. 역변환에 의해 변형시켜주는 값들인들을 구하면 다음과 같다.. The values that are transformed by the inverse transform The following are obtained.

[수학식 8]&Quot; (8) "

여기서는 스피커의 위치가 고정되면 결정되는 값들이고,은 음원의 위치가 정해지면, 그 위치에 따라 정해지는 알려진 값들이기 때문에을 구해줄 수가 있다.here Are values determined when the position of the speaker is fixed, Is a known value that is determined by the position of the sound source when it is determined Can be saved.

이 값들을 구한 뒤에 수학식 6을 이용해 삼차원 공간상의 위치A에서 존재하는 신호를 다른 임의의 위치에서 재현해 주면서 A 위치에서 소리가 나는 것과 같이 스피커를 이용해 재현해 줄 수가 있다.After obtaining these values, the signal existing at position A in the three-dimensional space can be reproduced at a different arbitrary position by using Equation (6), and reproduced by using a speaker as in the case of sound at A position.

크로스토크의 유무에 따른 적합한 처리변환을 하지 않아서 스피커 재현과 헤드폰 재현시 들리는 신호에 대한 느낌이 다른 문제점을 갖는다. 그렇기 때문에 헤드폰으로 재현시에는 수학식 3을 이용해 처리상의 효율성을 기할 수가 있다. 이러한 차이를 처리에 고려해주기 위해서 본 발명에서는 도 7과 같이 제어부로부터 스피커/헤드폰 출력 조절 신호를 받아 그 값이 OFF 이면 헤드폰으로만 인식해 스피커 출력보상 과정을 거치지 않도록 처리하고, 그 값이 ON이면 스피커로 인식해 스피커 출력 값들에 대한 보상을 하는 처리를 한다.There is a problem that the speaker reproduction and the feeling on the sound signal when the headphone is reproduced are different from each other due to the lack of proper processing conversion depending on the presence or absence of the crosstalk. Therefore, when reproducing with a headphone, the processing efficiency can be obtained by using Equation (3). In order to consider such a difference in processing, the present invention receives a speaker / headphone output adjustment signal from the control unit as shown in FIG. 7, and if the value is OFF, recognizes only the headphone and processes it so that the speaker output compensation process is not performed. And recognizes it as a speaker and performs processing to compensate for speaker output values.

본 발명에 의하면, 단일 구조로 여러 콘텐트들에 대한 처리가 가능하고 스케일 조절이 가능한 부호화기 및 복호화기를 구현할 수가 있다. ADPCM 시 실제 발생되는 양자화 에러를 처리에 사용하여 양자화기 단계의 선택 및 버퍼제어를 한다. content manipulation 이 가능하다. 즉 특정 콘텐트의 ON/OFF가 가능하며, 비선형 예측기를 이용해 오차의 전파를 줄일 수 있다.According to the present invention, it is possible to implement an encoder and a decoder capable of processing various contents with a single structure and capable of scale adjustment. The quantization error that is actually generated in the ADPCM is used for processing to select a quantizer step and perform buffer control. Content manipulation is possible. That is, it is possible to turn on / off specific contents, and it is possible to reduce error propagation by using a nonlinear predictor.

또한 특정 콘텐트의 위치이동에 따라 음원의 위치 이동을 시켜주는 것이 가능하다. 스피커를 이용한 재생의 경우와 헤드폰을 이용한 재생 경우에 대해 서로 다른 처리들을 해줌으로써 재현 환경을 고려한 보다 적합한 처리가 가능하게 하며, 인간의 음향심리 특성을 고려해 ADPCM 기의 양자화 단계를 결정한다.In addition, it is possible to move the position of the sound source according to the movement of the specific content. By performing different processes for the reproduction using the speaker and the reproduction using the headphone, a more suitable processing considering the reproduction environment is made possible, and the quantization step of the ADPCM is determined in consideration of human acoustic psychological characteristics.

또한 재현에 사용되는 스피커의 위치를 바꾸어 줄때도 그 변화하는 위치를 알면 새로 변화된 위치의 정보를 이용해서 보다 적합한 재현이 되도록 조절하는 처리가 가능하다. 콘텐트 이동에 따라 음원의 위치 변동이 일어나더라도 처리에 인간이 가지고 있는 특정 위치들에서의 음원에 의한 전달함수를 이용해 주기 때문에 보다 현장감있는 재생이 가능하다.Also, when changing the position of a speaker used for reproduction, knowing the changing position can be used to adjust the reproduction to be more appropriate by using the information of the newly changed position. Even if the position of the sound source changes due to the movement of the content, the transfer function by the sound source at the specific positions of the human being in the processing is used for the reproduction.

Claims

A first filter for dividing an input audio signal into a low frequency band signal and a high frequency band signal; A second filter for dividing the low-frequency band signal separated by the first filter into a finer frequency band; A T / F converter for converting a high frequency band signal separated by the first filter from a time domain to a frequency domain; An ADPCM encoder for encoding an output signal of the second filter into a digital signal by an ADPCM method; A bit allocation quantizer for bit allocating and quantizing an output signal of the T / F converter; A psychoacoustic unit processing the low-frequency and high-frequency signals separated by the first filter according to a predetermined psychoacoustic model to provide information on a step difference value of a quantizer used in the ADPCM encoder; And controls the processing frequency band of the T / F conversion unit. The information processing apparatus includes information indicating a multi-content processing mode or a scaleable mode, and position information of the content ; And a primary bitstream formation unit for forming a bitstream using the signal encoded by the ADPCM encoder, the quantized bits of the bit allocation quantization unit, and the position information of the content of the control unit. .

The apparatus of claim 1, further comprising: a prediction unit for obtaining data association between previous frame information and current frame information of the primary bitstream forming unit; And a bitstream forming unit for forming a bitstream by reducing redundant data according to a frame association of the predictor calculated by the predictor.

3. The audio encoding apparatus according to claim 1 or 2, wherein the ADPCM encoder uses a nonlinear predictor.

A bit stream decomposition unit for decomposing an input bit stream; An inverse quantizer for inversely quantizing the bitstream deconstructed by the bitstream decomposition unit; An ADPCM decoder for decoding the bitstream deconstructed by the bitstream decomposition unit; A first signal synthesizer for synthesizing low frequency band signals decoded by the ADPCM decoder; An F / T converter for converting a high frequency band signal into a time domain; A second signal synthesizer for synthesizing the low frequency band signal synthesized by the first signal synthesizer and the F / T converter output signal; A space control processor for extracting positional information in a space of contents from the signal demultiplexed by the bitstream demultiplexing unit and adjusting a position of the sound source according to speaker reproduction or headphone reproduction; Wherein the decoding means decodes the bitstream to determine whether the bitstream is in a multi-content processing mode or a scalable mode, and outputs the output signal of the F / T converter when the determined mode is the multi- And controlling the scale adjustment in the space control processing unit according to a scale adjustment command of the user; And a buffer output unit for temporarily storing and outputting signals output from the space control processing unit and the second signal combining unit.

5. The apparatus of claim 4, further comprising: a predictor for determining whether the bitstream demultiplexed by the bitstream demultiplexer is a bitstream using previous frame information; And a primary bitstream demultiplexer for reconstructing a bitstream using previous frame information if it is determined that the prediction in the predictor is a bitstream using previous frame information.

5. The audio decoding apparatus of claim 4, wherein the spatial control processor adjusts the position of the image content according to the positional change of the image content having the positional information to adjust the sound considering the positional shift of the sound source.

7. The audio decoding apparatus of claim 6, wherein the position information of the space control processing unit uses a position of a vocal organ as a reference position.

5. The audio decoding apparatus of claim 4, wherein the ADPCM decoder uses a nonlinear predictor.

5. The audio decoding apparatus of claim 4, wherein the information about the enlargement / reduction of the video content is reflected in reproduction of the audio signal.

A frequency band separation step of dividing the input audio signal into a low frequency band signal and a high frequency band signal; A low frequency separation step of dividing the low frequency band signal separated in the frequency band separation step into a finer frequency band; Determining whether to encode multiple contents to be processed simultaneously or to encode to enable scale adjustment; A coding step of coding a signal separated in the low frequency separation step into a digital signal by an ADPCM method when coding is performed so that multiple contents can be simultaneously processed; Frequency band separating step, the signal separated in the low-frequency demultiplexing step is encoded into a digital signal by the ADPCM method, and the high-frequency band signal separated in the frequency- / F conversion step; A quantization step of bit-allocating and quantizing a signal transformed in the T / F conversion step; An acoustic psychological step of processing the low-frequency and high-frequency signals separated in the frequency band separation step according to a predetermined acoustic psychological model and providing information on a step difference value of a quantizer used in the encoding step; And forming a bitstream using the encoded signal, the quantized bits, and the location information of the content.

A bit stream decomposing step of decomposing an input bit stream; Determining whether the decoded bitstream is a multiple content processing mode or a scalable mode in the bitstream decompression step; An inverse quantization step of inversely quantizing the decoded bit stream; A decoding step of decoding the decoded bit stream in the decoding step; A first signal synthesizing step of synthesizing the decoded low frequency band signal; An F / T conversion step of converting a high frequency band signal into a time domain if the determined mode is not a multiple content processing mode; A second signal synthesis step of synthesizing the low-frequency band signal synthesized in the signal synthesis step and the signal converted in the F / T conversion step; A spatial processing step of adjusting a scale according to a scale adjustment command of a user and extracting position information in a space of contents from the decomposed signal in the bit stream decompression step to adjust a position of a sound source according to speaker reproduction or headphone reproduction, ; And buffering the signal synthesized in the second signal synthesis step and the signal processed in the spatial processing step and outputting the buffered signal.