KR100316769B1

KR100316769B1 - Audio encoder/decoder apparatus and method

Info

Publication number: KR100316769B1
Application number: KR1019970008189A
Authority: KR
Inventors: 김상욱
Original assignee: 윤종용; 삼성전자 주식회사
Priority date: 1997-03-12
Filing date: 1997-03-12
Publication date: 2002-01-15
Also published as: KR19980073078A

Abstract

PURPOSE: An audio encoder/decoder apparatus and method is provided to perform a scale adjusting function by one structure and processing multiple contents at the same time. CONSTITUTION: The first filter(100) separates an input audio signal into low-frequency and high-frequency signals. The second filter(110) separates the low-frequency signal into a fine frequency band. An ADPCM encoder(120) encodes an output signal of the second filter(110) into a digital signal by an ADPCM type. A time-to-frequency conversion part(130) converts the high-frequency signal of a time domain into a frequency domain. A bit alignment and quantization part(140) bit-aligns and quantizes an output signal of the conversion part(130). The first bit stream forming part(150) forms a bit stream using the encoded signal, the quantized bits, position information of contents, and a process mode. A sound processing part(160) processes the low-frequency and high-frequency signals according to a sound psychological model to adjust a delta value of a quantizer used in the encoder(120). A control part(170) provides the process mode and the position information of contents. A prediction part(180) gets correlation between previous frame information of the part(150) and present frame information, and a bit stream forming part(190) reduces repeated data according to the calculated correlation information to form a bit stream.

Description

Audio encoding / decoding apparatus and method

본 발명은 오디오 부호화/복호화 장치 및 방법에 관한 것으로서, 특히 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 부호화/복호화 장치 및 방법에 관한 것이다.The present invention relates to an audio encoding / decoding apparatus and method, and more particularly, to an audio encoding / decoding apparatus and method capable of scaling in one structure and simultaneously processing multiple contents.

최근 영상회의, 영상쇼핑 등 인터랙티브(interactive)한 서비스가 다양하게 제공되고 있다. 이러한 인터랙티브한 서비스에서는 의미있는 영상단위(content : 이하 콘텐트라 함)들이 모여서 하나의 화면을 이루고 있다. 상기 의미있는 영상단위(콘텐트)들 각각은 하나의 처리단위로 되며, 개별적으로 이동이나 확대, 축소 및 삭제가 된다. 이렇게 복수의 콘텐트들을 각기 개별적으로 또는 동시에 처리하는 시스템을 다중 콘크런트 시스템(multiple concurrent system)이라 한다.Recently, various interactive services such as video conferencing and video shopping have been provided. In such an interactive service, meaningful video units (content) are gathered to form a screen. Each of the meaningful image units (contents) is one processing unit, and are individually moved, enlarged, reduced, and deleted. Such a system that processes a plurality of contents individually or simultaneously is called a multiple concurrent system.

또한 데이터 전송선로를 효과적으로 사용하기 위해서는 정보의 표현에 사용되는 비트들에 대해 상기 비트들이 가지는 정보의 중요도 또는 사용자의 요구에 따라 재현에 사용되는 비트율의 조절이 필요하다. 즉 사용되는 비트들의 수를 조절할 수 있는(scalable ) 처리가 요구된다.In addition, in order to effectively use the data transmission line, it is necessary to adjust the bit rate used for reproduction according to the importance of the information or the user's request for the bits used for the representation of the information. That is, a process capable of scaling the number of bits used is required.

일반적으로 오디오 데이터와 비디오 데이터를 부호화하거나 복호화할 때 각각의 콘텐트들은 서로 구분되지 않고 부호화 및 복호화되기 때문에, 존재하는 오디오 신호들 가운데 특정 오디오 신호만을 뽑아내서 재현한다든지, 존재하는 비디오 신호들 가운데 특정 부분만을 뽑아내서 이동, 삭제 및 변형 등의 처리를 하기가 쉽지 않다. 이러한 문제는 콘텐트 각각의 독립적인 제어가 가능하게 되면 해결될 수 있다. 콘텐트 각각의 독립적인 제어가 되면, 특정 사람의 목소리와 같은 콘텐트를 듣기 싫은 경우에는 이를 없앨 수가 있고, 특정 사람의 화면상의 위치가 변했을 때 변한 위치를 고려해 출력 오디오 데이터를 변형시킬 수가 있다.In general, when encoding or decoding audio data and video data, each content is encoded and decoded without being distinguished from each other. Therefore, only a specific audio signal is extracted from the existing audio signals and reproduced, or specific among existing video signals. It is not easy to extract only the parts and process them such as moving, deleting and transforming them. This problem can be solved when independent control of each content becomes possible. If each content is controlled independently, it can be eliminated when it is not desired to listen to content such as the voice of a specific person, and the output audio data can be modified in consideration of the changed position when the position of the specific person's screen is changed.

그러나 이러한 경우, 각 사람의 소리가 독립된 채널로 전달되기 때문에 특정인의 소리를 없애기 위해서는 독립된 채널 데이터를 전달해준다던지 혹은 전달하지 않는다던지 하는 것에 의해 용이하게 이루어질 수 있으나, 처리하는 콘텐트 각각에 대해 독립된 채널을 할당함으로 인해 시스템의 복잡도가 커지는 문제가 있다. 또한 상기 시스템이 다중 콘텐트(multiple contents )를 사용하는 영상회의 등 특정 목적에 사용되지 않는 경우, 시스템 구성요소 중 사용되지 않는 부분들이 많아지기 때문에 시스템의 효과적인 활용이 곤란하다는 문제가 있다.However, in this case, since the sound of each person is transmitted to an independent channel, it can be easily achieved by transmitting independent channel data or not to remove the specific person's sound, but the independent channel for each content to be processed. There is a problem that increases the complexity of the system by assigning. In addition, when the system is not used for a specific purpose, such as a video conference using multiple contents, there is a problem that effective use of the system is difficult because there are many unused parts of the system components.

한편 부호화기 및 복호화기에서 스케일조절(scalable)이 필요한 이유는 다음과 같다. 비디오와 오디오 정보가 있을 때, 경우에 따라서는 비디오 정보만이 중요한 경우가 있고, 또 오디오 정보만이 중요한 경우도 있다. 이와 같은 때, 비디오 정보와 오디오 정보에 고정된 비트율을 사용하면, 정보 전송시 채널의 전송능력을 효과적으로 활용하지 못할 수 있다. 이런 경우 정보의 중요도에 따라 처리에 사용되는 데이터 전송 비트율을 조절하면, 한정된 전송능력을 가진 채널을보다 효과적으로 사용할 수 있게 된다. 또한 비디오 채널검색, 오디오 채널검색과 같은 경우에는 어느 프로그램이 서비스되는지를 아는게 중요하다. 그래서 서비스에 사용되는 정보를 스케일러블(scalable)하게 줄여주어서 음질이나 화질은 저하되더라도, 많은 채널에 대한 정보를 동시에 보내줘 효과적인 채널검색이 가능하도록 한다. 그런데 스케일 조절이 가능한 장치를 구현함에 있어서, 기존의 방식은 부호화한 후 복호화해서 오차신호들을 구한 다음 스케일 조절이 가능한 비트스트림을 만들기 때문에, 스케일 조절에 필요한 단계가 많아지면 각 단계수 만큼 복잡도가 배로 늘어나는 문제가 있다.On the other hand, the reason for the scalable (scalable) in the encoder and decoder is as follows. When there is video and audio information, only video information is important in some cases, and only audio information is important in some cases. In such a case, if a fixed bit rate is used for the video information and the audio information, it may not be possible to effectively use the channel transmission capability in transmitting the information. In this case, if the data transmission bit rate used for processing is adjusted according to the importance of information, a channel having a limited transmission capacity can be used more effectively. In addition, it is important to know which programs are served in cases such as video channel search and audio channel search. Therefore, the information used for the service is scalable, so that even if the sound quality or the image quality is deteriorated, information on many channels is sent at the same time to enable effective channel search. However, in the implementation of a device capable of scaling, the conventional method obtains error signals by encoding and decoding, and then creates a bitstream that can be scaled. Therefore, as the number of steps required for scaling is increased, the complexity is multiplied by the number of steps. There is an increasing problem.

한편, Multiple concurrent processing 이 필요한 이유는 다음과 같다. 화상회의나 다자간의 통화와 같은 경우, 각자에 대해 또는 각 콘텐트들에 따라 처리가 가능하게 되면, 특정 사람의 목소리와 같은 콘텐트가 듣기 싫은 경우에는 삭제시킬 수 있으며, 또한 특정사람의 화면상의 위치가 변화할 때, 변화하는 위치를 고려해 출력 오디오 데이터를 변형시켜 음원의 위치를 이동하여 처리할 수가 있다. 이 모든 처리가 만약 동시에 일어나지 않는다면 듣는 소리와 입 모양이 틀려지게 되고 그렇게 되면 실시간으로 대화하고 있는 것같지 않아 부자연스럽게 된다. 그래서 여러 콘텐트들을 다루기 위해서는 multiple concurrent processing system 이 되어야 한다.On the other hand, the reasons for multiple concurrent processing are as follows. In the case of a video conference or a multi-party call, if the processing is possible for each person or according to each content, the content such as the voice of a specific person can be deleted if the user does not want to hear it. When changing, the output audio data can be modified in consideration of the changing position to move and process the position of the sound source. If all this does not happen at the same time, the sound you hear and the shape of your mouth will be wrong, and you will be unnatural because you don't seem to be talking in real time. Thus, to handle multiple contents, it must be a multiple concurrent processing system.

이 때, 음원의 위치 이동은 인간이 삼차원 공간에 존재하는 소리를 양쪽 귀로 듣고 느끼는 것에 대한 연구결과를 적용시켜 줌으로써 개선이 가능하다. 즉, 오른쪽 귀와 왼쪽 귀로 느끼는 소리 신호의 크기 차이라든지 소리의 전달 시간에 대한 연구결과에 의해 사람이 공간상의 한 점에서 존재하는 음원을 인식하는 인식 특성이 모델링되었고, 이러한 특성은 HRTF(head related transfer function)이라고 불리운다. 상기 HRTF 함수들은 공간 상의 어떤 한 점에서 소리가 존재할 때, 그 신호가 양 귀로 전송될 때에 대한 특징에 대한 중이(middle ear)에서의 임펄스 응답 또는 전달함수로 표현된다. 상기 HRTF를 응용하여 소리가 존재하는 곳을 삼차원 공간상의 임의의 위치로 옮겨주는 처리가 가능하게 되었다.At this time, the positional movement of the sound source can be improved by applying the results of research on human hearing and feeling the sound existing in the three-dimensional space. In other words, based on the results of the difference between the magnitude of the sound signal sensed by the right ear and the left ear and the time of sound transmission, the recognition characteristics of the sound source that exist at a point in space are modeled. function). The HRTF functions are expressed as an impulse response or transfer function at the middle ear for a characteristic when there is sound at a point in space, when the signal is transmitted to both ears. By applying the HRTF, it becomes possible to move the place where sound exists to an arbitrary position in the three-dimensional space.

그러나 종래에는 화면상의 특정영역, 즉 사람, 동물등과 같은 의미있는 부분(콘텐트)에서 발생되는 소리만을 골라 처리하기가 어려웠다. 예를 들어 화면상의 특정인의 위치를 바꾸거나 없애주기 위한 비디오와 오디오 처리가 곤란하였다. 그렇기 때문에 독립된 콘텐트 각각에 대한 처리를 할 수 없었으며, 복호화 단계에서 전송 또는 저장되어 있는 데이터 중 일부만을 변형처리하기가 용이하지 않았다. 결론적으로, 종래의 부호화 및 복호화 시스템에서는 스케일 조절이 가능하고 동시에 다중 콘텐트 처리가 가능한 방식에 대한 고려가 없었다.However, in the past, it was difficult to process only the sound generated in a specific area on the screen, that is, a meaningful part such as a person or an animal. For example, it was difficult to process video and audio to change or eliminate the position of a particular person on the screen. As a result, it was not possible to process each of the independent contents, and it was not easy to transform only a part of the data transmitted or stored in the decoding step. In conclusion, in the conventional encoding and decoding system, there is no consideration of a method of scaling and simultaneously processing multiple contents.

본 발명은 상술한 문제점을 해결하기 위해 창출된 것으로서, 채널검색등에 활용이 가능한 스케일조절에 의한 재생이나, 다자간 통화나 영상회의 등에 활용이 가능한 여러 콘텐트들에 대한 처리를 위해, 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 부호화/복호화 장치 및 방법를 제공함에 그 목적이 있다.The present invention was created to solve the above-described problems, and scaled in a single structure for processing by a scale adjustment that can be used for channel search or the like for processing various contents that can be used for multi-party calls or video conferences. It is an object of the present invention to provide an audio encoding / decoding apparatus and method capable of adjusting and simultaneously processing multiple contents.

도 1은 본 발명에 의한 스케일조절과 다중 콘텐트 처리가 가능한 오디오 부호화기의 구성을 블록도로 도시한 것이다.1 is a block diagram illustrating a configuration of an audio encoder capable of scaling and multi-content processing according to the present invention.

도 2는 본 발명에 의한 스케일조절과 다중 콘텐트 처리가 가능한 오디오 복호화기의 구성을 블록도로 도시한 것이다.2 is a block diagram illustrating a configuration of an audio decoder capable of scaling and multi-content processing according to the present invention.

도 3a 및 도 3b는 종래의 ADPCM 부호화기 및 ADPCM 복호화기를 블록도로 도시한 것이다.3A and 3B show block diagrams of a conventional ADPCM encoder and an ADPCM decoder.

도 4는 부호화기의 영상화면의 콘텐트 위치 정보 표현 방식을 도시한 것이다.4 illustrates a method of representing content position information of an image screen of an encoder.

도 5a 및 도 5b는 복호화기에서 영상화면에서의 콘텐트 이동에 따른 위치 정보표현 방식을 설명하기 위한 것으로서, 원래화면과 콘텐트가 이동한 후의 화면을 도시한 것이다.5A and 5B are diagrams for describing a position information expression method according to a content movement on a video screen in a decoder, and show a screen after the original screen and the content are moved.

도 6a 및 도 6b는 본 발명에 사용되는 ADPCM의 부호화기 및 ADPCM의 복호화기의 구성을 블록도로 도시한 것이다.6A and 6B show block diagrams of the ADPCM encoder and ADPCM decoder used in the present invention.

도 7은 헤드폰과 스피커로 재현해 주는 경우에 대한 일예를 도시한 것이다.Figure 7 shows an example of the case of reproducing with headphones and speakers.

상기의 목적을 달성하기 위한 본 발명에 의한, 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 부호화장치는 입력오디오 신호를 저주파대역신호와 고주파대역신호로 나누는 제1필터; 상기 제1필터에 의해 분리된 저주파대역신호를 보다 세밀한 주파수 대역으로 나누는 제2필터; 상기 제1필터에 의해 분리된 고주파대역 신호를 시간영역에서 주파수영역으로 변환하는 T/F변환부; 상기 제2필터의 출력신호를 ADPCM 방식에 의해 디지털 신호로 부호화하는 ADPCM부호화기; 상기 T/F변환부의 출력신호를 비트할당하고, 양자화하는 비트할당&양자화부; 상기 제1필터에서 분리된 저주파 및 고주파 신호를 소정의 음향심리모델에 따라 처리하여 상기 ADPCM부호화기와 T/F변환부에서 발생되는 양자화오차 제어에 대한 정보를 제공하는 음향심리부; 상기 입력오디오 신호의 상기 제1필터, 제2필터 통과여부를 제어하고, 상기 T/F변환부의 처리 주파수 대역을 제어하며, 다중콘텐트처리 모드 또는 스케일 조절가능 모드를 나타내는 정보 및 콘텐트의 위치정보를 제공하는 제어부; 및 상기 ADPCM부호화기에서 부호화된 신호와 상기 비트할당&양자화부에서 양자화된 비트들과 상기 제어부의 콘텐트의 위치정보를 이용하여 비트스트림을 형성하는 1차비트스트림 형성부를 포함함을 특징으로 한다.According to the present invention for achieving the above object, an audio encoding apparatus capable of scaling in one structure and capable of simultaneously processing multiple contents includes a first filter for dividing an input audio signal into a low frequency band signal and a high frequency band signal; A second filter dividing the low frequency band signal separated by the first filter into a finer frequency band; A T / F converter for converting a high frequency band signal separated by the first filter from a time domain to a frequency domain; An ADPCM encoder for encoding the output signal of the second filter into a digital signal by the ADPCM method; A bit allocation and quantization unit for allocating and quantizing the output signal of the T / F converter; An acoustic psychology unit which processes the low frequency and high frequency signals separated by the first filter according to a predetermined acoustic psychology model and provides information on quantization error control generated by the ADPCM encoder and the T / F converter; Controlling whether the input audio signal passes through the first filter and the second filter, controlling the processing frequency band of the T / F converter, and indicating information indicating a multi-content processing mode or a scale adjustable mode and position information of the content. Providing a control unit; And a primary bitstream forming unit configured to form a bitstream by using the signal encoded by the ADPCM encoder, the bits quantized by the bit allocation and quantization unit, and position information of the content of the controller.

본 발명의 다른 목적을 달성하기 위한, 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 복호화장치는 입력 비트스트림을 해체하는 비트스트림해체부; 상기 비트스트림해체부에서 해체된 비트스트림을 역양자화하는 역양자화기; 상기 비트스트림해체부에서 해체된 비트스트림을 복호화하는 ADPCM복호화기; 상기 ADPCM복호화기에서 복호화된 저주파 대역별 신호를 합성하는 제1신호합성부; 고주파 대역 신호를 시간영역으로 변환하는 F/T변환부; 상기 제1신호합성부에서 합성된 저주파대역 신호와 상기 F/T변환부 출력신호를 합성하는 제2신호합성부; 상기 비트스트림해체부에서 해체된 신호에서 콘텐트들의 공간에서의 위치정보를 추출하여 스피커 재현인지, 헤드폰 재현인지에 따라 음원의 위치를 조절하는 공간제어처리부; 상기 비트스트림해체부에서 해체된 신호를 받아, 상기 비트스트림이 다중 콘텐트처리모드인지 스케일조절가능모드인지를 판별하고, 상기 판별된 모드가 다중 콘텐트처리모드이면 상기 F/T 변환부의 출력신호가 출력되지 않게 하며, 사용자의 스케일조절 명령에 따라 상기 공간제어처리부에서의 스케일조절을 제어하는 제어부; 및 상기 공간제어처리부 및 제2신호합성부에서 출력되는 신호를 일시 저장하여 출력하는 버퍼출력부를 포함함이 바람직하다.In order to achieve another object of the present invention, an audio decoding apparatus capable of adjusting scale in one structure and simultaneously processing multiple contents may include: a bitstream decomposing unit for decomposing an input bitstream; An inverse quantizer for inversely quantizing the bitstream decomposed by the bitstream decomposing unit; An ADPCM decoder for decoding the bitstream decomposed by the bitstream decomposing unit; A first signal synthesizer for synthesizing the signals of the low frequency bands decoded by the ADPCM decoder; An F / T converter for converting a high frequency band signal into a time domain; A second signal synthesizer for synthesizing the low frequency band signal synthesized by the first signal synthesizer and the output signal of the F / T converter; A space control processor configured to extract position information in the space of contents from the signal disassembled by the bitstream decomposing unit and adjust the position of the sound source according to whether the speaker or the headphone is reproduced; Receiving the signal disassembled by the bitstream decomposing unit, it is determined whether the bitstream is a multi-content processing mode or a scalable control mode, and if the determined mode is a multi-content processing mode, an output signal of the F / T converter is output. A control unit controlling the scale adjustment in the space control processing unit according to a user's scale adjustment command; And a buffer output unit which temporarily stores and outputs signals output from the spatial control processing unit and the second signal synthesizing unit.

상기의 또 다른 목적을 달성하기 위한 본 발명에 의한, 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 부호화방법은, 입력오디오 신호를 저주파대역신호와 고주파대역신호로 나누는 주파수대역분리단계; 상기 주파수대역분리단계에서 분리된 저주파대역신호를 보다 세밀한 주파수 대역으로 나누는 저주파분리단계; 다중 콘텐트를 동시에 처리할 수 있도록 부호화할 것인지, 스케일조절이 가능하도록 부호화할 것인지 판단하는 단계; 다중콘텐트를 동시에 처리할 수 있도록 부호화할 경우, 상기 저주파분리단계에서 분리된 신호를 ADPCM 방식에 의해 디지털 신호로 부호화하는 부호화단계; 스케일조절이 가능하도록 부호화하고자 할 경우, 상기 저주파분리단계에서 분리된 신호를 ADPCM 방식에 의해 디지털 신호로 부호화하고, 상기 주파수대역분리단계에서 분리된 고주파대역 신호를 시간영역에서 주파수영역으로 변환하는 T/F변환단계; 상기 T/F변환단계에서 변환된 신호를 비트할당하고, 양자화하는 양자화단계; 상기 주파수대역분리단계에서 분리된 저주파 및 고주파 신호를 소정의 음향심리모델에 따라 처리하여 상기 부호화단계에서 사용되는 양자화기의 단계 차이 값에 대한 정보를 제공하는 음향심리단계; 및 상기 부호화된 신호와 상기 양자화된 비트들과 콘텐트의 위치정보를 이용하여 비트스트림을 형성하는 단계를 포함함을 특징으로 한다.According to the present invention for achieving the above object, the audio encoding method capable of adjusting the scale in one structure and simultaneous processing of multiple contents includes a frequency band dividing an input audio signal into a low frequency signal and a high frequency band signal. Separation step; A low frequency separation step of dividing the low frequency band signal separated in the frequency band separation step into finer frequency bands; Determining whether to encode multiple contents at the same time or to encode scales; An encoding step of encoding a signal separated in the low frequency separation step into a digital signal by an ADPCM method when encoding multiple content at the same time; In order to encode the scale, T which encodes the signal separated in the low frequency separation step into a digital signal by the ADPCM method and converts the high frequency band signal separated in the frequency band separation step from the time domain to the frequency domain. / F conversion step; A quantization step of bit-allocating and quantizing the signal converted in the T / F conversion step; An acoustic psychology step of processing the low frequency and high frequency signals separated in the frequency band separation step according to a predetermined acoustic psychological model to provide information on the step difference value of the quantizer used in the encoding step; And forming a bitstream using the encoded signal, the quantized bits, and position information of the content.

본 발명의 또 다른 목적을 달성하기 위한, 하나의 구조로 스케일 조절이 가능하고 다중 콘텐트들의 동시 처리가 가능한 오디오 복호화방법은, 입력되는 비트스트림을 해체하는 비트스트림해체단계; 상기 비트스트림해체단계에서 해체된 비트스트림이 다중 콘텐트처리모드인지 스케일조절가능모드인지를 판별하는 단계; 상기 해체된 비트스트림을 역양자화하는 역양자화단계; 상기 비트스트림해체단계에서 해체된 비트스트림을 복호화하는 복호화단계; 상기 복호화된 저주파 대역별 신호를 합성하는 제1신호합성단계; 상기 판별된 모드가 다중 콘텐트처리모드가 아니면, 고주파 대역 신호를 시간영역으로 변환하는 F/T변환단계; 상기 신호합성단계에서 합성된 저주파대역 신호와 상기 F/T변환단계에서 변환된 신호를 합성하는 제2신호합성단계; 사용자의 스케일조절 명령에 따라 스케일을 조절하고, 상기 비트스트림해체단계에서 해체된 신호에서 콘텐트들의 공간에서의 위치정보를 추출하여 스피커 재현인지, 헤드폰 재현인지에 따라 음원의 위치를 조절하는 공간처리단계; 및 상기 제2신호합성단계에서 합성된 신호와 상기 공간처리단계에서 처리된 신호를 버퍼링하여 출력하는 단계를 포함함이 바람직하다.In order to achieve another object of the present invention, an audio decoding method capable of scaling in one structure and simultaneously processing multiple contents may include: a bitstream decomposing step of decomposing an input bitstream; Determining whether the decoded bitstream in the bitstream decomposing step is a multi-content processing mode or a scalable control mode; An inverse quantization step of inversely quantizing the disassembled bitstream; A decoding step of decoding the bitstream decomposed in the bitstream decomposing step; A first signal synthesis step of synthesizing the decoded low frequency band-specific signals; An F / T conversion step of converting a high frequency band signal into a time domain when the determined mode is not a multiple content processing mode; A second signal synthesis step of synthesizing the low frequency band signal synthesized in the signal synthesis step and the signal converted in the F / T conversion step; Spatial processing step of adjusting the scale according to the user's scale adjustment command and adjusting the position of the sound source according to whether the speaker or headphone is reproduced by extracting the position information in the space of the contents from the signal disassembled in the bitstream decomposing step. ; And buffering and outputting the signal synthesized in the second signal synthesis step and the signal processed in the spatial processing step.

이하에서 첨부된 도면을 참조하여 본 발명을 상세히 설명하기로 한다. 도 1은 본 발명에 의한 스케일조절과 다중 콘텐트 처리가 가능한 오디오 부호화기의 구성을 블록도로 도시한 것으로서, 제1필터(100), 제2필터(110), ADPCM부호화기(120), T/F변환부(130), 비트할당&양자화부(140), 1차비트스트림형성부(150), 음향심리부(160), 제어부(170), 예측부(180) 및 비트스트림형성부(190)로 이루어진다.Hereinafter, the present invention will be described in detail with reference to the accompanying drawings. 1 is a block diagram illustrating a configuration of an audio encoder capable of scaling and multi-content processing according to the present invention. The first filter 100, the second filter 110, the ADPCM encoder 120, and T / F conversion are shown in FIG. To the unit 130, the bit allocation and quantization unit 140, the primary bit stream forming unit 150, the acoustic psychological unit 160, the control unit 170, the prediction unit 180 and the bit stream forming unit 190 Is done.

상기 제1필터(100)는 입력오디오 신호를 저주파대역신호와 고주파대역신호로 분리한다. 상기 제2필터(110)는 상기 제1필터(100)에 의해 분리된 저주파대역신호를 보다 세밀한 주파수 대역으로 분리한다. 상기 ADPCM부호화기(120)는 상기 제2필터(110)의 출력신호를 ADPCM 방식에 의해 디지털 신호로 부호화한다.The first filter 100 separates the input audio signal into a low frequency band signal and a high frequency band signal. The second filter 110 separates the low frequency band signal separated by the first filter 100 into a finer frequency band. The ADPCM encoder 120 encodes the output signal of the second filter 110 into a digital signal by the ADPCM method.

상기 T/F변환부(130)는 상기 제1필터(100)에 의해 분리된 고주파대역 신호를 시간영역에서 주파수영역으로 변환한다. 상기 비트할당&양자화부(140)는 상기 T/F변환부(130)의 출력신호를 비트할당하고 양자화한다. 상기 1차비트스트림 형성부(150)는 상기 ADPCM부호화기(120)에서 부호화된 신호와 상기 비트할당&양자화부(40)에서 양자화된 비트들과 콘텐트의 위치정보 및 처리모드를 이용하여 비트스트림을 형성한다.The T / F converter 130 converts the high frequency band signal separated by the first filter 100 from the time domain to the frequency domain. The bit allocation and quantization unit 140 bit-assigns and quantizes the output signal of the T / F converter 130. The primary bitstream forming unit 150 generates a bitstream using a signal encoded by the ADPCM encoder 120, position information of bits and content quantized by the bit allocation and quantization unit 40, and a processing mode. Form.

상기 음향심리부(160)는 상기 제1필터(100)에서 분리된 저주파 및 고주파 신호를 음향심리모델에 따라 처리하여 상기 ADPCM부호화기(120)에서 사용되는 양자화기의 단계값인 델타(delta) 값을 조절하고, 비트할당&양자화부(140)에서 사용되는 비트수들을 결정하는 한 척도를 제공한다.The acoustic psychological unit 160 processes the low frequency and high frequency signals separated by the first filter 100 according to an acoustic psychological model, and thus a delta value that is a step value of the quantizer used in the ADPCM encoder 120. And a measure of determining the number of bits used in the bit allocation & quantization unit 140.

상기 제어부(170)는 상기 입력오디오 신호의 상기 제1필터(100), 제2필터(110) 통과여부를 제어하고, 상기 T/F변환부(130)의 처리 주파수 대역을 제어하며, 다중콘텐트처리 모드 또는 스케일 조절가능 모드를 나타내는 처리모드 및 콘텐트의 위치정보를 제공한다.The controller 170 controls whether the input audio signal passes through the first filter 100 and the second filter 110, controls the processing frequency band of the T / F converter 130, and multi-contents. It provides a processing mode indicating a processing mode or a scale adjustable mode and position information of the content.

상기 예측부(180)는 상기 1차비트스트림형성부(150)의 이전 프레임 정보와 현재 프레임 정보의 연관성을 구한다. 상기 비트스트림형성부(190)는 상기 예측부(180)에서 산출된 프레임연관성에 따라 중복되는 데이터를 줄여 비트스트림을 형성한다.The prediction unit 180 obtains an association between previous frame information of the primary bitstream forming unit 150 and current frame information. The bitstream forming unit 190 forms a bitstream by reducing overlapping data according to the frame correlation calculated by the prediction unit 180.

도 2는 본 발명에 의한 스케일조절과 다중 콘텐트 처리가 가능한 오디오 복호화기의 구성을 블록도로 도시한 것으로서, 비트스트림해체부(200), 역양자화기(250), ADPCM복호화기(230), 제1신호합성부(240), F/T변환부(260), 제2신호합성부(280), 공간제어처리부(270), 제어부(290), 버퍼출력부(295), 예측부(210), 1차비트스트림해체부(220)를 포함하여 이루어진다.2 is a block diagram illustrating a configuration of an audio decoder capable of scaling and multi-content processing according to the present invention. The bitstream decomposing unit 200, the dequantizer 250, the ADPCM decoder 230, 1 signal synthesizing unit 240, F / T converting unit 260, second signal synthesizing unit 280, spatial control processing unit 270, control unit 290, buffer output unit 295, prediction unit 210 It comprises a primary bit stream decomposing unit 220.

상기 비트스트림해체부(200)는 입력 비트스트림을 해체한다. 상기 예측부(210)는 상기 비트스트림해체부(200)에서 해체된 비트스트림이 이전 프레임 정보를 이용한 비트스트림인지를 판별한다.The bitstream tearing unit 200 tears down the input bitstream. The predictor 210 determines whether the bitstream decomposed by the bitstream decompressor 200 is a bitstream using previous frame information.

상기 1차비트스트림 해체부(220)는 상기 예측부(210)에서의 판별이 이전 프레임정보를 이용한 비트스트림이라고 판별하면 이전 프레임정보를 이용하여 비트스트림을 재구성한다.The primary bitstream decomposing unit 220 reconstructs the bitstream using previous frame information when it determines that the determination in the prediction unit 210 is a bitstream using previous frame information.

상기 역양자화기(250)는 상기 1차비트스트림해체부(220)에서 해체된 비트스트림을 역양자화한다. 상기 ADPCM복호화기(230)는 상기 1차비트스트림해체부(200)에서 해체된 비트스트림을 복호화한다. 상기 제1신호합성부(240)는 상기 ADPCM복호화기(230)에서 복호화된 저주파 대역별 신호를 합성한다.The dequantizer 250 dequantizes the bitstream disassembled by the primary bitstream decomposing unit 220. The ADPCM decoder 230 decodes the bitstream decomposed by the primary bitstream decomposing unit 200. The first signal synthesizer 240 synthesizes the signals for each low frequency band decoded by the ADPCM decoder 230.

상기 F/T변환부(260)는 고주파 대역 신호를 시간영역으로 변환한다. 상기 제2신호합성부(280)는 상기 제1신호합성부(240)에서 합성된 저주파대역 신호와 상기 F/T변환부(260) 출력신호를 합성한다.The F / T converter 260 converts a high frequency band signal into a time domain. The second signal synthesis unit 280 synthesizes the low frequency band signal synthesized by the first signal synthesis unit 240 and the output signal of the F / T converter 260.

상기 공간제어처리부(270)는 상기 비트스트림해체부(200)에서 해체된 신호에서 콘텐트들의 공간에서의 위치정보를 추출하여 스피커 재현인지, 헤드폰 재현인지에 따라 음원의 위치를 조절한다. 상기 공간제어처리부(270)는 또한 위치정보를 갖는 영상 콘텐트의 위치 변동에 따라 영상콘텐트의 위치좌표를 새로 구해 음원의 위치 이동을 고려한 소리를 조절한다. 상기 공간제어처리부(270)의 위치정보는 발성기관의 위치를 기준위치로 사용한다.The space control processor 270 extracts the position information in the space of the contents from the signal disassembled by the bitstream decomposing unit 200 and adjusts the position of the sound source according to whether the speaker is reproduced or the headphones are reproduced. The spatial control processor 270 also obtains new position coordinates of the image content according to the positional change of the image content having the positional information and adjusts the sound considering the positional movement of the sound source. The position information of the space control processor 270 uses the position of the speech engine as a reference position.

상기 제어부(290)는 상기 비트스트림해체부(220)에서 해체된 신호를 받아, 상기 비트스트림이 다중 콘텐트처리모드인지 스케일조절가능모드인지를 판별하고, 상기 판별된 모드가 다중 콘텐트처리모드이면 상기 F/T 변환부(260)의 출력신호가 출력되지 않게 하며, 사용자의 스케일조절 명령에 따라 상기 공간제어처리부(270)에서의 스케일조절을 제어한다.The control unit 290 receives the signal disassembled by the bitstream decomposing unit 220 to determine whether the bitstream is a multi-content processing mode or a scale adjustable mode, and if the determined mode is a multi-content processing mode, The output signal of the F / T converter 260 is not output, and the scale control in the space control processor 270 is controlled according to a user's scale adjustment command.

상기 버퍼출력부(295)는 상기 공간제어처리부(270) 및 제2신호합성부(280)에서 출력되는 신호를 일시 저장하여 출력한다.The buffer output unit 295 temporarily stores and outputs signals output from the space control processing unit 270 and the second signal synthesizing unit 280.

그러면, 상기와 같은 구성에 의거하여 본 발명의 동작을 설명하기로 한다. 먼저, 부호화기에 대해서 살펴본다. 사용자가 상기 제어부(170)에서 다중콘커런트처리(multiple concurrent processing )와 스케일조절가능한 코딩(scalable coding) 중 하나의 동작모드를 선택한다. 만약, 선택된 동작모드가 multiple concurrent processing 일 경우, 상기 제어부(170)에 의해 상기 T/F변환부(130)은 동작을 하지 않고, 각 필터(100, 110)와 ADPCM부호화기(120)가 동작을 하여 배당된 콘텐트(content)에 대한 처리를 한다. 여기서 상기 필터(100, 110)는 콘텐트가 가지고 있는 주파수 특성을 처리에 고려해주기 위한 안티에일리어싱(anti-aliasing)필터이고, 대역 제한된 신호들은 도 6a에 도시된 바와 같은 ADPCM부호화기에 의해 부호화된다. 이때 상기 ADPCM부호화기(120)에서 사용되는 양자화기(도시안됨)의 단계 델타(delta)는 상기 음향심리를 모델링한 음향심리부(160)에 의해 제어가 된다. 여기서, 필터 및 ADPCM을 4개 병렬로 사용함으로써 최대 4개까지의 콘텐트들이 존재할 때 상기 콘텐트들의 콘커런트처리(concurrent processing)가 가능하게 한다.Then, the operation of the present invention will be described based on the above configuration. First, we look at the encoder. The control unit 170 selects one operation mode among multiple concurrent processing and scalable coding. If the selected operation mode is multiple concurrent processing, the T / F converter 130 does not operate by the controller 170, and each of the filters 100 and 110 and the ADPCM encoder 120 operate. To process the allocated content. Here, the filters 100 and 110 are anti-aliasing filters for considering the frequency characteristic of the content for processing, and band-limited signals are encoded by an ADPCM encoder as shown in FIG. 6A. At this time, the step delta of the quantizer (not shown) used in the ADPCM encoder 120 is controlled by the acoustic psychology unit 160 modeling the acoustic psychology. Here, the use of a filter and four ADPCMs in parallel enables concurrent processing of up to four pieces of content.

한편 사용자가 상기 제어부(170)에서 동작모드를 scalable coding를 선택할 경우, 입력신호는 하나의 콘텐트에 대한 것으로서, 상기 입력신호를 크게 5개의 주파수 대역별로 처리가 가능하게 한다. 샘플링 주파수를 Fs라 하면, Fs/4 - Fs/2, 0 - Fs/16, Fs/16 - Fs/8, Fs/8 - 3Fs/16, 3Fs/16 - Fs/4의 5개 대역으로 처리한다. 인간이 고주파수쪽 신호에 대해서는 민감도가 떨어지므로, 고주파수쪽에는 처리에 사용되는 주파수 대역을 넓게 하여 상기 T/F변환부(120)에서 T/F 변환을 하고, 낮은 주파수 쪽에서는 구성상의 복잡도를 간단하게 해주면서 상기 ADPCM부(130)를 사용한다.On the other hand, when the user selects the scalable coding in the operation mode in the control unit 170, the input signal is for one content, it is possible to process the input signal by five frequency bands. If the sampling frequency is Fs, it is processed into five bands: Fs / 4-Fs / 2, 0-Fs / 16, Fs / 16-Fs / 8, Fs / 8-3Fs / 16, and 3Fs / 16-Fs / 4. do. Since humans are less sensitive to the high frequency signal, the T / F conversion is performed by the T / F converter 120 by widening the frequency band used for processing on the high frequency side, and the configuration complexity is simplified on the low frequency side. While using the ADPCM unit 130.

여기서 저주파수 대역에는 데이터 전송시 발생가능한 에러에 대한 탄력성(resilience)을 위해 비선형 예측기를 사용한 ADPCM을 사용한다. 비선형예측기를 사용한 ADPCM부호화기(120)의 에러 resilience 에 대한 일예는 뒤에 보다 자세히 설명하기로 한다. 그리고 DPCM을 사용하지 않고 ADPCM을 사용하는 이유는 신호에 보다 적합한 양자화기를 사용하기 위함이며, 또한 도 6a과 같이 ADPCM 결과에 의해 발생하는 오차신호의 파워를 계산해 상기 음향심리부(160)의 인간의 음향심리 모델에 의해서 구한 한계치 이내에 드는지를 고려하여, 버퍼제어가 가능하게 한다.In the low frequency band, ADPCM using a nonlinear predictor is used for resilience to errors that may occur during data transmission. An example of the error resilience of the ADPCM encoder 120 using the nonlinear predictor will be described later in more detail. The reason why the ADPCM is used without using the DPCM is to use a quantizer that is more suitable for the signal, and also calculates the power of the error signal generated by the ADPCM result as shown in FIG. The buffer control is made possible by considering whether it falls within the limit obtained by the psychoacoustic model.

데이터들의 처리결과는 1차 비트스트림 형성부(150)로 전달이 되고, 전달된 다음에는 상기 예측부(180)에서 이전에 구성한 비트스트림과 비교를 해서 다른 점을 구한다. 이 때, 앞의 프레임과 뒤의 프레임간의 연관성이 소정의 한계값 이상이 되면 예측 온(prediction on)을 해서 상기 비트스트림형성부(190)를 통해 비트스트림을 구성해 전달하고, 소정의 한계값 이하일 경우는 예측 오프(prediction off)를 해서 비트스트림을 구성한다.The result of the processing of the data is transferred to the primary bitstream forming unit 150, and after the transfer, the difference is obtained by comparing with the bitstream previously configured in the prediction unit 180. At this time, when the association between the previous frame and the next frame is more than a predetermined threshold value, the prediction is on, and the bitstream is configured and transmitted through the bitstream forming unit 190, and the predetermined threshold value is obtained. In the following cases, prediction off is performed to configure the bitstream.

한편, 복호화기는 다음과 같다. 먼저, 상기 비트스트림헤체부(200)를 통해 비트스트림이 해체된다. 그리고 나서 상기 예측부(210)를 통해 해체된 비트스트림상의 프레딕션(prediction) 온/오프(on/off) 정보를 체크해 이전 프레임 결과를 처리에 사용하든지, 안하든지를 알고 비트스트림을 재구성한다. 만일 프레딕션 온인 경우에는 상기 1차비트스트림해체부(220)에 의해 이전 프레임 결과를 처리에 사용하여 비트스트림을 해체하며, 상기 역양자화기(250)을 통해 역양자화된다. 만일 프레딕션 오프인 경우는 상기 비트스트림 해체부(200)에서 해체된 비트스트림을 그대로 사용한다.Meanwhile, the decoder is as follows. First, a bitstream is decomposed through the bitstream body 200. Then, the prediction unit 210 checks the prediction on / off information on the decomposed bitstream and reconstructs the bitstream based on whether the previous frame result is used for processing or not. If the prediction is on, the first bitstream decomposing unit 220 decomposes the bitstream using the previous frame result for processing and dequantizes the dequantizer through the dequantizer 250. If the prediction is off, the bitstream decomposed by the bitstream decomposing unit 200 is used as it is.

그 다음, 제어부(290)는 비트스트림상의 정보를 읽어 이 비트스트림이 multiple concurrent processing 을 하고 있는지 아니면 scalable 복호화기로서의 역할을 하고 있는지를 검출한다. 만일 multiple concurrent processing 의 경우 상기 제어부(290)에 의해 해체된 비트스트림은 F/T변환기(260)에 통과되지 않고 상기 ADPCM복호화기(230)에 의해 ADPCM을 수행한다. 그리고 나서 제1신호합성부(240)에서 부호화될 때와는 반대로 세밀하게 나누어진 저주파수 부분에 대해 다시 신호가 합쳐져 하나의 저주파수 대역으로 된다. 그리고 만일 scalable 부호화기 및 복호화기로서 사용된 경우, 상기 제어부(290)의 제어에 의해 상기 F/T변환부(260)에 통과되면서 신호들이 재현된다.Then, the control unit 290 reads the information on the bitstream and detects whether the bitstream is performing multiple concurrent processing or serving as a scalable decoder. In the case of multiple concurrent processing, the bitstream decomposed by the controller 290 does not pass through the F / T converter 260 and performs ADPCM by the ADPCM decoder 230. Then, the signals are summed again to form a low frequency band for the low frequency part that is finely divided, as opposed to when encoded by the first signal synthesis unit 240. When used as a scalable encoder and decoder, signals are reproduced while being passed to the F / T converter 260 by the control of the controller 290.

이렇게 상기 제1신호합성부(240)에서 재현된 저주파수 대역 신호와 상기 F/T변환부(260)에서 변환된 고주파수 대역 신호는 상기 제신호합성부(280)를 통해 합쳐져서 상기 버퍼출력부(295)로 출력된다.The low frequency band signal reproduced by the first signal synthesizer 240 and the high frequency band signal converted by the F / T converter 260 are combined through the first signal synthesizer 280 to provide the buffer output unit 295. Will be printed).

비트스트림 상에 있는 각 content들의 위치정보를 이용해 복호화시 각 content 들의 공간상의 위치에 다른 보다 효과적인 처리가 가능하게 된다. 여기서 제어부(290)에서 어떤 content의 위치를 이동시켜주면 이동되는 위치를 연산에 의해 구한 뒤, 음원의 위치이동에 따른 보상을 해준다. 부호화기에서 음원의 위치보상을 고려해주지 않고 복호화기에서 고려해주는 이유는 만약 부호화기에서 변형시켰을 때 복호화기에서 또 다른 이동에 따른 변형을 한다면, 부호화기에서 변형된 효과를 없앤 후에 다시 변형에 따른 제어를 해줘야 하기 때문에 복잡도가 2배로 드는 문제가 있기 때문이다. 복호화기에서만 고려해줌으로써 복잡도가 2배가 되는 것을 방지할 수가 있다.By using the location information of each content on the bitstream, other more effective processing on the spatial location of each content is possible. In this case, when the position of a certain content is moved by the controller 290, the position to be moved is calculated by calculation, and the compensation is performed according to the position movement of the sound source. The reason that the encoder considers the position of the sound source rather than the position compensation of the encoder is that if the encoder is transformed by another movement when it is transformed in the encoder, the encoder must control the deformation after removing the transformed effect. This is because there is a problem that the complexity is doubled. By considering only the decoder, the complexity can be prevented from being doubled.

한편 상기 multiple 콘텐트에 대한 처리를 보다 상세하게 설명하면 다음과 같다. 상기 제어부(290)에 의해 상기 비트스트림이 multiple 콘텐트 처리입력인지 아닌지가 검출된다. multiple 콘텐트에 대한 신호인 경우 저주파수 신호들의 처리에 사용되는 ADPCM복호하기(230) 각각이 독립된 콘텐트를 처리하도록 한다. 이 때, 다루는 신호는 스케일 조절이 가능한 경우로 ADPCM부호화기(120)과 T/F변환부(130)들을 사용할 때 다루는 주파수 대역폭과는 다르다. 부호화시 각각의 콘텐트들에 대해 독립적인 ADPCM부호화기를 사용해 처리를 하기 때문에 사용자의 제어에 의해 특정 콘텐트의 소리를 완전히 없앨 수도 있고, 특정 콘텐트가 가지고 있는 공간에서의 분포특성도 변형할 수가 있다.Meanwhile, the processing of the multiple content will be described in detail as follows. The controller 290 detects whether the bitstream is a multiple content processing input. In the case of a signal for multiple contents, each of the ADPCM decoding 230 used for processing low frequency signals is processed for independent content. In this case, the signal to be handled is different from the frequency bandwidth to be handled when the ADPCM encoder 120 and the T / F converter 130 are used when the scale is adjustable. Since each content is processed using an independent ADPCM encoder during encoding, the sound of a specific content can be completely eliminated under the control of a user, and the distribution characteristic in the space of the specific content can be modified.

그리고 만일 스케일 조절이 가능한 신호인 경우 입력신호는 하나의 content에 대한 것이다. 5개의 주파수 대역별로 처리가 가능하게 되어 있고, 샘플링 주파수를 Fs라 하면, Fs/4 - Fs/2, 0 - Fs/16, Fs/16 - Fs/8, Fs/8 - 3Fs/16, 3Fs/16 - Fs/4의 5개 댜역으로 처리가 되어 있다. 인간이 고주파수쪽 신호에 대해서는 민감도가 떨어지므로 고주파수 쪽에는 처리에 사용되는 주파수 대역을 넓게 해주고 T/F 변환해주었으므로 상기 F/T 변환부(260)에서 F/T 변환에 의해 복원한다. 그리고 낮은 주파수쪽에서는 구성상의 복잡도를 간단하게 하기 위해 부호하기에서 ADPCM부호화기에 의해 부호화하였기 때문에 ADPCM 복호화기(230)로 복호화한다. 빠른 검색이 필요할 때에는 비트스트림상의 일부분만을 읽어서 복호해줌으로써 처리의 효율성을 높여준다.And if the signal is scalable, the input signal is for one content. Processing is possible for each of five frequency bands, and if the sampling frequency is Fs, Fs / 4-Fs / 2, 0-Fs / 16, Fs / 16-Fs / 8, Fs / 8-3Fs / 16, 3Fs / 16-It is treated as 5 bands of Fs / 4. Since the human body is less sensitive to the high frequency signal, the frequency band used for processing is widened and the T / F is converted to the high frequency side, so that the F / T converter 260 restores the F / T conversion. In order to simplify the complexity of the configuration on the lower frequency side, the decoding is performed by the ADPCM decoder 230 because the encoding is performed by the ADPCM encoder. When fast retrieval is required, only part of the bitstream is read and decoded to increase processing efficiency.

한편, 콘텐트의 위치정보를 이용해 새로운 위치 정보를 구해 처리하는 것은 다음과 같다. 부호화시에 영상의 콘텐트의 위치를 비트스트림상에 포함해준다. 도 4는 부호화기의 영상화면의 콘텐트 위치 정보 표현 방식을 도시한 것으로서, 비트스트림에 의해 전달되는 위치정보는 도 4와 같은 영상화면에서의 x, y 좌표값에 대한 정보이고, 이 값은 영상 콘텐트의 한쪽 끝을 기준으로 삼아준다. 입의 위치가 소리가 나오는 음원의 위치이기 때문에 입의 위치를 처리에 사용해주는 것을 특징으로하고 영상에 나타나지 않는 입의 경우, 영상 테두리상의 한 점을 입이 존재하는 위치로 가정해 처리를 해준다.Meanwhile, the new location information is obtained and processed using the location information of the content as follows. At the time of encoding, the position of the content of the video is included in the bitstream. 4 is a diagram illustrating a method of representing content location information of an image screen of an encoder, wherein the position information transmitted by the bitstream is information about x and y coordinate values in the image screen as shown in FIG. 4, and the value is image content. Use one end of as a reference. It is characterized by using the position of the mouth for processing because the position of the mouth is the position of the sound source from which the sound comes out. In the case of the mouth that does not appear in the image, the mouth is assumed to be a position where the mouth exists.

도 4에 그 기준점에 의한 예를 보였다. 이 때, 처리에 사용되는 화면을 배경과 콘텐트, 그리고 각 콘텐트들의 테두리 선으로 나누어 준 뒤, 각각을 결합해 영상을 재생해 줌으로써 복호화기에서 화면의 콘텐트를 이동시 테두리선 정보를 이용해 해당 콘텐트를 추출한 뒤 새로운 위치에 이동시킬 수 있도록 한다.4 shows an example by the reference point. At this time, the screen used for processing is divided into the background, the content, and the border lines of the contents, and the combined content is played back to extract the corresponding content by using the border information when the content of the screen is moved by the decoder. Make sure to move it back to a new location.

복호화시에는 영상정보가 하나의 콘텐트로 사용자가 그 콘텐트를 상하좌우로 이동시키거나 zoom in/out에 의해 크기를 조절해 줄 수가 있다. 복호화시 상하좌우로 움직임에 따라 변화하는 좌표 값을 처리에 고려해 복원시 화면에서 보이는 위치에서 소리가 나오는 것과 같이 처리를 한다. 예로 도 5a 및 도 5b 에서와 같이 사람 A, B가 있을 때 사용자가 사람의 위치를 원래위치(도 5a)에서 도 5b에서와 같이 바꾸어준다면, 그 바뀐 위치정보값(x,y)를 이용해 재생되는 소리를 바뀐 영상 콘텐트의 위치를 고려해 바꾸어 주는 처리를 한다. 또 영상 content가 zoom in/out 이 되면 그 정보를 (z) 정보로 이용해 새롭게 (x,y,z)에 대한 기준을 삼아서 근거리에서 말을 하는 경우와 원거리에서 말을 하는 경우에 대한 효과가 나오도록 처리해 준다. 이 결과 영상 콘텐트의 상하좌우 이동은 물로 전후 이동에 대한 처리를 할 수 있다. 음원의 공간이동 기법에 대한 것은 뒤에서 보다 자세히 설명한다.At the time of decoding, the image information is a single content, and the user can move the content up, down, left and right or adjust the size by zoom in / out. When decoding, considering the coordinate values that change according to the movement up, down, left, and right, the processing is performed as if the sound comes out from the position shown on the screen during restoration. For example, when there is a person A and B as shown in Figs. 5A and 5B, if the user changes the person's position from the original position (Fig. 5A) as shown in Fig. 5B, playback is performed using the changed location information values (x, y). The processing to change the sound taking into account the position of the changed video content. In addition, when the image content is zoomed in / out, the information is used as (z) information, and the effect of speaking at a short distance and speaking at a long distance using a new standard for (x, y, z) is obtained. Process it. As a result, the up, down, left, and right movements of the image content can be processed to move back and forth to the water. The spatial movement technique of the sound source is described in detail later.

한편 저주파수 대역 및 multiple concurrent 처리시, 부호화기 및 복호화기에 사용되는 ADPCM 부호화기 및 복호화기(120, 230)가 비선형예측기를 사용하는 이유를 설명하기로 한다. DPCM부나 ADPCM부를 구성하는 예측기를 선형예측기로 하느냐 비선형 예측기로 하느냐에 따라 오차 신호의 영향이 달라진다. 선형예측기는 오차신호가 누적되어 주위 신호에 계속 전달되는 데 반하여, 비선형예측기는 오차가 고립되기 때문에 주위의 신호에는 오차신호의 영향이 계속 전파되지 않는 효과가 있다. 예를 들어 도 3a 및 도 3b의 ADPCM 부호화기/복호화기의 예측기 부분에 선형예측기와 비선형예측기를 사용한 경우를 살펴보자.Meanwhile, the reason why the ADPCM encoders and decoders 120 and 230 used in the encoder and the decoder during the low frequency band and the multiple concurrent processing use the nonlinear predictor will be described. The influence of the error signal varies depending on whether the predictor constituting the DPCM unit or the ADPCM unit is a linear predictor or a nonlinear predictor. In the linear predictor, error signals accumulate and continue to be transmitted to the surrounding signals, whereas in the nonlinear predictor, the error is isolated so that the influence of the error signal does not propagate continuously in the surrounding signals. For example, consider a case where a linear predictor and a nonlinear predictor are used in the predictor portion of the ADPCM encoder / decoder of FIGS. 3A and 3B.

[수학식 1][Equation 1]

P_out[n] = mean { P_in[n], P_in[n-1], P_in[n-2] }P_out [n] = mean {P_in [n], P_in [n-1], P_in [n-2]}

= integer [ (P_in[n] + P_in[n-1] + P_in[n-2])/3.0 ]= integer [(P_in [n] + P_in [n-1] + P_in [n-2]) / 3.0]

선형예측기는 수학식 1과 같이 상기 P_in[n], P_in[n-1], P_in[n-2]의 세 값을 더해준 뒤 3으로 나눠 정수값으로 양자화 처리한 값을 P_out[n]의 값으로 해주는 예측기이다.The linear predictor adds three values of P_in [n], P_in [n-1], and P_in [n-2] as shown in Equation 1, divides the value by 3, and quantizes the integer value to P_out [n]. Is a predictor.

한편, 비선형 예측기로는 수학식 2와 같은 중앙값 예측기(median predictor)를 사용한다.Meanwhile, as a nonlinear predictor, a median predictor such as Equation 2 is used.

[수학식 2][Equation 2]

P_out = median { P_in[n], P_in[n-1], P_in[n-2] }P_out = median {P_in [n], P_in [n-1], P_in [n-2]}

즉, 상기 비선형 예측기는 위와 같이 3개의 샘플, P_in[n], P_in[n-1], P_in[n-2]을 크기순으로 정열시킨 뒤, 그 정열된 순서들 중 가운데 순서에 위치하는 값을 P_out[n]의 값으로 해주는 예측기이다.That is, the nonlinear predictor arranges three samples, P_in [n], P_in [n-1], and P_in [n-2] in size order as above, and then places the values in the center order among the ordered values. Is a predictor that makes P_out [n] a value.

X_in 과 Cod_X, Cod_Y 와 Y_out에 대한 선형 예측기/비선형 예측기에 대한 부호화기의 입력값/출력값, 복호화기의 입력값/출력값의 예는 다음과 같다.Examples of the input / output values of the encoder and the input / output values of the decoder for the linear predictor / nonlinear predictor for X_in and Cod_X, Cod_Y and Y_out are as follows.

[표 1]TABLE 1

선형예측기에 의한 부호화기 입력/출력Encoder input / output by linear predictor

nn 1One 22 33 44 55 66 77 88 99 X_inX_in 2525 3030 3535 4040 3535 3030 2525 2020 1515 Cod_XCod_X 1010 1010 1010 1010 00 -7-7 -10-10 -10-10 -10-10 P_inP_in 2020 2525 3030 3535 4040 3535 3030 2525 2020 P_outP_out 1515 2020 2525 3030 3535 3737 3535 3030 2525

[표 2]TABLE 2

비선형 예측기에 의한 복호화기 입력/출력Decoder Input / Output by Nonlinear Predictor

nn 1One 22 33 44 55 66 77 88 99 Cod_YCod_Y 1010 1010 1010 1010 00 -7-7 -10-10 -10-10 -10-10 Y_outY_out 2020 2525 3030 3535 4040 3535 3030 2525 2020 P_outP_out 1515 2020 2525 3030 3535 3737 3535 3030 2525

[표 3]TABLE 3

비선형 예측기에 의한 부호화기 출력Encoder Output by Nonlinear Predictor

nn 1One 22 33 44 55 66 77 88 99 X_inX_in 2525 3030 3535 4040 3535 3030 2525 2020 1515 Cod_XCod_X 1010 1010 1010 1010 00 -5-5 -10-10 -10-10 -10-10 P_inP_in 2020 2525 3030 3535 4040 3535 3030 2525 2020 P_outP_out 1515 2020 2525 3030 3535 3535 3535 3030 2525

[표 4]TABLE 4

비선형 예측기에 의한 복호화기 입력 및 출력Decoder Input and Output by Nonlinear Predictor

nn 1One 22 33 44 55 66 77 88 99 Cod_YCod_Y 1010 1010 1010 1010 00 -5-5 -10-10 -10-10 -10-10 Y_outY_out 2020 2525 3030 3535 4040 3535 3030 2525 2020 P_outP_out 1515 2020 2525 3030 3535 3535 3535 3030 2525

만약, 전송되는 상태에서 채널에서 잡음이 발생하게 된 경우를 고려해보면 다음과 같다. n이 5인 시간의 경우 원래의 신호는 0이었으나 오차신호에 의해 100으로 바뀐 경우에 대해서 선형예측기와 비선형 예측기를 사용한 복호화기에 의한 출력값 차이를 보인다.Considering the case where the noise occurs in the channel in the transmission state as follows. When n is 5, the original signal is 0, but the output value difference by the decoder using the linear predictor and the nonlinear predictor is shown for the case where the original signal is changed to 100 by the error signal.

[표 5]TABLE 5

채널에서 잡음발생시 선형예측기에 의한 복호화기 입력 및 출력Decoder input and output by linear predictor when noise occurs in channel

nn 1One 22 33 44 55 66 77 88 99 Cod_YCod_Y 1010 1010 1010 1010 100100 -7-7 -10-10 -10-10 -10-10 Y_outY_out 2020 2525 3030 3535 4040 135135 6363 6969 7979 P_outP_out 1515 2020 2525 3030 3535 7070 7979 8989 7070

[표 6]TABLE 6

채널에서 잡음발생시 비선형예측기에 의한 복호화기 입력 및 출력Decoder input and output by nonlinear predictor in case of noise in channel

nn 1One 22 33 44 55 66 77 88 99 Cod_YCod_Y 1010 1010 1010 1010 100100 -5-5 -10-10 -10-10 -10-10 Y_outY_out 2020 2525 3030 3535 4040 135135 3535 3030 2525 P_outP_out 1515 2020 2525 3030 3535 4040 4040 3535 3030

비선형 예측기의 경우, 0 이 100 으로 바뀐 경우 그 효과가 고립되나, 선형예측기에서는 그 효과가 고립되지 않고 전파되어 그 영향을 비치고 있음을 볼 수가 있다.In the case of the nonlinear predictor, the effect is isolated when 0 is changed to 100, but in the linear predictor, the effect is not isolated but propagates and reflects the effect.

실제 발생하는 오차신호를 검출해서 오디오 부호화시에 양자화기를 보다 효과적으로 사용할 수 있고, 버퍼제어를 할 수 있다. 이것은 인간의 음향심리에 의해 발생하는 마스크된 문턱치(masked threshold)를 사용함으로서 가능하다. 이 문턱치는 인간이 들어도 느기지 못하느 신호의 파워를 나타낸다. 해당 대역의 신호들을 양자화 처리했을 때 발생되는 양자화 잡음의 합이 이 이하기 되면 더 이상의 세밀한 양자화기는 필요없고 또 더 이상의 비트들도 필요없다는 것을 의미한다. 이러한 성질을 이용해 사용되는 비트수들을 제어한다.By detecting an error signal that actually occurs, the quantizer can be used more effectively during audio encoding, and buffer control can be performed. This is possible by using masked thresholds caused by human psychoacoustics. This threshold represents the power of a signal that is slow even when humans enter it. If the sum of the quantization noises generated when quantizing the signals in the band is less than this, it means that no more detailed quantizer is needed and no more bits are needed. This property is used to control the number of bits used.

도 6과 같이 ADPCM 부호화한 신호들을 복호화하면서 발생한 오차신호의 양을 계산한다. 그 오차의 총합과 음향심리 모델에 의해 결정된 문턱치 상수값과 비교하여 한계를 넘는지 넘지 않는지를 조사해서 양자화기의 양자화단계 조절, ADPCM의 델타 조절, 그리고 프레임의 버퍼제어에 활용한다. 만약 그 한계를 넘게 되면 새로운 양자화기를 이용해 그 결과 값을 줄여주는 처리를 수행하여 음질과 비트 사용량에 대한 trade-off에 따라 조절할 수 있도록 한다.As shown in FIG. 6, the amount of error signal generated while decoding the ADPCM coded signals is calculated. It compares the sum of the errors and the threshold constant determined by the psychoacoustic model and checks whether it exceeds or exceeds the limit and uses it for quantization step adjustment of quantizer, delta adjustment of ADPCM, and buffer control of frame. If the limit is exceeded, a new quantizer is used to reduce the resulting value so that it can be adjusted according to trade-offs in sound quality and bit usage.

삼차원 음향효과는 인간이 두 귀로서 소리를 모아서 듣기 때문에 발생하는 효과이다. 이러한 삼차원 음향효과는 스테레오 신호에 의한 재현시 고정된 재현 스피커들의 위치에 따라서 재현되는 신호들을 제어해 제공이 가능하다. 인간의 소리 인식에 대한 연구들은 크게 오른쪽이나 왼쪽 귀들 가운데 하나의 귀만을 가지고 한 연구와 양쪽 귀를 함께 고려해 한 연구들로 구분될 수가 있다. 한쪽 귀에 대한 연구는 소리 존재의 유무를 느끼는 과정 및 그 특징에 대한 모델링이 가능해 인간이 인지할 수 있는 신호의 최소 압력크기(absolute threshold value) 라든지 여러 신호들이 들어올 때 각 신호들간의 상호작용(masking)에 대한 연구결과들이 있어서 그 결과들을 데이터의 효과적인 표현, 즉 압축 등에 사용되고 있다. 양쪽 귀에 대한 연구는 양쪽 귀에 들어오는 입력신호들에 대한 상호 영향에 대한 연구, 즉 오른쪽 귀와 왼쪽귀로 느끼는 소리신호의 크기 차이라든지 소리의 전달시간의 차리로 발생하는 오른쪽 귀와 왼쪽 귀에 들어오는 소리의 위상에 대한 차이에 대한 것들을 수행해 왔다.Three-dimensional sound effects are caused by humans listening to sound collected with two ears. The three-dimensional sound effect can be provided by controlling the signals to be reproduced according to the position of the fixed reproduction speaker when the reproduction by the stereo signal. Studies of human speech perception can be largely divided into one study with only one of the right or left ears and one study considering both ears. Research on one ear can model the process of sensing the presence or absence of sound and its characteristics so that humans can perceive the absolute threshold value of a signal or mask each other when multiple signals come in. ), And the results are used for effective representation of data, that is, compression. The study of both ears is a study of the mutual influence on the input signals coming from both ears, that is, the difference in the magnitude of the sound signals felt by the right and left ears or the phase of the sound coming into the right and left ears caused by the difference in sound propagation time. I have done things about the difference.

이러한 양쪽 귀에 대한 연구결과에 의해, 사람이 공간상의 한 점에서 존재하는 음원을 인식하는 인식특성이 모델링되었고 이러한 특성은 HRTF(head related transfer function ) 이라고 불리운다. 상기 HRTF 함수들은 공간 상의 어떤 한 점에서 소리가 존재할 때 그 신호가 양귀로 전송될 때에 대한 특징에 대한 중이(middle ear)에서의 임펄스 응답 또는 전달함수로 표현된다. 상기 HRTF를 응용함으로써 소리가 존재하는 곳을 삼차원 공간상의 임의의 위치로 옮겨주어 보다 현장감있는 재현이 가능하도록 하였다.As a result of research on both ears, the recognition characteristics of the sound source that exist at a point in space are modeled and this characteristic is called the head related transfer function (HRTF). The HRTF functions are expressed as an impulse response or transfer function at the middle ear for a characteristic when a signal is transmitted to both ears when sound is present at some point in space. By applying the HRTF, the place where sound exists is moved to an arbitrary position in the three-dimensional space to enable more realistic reproduction.

삼차원 공간상의 임의의 한 점 A의 정보를 이용해, 그 점에서 소리가 재현되는 효과를 헤드폰으로 들을 때 쉽게 낼 수가 있다. 공간상의 특정 점 A에서 나는 소리를 X_A라 하면, 오른 쪽 귀와 왼쪽 귀에 들어오는 신호 E_r, E_l 는 다음과 같이 표현된다. 여기서 H_ar, H_al은 A점에서 나는 소리를 오른 쪽, 왼쪽 귀로 들을 때 느끼는 신호의 변형특성이다. 행렬로 표현하면,Using information from an arbitrary point A in three-dimensional space, the effect that sounds are reproduced at that point can be easily achieved when listening with headphones. If the sound produced at a specific point A in space is X _A , the signals E_r and E_l coming into the right and left ears are expressed as follows. Where H_ar and H_al are the deformation characteristics of the signal that is felt when the sound from point A is heard by the right and left ears. In matrix,

[수학식 3][Equation 3]

과 같다.Same as

모노 입력신호를 H_ar, H_al을 이용해 마치 A 점에서 들려오는 것과 같이 느끼게 한다. 이러한 효과를 전방 오른쪽/왼쪽 스피커를 이용해 낼 경우에는 양 스피커의 출력에 의해 발생하는 소리의 혼신(cross-talk) 효과를 보상해 주어야 한다. 오른쪽 스피커와 왼쪽 스피커로 나오는 신호들을 각각

이라 할 때, 오른쪽 왼쪽 스피커를 통해 귀에 들어오는 신호

들은Use the H_ar, H_al to make the mono input signal feel as if it is coming from point A. When this effect is achieved with the front right / left speakers, the cross-talk effect of the sound generated by the outputs of both speakers must be compensated for. The signals from the right and left speakers

When it comes to, the signal coming into the ear through the right left speaker

Heard

[수학식 4][Equation 4]

으로 나타낼 수가 있다. 여기서는 전달함수 이다.It can be represented as here Is the transfer function.

이 양쪽 수학식 3, 수학식 4에 의한 값들이 같다면, 점 A에 신호가 위치하고 있다고 느끼게 된다. 풀어주면,If the values according to the equations (3) and (4) are the same, it is felt that the signal is located at the point A. Loosen it,

[수학식 5][Equation 5]

가 된다.Becomes

상기 수학식 5의 해를 구하기 위해서는 오른쪽 스피커와 왼쪽 스피커의 출력으로 나오는 값을 조절해 주어야 한다. 스피커의 출력값

값들이

값이 각각

에 의해 변형된 신호라고 가정해 주면,In order to solve Equation 5, the output values of the right and left speakers should be adjusted. Speaker output value

Values

Each value is

Suppose that the signal is transformed by

[수학식 6][Equation 6]

과 같으므로, 수학식 5에 대입해 정리하면Is the same as

[수학식 7][Equation 7]

이 된다. 역변환에 의해 변형시켜주는 값들인

들을 구하면 다음과 같다.Becomes Transformed by inverse transform

If you get them as follows.

[수학식 8][Equation 8]

여기서

는 스피커의 위치가 고정되면 결정되는 값들이고,

은 음원의 위치가 정해지면, 그 위치에 따라 정해지는 알려진 값들이기 때문에

을 구해줄 수가 있다.here

Are values determined when the speaker position is fixed.

Are the known values that are determined by the position of the sound source,

Can save

이 값들을 구한 뒤에 수학식 6을 이용해 삼차원 공간상의 위치A에서 존재하는 신호를 다른 임의의 위치에서 재현해 주면서 A 위치에서 소리가 나는 것과 같이 스피커를 이용해 재현해 줄 수가 있다.After obtaining these values, Equation 6 can be used to reproduce the signal existing at the position A in the three-dimensional space at another arbitrary position and reproduced using the speaker as if the sound is generated at the position A.

크로스토크의 유무에 따른 적합한 처리변환을 하지 않아서 스피커 재현과 헤드폰 재현시 들리는 신호에 대한 느낌이 다른 문제점을 갖는다. 그렇기 때문에 헤드폰으로 재현시에는 수학식 3을 이용해 처리상의 효율성을 기할 수가 있다. 이러한 차이를 처리에 고려해주기 위해서 본 발명에서는 도 7과 같이 제어부로부터 스피커/헤드폰 출력 조절 신호를 받아 그 값이 "OFF" 이면 헤드폰으로만 인식해 스피커 출력보상 과정을 거치지 않도록 처리하고, 그 값이 "ON"이면 스피커로 인식해 스피커 출력 값들에 대한 보상을 하는 처리를 한다.There is a problem with the feeling of the signal heard during speaker reproduction and headphone reproduction due to the lack of proper processing conversion depending on the presence or absence of crosstalk. Therefore, when reproduced with headphones, it is possible to increase the processing efficiency by using Equation 3. In order to consider this difference in the process, the present invention receives the speaker / headphone output adjustment signal from the control unit as shown in FIG. If it is "ON", it is recognized as a speaker and processes to compensate for speaker output values.

본 발명에 의하면, 단일 구조로 여러 콘텐트들에 대한 처리가 가능하고 스케일 조절이 가능한 부호화기 및 복호화기를 구현할 수가 있다. ADPCM 시 실제 발생되는 양자화 에러를 처리에 사용하여 양자화기 단계의 선택 및 버퍼제어를 한다. content manipulation 이 가능하다. 즉 특정 콘텐트의 ON/OFF가 가능하며, 비선형 예측기를 이용해 오차의 전파를 줄일 수 있다.According to the present invention, it is possible to implement an encoder and a decoder capable of processing multiple contents and scaling in a single structure. The quantization error actually generated in ADPCM is used for processing to select the quantizer stage and to control the buffer. Content manipulation is possible. That is, the specific content can be turned on and off, and the propagation of errors can be reduced by using a nonlinear predictor.

또한 특정 콘텐트의 위치이동에 따라 음원의 위치 이동을 시켜주는 것이 가능하다. 스피커를 이용한 재생의 경우와 헤드폰을 이용한 재생 경우에 대해 서로 다른 처리들을 해줌으로써 재현 환경을 고려한 보다 적합한 처리가 가능하게 하며, 인간의 음향심리 특성을 고려해 ADPCM 기의 양자화 단계를 결정한다.In addition, it is possible to move the position of the sound source in accordance with the positional movement of the specific content. By performing different processes for the reproduction using the speaker and the reproduction using the headphone, a more suitable processing considering the reproduction environment is possible, and the quantization stage of the ADPCM device is determined in consideration of the human psychoacoustic characteristics.

또한 재현에 사용되는 스피커의 위치를 바꾸어 줄때도 그 변화하는 위치를 알면 새로 변화된 위치의 정보를 이용해서 보다 적합한 재현이 되도록 조절하는 처리가 가능하다. 콘텐트 이동에 따라 음원의 위치 변동이 일어나더라도 처리에 인간이 가지고 있는 특정 위치들에서의 음원에 의한 전달함수를 이용해 주기 때문에 보다 현장감있는 재생이 가능하다.In addition, when changing the position of the speaker used for reproduction, knowing the changing position, it is possible to process to adjust to the more appropriate reproduction using the information of the newly changed position. Even if the position of the sound source changes due to the movement of the content, the transfer function by the sound source at the specific positions of human beings is used for processing, so that the realistic reproduction can be performed.

Claims

A first filter dividing the input audio signal into a low frequency band signal and a high frequency band signal; A second filter dividing the low frequency band signal separated by the first filter into a finer frequency band; A T / F converter for converting a high frequency band signal separated by the first filter from a time domain to a frequency domain; An ADPCM encoder for encoding the output signal of the second filter into a digital signal by the ADPCM method; A bit allocation and quantization unit for allocating and quantizing the output signal of the T / F converter; An acoustic psychometric unit which processes the low frequency and high frequency signals separated by the first filter according to a predetermined acoustic psychological model and provides information on the step difference value of the quantizer used in the ADPCM encoder; Controlling whether the input audio signal passes through the first filter and the second filter, controlling the processing frequency band of the T / F converter, and indicating information indicating a multi-content processing mode or a scale adjustable mode and position information of the content. Providing a control unit; And a primary bitstream forming unit configured to form a bitstream by using the signal encoded by the ADPCM encoder, the bits quantized by the bit allocation and quantization unit, and the position information of the content of the controller. Device.

The apparatus of claim 1, further comprising: a prediction unit configured to obtain data correlation between previous frame information and current frame information of the primary bitstream forming unit; And a bitstream forming unit configured to form a bitstream by reducing overlapping data according to the frame correlation of the prediction unit calculated by the prediction unit.

The audio encoding apparatus of claim 1 or 2, wherein the ADPCM encoder uses a nonlinear predictor.

A bitstream decomposing unit for decomposing the input bitstream; An inverse quantizer for inversely quantizing the bitstream decomposed by the bitstream decomposing unit; An ADPCM decoder for decoding the bitstream decomposed by the bitstream decomposing unit; A first signal synthesizer for synthesizing the signals of the low frequency bands decoded by the ADPCM decoder; An F / T converter for converting a high frequency band signal into a time domain; A second signal synthesizer for synthesizing the low frequency band signal synthesized by the first signal synthesizer and the output signal of the F / T converter; A space control processor configured to extract position information in the space of contents from the signal disassembled by the bitstream decomposing unit and adjust the position of the sound source according to whether the speaker or the headphone is reproduced; Receiving the signal disassembled by the bitstream decomposing unit, it is determined whether the bitstream is a multi-content processing mode or a scalable control mode, and if the determined mode is a multi-content processing mode, an output signal of the F / T converter is output. A control unit controlling the scale adjustment in the space control processing unit according to a user's scale adjustment command; And a buffer output unit which temporarily stores and outputs signals output from the spatial control processing unit and the second signal synthesizing unit.

The apparatus of claim 4, further comprising: a prediction unit that determines whether the bitstream decomposed by the bitstream decomposing unit is a bitstream using previous frame information; And a primary bitstream decomposing unit configured to reconstruct the bitstream using previous frame information if the prediction unit determines that the determination is a bitstream using previous frame information.

The audio decoding apparatus as claimed in claim 4, wherein the spatial control processing unit obtains a new position coordinate of the image content according to the positional change of the image content having the position information and adjusts the sound in consideration of the positional movement of the sound source.

The audio decoding apparatus of claim 6, wherein the position information of the spatial control processor uses the position of the speech engine as a reference position.

5. The audio decoding apparatus of claim 4, wherein the ADPCM decoder uses a nonlinear predictor.

The audio decoding apparatus as claimed in claim 4, wherein the information about the enlargement / reduction of the image content is reflected in the reproduction of the audio signal.

A frequency band separating step of dividing an input audio signal into a low frequency band signal and a high frequency band signal; A low frequency separation step of dividing the low frequency band signal separated in the frequency band separation step into finer frequency bands; Determining whether to encode multiple contents at the same time or to encode scales; An encoding step of encoding a signal separated in the low frequency separation step into a digital signal by an ADPCM method when encoding multiple content at the same time; In order to encode the scale, T which encodes the signal separated in the low frequency separation step into a digital signal by the ADPCM method and converts the high frequency band signal separated in the frequency band separation step from the time domain to the frequency domain. / F conversion step; A quantization step of bit-allocating and quantizing the signal converted in the T / F conversion step; An acoustic psychology step of processing the low frequency and high frequency signals separated in the frequency band separation step according to a predetermined acoustic psychological model to provide information on the step difference value of the quantizer used in the encoding step; And forming a bitstream using the encoded signal, the quantized bits, and position information of the content.

A bitstream decomposing step of decomposing the input bitstream; Determining whether the decoded bitstream in the bitstream decomposing step is a multi-content processing mode or a scalable control mode; An inverse quantization step of inversely quantizing the disassembled bitstream; A decoding step of decoding the bitstream decomposed in the bitstream decomposing step; A first signal synthesis step of synthesizing the decoded low frequency band-specific signals; An F / T conversion step of converting a high frequency band signal into a time domain when the determined mode is not a multiple content processing mode; A second signal synthesis step of synthesizing the low frequency band signal synthesized in the signal synthesis step and the signal converted in the F / T conversion step; Spatial processing step of adjusting the scale according to the user's scale adjustment command and adjusting the position of the sound source according to whether the speaker or headphone is reproduced by extracting the position information in the space of the contents from the signal disassembled in the bitstream decomposing step. ; And buffering and outputting the signal synthesized in the second signal synthesis step and the signal processed in the spatial processing step.