KR20100050414A

KR20100050414A - Method and apparatus for processing an audio signal

Info

Publication number: KR20100050414A
Application number: KR1020090105389A
Authority: KR
Inventors: 윤성용; 이현국; 김동수; 임재현
Original assignee: 엘지전자 주식회사
Priority date: 2008-11-04
Filing date: 2009-11-03
Publication date: 2010-05-13
Also published as: KR101259120B1

Abstract

PURPOSE: An audio signal processing method and an apparatus thereof are provided to remarkably reduce the number of bits of the bit stream by skipping transmission of information on noise peeling. CONSTITUTION: A multiplexer(210) extracts the noise filling flag information and coding scheme information. In the noise filling flag information is a plurality of frames, the active condition of the noise filling is directed. In the coding scheme information is the frequency domain or the time domain, the coding state of the current frame is shown. The noise information decoding part(201) extracts the noise level information about the current frame.

Description

Method and apparatus for processing audio signal {METHOD AND APPARATUS FOR PROCESSING AN AUDIO SIGNAL}

본 발명은 오디오 신호를 인코딩하거나 디코딩할 수 있는 오디오 신호 처리 방법 및 장치에 관한 것이다. The present invention relates to an audio signal processing method and apparatus capable of encoding or decoding an audio signal.

일반적으로, 음악 신호와 같은 오디오 신호에 대해서는 오디오 특성에 기반한 코딩 방식을 적용하고, 음성 신호에 대해서는 음성 특성에 기반한 코딩 방식을 적용한다.In general, a coding scheme based on audio characteristics is applied to an audio signal such as a music signal, and a coding scheme based on speech characteristics is applied to a speech signal.

오디오 특성과 음성 특성이 혼재되어 있는 신호에 대해서 어느 하나의 코딩 방식을 적용하는 경우, 오디오 코딩 효율이 떨어지거나, 음질이 나빠지는 문제점이 있다.When any one of the coding schemes is applied to a signal in which audio and voice characteristics are mixed, audio coding efficiency or sound quality deteriorates.

본 발명은 상기와 같은 문제점을 해결하기 위해 창안된 것으로서, 인코딩의 양자화 과정에서 손실된 신호를 보상하기 위해, 디코더에서 노이즈 필링 방식을 적용할 수 있는 오디오 신호 처리 방법 및 장치를 제공하는 데 있다. The present invention has been made to solve the above problems, and to provide an audio signal processing method and apparatus capable of applying a noise filling method in a decoder to compensate for a signal lost in the quantization process of encoding.

본 발명의 또 다른 목적은, 노이즈 필링 방식이 적용되지 않는 프레임에 대해서는, 노이즈 필링에 관한 정보의 전송을 생략할 수 있는 오디오 신호 처리 방법 및 장치를 제공하는 데 있다.It is still another object of the present invention to provide an audio signal processing method and apparatus which can omit transmission of information on noise filling for a frame to which a noise filling method is not applied.

본 발명의 또 다른 목적은, 노이즈 필링에 관한 정보(노이즈 레벨 또는 노이즈 옵셋)이 매 프레임마다 거의 비슷한 값을 갖는 특성을 기반으로 하여, 노이즈 필링에 관한 정보를 인코딩하기 위한 오디오 신호 처리 방법 및 장치를 제공하는 데 있다.It is still another object of the present invention to provide an audio signal processing method and apparatus for encoding information on noise filling based on a characteristic in which information on noise filling (noise level or noise offset) has almost similar values every frame. To provide.

본 발명은 상기 목적을 달성하기 위해, 노이즈 필링이 복수의 프레임에 사용되는지 여부를 지시하는 노이즈 필링 플래그 정보를 추출하는 단계; 상기 복수의 프레임을 포함하는 현재 프레임이 주파수 도메인에서 코딩되었는지 아니면 시간 도메인에서 코딩되었는지를 나타내는 코딩 스킴 정보를 추출하는 단계; 상기 노이즈 필링 플래그 정보가 상기 노이즈 필링이 상기 복수의 프레임에 사용되었음을 지시하고 상기 코딩 정보가 상기 현재 프레임이 상기 주파수 도메인에서 코딩되었음을 지시하는 경우, 현재 프레임에 대한 노이즈 레벨 정보를 추출하는 단계; 상기 노이즈 레벨 정보에 대응하는 노이즈 레벨 값이 미리 정해진 레벨을 만족하는 경우, 현재 프레임에 대한 노이즈 옵셋 정보를 추출하는 단계; 및, 상기 노이즈 옵셋 정보가 추출된 경우, 상기 노이즈 레벨 값 및 상기 노이즈 레벨 정보를 근거로 현재 프레임에 대해 노이즈 필링을 수행하는 단계를 포함하는 오디오 신호 처리 방법을 제공한다.In order to achieve the above object, the present invention comprises the steps of: extracting noise filling flag information indicating whether noise filling is used in a plurality of frames; Extracting coding scheme information indicating whether a current frame including the plurality of frames is coded in a frequency domain or in a time domain; Extracting noise level information for a current frame when the noise filling flag information indicates that the noise filling is used in the plurality of frames and the coding information indicates that the current frame is coded in the frequency domain; Extracting noise offset information for a current frame when a noise level value corresponding to the noise level information satisfies a predetermined level; And when the noise offset information is extracted, performing noise filling on the current frame based on the noise level value and the noise level information.

본 발명에 따르면, 상기 노이즈 필링은, 상기 현재 프레임의 스펙트럴 데이터를 이용하여 상기 현재 프레임의 손실 영역을 결정하는 단계; 상기 노이즈 레벨 값을 이용하여 상기 손실 영역에 보상 신호를 필링함으로써, 보상된 스팩트럴 데이터를 생성하는 단계; 및 상기 노이즈 옵셋 정보를 근거로하여 보상된 스케일팩터를 생성하는 단계를 포함할 수 있다.According to the present invention, the noise filling may include: determining a loss area of the current frame using spectral data of the current frame; Generating compensated spectral data by filling a compensation signal in the loss region using the noise level value; And generating a compensated scale factor based on the noise offset information.

본 발명에 따르면, 노이즈 레벨의 참조 값을 나타내는 레벨 파일럿 값, 및 노이즈 옵셋의 참조 값을 나타내는 옵셋 파일럿 값을 추출하는 단계; 상기 레벨 파일럿 값 및 상기 노이즈 레벨 정보를 더함으로써 상기 노이즈 레벨 값을 획득하는 단계; 및, 상기 노이즈 옵셋 정보가 추출되면, 상기 옵셋 파일롯 값 및 상기 노이즈 옵셋 정보를 더함으로써 노이즈 옵셋 값을 획득하는 단계를 더 포함하고, 상기 노이즈 필링은 상기 노이즈 레벨 값 및 상기 노이즈 옵셋 값을 이용하여 수행될 수 있다.According to the present invention, there is provided a method comprising extracting a level pilot value representing a reference value of a noise level and an offset pilot value representing a reference value of a noise offset; Obtaining the noise level value by adding the level pilot value and the noise level information; And when the noise offset information is extracted, obtaining the noise offset value by adding the offset pilot value and the noise offset information, wherein the noise filling uses the noise level value and the noise offset value. Can be performed.

본 발명에 따르면, 이전 프레임의 노이즈 레벨 값, 및 상기 현재 프레임의 노이즈 레벨 정보를 이용하여, 상기 현재 프레임의 노이즈 레벨 값을 획득하는 단계; 상기 노이즈 옵셋 정보가 추출되면, 상기 이전 프레임의 노이즈 옵셋 값 및 상기 현재 프레임의 상기 노이즈 옵셋 정보를 이용하여, 상기 현재 프레임의 노이즈 옵셋 값을 획득하는 단계를 더 포함하고, 상기 노이즈 필링은, 상기 노이즈 레벨 값 및 상기 노이즈 옵셋 값일 이용하여 수행될 수 있다.According to the present invention, using the noise level value of the previous frame, and the noise level information of the current frame, obtaining a noise level value of the current frame; When the noise offset information is extracted, the method may further include obtaining a noise offset value of the current frame by using the noise offset value of the previous frame and the noise offset information of the current frame. The noise level value and the noise offset value may be performed.

본 발명에 따르면, 상기 노이즈 레벨 정보 및 상기 노이즈 옵셋 정보는 가변 장 부호화 방식에 따라서 추출될 수 있다.According to the present invention, the noise level information and the noise offset information may be extracted according to a variable length coding scheme.

본 발명의 또 다른 측면에 따르면, 복수의 프레임에 노이즈 필링이 사용되는지 여부를 지시하는 노이즈 필링 플래그 정보 및, 상기 복수의 프레임을 포함하는 현재 프레임이 주파수 도메인에서 코딩되었는지 아니면 시간 도메인에서 코딩되었는지를 나타내는 코딩 스킴 정보를 추출하는 멀티플렉서; 상기 노이즈 필링 플래그 정보가 상기 노이즈 필링이 상기 복수의 프레임에 사용되었음을 지시하고 상기 코딩 정보가 상기 현재 프레임이 상기 주파수 도메인에서 코딩되었음을 지시하는 경우, 현재 프레임에 대한 노이즈 레벨 정보를 추출하고, 상기 노이즈 레벨 정보에 대응하는 노이즈 레벨 값이 미리 정해진 레벨을 만족하는 경우, 현재 프레임에 대한 노이즈 옵셋 정보를 추출하는 노이즈 정보 디코딩 파트; 및, 상기 노이즈 옵셋 정보가 추출된 경우, 상기 노이즈 레벨 값 및 상기 노이즈 레벨 정보를 근거로 현재 프레임에 대해 노이즈 필링을 수행하는 손실 보상 파트를 포함하는 오디오 신호 처리 장치가 제공된다.According to another aspect of the present invention, noise filling flag information indicating whether noise filling is used in a plurality of frames and whether a current frame including the plurality of frames is coded in the frequency domain or in the time domain A multiplexer for extracting representing coding scheme information; When the noise filling flag information indicates that the noise filling is used in the plurality of frames and the coding information indicates that the current frame is coded in the frequency domain, extracting noise level information for a current frame, and extracting the noise A noise information decoding part for extracting noise offset information for the current frame when a noise level value corresponding to the level information satisfies a predetermined level; And a loss compensation part configured to perform noise peeling on the current frame based on the noise level value and the noise level information when the noise offset information is extracted.

본 발명에 따르면, 상기 손실 보상 파트는,상기 현재 프레임의 스펙트럴 데이터를 이용하여 상기 현재 프레임의 손실 영역을 결정하고,상기 노이즈 레벨 값을 이용하여 상기 손실 영역에 보상 신호를 필링함으로써, 보상된 스팩트럴 데이터를 생성하고 상기 노이즈 옵셋 정보를 근거로하여 보상된 스케일팩터를 생성할 수 있다.According to the present invention, the loss compensation part is, by using the spectral data of the current frame to determine the loss area of the current frame, by using the noise level value by filling the compensation signal in the loss area, the compensation The spectral data may be generated and a compensated scale factor may be generated based on the noise offset information.

본 발명에 따르면, 노이즈 레벨의 참조 값을 나타내는 레벨 파일럿 값, 및 노이즈 옵셋의 참조 값을 나타내는 옵셋 파일럿 값을 추출하고,상기 레벨 파일럿 값 및 상기 노이즈 레벨 정보를 더함으로써 상기 노이즈 레벨 값을 획득하고,상기 노이즈 옵셋 정보가 추출되면, 상기 옵셋 파일롯 값 및 상기 노이즈 옵셋 정보를 더함으로써 노이즈 옵셋 값을 획득하는 데이터 디코딩 파트를 더 포함하고, 상기 노이즈 필링은 상기 노이즈 레벨 값 및 상기 노이즈 옵셋 값을 이용하여 수행되는 것일 수 있다.According to the present invention, a level pilot value indicating a reference value of a noise level and an offset pilot value indicating a reference value of a noise offset are extracted, and the noise level value is obtained by adding the level pilot value and the noise level information. And when the noise offset information is extracted, further comprising a data decoding part configured to obtain a noise offset value by adding the offset pilot value and the noise offset information, wherein the noise filling uses the noise level value and the noise offset value. It may be to be performed by.

본 발명에 따르면, 이전 프레임의 노이즈 레벨 값, 및 상기 현재 프레임의 노이즈 레벨 정보를 이용하여, 상기 현재 프레임의 노이즈 레벨 값을 획득하고,상기 노이즈 옵셋 정보가 추출되면, 상기 이전 프레임의 노이즈 옵셋 값 및 상기 현재 프레임의 상기 노이즈 옵셋 정보를 이용하여, 상기 현재 프레임의 노이즈 옵셋 값을 획득하는 데이터 디코딩 파트를 더 포함하고, 상기 노이즈 필링은, 상기 노이즈 레벨 값 및 상기 노이즈 옵셋 값을 이용하여 수행되는 것일 수 있다.According to the present invention, the noise level value of the current frame is obtained using the noise level value of the previous frame and the noise level information of the current frame, and when the noise offset information is extracted, the noise offset value of the previous frame. And a data decoding part for obtaining a noise offset value of the current frame using the noise offset information of the current frame, wherein the noise filling is performed using the noise level value and the noise offset value. It may be.

본 발명의 또 다른 측면에 따르면, 양자화된 신호를 근거로 노이즈 레벨 값 및 노이즈 옵셋 값을 생성하는 단계; 노이즈 필링이 복수의 프레임에 사용되는지 여부를 지시하는 노이즈 필링 플래그 정보를 생성하는 단계; 상기 복수의 프레임에 포함된 현재 프레임이 주파수 도메인에서 코딩되었는지 아니면 시간 도메인에서 코딩되었는지를 지시하는 코딩 스킴 정보를 생성하는 단계; 상기 복수의 프레임에 노이즈 필링이 사용되는 것을 상기 노이즈 필링 플래그 정보가 지시하고 상기 코딩 스킴 정보가 상기 현재 프레임이 상기 주파수 도메인에서 코딩되었는지를 지시하는 경우, 상기 노이즈 레벨 값에 대응하는 상기 현재 프레임의 노이즈 레벨 정보를 비트스트림에 삽입하는 단계; 및 상기 노이즈 레벨 값이 미리 정해진 레벨을 만족하는 경우, 상기 노이즈 옵셋 값에 대응하는 노이즈 옵셋 정보를 상기 비트스트림에 삽입하는 단계를 포함하는 오디오 신호 처리 방법이 제공된다.According to another aspect of the invention, generating a noise level value and a noise offset value based on the quantized signal; Generating noise filling flag information indicating whether noise filling is used for the plurality of frames; Generating coding scheme information indicating whether a current frame included in the plurality of frames is coded in a frequency domain or a time domain; When the noise filling flag information indicates that noise filling is used for the plurality of frames and the coding scheme information indicates whether the current frame is coded in the frequency domain, the noise frame value of the current frame corresponding to the noise level value. Inserting noise level information into the bitstream; And inserting noise offset information corresponding to the noise offset value into the bitstream when the noise level value satisfies a predetermined level.

본 발명의 또 다른 측면에 따르면, 양자화된 신호를 근거로 노이즈 레벨 값 및 노이즈 옵셋 값을 생성하고, 노이즈 필링이 복수의 프레임에 사용되는지 여부를 지시하는 노이즈 필링 플래그 정보를 생성하는 손실 보상 추정 파트; 상기 복수의 프레임에 포함된 현재 프레임이 주파수 도메인에서 코딩되었는지 아니면 시간 도메인에서 코딩되었는지를 지시하는 코딩 스킴 정보를 생성하는 신호 분류부; 상기 복수의 프레임에 노이즈 필링이 사용되는 것을 상기 노이즈 필링 플래그 정보가 지시하고 상기 코딩 스킴 정보가 상기 현재 프레임이 상기 주파수 도메인에서 코딩되었는지를 지시하는 경우, 상기 노이즈 레벨 값에 대응하는 상기 현재 프레임의 노이즈 레벨 정보를 비트스트림에 삽입하고, 상기 노이즈 레벨 값이 미리 정해진 레벨 을 만족하는 경우, 상기 노이즈 옵셋 값에 대응하는 노이즈 옵셋 정보를 상기 비트스트림에 삽입하는 노이즈 정보 인코딩 파트를 포함하는 오디오 신호 처리 장치가 제공된다.According to another aspect of the present invention, a loss compensation estimation part for generating a noise level value and a noise offset value based on a quantized signal and generating noise filling flag information indicating whether noise filling is used in a plurality of frames ; A signal classifier configured to generate coding scheme information indicating whether a current frame included in the plurality of frames is coded in a frequency domain or a time domain; When the noise filling flag information indicates that noise filling is used for the plurality of frames and the coding scheme information indicates whether the current frame is coded in the frequency domain, the noise frame value of the current frame corresponding to the noise level value. And a noise information encoding part for inserting noise level information into the bitstream and inserting noise offset information corresponding to the noise offset value into the bitstream when the noise level value satisfies a predetermined level. An apparatus is provided.

본 발명의 또 다른 측면에 따르면, 노이즈 필링이 복수의 프레임에 사용되는지 여부를 지시하는 노이즈 필링 플래그 정보를 추출하는 단계; 상기 복수의 프레임을 포함하는 현재 프레임이 주파수 도메인에서 코딩되었는지 아니면 시간 도메인에서 코딩되었는지를 나타내는 코딩 스킴 정보를 추출하는 단계; 상기 노이즈 필링 플래그 정보가 상기 노이즈 필링이 상기 복수의 프레임에 사용되었음을 지시하고 상기 코딩 정보가 상기 현재 프레임이 상기 주파수 도메인에서 코딩되었음을 지시하는 경우, 현재 프레임에 대한 노이즈 레벨 정보를 추출하는 단계; 상기 노이즈 레벨 정보에 대응하는 노이즈 레벨 값이 미리 정해진 레벨을 만족하는 경우, 현재 프레임에 대한 노이즈 옵셋 정보를 추출하는 단계; 및, 상기 노이즈 옵셋 정보가 추출된 경우, 상기 노이즈 레벨 값 및 상기 노이즈 레벨 정보를 근거로 현재 프레임에 대해 노이즈 필링을 수행하는 단계를 포함하는 동작들을, 프로세서에 의해 실행될 때, 상기 프로세서가 수행하도록 하는 명령이 저장되어 있는 컴퓨터로 읽을 수 있는 저장 매체가 제공된다.According to another aspect of the present invention, the method includes: extracting noise filling flag information indicating whether noise filling is used for a plurality of frames; Extracting coding scheme information indicating whether a current frame including the plurality of frames is coded in a frequency domain or in a time domain; Extracting noise level information for a current frame when the noise filling flag information indicates that the noise filling is used in the plurality of frames and the coding information indicates that the current frame is coded in the frequency domain; Extracting noise offset information for a current frame when a noise level value corresponding to the noise level information satisfies a predetermined level; And when the noise offset information is extracted, performing noise filling on a current frame based on the noise level value and the noise level information, when executed by the processor. A computer readable storage medium having stored therein a command is provided.

본 발명은 다음과 같은 효과와 이점을 제공한다.The present invention provides the following effects and advantages.

첫째, 노이즈 필링 방식이 적용되지 않는 프레임에 대해서는, 노이즈 필링에 관한 정보의 전송을 생략할 수 있는, 비트스트림의 비트수를 현저히 절감할 수 있 다.First, for a frame to which the noise filling method is not applied, the number of bits in the bitstream can be significantly reduced, which can omit the transmission of information on the noise filling.

둘째, 현재 프레임이 노이즈 필링이 적용되는지 여부를 판단하여, 노이즈 필링에 관한 특정 정보를 비트스트림으로부터 추출해내는 것이기 때문에, 파싱하는 과정을 위한 복잡도(complexity)를 거의 증가시키지 않으면서도, 효율적으로 필요한 정보를 획득할 수 있다.Second, since the current frame determines whether noise filling is applied and extracts specific information about the noise filling from the bitstream, information necessary for efficient parsing without increasing complexity for the parsing process is required. Can be obtained.

셋째, 매 프레임마다 비슷한 값을 갖는 정보(예: 노이즈 레벨 또는 노이즈 옵셋)에 대해서, 그 값을 그대로 전송하지 않고, 이전 프레임의 해당 값과의 차분값을 전송함으로써, 비트수를 더욱 절감할 수 있다.Third, for information having similar values in each frame (e.g., noise level or noise offset), the number of bits can be further reduced by transmitting the difference value with the corresponding value of the previous frame without transmitting the value as it is. have.

이하 첨부된 도면을 참조로 본 발명의 바람직한 실시예를 상세히 설명하기로 한다.　 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Prior to this, terms or words used in the specification and claims should not be construed as having a conventional or dictionary meaning, and the inventors should properly explain the concept of terms in order to best explain their own invention. Based on the principle that can be defined, it should be interpreted as meaning and concept corresponding to the technical idea of the present invention. Therefore, the embodiments described in the specification and the drawings shown in the drawings are only the most preferred embodiment of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

본 발명에서 다음 용어는 다음과 같은 기준으로 해석될 수 있고, 기재되지 않은 용어라도 하기 취지에 따라 해석될 수 있다. 코딩은 경우에 따라 인코딩 또는 디코딩으로 해석될 수 있고, 정보(information)는 값(values), 파라미터(parameter), 계수(coefficients), 성분(elements) 등을 모두 아우르는 용어로서, 경우에 따라 의미는 달리 해석될 수 있는 바, 그러나 본 발명은 이에 한정되지 아니한다.In the present invention, the following terms may be interpreted based on the following criteria, and terms not described may be interpreted according to the following meanings. Coding can be interpreted as encoding or decoding in some cases, and information is a term that encompasses values, parameters, coefficients, elements, and so on. It may be interpreted otherwise, but the present invention is not limited thereto.

여기서 오디오 신호(audio signal)란, 광의로는, 비디오 신호와 구분되는 개념으로서, 재생 시 청각으로 식별할 수 있는 신호를 지칭하고, 협의로는, 음성(speech) 신호와 구분되는 개념으로서, 음성 특성이 없거나 적은 신호를 의미한다. 본 발명에서의 오디오 신호는 광의로 해석되어야 하며 음성 신호와 구분되어 사용될 때 협의의 오디오 신호로 이해될 수 있다.Here, the audio signal is a concept that is broadly distinguished from the video signal, and refers to a signal that can be visually identified during reproduction. The narrow signal is a concept that is distinguished from a speech signal. Means a signal with little or no characteristics. The audio signal in the present invention should be interpreted broadly and can be understood as a narrow audio signal when used separately from a voice signal.

도 1은 본 발명의 일 실시예에 따른 오디오 신호 처리 장치 중 인코더의 구성을 보여주는 도면이고, 도 2는 본 발명의 실시예에 따른 오디오 신호 처리 방법 중 인코딩 방법의 순서를 나타내는 도면이다.1 is a view showing the configuration of an encoder in an audio signal processing apparatus according to an embodiment of the present invention, Figure 2 is a view showing the sequence of the encoding method of the audio signal processing method according to an embodiment of the present invention.

우선, 도 1을 참조하면, 오디오 신호 처리 장치의 인코더(Encoder side: 100)는 노이즈 정보 인코딩 파트(noise information encoding part)(101)를 포함하고, 데이터 인코딩 파트(102), 및 엔트로피 코딩 파트(103), 손실 보상 추정 파트(loss compensation estimating part)(110) 및 멀티플렉서(multiplexer)(120)를 더 포함할 수 있다. 본 발명에 따른 오디오 신호 처리 장치는, 노이즈 레벨을 기반으로 노이즈 옵셋을 인코딩한다.First, referring to FIG. 1, an encoder side 100 of an audio signal processing apparatus includes a noise information encoding part 101, a data encoding part 102, and an entropy coding part ( 103, a loss compensation estimating part 110 and a multiplexer 120 may be further included. The audio signal processing apparatus according to the present invention encodes a noise offset based on the noise level.

손실 보상 추정 파트(110)는 양자화된 신호를 기반으로 노이즈 필링에 관한 정보를 생성한다. 여기서 노이즈 필링에 관한 정보는 노이즈 필링 플래그 정 보(noise filling flag information), 노이즈 레벨(noise level), 및 노이즈 옵셋(noise offset) 등일 수 있다. The loss compensation estimation part 110 generates information about noise filling based on the quantized signal. The information about the noise filling may include noise filling flag information, noise level, noise offset, and the like.

구체적으로, 손실 보상 추정 파트(110)는 우선, 양자화된 신호 및 코딩 식별 정보를 수신한다(S110 단계). 코딩 스킴 정보(coding scheme information)는 현재 프레임이 주파수 도메인 기반의 방식이 적용되는지, 아니면 시간 도메인 기반의 방식이 적용되는지 나타내는 정보로서, 신호 분류부(미도시)에 의해 생성된 정보일 수 있다. 손실 보상 추정 파트(110)는 주파수 도메인의 신호인 경우에만 노이즈 필링에 관한 정보를 생성할 수 있다. 이 코딩 스킴 정보는 멀티플렉서(120)에 전달될 수 있는데, 코딩 스킴 정보를 인코딩하기 위한 신택스의 일 예는 추후에 설명하고자 한다.In detail, the loss compensation estimation part 110 first receives the quantized signal and the coding identification information (step S110). The coding scheme information is information indicating whether the current frame is a frequency domain based scheme or a time domain based scheme, and may be information generated by a signal classifier (not shown). The loss compensation estimation part 110 may generate information about noise filling only when the signal is a frequency domain signal. This coding scheme information may be passed to the multiplexer 120, an example of syntax for encoding coding scheme information will be described later.

한편, 양자화란, 스펙트럴 계수로부터 스케일팩터 및 스펙트럴 데이터를 획득하는 과정이다. 여기서 스케일팩터 및 스펙트럴 데이터가 양자화된 신호이다. 여기서 스펙트럴 계수는 MDCT (Modified Discrete Transform) 변환을 통해 획득된 MDCT 계수일 수 있으나, 본 발명은 이에 한정되지 아니한다. 다시 말해서, 스펙트럴 계수는 아래 수학식 1과 같이 정수인 스케일 팩터, 정수인 스펙트럴 데이터를 이용하여 유사하게 표현될 수 있다. On the other hand, quantization is a process of obtaining scale factor and spectral data from spectral coefficients. Here, the scale factor and spectral data are quantized signals. Here, the spectral coefficient may be an MDCT coefficient obtained through a Modified Discrete Transform (MDCT) transformation, but the present invention is not limited thereto. In other words, the spectral coefficients may be similarly expressed using scale factors that are integers and spectral data that are integers as shown in Equation 1 below.

여기서, X는 스펙트럴 계수, scalefactor는 스케일 팩터, spectral data는 스펙트럴 데이터.Where X is spectral coefficient, scalefactor is scale factor, and spectral data is spectral data.

한편, 도 3은 양자화 개념을 설명하기 위한 도면이다. 도 3을 참조하면, 스펙트럴 계수(a, b, c 등)을 스케일팩터(A, B, C 등) 및 스펙트럴 데이터(a', b', c' 등)으로 나타내는 과정이 개념적으로 나타나있다. 스케일팩터(A, B, C 등)은 그룹(특정 밴드 또는 특정 구간)에 적용되는 팩터이다. 이와 같이 어떤 그룹(예: 스케일팩터 밴드)을 대표하는 스케일팩터를 이용하여, 그 그룹에 속하는 계수들의 크기를 일괄적으로 변환함으로써, 코딩 효율을 높일 수 있다. 이와 같이 결정된 스케일팩터 및 스펙트럴 데이터는 그대로 사용될 수도 있지만, 심리 음향 모델에 근거한 마스킹 과정을 통해 보정될 수 있는 바, 이에 대해서는 구체적인 설명을 생략하고자 한다.3 is a diagram for explaining a quantization concept. Referring to FIG. 3, a process of representing spectral coefficients (a, b, c, etc.) as scale factors (A, B, C, etc.) and spectral data (a ', b', c ', etc.) is conceptually shown. have. Scale factors (A, B, C, etc.) are factors applied to groups (specific bands or specific intervals). As such, by using a scale factor representing a group (eg, scale factor band), the coding efficiency can be improved by collectively converting the sizes of the coefficients belonging to the group. The scale factor and spectral data determined as described above may be used as they are, but may be corrected through a masking process based on a psychoacoustic model, which will not be described in detail.

손실 보상 추정 파트(110)는 이 스펙트럴 데이터를 근거로 하여 손실 신호가 존재하는 손실 영역을 판단한다. 도 4는 손실 신호(loss signal) 및 손실 영역의 개념을 설명하기 위한 도면이다. 도 4를 참조하면, 우선 각 스펙트럴 밴드(sfb₁, sfb₂, ,sfb₄) 별로 하나 이상의 스펙트럴 데이터가 존재함을 알 수 있다. 각 스펙트럴 데이터는0에서 7사이의 정수 값에 해당한다. 물론 스펙트럴 데이터는 0부터 7사이의 정수가 아니라 -50부터 100까지의 값이 될 수도 있으나, 도 4는 개념을 설명하기 위한 일 예로서 본 발명은 이에 한정되지 아니한다. 어떤 샘플, 빈(bin) 또는 영역에서는 스펙트럴 데이터의 절대값이 특정 값(예: 0)보다 이하인 값을 나타낼 때, 신호가 손실되었거나 또는 손실 영역이 존재한다고 판단될 수 있다. 만약 특정 값이 0인 경우, 도 4와 같은 경우, 두 번째 및 세 번째 스펙트럴 밴드(sfb₂, sfb₃)에서 손실 신호가 발생하였고, 세 번째 스펙트럴 밴드(sfb₃)인 경우 밴드 전체가 손실 영역에 해당한다고 할 수 있다. The loss compensation estimation part 110 determines a loss area in which a loss signal exists based on the spectral data. 4 is a diagram for describing the concept of a loss signal and a loss area. Referring to FIG. 4, it can be seen that at least one spectral data exists for each spectral band sfb ₁ , sfb ₂ , and sfb ₄ . Each spectral data corresponds to an integer value between 0 and 7. Of course, the spectral data may be a value from -50 to 100 instead of an integer of 0 to 7, but Figure 4 is an example for explaining the concept is not limited thereto. In some samples, bins, or regions, when the absolute value of the spectral data indicates a value that is less than or equal to a specific value (eg, 0), it may be determined that a signal is lost or a lossy region exists. If a specific value is 0, as shown in FIG. 4, a loss signal is generated in the second and third spectral bands sfb ₂ and sfb ₃ , and in the case of the _third spectral band sfb ₃ , the entire band is It can be said to correspond to a loss area.

손실 보상 추정 파트(110)는 손실 영역에 손실 신호를 보상하기 위해서, 복수 개의 프레임 또는, 하나의 스퀀스에 대해서 노이즈 필링 방식을 사용할 지 여부를 결정하고 이를 기반으로 노이즈 필링 플래그 정보를 생성한다. 즉, 노이즈 필링 플래그 정보는 복수 개의 프레임 또는 시퀀스에 대해서, 손실 신호를 보상하기 위해 노이즈 필링 방식이 사용되는지 여부를 지시하는 정보이다. 한편, 복수 개의 프레임 또는 시퀀스에 속하는 모든 프레임들에 대해 노이즈 필링 방식이 사용되는지 여부가 아니라, 그 중의 특정 프레임에 대해서 노이즈 필링 방식이 사용될 가능성이 있는지 여부를 지시하는 것이다. 노이즈 필링 플래그 정보는 복수 개의 프레임 또는 시퀀스 전체에 공통적인 정보에 해당하는 헤더에 포함될 수 있다. 여기서 생성된 노이즈 필링 플래그 정보는 멀티플렉서(120)로 전달된다. 도 5는 노이즈 필링 플래그 정보를 인코딩하기 위한 신택스의 일 예이다. 도 5의 (L1)을 참조하면, 상기 노이즈 필링 플래그 정보(noisefilling)가 시퀀스 전체에 공통으로 적용되는 정보(예: 프레임 길이, eSBR의 사용여부 등)를 전송하는 헤더(USACSpecificConfig())에 포함되어 있음을 알 수 있다. 만약 노이즈 필링 플래그 정보가 0인 경우에는 전체 시퀀스에 노이즈 필링 방식이 사용될 수 없음을 의미한다. 반대로, 노이즈 필링 플래그 정보가 1인 경우에는, 전체 시퀀스에 포함된 하나 이상의 프레임에 대해 노이즈 필링 방식이 사용됨을 의미할 수 있다.The loss compensation estimation part 110 determines whether to use a noise filling method for a plurality of frames or a sequence to compensate for a loss signal in the loss area, and generates noise filling flag information based on the loss compensation estimation part 110. That is, the noise filling flag information is information indicating whether a noise filling method is used to compensate for a lost signal for a plurality of frames or sequences. On the other hand, it is not whether the noise filling method is used for all the frames belonging to a plurality of frames or sequences, but whether the noise filling method is likely to be used for a specific frame therein. The noise filling flag information may be included in a header corresponding to information common to a plurality of frames or sequences. The generated noise filling flag information is transmitted to the multiplexer 120. 5 is an example of syntax for encoding noise filling flag information. Referring to (L1) of FIG. 5, the noise filling flag information (noisefilling) is included in a header (USACSpecificConfig ()) that transmits information (for example, frame length, eSBR usage, etc.) commonly applied to the entire sequence. It can be seen that. If the noise filling flag information is 0, it means that the noise filling method cannot be used for the entire sequence. In contrast, when the noise filling flag information is 1, it may mean that the noise filling method is used for one or more frames included in the entire sequence.

다시 도 1 및 2를 참조하면, 손실 보상 추정 파트(110)는 손실 신호가 존재하는 손실 영역에 대해서, 노이즈 레벨 및 노이즈 옵셋을 생성한다(S130 단계). 도 6는 노이즈 레벨 및 노이즈 옵셋을 설명하기 위한 도면이다. 도 6을 참조하면, 노이즈 레벨은, 스펙트럴 데이터가 손실된 영역에 대해서, 손실 신호를 대신하여 보상 신호(예: 랜덤 신호)를 생성할 수 있는데 이 보상 신호에 대한 레벨을 결정하는 정보가 노이즈 레벨이다. 노이즈 레벨 및 보상 신호(랜덤 신호)는 다음 수학식과 같이 표현될 수 있다. 노이즈 레벨은 프레임마다 결정될 수 있다. Referring back to FIGS. 1 and 2, the loss compensation estimation part 110 generates a noise level and a noise offset for a loss area in which a loss signal exists (S130). 6 is a diagram for explaining a noise level and a noise offset. Referring to FIG. 6, the noise level may generate a compensation signal (for example, a random signal) in place of a loss signal in an area where spectral data is lost, and information for determining the level of the compensation signal may be noise. Level. The noise level and the compensation signal (random signal) can be expressed by the following equation. The noise level can be determined for each frame.

spectral_data = noise _val X random_signalspectral_data = noise _val X random_signal

spectral_data은 스펙트럴 데이터, noise _val은 노이즈 레벨을 이용하여 획득된 값, random_signal은 랜덤 신호spectral_data is spectral data, noise _val is a value obtained using noise level, and random_signal is a random signal

한편, 노이즈 옵셋은 스케일팩터를 보정하기 위한 정보이다. 앞서 언급한 바와 같이 노이즈 레벨은 수학식 2에서 스펙트럴 데이터를 보정하기 위한 팩터인데, 노이즈 레벨의 값은 범위는 한계가 있다. 손실 영역에 대해서, 최종적으로 스펙트럴 계수를 큰 값으로 만들기 위해서, 노이즈 레벨을 통해 스펙트럴 데이터를 보정하기 보다는, 스케일팩터를 보정하는 것이 더욱 효과적일 수 있다. 이때, 스케일팩터를 보정하기 위한 값이 바로 노이즈 옵셋이다. 노이즈 옵셋 및 스케일팩터와의 관계는 다음 수학식과 같이 표현될 수 있다.On the other hand, the noise offset is information for correcting the scale factor. As mentioned above, the noise level is a factor for correcting spectral data in Equation 2, but the value of the noise level is limited in range. For the lossy region, it may be more effective to correct the scale factor, rather than correcting the spectral data through the noise level, to finally make the spectral coefficient large. At this time, a value for correcting the scale factor is a noise offset. The relationship between the noise offset and the scale factor may be expressed as the following equation.

sfc_d = sfc_c - noise_offsetsfc_d = sfc_c-noise_offset

sfc_c은 스케일팩터, sfc_d은 전송되는 스케일팩터, noise_offset은 노이즈 옵셋.sfc_c is the scale factor, sfc_d is the scale factor to be transmitted, and noise_offset is the noise offset.

여기서 노이즈 옵셋은, 전체 스펙트럴 밴드가 모두 손실 영역인 경우에만 적용되는 것일 수 있다. 예를 들어 도 6의 경우, 세 번째 스펙트럴 밴드(sfb₃)에만 노이즈 옵셋이 적용될 수 있는 것이다. 이는 하나의 스펙트럴 밴드 중 일부만 손실 영역이 있는 경우, 노이즈 옵셋을 적용하게 되면, 비 손실영역에 해당하는 스펙트럴 데이터의 비트수가 오히려 더 증가될 수 있기 때문이다.The noise offset may be applied only when the entire spectral band is a loss region. For example, in FIG. 6, the noise offset may be applied only to the _third spectral band sfb ₃ . This is because, when only a part of one spectral band has a lossy region, when the noise offset is applied, the number of bits of spectral data corresponding to the non-lossy region may be increased.

노이즈 정보 인코딩 파트(101)는 손실 보상 추정 파트(110)로부터 수신한 노이즈 레벨 값 및 노이즈 옵셋 값을 근거로, 노이즈 옵셋을 인코딩한다. 예를 들어 노이즈 레벨 값이 소정 조건(특정 레벨 범위)을 만족하는 경우에만 노이즈 옵셋 값을 인코딩할 수 있다. 예를 들어, 노이즈 레벨 값이 0을 초과하는 경우(S140 단계의 'no'), 노이즈 필링 방식을 수행하는 경우이기 때문에, 노이즈 옵셋 값을 데이터 코딩 파트(102)에 전달함으로써 노이즈 옵셋 정보를 비트스트림에 포함되도록 한다(S160 단계).The noise information encoding part 101 encodes the noise offset based on the noise level value and the noise offset value received from the loss compensation estimation part 110. For example, the noise offset value may be encoded only when the noise level value satisfies a predetermined condition (specific level range). For example, when the noise level value exceeds 0 ('no' in step S140), since the noise filling method is performed, the noise offset information is transmitted by passing the noise offset value to the data coding part 102. Included in the stream (step S160).

반대로, 노이즈 레벨 값이 0인 경우(S140 단계의 'yes')는 노이즈 필링 방식이 수행되지 않는 경우에 해당하기 때문에, 0인 노이즈 레벨 값만을 인코딩하고, 노이즈 옵셋 값은 비트스트림에서 제외되도록 한다(S150 단계).On the contrary, when the noise level value is 0 ('yes' in step S140), since the noise filling method is not performed, only the noise level value of 0 is encoded and the noise offset value is excluded from the bitstream. (Step S150).

도 7은 노이즈 레벨 및 노이즈 옵셋을 인코딩하기 위한 신택스의 일 예이다. 도 7의 (L1) 행을 참조하면, 현재 프레임이 주파수 도메인의 신호에 해당하는 경우임을 알 수 있다. 그리고 (L2) 및 (L3) 행을 참조하면, 노이즈 필링 플래그 정보(noiseFilling)가 1인 경우에만 노이즈 레벨 정보(noise_level)를 비트스트림에 포함시키는 것을 알 수 있다. 노이즈 필링 플래그 정보(noiseFilling)가 0인 경우에는 현재 프레임이 속하는 시퀀스 전체가 노이즈 필링이 적용되지 않음을 의미하기 때문이다. 한편 (L4) 및 (L5)행을 참조하면, 노이즈 레벨 값이 0보다 큰 경우에만 노이즈 옵셋 정보(noise_offset)가 비트스트림에 포함됨을 알 수 있다.7 is an example of syntax for encoding a noise level and a noise offset. Referring to (L1) of FIG. 7, it can be seen that the current frame corresponds to a signal of a frequency domain. Referring to (L2) and (L3), it can be seen that noise level information (noise_level) is included in the bitstream only when noise filling flag information (noiseFilling) is 1. This is because when the noise filling flag information (noiseFilling) is 0, it means that the noise filling is not applied to the entire sequence to which the current frame belongs. On the other hand, referring to (L4) and (L5), it can be seen that the noise offset information (noise_offset) is included in the bitstream only when the noise level value is greater than zero.

다시 도 1 및 도 2를 참조하면, 데이터 코딩 파트(102)는 차분 코딩 방식(differential coding scheme) 또는 파일럿 코딩 방식(pilot coding scheme)을 이용하여 노이즈 레벨 값 (및 노이즈 옵셋 값)을 데이터 코딩한다. 여기서 차분 코딩 방식이란, 이전 프레임의 노이즈 레벨 값과 현재 프레임의 노이즈 레벨 값과의 차분값만을 전송하는 방식으로서, 다음 수학식과 같이 나타낼 수 있다.Referring back to FIGS. 1 and 2, the data coding part 102 data codes a noise level value (and a noise offset value) using a differential coding scheme or a pilot coding scheme. . Here, the differential coding method is a method of transmitting only a difference value between a noise level value of a previous frame and a noise level value of a current frame, and may be expressed as in the following equation.

noise_info_diff_cur = noise_info_cur - noise_info_prev noise_info_diff_cur = noise_info_cur-noise_info_prev

noise_info_cur는 현재 프레임의 노이즈 레벨(또는 옵셋), noise_info_prev는 이전 프레임의 노이즈 레벨(또는 옵셋), noise_info_diff_cur는 차분값.noise_info_cur is the noise level (or offset) of the current frame, noise_info_prev is the noise level (or offset) of the previous frame, and noise_info_diff_cur is the difference value.

노이즈 정보 인코딩 파트(101)로부터 수신한 현재 프레임의 노이즈 레벨(또는 옵셋)에서, 이전 프레임의 노이즈 레벨(또는 옵셋)값을 뺀 차분 값만을 엔트로피 코딩 파트(103)로 전달하는 것이다.Only the difference value obtained by subtracting the noise level (or offset) value of the previous frame from the noise level (or offset) of the current frame received from the noise information encoding part 101 is transmitted to the entropy coding part 103.

한편, 파일럿 코딩 방식이란, 둘 이상의 프레임에 해당하는 노이즈 레벨(또는 옵셋) 값에 대응하는 기준값(예: 총 N개 프레임의 노이즈 레벨들(또는 옵셋들)의 평균값, 중간값, 최빈값 등)으로서 파일럿 값을 결정하고, 이 파일럿값과 현재 프레임의 노이즈 레벨(또는 옵셋)과의 차분값을 전송하는 것이다.On the other hand, the pilot coding method is a reference value corresponding to a noise level (or offset) value corresponding to two or more frames (eg, an average value, a median value, a mode value, etc. of noise levels (or offsets) of N frames in total). The pilot value is determined and the difference value between the pilot value and the noise level (or offset) of the current frame is transmitted.

noise_info_diff_cur = noise_info_cur - noise_info_pilot noise_info_diff_cur = noise_info_cur-noise_info_pilot

noise_info_cur는 현재 프레임의 노이즈 레벨(또는 옵셋), noise_info_pilot는 노이즈 레벨(또는 옵셋)의 파일럿, noise_info_diff_cur는 차분값.noise_info_cur is the noise level (or offset) of the current frame, noise_info_pilot is the pilot of the noise level (or offset), and noise_info_diff_cur is the difference value.

여기서 노이즈 레벨(또는 옵셋)의 파일럿은 헤더를 통해 전송될 수 있다. 여기서 헤더는 앞서 노이즈 필링 플래그 정보가 전송되는 헤더와 동일한 것일 수 있다. Here, the pilot of the noise level (or offset) may be transmitted through the header. Here, the header may be the same as the header through which the noise filling flag information is transmitted.

즉, 차분 코딩 방식 또는 파일럿 코딩 방식이 적용되는 경우, 현재 프레임의 노이즈 레벨 값이 그대로 비트스트림에 포함되는 노이즈 레벨 정보가 되는 것이 아니라, 노이즈 레벨 값의 차분값(DIFF 코딩의 차분값, 파일럿 코딩의 차분값)이 노이즈 레벨 정보가 되는 것이다.That is, when the differential coding scheme or the pilot coding scheme is applied, the noise level value of the current frame does not become noise level information included in the bitstream as it is, but the difference value of the noise level value (difference value of DIFF coding and pilot coding). Difference value) becomes noise level information.

이와 같이 노이즈 레벨 값이 차분 코딩 또는 파일럿 코딩을 수행함으로써 노이즈 레벨 정보가 생성되고(S170 단계 및 S180 단계), 노이즈 옵셋 값이 생성되는 경우에는, 이 노이즈 옵셋 값에 대해서도 차분 코딩 또는 파일럿 코딩을 수행함으로써, 노이즈 옵셋 정보가 생성된다(S180 단계). 이 노이즈 레벨 정보(및 노이즈 옵셋 정보)는 엔트로피 코딩 파트(103)에 전달된다. In this way, noise level information is generated by performing differential coding or pilot coding (steps S170 and S180), and when a noise offset value is generated, differential coding or pilot coding is also performed on the noise offset value. By doing so, noise offset information is generated (step S180). This noise level information (and noise offset information) is transmitted to the entropy coding part 103.

엔트로피 코딩 파트(103)는 노이즈 레벨 정보(및 노이즈 옵셋 정보)에 대해서 엔트로피 코딩을 수행한다. 노이즈 레벨 정보(및 노이즈 옵셋 정보)가 데이터 코딩 파트(102)를 통해서 차분 코딩 방식 또는 파일럿 코딩 방식에 따라서 코딩된 경우, 그 차분값에 해당하는 정보를 엔트로피 코딩 방식 중에서도 가변장 코딩 방식(예: Huffman 부호화)에 따라 인코딩할 수 있다. 이 차분값은 0이나 0과 비슷한 값일 수 있기 때문에 고정비트로 인코딩하기 보다는 가변 길이 코딩 방식에 따라 인코딩하면, 비트 수를 더욱 줄일 수 있다.Entropy coding part 103 performs entropy coding on noise level information (and noise offset information). When the noise level information (and the noise offset information) is coded according to the differential coding scheme or the pilot coding scheme through the data coding part 102, the information corresponding to the difference value is converted into the variable length coding scheme (eg, the entropy coding scheme). Huffman coding). Since the difference value may be 0 or a value similar to 0, encoding according to a variable length coding scheme rather than encoding with fixed bits may further reduce the number of bits.

멀티플렉서(120)는 신호 분류부(미도시)로부터 수신한 코딩 스킴 정보, 엔트로피 코딩 파트(103)를 통해 수신한 노이즈 레벨 정보 (및 노이즈 옵셋 정보), 손실 보상 추정 파트(110)를 통해 수신한 노이즈 필링 플래그 정보 및 양자화된 신호(스펙트럴 데이터 및 스케일팩터)를 멀티플렉싱하여 비트스트림을 생성한다. 노이즈 필링 플래그 정보를 인코딩하기 위한 신택스는 앞서 설명한 도 5와 같을 수 있고, 노이즈 레벨 정보(및 노이즈 옵셋 정보)를 인코딩하기 위한 신택스는 앞서 설명한 도 7과 같을 수 있다. The multiplexer 120 receives the coding scheme information received from the signal classifier (not shown), the noise level information (and noise offset information) received through the entropy coding part 103, and the loss compensation estimation part 110. The bit filling is generated by multiplexing the noise filling flag information and the quantized signal (spectral data and scale factor). The syntax for encoding the noise filling flag information may be the same as in FIG. 5 described above, and the syntax for encoding the noise level information (and the noise offset information) may be as in FIG. 7 described above.

도 8은 코딩 스킴 정보를 인코딩하기 위한 신택스의 일 예이다. 도 8의 (L1)을 참조하면, 현재 프레임이 주파수 도메인 기반 방식 또는 시간 도메인 기반의 방식이 적용되는지를 나타내는 코딩 스킴 정보(core_mode)가 포함되어 있음을 알 수 있다. (L2) 및 (L3) 행을 참조하면, 코딩 스킴 정보가 시간 도메인 기반 방식이 적용되는 것을 지시하는 경우, 시간 도메인 기반의 채널 스트림을 전송함을 알 수 있다. (L4) 및 (L5) 행을 참조하면, 코딩 스킴 정보가 주파수 도메인 기반 방식이 적 용되는 것을 지시하는 경우, 주파수 도메인 기반의 채널 스트림을 전송함을 알 수 있다. 앞서 언급한 바와 같이, 주파수 도메인 기반의 채널 스트림(fd_channel_stream())은 앞서 도 7과 함께 언급한 바와 같이, 노이즈 필링에 관한 정보(노이즈 레벨 정보(및 노이즈 옵셋 정보))를 포함할 수 있는 것이다.8 is an example of syntax for encoding coding scheme information. Referring to (L1) of FIG. 8, it can be seen that coding scheme information (core_mode) indicating whether the current frame is applied to the frequency domain based method or the time domain based method is included. Referring to (L2) and (L3), it can be seen that when the coding scheme information indicates that the time domain based scheme is applied, the time domain based channel stream is transmitted. Referring to (L4) and (L5), it can be seen that when the coding scheme information indicates that the frequency domain based scheme is applied, the channel stream based on the frequency domain is transmitted. As mentioned above, the frequency domain based channel stream fd_channel_stream () may include information on noise filling (noise level information (and noise offset information)), as mentioned above with reference to FIG. 7. .

이와 같이 본 발명의 실시예에 따른 오디오 신호 인코딩 장치 및 방법은, 노이즈 필링 방식이 사용될 수 있는 시퀀스 중에서, 특정 프레임에 대해 노이즈 필링 방식이 사실상 적용되는지 여부에 따라서, 노이즈 필링에 관한 정보(특히 노이즈 옵셋 정보)를 인코딩하거나 또는 인코딩을 생략한다.As described above, the apparatus and method for encoding an audio signal according to an embodiment of the present invention may include information on noise filling (especially noise) according to whether a noise filling method is actually applied to a specific frame among sequences in which the noise filling method may be used. Offset information) or encode.

도 9는 본 발명의 실시예에 따른 오디오 신호 처리 장치 중 디코더의 구성을 보여주는 도면이고, 도 10은 도9의 손실 보상 파트의 세부적인 구성을 보여주는 도면이다. 도 11은 본 발명의 실시예에 따른 오디오 신호 처리 방법 중 디코딩 방법의 순서를 나타내는 도면이다. FIG. 9 is a diagram illustrating a decoder of an audio signal processing apparatus according to an exemplary embodiment of the present invention, and FIG. 10 is a diagram illustrating a detailed configuration of a loss compensation part of FIG. 9. 11 is a diagram illustrating a procedure of a decoding method of an audio signal processing method according to an embodiment of the present invention.

우선 도 9 및 도 11을 참조하면, 오디오 신호 처리 장치 중 디코딩 장치(200)는 노이즈 정보 디코딩 파트(201)를 포함하고, 엔트로피 디코딩 파트(202), 데이터 디코딩 파트(203), 멀티플렉서(210), 손실 보상 파트(220) 및 스케일링 파트(230)를 더 포함할 수 있다. First, referring to FIGS. 9 and 11, the decoding apparatus 200 of the audio signal processing apparatus includes a noise information decoding part 201, an entropy decoding part 202, a data decoding part 203, and a multiplexer 210. The loss compensation part 220 and the scaling part 230 may be further included.

우선 멀티플렉서(210)는 비트스트림(특히 헤더)로부터 노이즈 필링 플래그 정보를 추출한다(S210 단계). 그런 다음 현재 프레임에 대한 코딩 스킴 정보 및 양자화된 신호를 수신한다(S220 단계). 노이즈 필링 플래그 정보, 코딩 스킴 정보, 양자화된 신호는 앞서 설명한 바와 같다. 노이즈 필링 플래그 정보는 복수 개의 프 레임에 대해 노이즈 필링 스킴이 사용되는지 여부에 대한 정보이고, 코딩 스킴 정보는 상기 복수 개의 프레임 중 현재 프레임이 주파수 도메인 기반의 방식이 적용되는지 시간 도메인 기반 방식이 적용되는지 여부를 나타내는 정보이다. 양자화된 신호는 주파수 도메인 기반 방식이 적용되는 경우, 스펙트럴 데이터 및 스케일팩터일 수 있다. 여기서 노이즈 필링 정보는 도 5에 표시된 신택스에 따라서 추출될 수 있고, 코딩 스킴 정보는 도 8에 표시된 신택스에 따라서 추출될 수 있다. 멀티플렉서(210)에서 추출된 노이즈 필링 정보 및 코딩 스킴 정보는 노이즈 정보 디코딩 파트(201)에 전달된다. First, the multiplexer 210 extracts noise filling flag information from a bitstream (particularly, a header) (step S210). Then, the coding scheme information and the quantized signal for the current frame are received (S220). The noise filling flag information, the coding scheme information, and the quantized signal are as described above. The noise filling flag information is information on whether a noise filling scheme is used for a plurality of frames, and the coding scheme information is information on whether a current frame is applied to a frequency domain based method or a time domain based method among the plurality of frames. Information indicating whether or not. The quantized signal may be spectral data and scale factor when a frequency domain based scheme is applied. The noise filling information may be extracted according to the syntax shown in FIG. 5, and the coding scheme information may be extracted according to the syntax shown in FIG. 8. The noise filling information and the coding scheme information extracted by the multiplexer 210 are transferred to the noise information decoding part 201.

노이즈 정보 디코딩 파트(201)는 노이즈 필링 플래그 정보 및 코딩 스킴 정보를 근거로 하여, 비트스트림으로부터 노이즈 필링에 관한 정보(노이즈 레벨 정보, 노이즈 옵셋 정보)를 추출한다. 구체적으로, 노이즈 필링 플래그 정보가 복수 개의 프레임에 대해 노이즈 필링 스킴이 사용될 수 있음을 지시하고(S230 단계의 yes), 현재 프레임이 주파수 도메인 기반의 방식이 적용되는 경우(S240 단계의 yes)는 노이즈 정보 디코딩 파트(201)는 비트스트림으로부터 노이즈 레벨 정보를 추출한다(S250 단계). 물론 S240 단계는 S230 단계 이전에 수행될 수도 있다. S230 단계 내지 S250 단계는 앞서 설명한 도 7의 (L1) 내지 (L3) 행에 표시된 신택스에 따라서 수행될 수 있다. 도 6과 함께 앞서 설명한 바와 마찬가지로, 노이즈 레벨 정보는 스펙트럴 데이터가 손실된 영역(또는 샘플, 빈)에 삽입되는 보상 신호(예: 랜덤 신호)의 레벨에 관한 정보이다.The noise information decoding part 201 extracts information on noise filling (noise level information, noise offset information) from the bitstream based on the noise filling flag information and the coding scheme information. Specifically, the noise filling flag information indicates that the noise filling scheme may be used for a plurality of frames (YES in step S230), and when the current frame is applied in a frequency domain based manner (yes in step S240). The information decoding part 201 extracts noise level information from the bitstream (S250). Of course, step S240 may be performed before step S230. Steps S230 to S250 may be performed according to the syntax indicated in the above-described lines (L1) to (L3) of FIG. 7. As described above with reference to FIG. 6, the noise level information is information about a level of a compensation signal (eg, a random signal) inserted into a region (or a sample or a bin) in which spectral data is lost.

S230 단계에서, 노이즈 필링 플래그 정보가 복수 개의 프레임 중 하나의 프 레임에 대해서도 노이즈 필링 스킴이 사용될 수 없음을 지시하는 경우(S230 단계의 no)는 노이즈 필링에 관한 어떤 단계도 수행되지 않고 절차가 종료될 수 있다. S240 단계에서도, 현재 프레임이 시간 도메인 기반의 방식이 적용된 프레임인 경우에도(S240 단계의 no) 노이즈 필링에 관한 절차가 수행되지 않을 수 있다.In step S230, when the noise filling flag information indicates that the noise filling scheme cannot be used for one frame of the plurality of frames (no in step S230), no step on noise filling is performed and the procedure ends. Can be. Even in step S240, even when the current frame is a frame to which a time domain based method is applied (no in step S240), a procedure regarding noise filling may not be performed.

역양자화 파트(미도시)는 수신된 스펙트럴 데이터를 역양자화하여 역양자화된 스펙트럴 데이터를 생성한다. 여기서 역양자화된 스펙트럴 데이터란, 상기 수학식 1에 나타난 바와 같이, 수신된 스펙트럴 데이터에 4/3승을 해주는 것이다.The dequantization part (not shown) dequantizes the received spectral data to generate dequantized spectral data. Here, the dequantized spectral data is a 4/3 power to the received spectral data, as shown in Equation 1 above.

노이즈 정보 디코딩 파트(201)는 S250 단계에서 노이즈 레벨 정보가 추출된 경우, 노이즈 레벨이 0보다 큰 경우에는 (노이즈 필링 스킴이 현재 프레임에 적용되는 경우이므로)(S260 단계의 yes), 비트스트림으로부터 노이즈 옵셋 정보를 추출한다(S270 단계). S260 단계 내지 S270 단계는 앞서 도 7의 (L4) 및 (L5)행에 표시된 신택스에 따라서 수행될 수 있다. 도 6과 함께 설명된 바와 마찬가지로, 노이즈 옵셋 정보는 특정 스케일팩터 밴드에 대응하는 스케일팩터를 보정하기 위한 정보이다. 여기서 특정 스케일팩터 밴드는 스펙트럴 데이터가 전부 손실된 스케일팩터 밴드일 수 있다. 이 노이즈 옵셋 정보가 획득된 경우에는 현재 프레임에 대한 역양자화된 스펙트럴 데이터 및 스케일팩터는 손실 보상 파트(220)를 통하게 되고, 노이즈 옵셋 정보가 획득되지 않은 경우에는 현재 프레임에 대한 역양자화 스펙트럴 데이터 및 스케일팩터는 손실 보상 파트(220)를 바이패스하고 바로 스케일링 파트(230)로 입력된다.If the noise level information is extracted in step S250, and the noise level is greater than 0 (since the noise filling scheme is applied to the current frame) (YES in step S260), the noise information decoding part 201 is determined from the bitstream. The noise offset information is extracted (step S270). Steps S260 to S270 may be performed according to the syntax previously described in the (L4) and (L5) lines of FIG. 7. As described with reference to FIG. 6, the noise offset information is information for correcting a scale factor corresponding to a specific scale factor band. In this case, the specific scale factor band may be a scale factor band in which all spectral data is lost. If this noise offset information is obtained, the dequantized spectral data and scale factor for the current frame are passed through the loss compensation part 220. If the noise offset information is not obtained, the inverse quantized spectral for the current frame is obtained. The data and scale factor bypass the loss compensation part 220 and directly enter the scaling part 230.

S250 단계에서 추출된 노이즈 레벨 정보, 및 S270 단계에서 추출된 노이즈 옵셋 정보는 엔트로피 디코딩 파트(202)에서 엔트로피 디코딩된다. 여기서 상기 정보들이 엔트로피 디코딩 방식 중 가변장 코딩 방식(예: Huffman 부호화)에 따라 인코딩된 경우, 가변장 디코딩 방식에 따라서 엔트로피 디코딩될 수 있다.The noise level information extracted in step S250 and the noise offset information extracted in step S270 are entropy decoded in the entropy decoding part 202. If the information is encoded according to a variable length coding scheme (eg, Huffman coding) among the entropy decoding schemes, the information may be entropy decoded according to the variable length decoding scheme.

데이터 디코딩 파트(203)는 엔트로피 디코딩된 노이즈 레벨 정보(및 노이즈 옵셋 정보)에 대해서 차분 방식 또는 파일럿 방식에 따라서 데이터 디코딩한다. 차분 방식(DIFF coding)에 해당하는 경우 다음 수학식에 따라서 현재 프레임의 노이즈 레벨(또는 옵셋)을 획득할 수 있다.The data decoding part 203 performs data decoding on entropy decoded noise level information (and noise offset information) according to a differential method or a pilot method. In the case of the DIFF coding, the noise level (or offset) of the current frame may be obtained according to the following equation.

noise_info_cur = noise_info_prev + noise_info_diff_curnoise_info_cur = noise_info_prev + noise_info_diff_cur

만약, 파일럿 코딩(Pilot coding)에 해당하는 경우 다음 수학식에 따라서 현재 프레임의 노이즈 레벨(또는 옵셋)을 획득할 수 있다.If it corresponds to pilot coding, the noise level (or offset) of the current frame can be obtained according to the following equation.

noise_info_cur = noise_info_pilot + noise_info_diff_curnoise_info_cur = noise_info_pilot + noise_info_diff_cur

여기서 노이즈 레벨(또는 옵셋)의 파일럿은 헤더에 포함된 정보일 수 있다. 이와 같이 획득된 노이즈 레벨(및 노이즈 옵셋)은 손실 보상 파트(230)에 전달된다. Here, the pilot of the noise level (or offset) may be information included in the header. The noise level (and noise offset) thus obtained is transferred to the loss compensation part 230.

손실 보상 파트(220)는 노이즈 레벨 및 노이즈 옵셋이 모두 획득되었을 경우, 이를 기반으로 하여 현재 프레임에 대해서 노이즈 필링을 수행한다(S280 단계). 손실 보상 파트(220)에 대한 세부 구성도가 도 10에 도시되어 있다.When both the noise level and the noise offset are acquired, the loss compensation part 220 performs noise peeling on the current frame based on this (step S280). A detailed schematic diagram of the loss compensation part 220 is shown in FIG. 10.

도 10을 참조하면, 손실 보상 파트(220)는 스펙트럴 데이터 필링 파트(222) 및 스케일팩터 보정 파트(224)를 포함한다. 스펙트럴 데이터 필링 파트(222)는 현재 프레임에 속하는 스펙트럴 데이터 중 손실 영역이 있는지 판단하고 손실 영역에 대해서는 노이즈 레벨을 이용하여 보상 신호를 채워넣는다. 수신된 스펙트럴 데이터를 분석한 결과, 스펙트럴 데이터가 소정 값 이하인 경우(예: 0인 경우) 그 해당 샘플을 손실 영역으로 판단한다. 손실 영역은 앞서 도 4에 도시된 바와 같을 수 있다. 앞서 수학식 2에 도시된 바와 같이 보상 신호(예: 랜덤 신호)에 노이즈 레벨값을 적용하여 손실 영역에 대응하는 스펙트럴 데이터를 생성할 수 있다. 이와 같이 보상 신호를 손실 영역에 채워넣음으로써 보상된 스펙트럴 데이터가 생성된다.Referring to FIG. 10, the loss compensation part 220 includes a spectral data filling part 222 and a scale factor correction part 224. The spectral data filling part 222 determines whether there is a loss area among the spectral data belonging to the current frame, and fills the compensation signal using the noise level for the loss area. As a result of analyzing the received spectral data, if the spectral data is less than or equal to a predetermined value (for example, 0), the corresponding sample is determined as a loss region. The loss area may be as shown in FIG. 4 above. As shown in Equation 2, spectral data corresponding to a lossy region may be generated by applying a noise level value to a compensation signal (for example, a random signal). In this way, the compensated spectral data is generated by filling the compensation signal in the loss region.

스케일팩터 보정 파트(224)는 수신된 스케일팩터를 노이즈 옵셋으로 보상한다. 다음 수학식에 따라서 스케일팩터를 보상할 수 있다. The scale factor correction part 224 compensates for the received scale factor with a noise offset. The scale factor can be compensated according to the following equation.

sfc_c = sfc_d + noise_offsetsfc_c = sfc_d + noise_offset

sfc_c은 보상된 스케일팩터, sfc_d은 전송된 스케일팩터, noise_offset은 노이즈 옵셋. sfc_c is the compensated scale factor, sfc_d is the transmitted scale factor, and noise_offset is the noise offset.

앞서 언급한 바와 같이, 노이즈 옵셋의 보상은 스케일팩터 밴드 전체가 손실 영역에 해당할 때, 그 스케일팩터 밴드에 대해서만 수행될 수 있다. 손실 보상 파 트(22)에서 생성된 보상된 스펙트럴 데이터 및 보상된 스케일팩터는 도 9에 도시된 스케일링 파트(230)로 입력된다. As mentioned above, the compensation of the noise offset may be performed only for the scale factor band when the entire scale factor band corresponds to the loss region. The compensated spectral data and the compensated scale factor generated in the loss compensation part 22 are input to the scaling part 230 shown in FIG. 9.

다시 도 9 및 도 11을 참조하면, 스케일링 파트(230)는 수신된 스펙트럴 데이터 또는 보상된 스펙트럴 데이터 중 하나를 스케일팩터 또는 보상된 스케일팩터를 이용하여 스케일링한다(S290 단계). 스케일링이란, 역양자화된 스펙트럴 데이터(아래 수학식에서 spectral_data^4/3)를 스케일팩터를 이용하여 다음 수학식과 같이 스펙트럴 계수를 획득하는 것이다.9 and 11, the scaling part 230 scales one of the received spectral data and the compensated spectral data using the scale factor or the compensated scale factor (step S290). Scaling is to obtain spectral coefficients using inverse quantized spectral data (spectral_data ^{4/3 in} the following equation) using a scale factor as shown in the following equation.

X'는 복원된 스펙트럴 계수, spectral_data는 수신된 스펙트럴 데이터 또는 보상된 스펙트럴 데이터, scalefactor는 수신된 스케일팩터 또는 보상된 스케일 팩터.X 'is the restored spectral coefficient, spectral_data is the received spectral data or compensated spectral data, and scalefactor is the received scale factor or compensated scale factor.

본 발명의 실시예에 따른 오디오 신호 처리 장치 중 디코딩 장치는 상기와 같은 단계를 수행함으로써 노이즈 필링에 관한 정보를 획득함으로써 노이즈 필링을 수행한다.The decoding apparatus of the audio signal processing apparatus according to the embodiment of the present invention performs noise filling by acquiring information on noise filling by performing the above steps.

도 12는 본 발명의 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 인코딩 장치의 일 예이고, 도 13는 본 발명의 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 디코딩 장치의 일 예이다. 12 is an example of an audio signal encoding apparatus to which an audio signal processing apparatus according to an embodiment of the present invention is applied, and FIG. 13 is an example of an audio signal decoding apparatus to which an audio signal processing apparatus according to an embodiment of the present invention is applied.

도 12의 오디오 신호 처리 장치(100)가 도 1과 함께 설명된 노이즈 정보 인코딩 파트(101)를 포함하고, 데이터 코딩 파트(102) 및 엔트로피 코딩 파트(103)를 더 포함할 수 있다. 도 13의 오디오 신호 처리 장치(200)가 도 9와 함께 설명된 노이즈 정보 디코딩 파트(201)를 포함하고, 엔트로피 디코딩 파트(202) 및 데이터 디코딩 파트(203)를 더 포함할 수 있다.The audio signal processing apparatus 100 of FIG. 12 may include the noise information encoding part 101 described with reference to FIG. 1, and may further include a data coding part 102 and an entropy coding part 103. The audio signal processing apparatus 200 of FIG. 13 may include the noise information decoding part 201 described with reference to FIG. 9, and may further include an entropy decoding part 202 and a data decoding part 203.

우선 도 12를 참조하면, 오디오 신호 인코딩 장치(300)는 복수 채널 인코더(310), 밴드 확장 코딩 유닛(320), 오디오 신호 인코더(330), 음성 신호 인코더(350), 손실 보상 추정 유닛(350), 오디오 신호 처리 장치(100) 및 멀티플렉서(360)를 포함한다.First, referring to FIG. 12, an audio signal encoding apparatus 300 includes a multi-channel encoder 310, a band extension coding unit 320, an audio signal encoder 330, a voice signal encoder 350, and a loss compensation estimation unit 350. ), An audio signal processing apparatus 100, and a multiplexer 360.

복수채널 인코더(310)는 복수의 채널 신호(둘 이상의 채널 신호)(이하, 멀티채널 신호)를 입력받아서, 다운믹스를 수행함으로써 모노 또는 스테레오의 다운믹스 신호를 생성하고, 다운믹스 신호를 멀티채널 신호로 업믹스하기 위해 필요한 공간 정보를 생성한다. 여기서 공간 정보(spatial information)는, 채널 레벨 차이 정보, 채널간 상관정보, 채널 예측 계수, 및 다운믹스 게인 정보 등을 포함할 수 있다. 만약, 오디오 신호 인코딩 장치(300)가 모노 신호를 수신할 경우, 복수 채널 인코더(310)는 모노 신호에 대해서 다운믹스하지 않고 바이패스할 수도 있음은 물론이다.The multi-channel encoder 310 receives a plurality of channel signals (two or more channel signals) (hereinafter, referred to as a multi-channel signal), performs downmixing to generate a mono or stereo downmix signal, and multi-channels the downmix signal. Generates spatial information needed for upmixing to a signal. The spatial information may include channel level difference information, interchannel correlation information, channel prediction coefficients, downmix gain information, and the like. If the audio signal encoding apparatus 300 receives the mono signal, the multi-channel encoder 310 may bypass the mono signal without downmixing.

대역 확장 인코더(320)는 복수채널 인코더(310)의 출력인 다운믹스 신호에 대역 확장 방식을 적용하여, 저주파 대역에 대응하는 스펙트럴 데이터 및, 고주파 대역 확장을 위한 대역확장정보를 생성할 수 있다. 즉, 다운믹스 신호의 일부 대 역(예: 고주파 대역)의 스펙트럴 데이터가 제외되고, 이 제외된 데이터를 복원하기 위한 대역확장정보가 생성될 수 있다.The band extension encoder 320 may apply a band extension method to the downmix signal output from the multi-channel encoder 310 to generate spectral data corresponding to a low frequency band and band extension information for high frequency band extension. . That is, the spectral data of some bands (eg, high frequency bands) of the downmix signal may be excluded, and band extension information for restoring the excluded data may be generated.

대역 확장 코딩 유닛(320)을 통해 생성된 신호는 신호 분류부(미도시)에서 생성된 코딩 스킴 정보에 따라서, 오디오 신호 인코더(330) 또는 음성 신호 디코더(340)에 입력된다.The signal generated through the band extension coding unit 320 is input to the audio signal encoder 330 or the voice signal decoder 340 according to the coding scheme information generated by the signal classification unit (not shown).

오디오 신호 인코더(330)는 다운믹스 신호의 특정 프레임 또는 특정 세그먼트가 큰 오디오 특성을 갖는 경우, 오디오 코딩 방식(audio coding scheme)에 따라 다운믹스 신호를 인코딩한다. 여기서 오디오 코딩 방식은 AAC (Advanced Audio Coding) 표준 또는 HE-AAC (High Efficiency Advanced Audio Coding) 표준에 따른 것일 수 있으나, 본 발명은 이에 한정되지 아니한다. 한편, 오디오 신호 인코더(330)는, MDCT(Modified Discrete Transform) 인코더에 해당할 수 있다.The audio signal encoder 330 encodes the downmix signal according to an audio coding scheme when a specific frame or a specific segment of the downmix signal has a large audio characteristic. Here, the audio coding scheme may be based on an AAC standard or a high efficiency advanced audio coding (HE-AAC) standard, but the present invention is not limited thereto. The audio signal encoder 330 may correspond to a modified discrete transform (MDCT) encoder.

음성 신호 인코더(340)는 다운믹스 신호의 특정 프레임 또는 특정 세그먼트가 큰 음성 특성을 갖는 경우, 음성 코딩 방식(speech coding scheme)에 따라서 다운믹스 신호를 인코딩한다. 여기서 음성 코딩 방식은 AMR-WB(Adaptive multi-rate Wide-Band) 표준에 따른 것일 수 있으나, 본 발명은 이에 한정되지 아니한다. 한편, 음성 신호 인코더(340)는 선형 예측 부호화(LPC: Linear Prediction Coding) 방식을 더 이용할 수 있다. 하모닉 신호가 시간축 상에서 높은 중복성을 가지는 경우, 과거 신호로부터 현재 신호를 예측하는 선형 예측에 의해 모델링될 수 있는데, 이 경우 선형 예측 부호화 방식을 채택하면 부호화 효율을 높을 수 있다. 한편, 음성 신호 인코더(340)는 타임 도메인 인코더에 해당할 수 있다.The speech signal encoder 340 encodes the downmix signal according to a speech coding scheme when a specific frame or a segment of the downmix signal has a large speech characteristic. Here, the speech coding scheme may be based on an adaptive multi-rate wide-band (AMR-WB) standard, but the present invention is not limited thereto. Meanwhile, the speech signal encoder 340 may further use a linear prediction coding (LPC) method. When the harmonic signal has high redundancy on the time axis, the harmonic signal may be modeled by linear prediction that predicts the current signal from the past signal. In this case, the linear prediction coding method may increase coding efficiency. Meanwhile, the voice signal encoder 340 may correspond to a time domain encoder.

손실 보상 추정 유닛(350)은 앞서 도 1과 함께 설명된 손실 보상 추정 유닛(110)과 동일한 기능을 수행하는 것일 수 있기 때문에, 이에 대한 구체적인 설명은 생략하고자 한다.Since the loss compensation estimation unit 350 may perform the same function as the loss compensation estimation unit 110 described above with reference to FIG. 1, a detailed description thereof will be omitted.

오디오 신호 처리 유닛(100)은 도 1과 함께 설명된 노이즈 정보 인코딩 파트(101)를 포함함으로써, 손실 보상 추정 유닛(350)에 의해 생성된 노이즈 레벨 및 노이즈 옵셋을 인코딩한다.The audio signal processing unit 100 includes the noise information encoding part 101 described with reference to FIG. 1, thereby encoding the noise level and the noise offset generated by the loss compensation estimation unit 350.

멀티플렉서(360)는 공간 정보, 대역확장 정보, 오디오 신호 인코더(330) 내지 음성 신호 인코더(340) 각각에 의해 인코딩된 신호, 노이즈 필링 플래그 정보, 및 오디오 신호 처리 유닛(100)에 의해 생성된 노이즈 레벨 정보(및 노이즈 옵셋 정보)를 멀티플렉싱함으로써, 하나 이상의 비트스트림을 생성한다.The multiplexer 360 includes spatial information, bandwidth extension information, a signal encoded by each of the audio signal encoder 330 to the voice signal encoder 340, noise filling flag information, and noise generated by the audio signal processing unit 100. By multiplexing the level information (and noise offset information), one or more bitstreams are generated.

도 13을 참조하면, 오디오 신호 디코딩 장치(400)는 디멀티플렉서(410), 오디오 신호 처리 장치(200), 손실 보상 파트(420), 스케일링 파트(430), 오디오 신호 디코더(440), 음성 신호 디코더(450), 밴드 확장 디코딩 유닛(460) 및 복수 채널 디코더(470)를 포함한다.Referring to FIG. 13, the audio signal decoding apparatus 400 includes a demultiplexer 410, an audio signal processing apparatus 200, a loss compensation part 420, a scaling part 430, an audio signal decoder 440, and a voice signal decoder. 450, band extension decoding unit 460, and multi-channel decoder 470.

디멀티플렉서(410)는 오디오신호 비트스트림으로부터 노이즈 필링 플래그 정보, 양자화된 신호, 코딩 스킴 정보, 밴드 확장 정보, 공간 정보 등을 추출한다. The demultiplexer 410 extracts noise filling flag information, quantized signals, coding scheme information, band extension information, spatial information, and the like from the audio signal bitstream.

오디오 신호 처리 유닛(200)은 앞서 언급한 바와 같이 도 9와 함께 설명된 노이즈 정보 디코딩 유닛(201)을 포함하고, 노이즈 필링 플래그 정보 및 코딩 스킴 정보를 기반으로 비트스트림으로부터 노이즈 레벨 정보(및 노이즈 옵셋 정보)를 획득한다.The audio signal processing unit 200 includes the noise information decoding unit 201 described with reference to FIG. 9 as mentioned above, and based on the noise filling flag information and the coding scheme information, noise level information (and noise) from the bitstream. Offset information).

역양자화 유닛(미도시)는 수신된 스펙트럴 데이터에 대해 역양자화를 수행하여 역양자화된 스펙트럴 데이터를 손실 보상 파트(420)에 바로 전달하거나, 노이즈 필링이 스킵되는 경우 손실 보상 파트(420)를 바이패스하고 스케일링 파트(430)에 전달한다.The inverse quantization unit (not shown) performs inverse quantization on the received spectral data to directly transfer the dequantized spectral data to the loss compensation part 420, or when the noise filling is skipped, the loss compensation part 420. Bypasses and passes to the scaling part 430.

손실 보상 파트(420)는 도 9와 함께 설명된 손실 보상 파트(220)와 동일한 구성요소로서, 현재 프레임에 대해 노이즈 필링이 적용되는 경우, 노이즈 레벨 및 노이즈 옵셋을 이용하여 현재 프레임에 대해 노이즈 필링을 수행한다. The loss compensation part 420 is the same component as the loss compensation part 220 described with reference to FIG. 9, and when noise filling is applied to the current frame, noise filling is performed on the current frame using the noise level and the noise offset. Do this.

스케일링 파트(430)는 도 9와 함께 설명된 스케일링 파트(230)와 동일한 구성요소로서, 역양자화된 스펙트럴 데이터 또는 보상된 스펙트럴 데이터에 대해서 스케일링을 수행함으로써, 스펙트럴 계수를 획득한다.Scaling part 430 is the same component as scaling part 230 described with reference to FIG. 9 and obtains spectral coefficients by performing scaling on dequantized spectral data or compensated spectral data.

오디오 신호 디코더(440)는, 오디오 신호(예: 스펙트럴 계수)가 오디오 특성이 큰 경우, 오디오 코딩 방식으로 오디오 신호를 디코딩한다. 여기서 오디오 코딩 방식은 앞서 설명한 바와 같이, AAC 표준, HE-AAC 표준에 따를 수 있다. 음성 신호 디코더(450)는 상기 오디오 신호가 음성 특성이 큰 경우, 음성 코딩 방식으로 다운믹스 신호를 디코딩한다. 음성 코딩 방식은, 앞서 설명한 바와 같이, AMR-WB 표준에 따를 수 있지만, 본 발명은 이에 한정되지 아니한다. The audio signal decoder 440 decodes the audio signal by an audio coding method when the audio signal (eg, spectral coefficient) has a large audio characteristic. As described above, the audio coding scheme may be based on the AAC standard and the HE-AAC standard. The speech signal decoder 450 decodes the downmix signal using a speech coding scheme when the audio signal has a large speech characteristic. As described above, the speech coding scheme may conform to the AMR-WB standard, but the present invention is not limited thereto.

대역 확장 디코딩 유닛(460)는 오디오 신호 디코더(440) 및 음성 신호 디코더(450) 의 출력 신호에 대해서, 대역 확장 디코딩 방식을 수행함으로써, 대역 확장 정보를 기반으로 고주파 대역의 신호를 복원한다.The band extension decoding unit 460 performs a band extension decoding method on the output signals of the audio signal decoder 440 and the voice signal decoder 450, thereby restoring a high frequency band signal based on the band extension information.

복수 채널 디코더(470)은 디코딩된 오디오 신호가 다운믹스인 경우, 공간정 보를 이용하여 멀티채널 신호(스테레오 신호 포함)의 출력 채널 신호를 생성한다.When the decoded audio signal is downmixed, the multichannel decoder 470 generates an output channel signal of a multichannel signal (including a stereo signal) using spatial information.

본 발명에 따른 오디오 신호 처리 장치는 다양한 제품에 포함되어 이용될 수 있다. 이러한 제품은 크게 스탠드 얼론(stand alone) 군과 포터블(portable) 군으로 나뉠 수 있는데, 스탠드 얼론군은 티비, 모니터, 셋탑 박스 등을 포함할 수 있고, 포터블군은 PMP, 휴대폰, 네비게이션 등을 포함할 수 있다.The audio signal processing apparatus according to the present invention can be included and used in various products. These products can be broadly divided into stand alone and portable groups, which can include TVs, monitors and set-top boxes, and portable groups include PMPs, mobile phones, and navigation. can do.

도 14는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계를 보여주는 도면이다. 우선 도 14를 참조하면, 유무선 통신부(510)는 유무선 통신 방식을 통해서 비트스트림을 수신한다. 구체적으로 유무선 통신부(510)는 유선통신부(510A), 적외선통신부(510B), 블루투스부(510C), 무선랜통신부(510D) 중 하나 이상을 포함할 수 있다.14 is a diagram illustrating a relationship between products in which an audio signal processing device according to an embodiment of the present invention is implemented. First, referring to FIG. 14, the wired / wireless communication unit 510 receives a bitstream through a wired / wireless communication method. Specifically, the wired / wireless communication unit 510 may include at least one of a wired communication unit 510A, an infrared communication unit 510B, a Bluetooth unit 510C, and a wireless LAN communication unit 510D.

사용자 인증부는(520)는 사용자 정보를 입력 받아서 사용자 인증을 수행하는 것으로서 지문인식부(520A), 홍채인식부(520B), 얼굴인식부(520C), 및 음성인식부(520D) 중 하나 이상을 포함할 수 있는데, 각각 지문, 홍채정보, 얼굴 윤곽 정보, 음성 정보를 입력받아서, 사용자 정보로 변환하고, 사용자 정보 및 기존 등록되어 있는 사용자 데이터와의 일치여부를 판단하여 사용자 인증을 수행할 수 있다. The user authentication unit 520 receives user information and performs user authentication. The user authentication unit 520 receives one or more of a fingerprint recognition unit 520A, an iris recognition unit 520B, a face recognition unit 520C, and a voice recognition unit 520D. The fingerprint, iris information, facial contour information, and voice information may be input, converted into user information, and the user authentication may be performed by determining whether the user information matches the existing registered user data. .

입력부(530)는 사용자가 여러 종류의 명령을 입력하기 위한 입력장치로서, 키패드부(530A), 터치패드부(530B), 리모컨부(530C) 중 하나 이상을 포함할 수 있지만, 본 발명은 이에 한정되지 아니한다. The input unit 530 is an input device for a user to input various types of commands, and may include one or more of a keypad unit 530A, a touch pad unit 530B, and a remote controller unit 530C. It is not limited.

신호 코딩 유닛(540)는 유무선 통신부(510)를 통해 수신된 오디오 신호 및/또는 비디오 신호에 대해서 인코딩 또는 디코딩을 수행하고, 시간 도메인의 오디오 신호를 출력한다. 오디오 신호 처리 장치(545)를 포함하는데, 이는 앞서 설명한 본 발명의 실시예(즉, 인코딩 측(100) 및/또는 디코딩 측(200))에 해당하는 것으로서, 이와 같이 오디오 처리 장치(545) 및 이를 포함한 신호 코딩 유닛은 하나 이상의 프로세서에 의해 구현될 수 있다.The signal coding unit 540 encodes or decodes an audio signal and / or a video signal received through the wired / wireless communication unit 510, and outputs an audio signal of a time domain. Audio signal processing device 545, which corresponds to the embodiment of the present invention described above (i.e., encoding side 100 and / or decoding side 200). The signal coding unit including this may be implemented by one or more processors.

제어부(550)는 입력장치들로부터 입력 신호를 수신하고, 신호 디코딩부(540)와 출력부(560)의 모든 프로세스를 제어한다. 출력부(560)는 신호 디코딩부(540)에 의해 생성된 출력 신호 등이 출력되는 구성요소로서, 스피커부(560A) 및 디스플레이부(560B)를 포함할 수 있다. 출력 신호가 오디오 신호일 때 출력 신호는 스피커로 출력되고, 비디오 신호일 때 출력 신호는 디스플레이를 통해 출력된다.The controller 550 receives input signals from the input devices, and controls all processes of the signal decoding unit 540 and the output unit 560. The output unit 560 is a component that outputs an output signal generated by the signal decoding unit 540, and may include a speaker unit 560A and a display unit 560B. When the output signal is an audio signal, the output signal is output to the speaker, and when the output signal is a video signal, the output signal is output through the display.

도 15는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계도이다. 도 15는 도 14에서 도시된 제품에 해당하는 단말 및 서버와의 관계를 도시한 것으로서, 도 15의 (A)를 참조하면, 제1 단말(500.1) 및 제2 단말(500.2)이 각 단말들은 유무선 통신부를 통해서 데이터 내지 비트스트림을 양방향으로 통신할 수 있음을 알 수 있다. 도 15의 (B)를 참조하면, 서버(600) 및 제1 단말(500.1) 또한 서로 유무선 통신을 수행할 수 있음을 알 수 있다.15 is a relationship diagram of products in which an audio signal processing apparatus according to an embodiment of the present invention is implemented. FIG. 15 illustrates a relationship between a terminal and a server corresponding to the product illustrated in FIG. 14. Referring to FIG. 15A, the first terminal 500. 1 and the second terminal 500. It can be seen that the data to the bitstream can be bidirectionally communicated through the wired / wireless communication unit. Referring to FIG. 15B, it can be seen that the server 600 and the first terminal 500.1 may also perform wired / wireless communication with each other.

본 발명에 따른 오디오 신호 처리 방법은 컴퓨터에서 실행되기 위한 프로그램으로 제작되어 컴퓨터가 읽을 수 있는 기록 매체에 저장될 수 있으며, 본 발명에 따른 데이터 구조를 가지는 멀티미디어 데이터도 컴퓨터가 읽을 수 있는 기록 매체에 저장될 수 있다. 상기 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 저장 장치를 포함한다. 컴퓨 터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 상기 인코딩 방법에 의해 생성된 비트스트림은 컴퓨터가 읽을 수 있는 기록 매체에 저장되거나, 유/무선 통신망을 이용해 전송될 수 있다.The audio signal processing method according to the present invention can be stored in a computer-readable recording medium which is produced as a program for execution in a computer, and multimedia data having a data structure according to the present invention can also be stored in a computer-readable recording medium. Can be stored. The computer readable recording medium includes all kinds of storage devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disks, optical data storage devices, and the like, which are also implemented in the form of carrier waves (for example, transmission over the Internet). It also includes. In addition, the bitstream generated by the encoding method may be stored in a computer-readable recording medium or transmitted using a wired / wireless communication network.

이상과 같이, 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 이것에 의해 한정되지 않으며 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술사상과 아래에 기재될 특허청구범위의 균등범위 내에서 다양한 수정 및 변형이 가능함은 물론이다. As described above, although the present invention has been described by way of limited embodiments and drawings, the present invention is not limited thereto and is intended by those skilled in the art to which the present invention pertains. Of course, various modifications and variations are possible within the scope of equivalents of the claims to be described.

본 발명은 오디오 신호를 처리하고 출력하는 데 적용될 수 있다.The present invention can be applied to process and output audio signals.

도 1은 본 발명의 실시예에 따른 오디오 신호 처리 장치 중 인코더의 구성도.1 is a block diagram of an encoder in an audio signal processing apparatus according to an embodiment of the present invention.

도 2는 본 발명의 실시예에 따른 오디오 신호 처리 방법 중 인코딩 방법의 순서도.2 is a flowchart of an encoding method of an audio signal processing method according to an embodiment of the present invention.

도 3은 양자화 개념을 설명하기 위한 도면.3 is a diagram for explaining a quantization concept.

도 4는 손실 신호(loss signal) 및 손실 영역의 개념을 설명하기 위한 도면.4 is a diagram for explaining a concept of a loss signal and a loss area.

도 5는 노이즈 필링 플래그 정보를 인코딩하기 위한 신택스의 일 예.5 is an example of syntax for encoding noise filling flag information.

도 6은 노이즈 레벨 및 노이즈 옵셋을 설명하기 위한 도면.6 is a diagram for explaining a noise level and a noise offset.

도 7은 노이즈 레벨 및 노이즈 옵셋을 인코딩하기 위한 신택스의 일 예.7 is an example of syntax for encoding noise level and noise offset.

도 8은 코딩 스킴 정보를 인코딩하기 위한 신택스의 일 예.8 is an example of syntax for encoding coding scheme information.

도 9는 본 발명의 실시예에 따른 오디오 신호 처리 장치 중 디코더의 구성도. 9 is a block diagram of a decoder in an audio signal processing apparatus according to an embodiment of the present invention.

도 10은 도9의 손실 보상 파트의 세부 구성도.10 is a detailed configuration diagram of the loss compensation part of FIG. 9;

도 11은 본 발명의 실시예에 따른 오디오 신호 처리 방법 중 디코딩 방법의 순서도.11 is a flowchart of a decoding method of an audio signal processing method according to an embodiment of the present invention.

도 12는 본 발명의 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 인코딩 장치의 일 예.12 is an example of an audio signal encoding apparatus to which an audio signal processing apparatus according to an embodiment of the present invention is applied.

도 13는 본 발명의 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 디코딩 장치의 일 예.13 is an example of an audio signal decoding apparatus to which an audio signal processing apparatus according to an embodiment of the present invention is applied.

도 14는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품의 개략적인 구성도. 14 is a schematic configuration diagram of a product on which an audio signal processing device according to an embodiment of the present invention is implemented.

도 15는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계도.15 is a relationship diagram of products in which an audio signal processing device according to an embodiment of the present invention is implemented.

Claims

Extracting noise filling flag information indicating whether noise filling is used for a plurality of frames;

Extracting coding scheme information indicating whether a current frame including the plurality of frames is coded in a frequency domain or in a time domain;

Extracting noise level information for a current frame when the noise filling flag information indicates that the noise filling is used in the plurality of frames and the coding information indicates that the current frame is coded in the frequency domain;

Extracting noise offset information for a current frame when a noise level value corresponding to the noise level information satisfies a predetermined level; And,

And if the noise offset information is extracted, performing noise filling on a current frame based on the noise level value and the noise level information.

The method of claim 1,

The noise filling,

Determining a loss area of the current frame using the spectral data of the current frame;

Generating compensated spectral data by filling a compensation signal in the loss region using the noise level value; And

And generating a compensated scale factor based on the noise offset information.

The method of claim 1,

Extracting a level pilot value representing a reference value of the noise level and an offset pilot value representing the reference value of the noise offset;

Obtaining the noise level value by adding the level pilot value and the noise level information; And,

Obtaining the noise offset value by adding the offset pilot value and the noise offset information when the noise offset information is extracted;

The noise filling is performed using the noise level value and the noise offset value.

The method of claim 1,

Acquiring a noise level value of the current frame by using a noise level value of a previous frame and noise level information of the current frame;

Obtaining the noise offset value of the current frame by using the noise offset value of the previous frame and the noise offset information of the current frame when the noise offset information is extracted.

The noise filling is performed by using the noise level value and the noise offset value.

The method of claim 1,

The noise level information and the noise offset information is extracted according to a variable length coding scheme.

A multiplexer for extracting noise filling flag information indicating whether noise filling is used in a plurality of frames and coding scheme information indicating whether a current frame including the plurality of frames is coded in a frequency domain or a time domain;

When the noise filling flag information indicates that the noise filling is used in the plurality of frames and the coding information indicates that the current frame is coded in the frequency domain, extracting noise level information for a current frame, and extracting the noise A noise information decoding part for extracting noise offset information for the current frame when a noise level value corresponding to the level information satisfies a predetermined level; And,

And a loss compensation part for performing noise filling on the current frame based on the noise level value and the noise level information when the noise offset information is extracted.

The method of claim 6,

The loss compensation part,

Determine the loss region of the current frame by using the spectral data of the current frame,

By using the noise level value to fill a compensation signal in the loss region, to generate compensated spectral data

The method of claim 6,

Extract a level pilot value representing a reference value of the noise level, and an offset pilot value representing the reference value of the noise offset,

Obtain the noise level value by adding the level pilot value and the noise level information,

When the noise offset information is extracted, further comprising a data decoding part for obtaining a noise offset value by adding the offset pilot value and the noise offset information,

The method of claim 6,

Obtain a noise level value of the current frame by using a noise level value of a previous frame and noise level information of the current frame,

If the noise offset information is extracted, further comprising a data decoding part for obtaining a noise offset value of the current frame by using the noise offset value of the previous frame and the noise offset information of the current frame,

The method of claim 6,

And the noise level information and the noise offset information are extracted according to a variable length coding scheme.

Generating a noise level value and a noise offset value based on the quantized signal;

Generating noise filling flag information indicating whether noise filling is used for the plurality of frames;

Generating coding scheme information indicating whether a current frame included in the plurality of frames is coded in a frequency domain or a time domain;

When the noise filling flag information indicates that noise filling is used for the plurality of frames and the coding scheme information indicates whether the current frame is coded in the frequency domain, the noise frame value of the current frame corresponding to the noise level value. Inserting noise level information into the bitstream; And

And inserting noise offset information corresponding to the noise offset value into the bitstream when the noise level value satisfies a predetermined level.

A loss compensation estimation part for generating a noise level value and a noise offset value based on the quantized signal, and generating noise filling flag information indicating whether noise filling is used in a plurality of frames;

A signal classifier configured to generate coding scheme information indicating whether a current frame included in the plurality of frames is coded in a frequency domain or a time domain;

When the noise filling flag information indicates that noise filling is used for the plurality of frames and the coding scheme information indicates whether the current frame is coded in the frequency domain, the noise frame value of the current frame corresponding to the noise level value. And inserting noise level information into the bitstream and inserting noise offset information corresponding to the noise offset value into the bitstream when the noise level value satisfies a predetermined level. Audio signal processing device.

When the noise offset information is extracted, performing the operations including performing the noise filling on the current frame based on the noise level value and the noise level information, when executed by the processor. The computer-readable storage medium on which this is stored.