KR101108955B1

KR101108955B1 - A method and an apparatus for processing an audio signal

Info

Publication number: KR101108955B1
Application number: KR1020090090705A
Authority: KR
Inventors: 이현국; 김동수; 윤성용; 방희석; 임재현
Original assignee: 엘지전자 주식회사
Priority date: 2008-09-25
Filing date: 2009-09-24
Publication date: 2012-02-06
Also published as: KR20100035128A

Abstract

The present invention provides, by an audio processing apparatus, type information indicating a specific band extension scheme for a current frame of an audio signal among a plurality of band extension schemes including a first band extension scheme and a second band extension scheme, and a low frequency band. Receiving spectral data of; When the type information indicates the first band extension scheme with respect to the current frame, generating spectral data of the high frequency band of the current frame by using the spectral data of the low frequency band by performing the first band extension scheme. step; And when the type information indicates the second band extension scheme with respect to the current frame, performs spectral data of the high frequency band of the current frame by using the spectral data of the low frequency band by performing a second band extension scheme. Generating a first data area of the spectral data in the low frequency band, and the second band extension method in the second data area of the spectral data in the low frequency band. Based on

Audio, voice, band extension

Description

Audio signal processing method and apparatus {A METHOD AND AN APPARATUS FOR PROCESSING AN AUDIO SIGNAL}

본 발명은 오디오 신호를 인코딩 하거나 디코딩 할 수 있는 신호 처리 방법 및 장치에 관한 것이다. The present invention relates to a signal processing method and apparatus capable of encoding or decoding an audio signal.

일반적으로, 오디오 신호는 하나의 프레임 내에서 저주파 대역의 신호와 고주파 대역의 신호간의 유사성(correlation)이 있는 데, 이러한 유사성을 원리로 하여, 저주파 대역의 스펙트럴 데이터를 이용하여 고주파 대역의 스펙트럴 데이터를 인코딩하는 대역 확장 기술을 이용하여 오디오 신호를 압축한다.In general, an audio signal has a correlation between a signal of a low frequency band and a signal of a high frequency band within a frame. Based on this similarity, the spectral data of a high frequency band is used by using spectral data of a low frequency band. The audio signal is compressed using a band extension technique for encoding data.

종래에는 저주파 대역의 신호와 고주파 대역의 신호간의 유사성이 낮을 경우, 대역 확장 방식을 적용하여 오디오 신호를 압축하면, 오디오 신호의 음질이 나빠지는 문제점이 있다.Conventionally, when the similarity between the signal of the low frequency band and the signal of the high frequency band is low, if the audio signal is compressed by applying the band extension method, there is a problem that the sound quality of the audio signal is deteriorated.

특히, 치찰음(sibilant) 등과 같은 경우, 상기 연관성(correlation)이 높지 않기 때문에, 오디오 신호의 대역 확장 방식은 부적절한 문제점이 있다.In particular, in case of sibilant or the like, since the correlation is not high, the band extension method of the audio signal has an inappropriate problem.

한편, 대역 확장 방식에는 여러 타입이 있을 수 있는데, 시간에 따라서 오디오 신호에 적용되는 대역 확장 방식의 타입이 다를 수 있다. 이때, 다른 타입이 변하는 구간에서 순간적으로 음질이 나빠질 수 있다.Meanwhile, there may be various types of band extension schemes, and the type of band extension schemes applied to the audio signal may vary according to time. In this case, sound quality may deteriorate instantaneously in a section in which other types change.

본 발명은 상기와 같은 문제점을 해결하기 위해 창안된 것으로서, 오디오 신호의 특성에 따라서 대역 확장 방식을 선택적으로 적용할 수 있는 오디오 신호 처리 방법 및 장치를 제공하는 데 있다. The present invention has been made to solve the above problems, and to provide an audio signal processing method and apparatus that can selectively apply a band extension method according to the characteristics of the audio signal.

본 발명의 또 다른 목적은, 프레임 별 오디오 신호의 특성에 따라서, 대역 확장 방식 대신, 적절한 방식을 적응적으로 적용할 수 있는 오디오 신호 처리 방법 및 장치를 제공하는 데 있다.It is still another object of the present invention to provide an audio signal processing method and apparatus which can adaptively apply an appropriate method instead of a band extension method according to the characteristics of an audio signal for each frame.

본 발명의 또 다른 목적은, 오디오 신호 특성을 분석한 결과, 오디오 신호의 특성이 치찰음에 가까운 경우, 대역 확장 방식의 적용을 회피함으로써, 음질을 유지할 수 있는 오디오 신호 처리 방법 및 장치를 제공하는 데 있다.It is still another object of the present invention to provide an audio signal processing method and apparatus which can maintain sound quality by avoiding application of a band extension method when the characteristics of an audio signal are close to hissing sound after analyzing the characteristics of the audio signal. have.

본 발명의 또 다른 목적은, 오디오 신호의 특성에 따라 여러 타입의 대역 확장 방식을 시간별로 적용하기 위한 오디오 신호 처리 방법 및 장치를 제공하는 데 있다.It is still another object of the present invention to provide an audio signal processing method and apparatus for applying various types of band extension schemes according to characteristics of an audio signal.

본 발명의 또 다른 목적은, 여러 타입의 대역 확장 방식을 적용하는 데 있어서, 대역 확장 방식의 타입이 변하는 구간에서 아티팩트(artifact)를 감소시키기 위한 오디오 신호 처리 방법 및 장치를 제공하는 데 있다.Another object of the present invention is to provide an audio signal processing method and apparatus for reducing artifacts in a section in which a type of band extension scheme is changed in applying various types of band extension schemes.

본 발명은 다음과 같은 효과와 이점을 제공한다.The present invention provides the following effects and advantages.

첫째, 프레임 별 신호의 특성에 따라서, 프레임 별로 대역 확장 방식을 선별적으로 적용하기 때문에, 비트 수를 크게 증가시키지 않으면서 음질을 향상시킬 수 있는 효과가 있다.First, since the band extension scheme is selectively applied for each frame according to the characteristics of the signals for each frame, the sound quality can be improved without significantly increasing the number of bits.

둘째, 치찰음 등의 고주파 대역의 에너지가 높은 음이 포함되어 있다고 판단되는 프레임에 대해서, 대역 확장 방식 대신에, 음성 신호(speech 신호)에 적합한 LPC(linear Predictive coding) 방식이나, HBE(High band extension) 방식, 본원에서 새롭게 제안한 방식(PSDD)을 이용함으로써, 음질의 손실을 최소화할 수 있다.Second, for frames deemed to contain high-energy sounds such as sibilant sounds, a linear predictive coding (LPC) method suitable for a speech signal or a high band extension instead of a band extension method is used. ), The loss of the sound quality can be minimized by using the newly proposed method (PSDD).

셋째, 오디오 신호의 특성에 따라 여러 타입의 대역 확장 방식을 시간별로 적용할 수 있고, 나아가 여러 타입의 대역 확장 방식을 적용하는 데 있어서, 대역 확장 방식의 타입이 변하는 구간에서 아티팩트(artifact)를 감소시킬 수 있기 때문에, 대역 확장 방식을 적용하면서도 음질이 향상될 수 있다.Third, according to the characteristics of the audio signal, it is possible to apply various types of band extension schemes by time, and furthermore, in applying various types of band extension schemes, artifacts are reduced in a section in which the type of band extension schemes is changed. Because of this, the sound quality can be improved while applying the band extension method.

이하 첨부된 도면을 참조로 본 발명의 바람직한 실시예를 상세히 설명하기로 한다.　 이에 앞서, 본 명세서 및 청구범위에 사용된 용어나 단어는 통상적이거나 사전적인 의미로 한정해서 해석되어서는 아니되며, 발명자는 그 자신의 발명을 가장 최선의 방법으로 설명하기 위해 용어의 개념을 적절하게 정의할 수 있다는 원칙에 입각하여 본 발명의 기술적 사상에 부합하는 의미와 개념으로 해석되어야만 한다. 따라서, 본 명세서에 기재된 실시예와 도면에 도시된 구성은 본 발명의 가장 바람직한 일 실시예에 불과할 뿐이고 본 발명의 기술적 사상을 모두 대변하는 것은 아니므로, 본 출원시점에 있어서 이들을 대체할 수 있는 다양한 균등물과 변형예들이 있을 수 있음을 이해하여야 한다. Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings. Prior to this, terms or words used in the specification and claims should not be construed as having a conventional or dictionary meaning, and the inventors should properly explain the concept of terms in order to best explain their own invention. Based on the principle that can be defined, it should be interpreted as meaning and concept corresponding to the technical idea of the present invention. Therefore, the embodiments described in the specification and the drawings shown in the drawings are only the most preferred embodiment of the present invention and do not represent all of the technical idea of the present invention, various modifications that can be replaced at the time of the present application It should be understood that there may be equivalents and variations.

본 발명에서 다음 용어는 다음과 같은 기준으로 해석될 수 있고, 기재되지 않은 용어라도 하기 취지에 따라 해석될 수 있다. 코딩은 경우에 따라 인코딩 또는 디코딩으로 해석될 수 있고, 정보(information)는 값(values), 파라미터(parameter), 계수(coefficients), 성분(elements) 등을 모두 아우르는 용어로서, 경우에 따라 의미는 달리 해석될 수 있는 바, 그러나 본 발명은 이에 한정되지 아니한다.In the present invention, the following terms may be interpreted based on the following criteria, and terms not described may be interpreted according to the following meanings. Coding can be interpreted as encoding or decoding in some cases, and information is a term that encompasses values, parameters, coefficients, elements, and so on. It may be interpreted otherwise, but the present invention is not limited thereto.

여기서 오디오 신호(audio signal)란, 광의로는, 비디오 신호와 구분되는 개념으로서, 재생 시 청각으로 식별할 수 있는 신호를 지칭하고, 협의로는, 음성(speech) 신호와 구분되는 개념으로서, 음성 특성이 없거나 적은 신호를 의미한다. 본 발명에서의 오디오 신호는 광의로 해석되어야 하며 음성 신호와 구분되어 사용될 때 협의의 오디오 신호로 이해될 수 있다.Here, the audio signal is a concept that is broadly distinguished from the video signal, and refers to a signal that can be visually identified during reproduction. The narrow signal is a concept that is distinguished from a speech signal. Means a signal with little or no characteristics. The audio signal in the present invention should be interpreted broadly and can be understood as a narrow audio signal when used separately from a voice signal.

우선, 도 1은 본 발명의 일 실시예에 따른 오디오 신호 처리 장치의 구성을 보여주는 도면이다. 도 1을 참조하면, 오디오 신호 처리 장치의 인코더 측(100)은 치찰음 검출 유닛(sibilant detecting unit)(110), 제1 인코딩 유닛(first encoding unit)(122), 제2 인코딩 유닛(second encoding unit)(124), 멀티플렉싱 유닛(multiplexing unit)(130)을 포함할 수 있다. 오디오 신호 처리 장치의 디코더 측(200)는 디멀티플렉서(de-multiplexer)(210), 제1 디코딩 유닛(first decoding unit)(222), 제2 디코딩 유닛(second decoding unit)(224)를 포함할 수 있다.First, FIG. 1 is a diagram illustrating a configuration of an audio signal processing apparatus according to an embodiment of the present invention. Referring to FIG. 1, the encoder side 100 of the audio signal processing apparatus includes a sibilant detecting unit 110, a first encoding unit 122, and a second encoding unit. 124, a multiplexing unit 130. The decoder side 200 of the audio signal processing apparatus may include a de-multiplexer 210, a first decoding unit 222, and a second decoding unit 224. have.

오디오 신호 처리 장치의 인코더 측(100)은 오디오 신호의 특성에 따라서, 밴드 확장 방식(band extension scheme)을 적용할 지 여부를 결정하고 그 결정에 따라 코딩 방식 정보(coding scheme information)를 생성하고, 디코더 측(200)은 코딩 방식 정보(coding scheme information)에 따라서 프레임 별로 밴드 확장 방식(band extension scheme)을 적용할 지 여부를 선택한다.The encoder side 100 of the audio signal processing apparatus determines whether to apply a band extension scheme according to the characteristics of the audio signal, and generates coding scheme information according to the determination, The decoder side 200 selects whether to apply a band extension scheme for each frame according to coding scheme information.

치찰음 검출 유닛(sibilant detecting unit)(110)은 오디오 신호의 현재 프레임에 대하여 치찰음 정도(sibilant proportion)을 검출하고, 치찰음 정도를 근거로 하여, 현재 프레임에 밴드 확장 방식 이 적용될지 여부를 지시하는 코딩 방식 정보 을 생성한다. 여기서 치찰음 정도(sibilant proportion)란, 현재 프레임에 치찰음(sibilant)인지 여부에 대한 정도를 나타내는 것이다. 치찰음(sibilant)이란, 일반적으로 닿소리를 발음할 때 공기가 좁은 틈을 이빨 쪽으로 통과되면서 발생하는 마찰을 이용해서 내는 소리로서, 한국어 중 ㅅ, ㅆ이 그 예이고, 영어 중 s가 그 예이다. 한편 파찰음 (affricate)는 닿소리를 발음할 때 폐쇄를 형성해서 공기 의 흐름을 막았다가 완전히 파열하지 않고 조금씩 개방해서 좁은 틈 사이로 공기를 통과시키면서 내는 소리로서, 한국어 중 'ㅈ', 'ㅉ', 'ㅊ' 등이 그 예이다. 본원에서의 치찰음(sibilant)란, 특정 음소에 국한되지 않고, 최대 에너지를 갖는 피크 밴드(peak band)가 다른 음소들 보다 고주파 대역에 속하는 음소를 일컫는다. 치찰음 검출 유닛(110)의 세부적인 구성은 추후 도 2와 함께 설명하고자 한다.The sibilant detecting unit 110 detects a sibilant proportion with respect to the current frame of the audio signal, and based on the sibilant degree, coding indicating whether a band extension scheme is applied to the current frame. Create method information. Here, the sibilant proportion indicates the degree of sibilant in the current frame. Sibilant is a sound produced by the friction generated when air is passed through a narrow gap toward the teeth in general when pronunciation of a touch sound. For example, s in Korean and s in English are examples. On the other hand, the affricate is a sound produced by closing the airflow to prevent the flow of air and then opening it little by little, without passing it through, and passing air through narrow gaps. E.g. The sibilant herein refers to a phone that is not limited to a specific phone and that a peak band having the maximum energy belongs to a higher frequency band than other phones. The detailed configuration of the sibilant sound detection unit 110 will be described later with reference to FIG. 2.

치찰음 정도가 검출된 결과, 치찰음 정도가 낮다 판단되는 프레임은, 제1 인코딩 유닛(first encoding unit)(122)에 의해 오디오 신호가 인코딩되고, 치찰음 정도가 높다고 판단되는 프레임은, 제2 인코딩 유닛(second encoding unit)(124)에 의해 오디오 신호가 인코딩된다. As a result of the detection of the hissing sound level, the audio signal is encoded by the first encoding unit 122 in the frame determined to be low in the hissing sound, and the frame determined to be the high hissing sound in the second encoding unit ( The audio signal is encoded by a second encoding unit (124).

제1 인코딩 유닛(first encoding unit)(122)이란, 주파수 도메인 기반의 밴드 확장 방식에서 따라서, 오디오 신호를 인코딩하는 구성요소이다. 여기서 주파수 도메인 기반의 밴드 확장 방식(band extension scheme)이란, 광 대역(wide band)의 스펙트럴 데이터 중 고주파 대역(higher band)에 해당하는 스펙트럴 데이터를 협대역(narrow band)의 전부 또는 일부(all or portion)를 이용하여 인코딩하는 것이다. 이 방식은 고주파 대역과 저주파 대역간의 유사성을 원리로 비트 수를 절감할 수 있다. 이때, 상기 밴드 확장 방식은 주파수 도메인 기반이고, 상기 스펙트럴 데이터는 QMF (Quadrature mirror filter) 필터 뱅크(filterbank)등에 의해 주파수 변환된 데이터이다. 디코더에서는, 밴드 확장 정보를 이용하여 협대역 스펙트럴 데이터로부터 고주파 대역(higher band)의 스펙트럴 데이터를 복원한다. 여기서 고주파 대역(higher band)는 경계 주파수(boundary frequency)보다 같거나 높은 밴드이 고, 저주파 대역(narrow band)는 경계 주파수(boundary frequency) 보다 같거나 낮은 밴드로서, 연속된 밴드들로 구성된다. 이러한 주파수 도메인 기반의 대역 확장 방식은, SBR(Spectral Band Replication) 또는 eSBR(enhanced Spectral Band Replication) 표준에 따를 수 있지만, 본 발명은 이에 한정되지 아니한다.The first encoding unit 122 is a component for encoding an audio signal according to a frequency domain based band extension scheme. In this case, the band extension scheme based on the frequency domain means that the spectral data corresponding to the higher band of the spectral data of the wide band includes all or part of the narrow band ( encoding using all or portion). This method can reduce the number of bits based on the similarity between the high frequency band and the low frequency band. In this case, the band extension scheme is based on the frequency domain, and the spectral data is frequency-converted by a quadrature mirror filter (QMF) filter bank. The decoder recovers spectral data of a higher band from narrowband spectral data using the band extension information. Here, the high band is a band equal to or higher than the boundary frequency, and the narrow band is a band equal to or lower than the boundary frequency and is composed of consecutive bands. Such a frequency domain based band extension scheme may be based on Spectral Band Replication (SBR) or Enhanced Spectral Band Replication (eSBR) standards, but the present invention is not limited thereto.

한편, 이러한 주파수 도메인 기반의 밴드 확장 방식(band extension scheme)은 고주파 대역과 저주파 대역간의 유사성을 기반으로 하는데, 이 유사성은 오디오 신호의 특성에 따라서 강할 수도 있고 약할 수도 있다. 특히 앞서 설명한 치찰음(sibilant)의 경우, 상기 유사성이 약하기 때문에, 치찰음(sibilant)에 해당하는 프레임에 밴드 확장 방식(band extension scheme)을 적용하는 경우, 음질이 저하될 수 있다. 추후 도 3 및 도 4와 함께 치찰음(sibilant)의 에너지 특성과 주파수 도메인 기반의 밴드 확장 방식의 적용간의 관계를 상세히 설명하고자 한다. 제1 인코딩 유닛(122)은 도 8와 함께 추후에 설명될 오디오 신호 인코더까지 포함하는 개념일 수 있으나, 본 발명은 이에 한정되지 아니한다. Meanwhile, the frequency domain based band extension scheme is based on the similarity between the high frequency band and the low frequency band, which may be strong or weak depending on the characteristics of the audio signal. In particular, in the case of the sibilant described above, since the similarity is weak, when the band extension scheme is applied to a frame corresponding to the sibilant, sound quality may be degraded. The relationship between the energy characteristics of sibilant and the application of the band extension scheme based on the frequency domain will be described in detail later with reference to FIGS. 3 and 4. The first encoding unit 122 may be a concept including an audio signal encoder to be described later together with FIG. 8, but the present invention is not limited thereto.

제2 인코딩 유닛(124)은 상기 주파수 도메인 기반의 밴드 확장 방식을 사용하지 않고, 오디오 신호를 인코딩하는 유닛이다. 여기서 모든 유형의 밴드 확장 방식이 사용되지 않는 것이 아니라, 제1 인코딩 유닛(first encoding unit)(122)에서 적용되는 특정한 주파수 도메인 기반의 밴드 확장 방식이 사용되지 않는 것이다. 제2 인코딩 유닛(124)은 첫째, LPC(linear predictive coding) 방식을 적용하는 음성 신호 인코더에 해당하거나, 둘째, 음성 인코더를 뿐만 아니라 시간 도메인 기반의 밴드 확장 방식에 따른 모듈을 더 포함하거나, 셋째, 본원에서 새로 제시한 PSDD(Partial Spectral Data Duplication) 방식에 따른 모듈을 더 포함할 수 있는데, 이에 대한 설명은 도 5 내지 도 8와 함께 후술하고자 한다. 한편, 상기 둘째의 시간 도메인 기반의 밴드 확장 방식에는 AMR-WB(Adaptive Multi Rate WideBand) 표준에 적용된 HBE(High Band Extension) 방식을 따른 것일 수 있지만 본 발명은 이에 한정되지 아니한다.The second encoding unit 124 is a unit for encoding an audio signal without using the frequency domain based band extension scheme. Here, not all types of band extension schemes are used, but a specific frequency domain based band extension scheme applied in the first encoding unit 122 is not used. The second encoding unit 124 firstly corresponds to a speech signal encoder applying a linear predictive coding (LPC) scheme, or secondly, further includes a module according to a time domain based band extension scheme as well as the speech encoder. In addition, the present invention may further include a module according to the PSDD (Partial Spectral Data Duplication) scheme, which will be described later with reference to FIGS. 5 to 8. On the other hand, the second time domain based band extension scheme may be based on the High Band Extension (HBE) scheme applied to the Adaptive Multi Rate WideBand (AMR-WB) standard, but the present invention is not limited thereto.

멀티 플렉서(130)는 제1 인코딩 유닛(122)과 넌 밴드 확장 인코딩 유닛(124)에서 인코딩된 오디오 신호 및, 치찰음 검출 유닛(sibilant detecting unit)(110)에서 생성된 코딩 방식 정보를 멀티플렉싱하여 하나 이상의 비트스트림을 생성한다.The multiplexer 130 multiplexes the audio signal encoded by the first encoding unit 122 and the non-band extension encoding unit 124 and the coding scheme information generated by the sibilant detecting unit 110. Generate one or more bitstreams.

디코더 측의 디멀티플렉서(210)는 비트스트림으로부터 코딩 방식 정보 을 추출하고, 코딩 방식 정보 를 근거로, 현재 프레임의 오디오 신호를 제1 디코딩 유닛(first decoding unit)(222) 또는 제2 디코딩 유닛(second decoding unit)(224)으로 전달한다. 제1 디코딩 유닛(first decoding unit)(222)은 앞서 설명한 밴드 확장 방식(band extension scheme)에 따라서 오디오 신호를 디코딩하고, 제2 디코딩 유닛(second decoding unit)(224)는 앞서 설명한 LPC 방식 (및 HBE/PSDD 방식)에 따라서 오디오 신호를 디코딩한다.The demultiplexer 210 on the decoder side extracts the coding scheme information from the bitstream, and based on the coding scheme information, decodes the audio signal of the current frame in a first decoding unit 222 or a second decoding unit (second). the decoding unit 224). The first decoding unit 222 decodes the audio signal according to the band extension scheme described above, and the second decoding unit 224 uses the LPC scheme (and HBE / PSDD scheme) to decode the audio signal.

도 2는 도 1에서의 치찰음 검출 유닛(sibilant detecting unit)의 세부 구성을 보여주는 도면이고, 도 3의 치찰음 검출의 원리를 설명하기 위한 도면이고, 도 4는 비 치찰음(non-sibilant)의 경우와 치찰음(sibilant)의 경우의 에너지 스펙트럼의 일 예이다. 우선, 도 2를 참조하면, 치찰음 검출 유닛(110)은 변환 파 트(transforming part)(112), 에너지 추정 파트(energy estimating part)(114), 치찰음 결정 파트(sibilant deciding part)(116)을 포함한다.FIG. 2 is a diagram illustrating a detailed configuration of a sibilant detecting unit in FIG. 1, a view for explaining the principle of the sibilant detection of FIG. 3, and FIG. 4 is a non-sibilant case. It is an example of the energy spectrum in the case of sibilant. First, referring to FIG. 2, the sibilant detection unit 110 includes a transform part 112, an energy estimating part 114, and a sibilant deciding part 116. Include.

변환 파트(Transforming part)(112)는 시간 도메인의 오디오 신호를 주파수 변환을 수행하여 주파수 도메인의 신호로 변환한다. 이때 주파수 변환에는 FFT(fast Fourier transform) 또는 MDCT(Modified Discrete Cosine Transform) 등이 사용될 수 있지만, 본 발명은 이에 한정되지 아니한다.The transform part 112 converts the audio signal in the time domain into a signal in the frequency domain by performing frequency transformation. In this case, a fast Fourier transform (FFT) or a Modified Discrete Cosine Transform (MDCT) may be used for the frequency conversion, but the present invention is not limited thereto.

에너지 추정 파트(Energy Estimating Part)(114)는 주파수 도메인의 오디오 신호를 몇 개의 밴드 별로 묶어서, 현재 프레임에 대해서 밴드 별 에너지를 산출한다. 그런 다음, 전체 밴드 중에서 가장 큰 에너지(E_max)를 갖는 피크 밴드(B_max)가 무엇인지 판단한다. 치찰음 결정 파트(Sibilant deciding part)(116)는 가장 큰 에너지를 갖는 밴드(B_max)가 임계 밴드(threshold band)(B_th)보다 높은지 낮은지 여부를 판단하여, 현재 프레임의 치찰음 정도를 검출한다. 이는 유성음은 저주파에서 최대 에너지를 갖는 반면에 치찰음는 고주파에서 최대 에너지를 갖는 특성을 기반한 것이다. 여기서 임계 밴드(threshold band)(B_th)는 디폴트 값으로 미리 정해진 값일 수 있고, 입력된 오디오의 특성에 따라서 산출된 값일 수 있다.The energy estimating part 114 bundles the audio signal of the frequency domain into several bands and calculates band-specific energy for the current frame. Then, it is determined which peak band B _max has the largest energy E _max among the entire bands. The sibilant deciding part 116 determines whether the band B _max having the largest energy is higher or lower than the threshold band B _th , and detects the sibilant degree of the current frame. . This is based on the characteristic that voiced sound has maximum energy at low frequency while sibilant sound has maximum energy at high frequency. Here, the threshold band B _th may be a predetermined value as a default value or a value calculated according to the characteristics of the input audio.

도 3을 참조하면, 저주파 대역(narrow band) 및 고주파 대역(higher band)을 포함하는 광 대역(wide band)이 존재함을 알 수 있다. 가장 높은 에너지(E_max)를 갖는 피크 밴드(B_max)가 임계 밴드(threshold band)(B_th)보다 높을 수도 있고 낮을 수 도 있다. 한편, 도 4를 참조하면, 비 치찰음(non-sibilant)의 신호의 에너지 피크가 저주파 대역에 존재하고, 치찰음(sibilant) 신호의 에너지 피크(peak)는 상대적으로 고주파 대역에 존재함을 알 수 있다. 다시 도 3을 참조하면, (A)의 경우, 에너지 피크(peak)가 상대적으로 저주파에 존재하기 때문에, 비 치찰음(non-sibilant)으로 판단되고, (B)의 경우 에너지 피크(peak)가 상대적으로 고주파에 존재하기 때문에 치찰음(sibilant)라고 판단할 수 있다. Referring to FIG. 3, it can be seen that there is a wide band including a narrow band and a high band. The peak band B _max with the highest energy E _max may be higher or lower than the threshold band B _th . Meanwhile, referring to FIG. 4, it can be seen that an energy peak of a non-sibilant signal exists in a low frequency band, and an energy peak of a sibilant signal exists in a relatively high frequency band. . Referring back to FIG. 3, in case of (A), since the energy peak is relatively low, it is determined to be non-sibilant, and in case of (B), the energy peak is relative. Because it exists at a high frequency can be determined as sibilant (sibilant).

한편, 앞서 언급한 주파수 도메인 기반의 밴드 확장 방식 은 경계 주파수 보다 낮은 협 대역(narrow band)을 이용하여, 경계 주파수 보다 높은 고주파 대역(higher band)를 인코딩한다. 이 방식은 협대역(narrow band)의 스펙트럴 데이터와 고주파 대역 (higher band)의 스펙트럴 데이터간의 유사성을 기반으로 한다. 그러나 에너지 피크가 고주파에 존재하는 신호의 경우, 상기 유사성(correlation)이 상대적으로 떨어진다. 따라서, 상기 협대역(narrow band)의 스펙트럴 데이터로 고주파 대역(higher band)의 스펙트럴 데이터를 예측하는 주파수 도메인 기반의 밴드 확장 방식을 적용하는 것은 음질을 저하시킬 수 있다. 따라서, 치찰음로 판단되는 현재 프레임에 대해서는, 상기 주파수 도메인 기반의 밴드 확장 방식 대신 다른 방식을 적용하는 것이 바람직한 것이다.On the other hand, the aforementioned frequency domain based band extension scheme encodes a higher band higher than the boundary frequency by using a narrow band lower than the boundary frequency. This method is based on the similarity between narrow band spectral data and high band spectral data. However, for signals with energy peaks at high frequencies, the correlation is relatively low. Therefore, applying a frequency domain-based band extension method that predicts spectral data of a higher band as narrow band spectral data may degrade sound quality. Therefore, for the current frame determined to be hissing sound, it is preferable to apply another method instead of the frequency domain based band extension method.

다시 도 2를 참조하면, 치찰음 검출 파트(sibilant deciding part)(116)는 에너지 피크에 피크 밴드(B_max)가 임계 밴드(threshold band)(B_th)보다 낮은 경우 현재 프레임이 비 치찰음(non-sibilant)으로 판단하여, 오디오 신호를 제1 인코딩 유 닛 에서 주파수 도메인의 밴드 확장 방식에 따라 인코딩 되도록 하고, 반대의 경우, 치찰음(sibilant)으로 판단하여 오디오 신호를 제2 인코딩 유닛(second encoding unit)에서 대안적인 방식에 따라 인코딩 되도록 한다.Referring back to FIG. 2, the sibilant deciding part 116 is a non-sibilizing current frame when the peak band B _max at the energy peak is lower than the threshold band B _th . sibilant, so that the audio signal is encoded according to the band extension scheme of the frequency domain in the first encoding unit, and in the opposite case, the audio signal is determined as a sibilant (second encoding unit). To be encoded in an alternative way.

도 5는 도 1에서의 제2 인코딩 유닛(second encoding unit) 및 제2 디코딩 유닛(second decoding unit)의 세부 구성도의 예들이다. 도 5의 (A)를 참조하면, 제1 실시예에 따른 제2 인코딩 유닛(second encoding unit)(124a)은 LPC 인코딩 파트(LPC encoding part)(124a-1)를 포함하고, 제1 실시예에 따른 제2 디코딩 유닛(second decoding unit)(224a)은 LPC 디코딩 파트(LPC decoding part)(224a-1)를 포함한다. LPC인코딩 파트(LPC encoding part) 및 LPC 디코딩 파트(LPC decoding part)는 선형 예측 코딩(linear prediction coding)(LPC) 방식으로 전체 대역에 대한 오디오 신호를 인코딩하거나 디코딩하는 구성요소이다. LPC(linear prediction coding)이란 과거의 일정 개수의 샘플 값에 계수를 곱해서 이를 총 합한 값으로 현재의 샘플 값을 예측하는 것으로, 시간 도메인을 기반으로 음성 신호를 처리하기 위한 단구간 예측(short term prediction)(STP)의 대표적인 예에 해당된다. 이와 같이 LPC 방식으로 인코딩된 LPC 계수(미도시)를 LPC 인코딩 파트(124a-1)에서 생성하면, LPC 디코딩 파트(LPC decoding part)(224a-1)는 LPC 계수를 이용하여 오디오 신호를 복원한다.FIG. 5 is an example of a detailed configuration diagram of a second encoding unit and a second decoding unit in FIG. 1. Referring to FIG. 5A, the second encoding unit 124a according to the first embodiment includes an LPC encoding part 124a-1, and the first embodiment The second decoding unit 224a in accordance with the LPC decoding part 224a-1. The LPC encoding part and the LPC decoding part are components that encode or decode an audio signal for an entire band in a linear prediction coding (LPC) scheme. Linear prediction coding (LPC) is a method of predicting a current sample value by multiplying a predetermined number of past sample values by a coefficient and adding the sum, and short term prediction for processing a speech signal based on a time domain. (STP) is a representative example. When the LPC coefficients (not shown) encoded in the LPC method are generated in the LPC encoding part 124a-1, the LPC decoding part 224a-1 reconstructs the audio signal using the LPC coefficients. .

한편, 제2 실시예에 따른 제2 인코딩 유닛(second encoding unit)(124b)은 HBE 인코딩 파트(HBE encoding part)(124b-1) 및 LPC 인코딩 파트(LPC encoding part)(124b-2)를 포함하고, 제2 실시예에 따른 제2 디코딩 파트(second decoding unit)(224b)은 LPC 디코딩 파트(LPC decoding part)(224b-1) 및 HBE 디코딩 파트(HBE decoding part)(224b-2)를 포함한다. HBE 인코딩 파트(HBE encoding part)(124b-1) 및 HBE 디코딩 파트(HBE decoding part)(224b-2)는 HBE 방식에 따라서 오디오 신호를 인코딩/디코딩 하는 구성요소이다. HBE(High Bnad Extension) 방식이란, 시간 도메인 기반의 밴드 확장 방식의 일종이다. 인코더에서는 고주파 신호에 대해 HBE 정보 즉, 스펙트럴 인벨롭 모델링 정보 및 프레임 에너지 정보를 생성하고, 저주파 신호에 대해서 여기 신호(excitation signal)를 생성한다. 여기서 스펙트럴 인벨롭 모델링 정보는 시간 도메인 기반인LP(linear prediction) 분석을 통해 생성된 LP 계수를 ISP (Immittance Spectral Pair)로 변환된 것에 해당할 수 있다. 상기 프레임 에너지 정보는 64 sub-frame 마다 원래의 에너지와 합성된 에너지를 비교하여 결정된 정보에 해당할 수 있다. 디코더에서는 상기 스펙트럴 인벨롭 모델링 정보와 프레임 에너지 정보를 이용하여 저주파 신호의 여기 신호를 쉐이핑(shaping)하여 고주파 신호를 생성한다. 이러한 HBE 방식은, 시간 도메인을 기반으로 한다는 점에서 앞서 설명한 주파수 도메인 기반의 밴드 확장 방식과 구별된다. 치찰음(Sibilant)은 시간축 파형으로 보면 매우 복잡하고 랜덤(random)한 노이즈 라이크(noise-like)한 신호인데, 이를 주파수 도메인 기반으로 밴드 확장 방식(band extension)을 할 경우 매우 부정확할 수 있는 반면에, HBE는 시간 도메인을 기반으로 하기 때문에 치찰음(sibilant)을 적절히 처리할 수 있다. 한편 상기 HBE 방식이 고주파 여기(excitation) 신호의 버즈니스(buzzness)를 줄이기 위한 후처리를 더 포함할 경우, 치찰음 프레임에 대해서 더욱 성능이 높아질 수 있다.Meanwhile, the second encoding unit 124b according to the second embodiment includes an HBE encoding part 124b-1 and an LPC encoding part 124b-2. In addition, the second decoding unit 224b according to the second embodiment includes an LPC decoding part 224b-1 and an HBE decoding part 224b-2. do. The HBE encoding part 124b-1 and the HBE decoding part 224b-2 are components for encoding / decoding an audio signal according to the HBE scheme. The HBE (High Bnad Extension) method is a kind of time domain based band extension method. The encoder generates HBE information, that is, spectral envelope modeling information and frame energy information, for the high frequency signal, and generates an excitation signal for the low frequency signal. In this case, the spectral envelope modeling information may correspond to the LP coefficient generated through linear prediction (LP) analysis, which is time domain-based, is converted into an implicit spectral pair (ISP). The frame energy information may correspond to information determined by comparing the original energy and the synthesized energy every 64 sub-frames. The decoder generates a high frequency signal by shaping an excitation signal of a low frequency signal using the spectral envelope modeling information and the frame energy information. This HBE scheme is distinguished from the frequency domain based band extension scheme described above in that it is based on the time domain. Sibilant is a very complex and random noise-like signal from a time-base waveform that can be very inaccurate when band-extended based on the frequency domain. Since HBE is based on the time domain, it can handle sibilants appropriately. Meanwhile, when the HBE method further includes post-processing for reducing buzzness of the high frequency excitation signal, performance may be further improved for the sibilant frame.

한편, LPC 인코딩 파트(LPC encoding part)(124b-2) 및 LPC 디코딩 파트(LPC decoding part)(224b-1)는 제1 실시예에서의 동일한 명칭의 구성요소(124a-1, 224a-1)와 동일한 기능을 수행한다. 단, 제1 실시예에서는 현재 프레임의 전체 대역에 대해서 선형 예측 인코딩/디코딩을 수행하는 데 비해, 제2 실시예에서는 전체 대역이 아니라 HBE가 수행된 이후의 협대역(narrow band)(또는 저주파 대역(lower band))에 대해서 선형 예측 인코딩을 수행하고, 협대역(narrow band)에 대해서 선형 예측 디코딩을 수행한 이후, HBE 디코딩을 수행한다.On the other hand, the LPC encoding part 124b-2 and the LPC decoding part 224b-1 are components 124a-1 and 224a-1 of the same name in the first embodiment. Performs the same function as However, in the first embodiment, the linear prediction encoding / decoding is performed for the entire band of the current frame, whereas in the second embodiment, the narrow band (or low frequency band) after the HBE is performed instead of the entire band. (lower band), linear prediction encoding is performed, and linear prediction decoding is performed on a narrow band, and then HBE decoding is performed.

한편 제3 실시예에 따른 제2 인코딩 유닛(second encoding unit)(124c)는 PSDD 인코딩 파트(PSDD encoding part)(124c-1) 및 LPC 인코딩 파트(LPC encoding part)(124c-2)를 포함하고, 제3 실시예에 따른 제2 인코딩 유닛(second encoding unit)(224c)은 LPC 디코딩 파트(LPC decoding part)(224c-1) 및 PSDD 디코딩 파트(PSDD decoding part)(224c-2)를 포함한다. 도 1의 제2 인코딩 유닛(first encoding unit)(122)에서 수행되는 주파수 도메인 기반의 밴드 확장 방식(band extension scheme)은 저주파 대역으로 구성된 협대역(narrow band)의 일부 또는 전체를 이용하는 것이다. 반면에, PSDD(Partial Spectral Data Duplication)는 저주파 및 고주파 대역에 이산적으로 분포된 카피 밴드(copy band)를 이용하여, 그 카피 밴드(copy band)와 인접한 타깃 밴드(target band)를 인코딩하는 것이다. 그 구체적인 내용은 추후 도 6 내지 도 9와 함께 후술하고자 한다.Meanwhile, the second encoding unit 124c according to the third embodiment includes a PSDDD encoding part 124c-1 and an LPC encoding part 124c-2. The second encoding unit 224c according to the third embodiment includes an LPC decoding part 224c-1 and a PSDD decoding part 224c-2. . A frequency domain based band extension scheme performed in the first encoding unit 122 of FIG. 1 uses a part or all of a narrow band composed of a low frequency band. On the other hand, PSDD (Partial Spectral Data Duplication) is to encode a target band adjacent to the copy band by using a copy band distributed discretely in the low frequency and high frequency bands. . Details thereof will be described later with reference to FIGS. 6 to 9.

한편, 앞서 도 5의 (A) 내지 (C)와 함께 설명된 LPC 인코딩 파트(LPC encoding part) 및 LPC 디코딩 파트(LPC decoding part)는 도 9 내지 도 12와 함께 설명될 음성 신호 인코더(speech signal encoder)440) 및 음성 신호 디코더(speech signal decoder)(630)에 각각 속할 수 있다.Meanwhile, the LPC encoding part and the LPC decoding part described above with reference to FIGS. 5A to 5C are speech signal encoders to be described with reference to FIGS. 9 to 12. encoder 440) and speech signal decoder 630, respectively.

도 6은 제2 인코딩/디코딩(second encoding/decoding) 방식의 일 예인 PSDD(Partial Spectral Data Duplication) 방식의 제1 실시예 내지 제2 실시예를 설명하기 위한 도면이다. FIG. 6 is a diagram for describing first to second embodiments of a partial spectral data duplication (PSDD) scheme, which is an example of a second encoding / decoding scheme.

우선 도 6의 (A)를 참조하면, 저주파부터 고주파까지 즉, 0번째부터 n-1번째까지 총 n개의 스케일 팩터 밴드(sfb₀~sfb_n _-1)가 존재하고, 각 스케일 팩터 밴드(sfb₀, ,sfb_n-1)에 대응하는 스펙트럴 데이터가 존재한다. 특정 밴드에 속하는 스펙트럴 데이터(sd_i)는 다수의 스펙트럴 데이터의 집합(sd_i _{_0}부터 sd_i _{_m-1})을 의미할 수 있는데, 스펙트럴 데이터의 개수(m_i)는 스펙트럴 데이터 단위, 밴드 단위 또는 그 이상의 단위에 대응하여 생성할 수 있다.First, referring to FIG. 6A, there are n scale factor bands sfb ₀ to sfb _n ₋₁ from low to high frequencies, that is, from 0th to n−1th, and each scale factor band sfb. Spectral data corresponding to ₀ ,, sfb _n-1 ). The spectral data (sd _i ) belonging to a specific band may mean a set of spectral data (sd _i _{_0} to sd _i _{_m-1} ), and the number of spectral data (m _i ) is a spectral data unit. Can be generated corresponding to a band unit or more units.

여기서 디코더에 데이터가 전송되는 밴드는 전체 대역(sfb₀, ,sfb_n _-1) 중에서 저주파 대역(sfb₀, ,sfb_s _-1) 및 카피밴드(copy band)(cb)(sfb_s, sfb_n _-4, sfb_n _-2)들 이다. 카피밴드는 시작 밴드(start band)(sb) 또는 시작 주파수(start frequency)부터 시작하는 밴드들로서 타깃 밴드(target band)(tb) (sfb_s ₊₁, sfb_n _-3, sfb_n _-1)의 예측에 사용되는 밴드이고, 타깃 밴드(target band)는 카피 밴드(copy band)를 이용하여 예측되는 밴드로서 스펙트럴 데이터가 디코더에 전송되지 않는다. In this case, the band through which data is transmitted to the decoder includes a low frequency band (sfb ₀ ,, sfb _s _-1 ) and a copy band (cb) (sfb _s , sfb _n ) among the entire bands (sfb ₀ ,, sfb _n _-1 ). _-4 , sfb _n _-2 ). The copy bands are bands starting from the start band (sb) or the start frequency (start frequency), and of the target band (tb) (sfb _s ₊₁ , sfb _n _-3 , sfb _n _-1 ) A band used for prediction, and a target band is a band predicted using a copy band, and no spectral data is transmitted to the decoder.

도 6의 (A)에 도시된 바와 같이, 카피 밴드(copy band)는 저주파 대역에 집 중되어 있지 않고 고주파 대역에도 존재하며 타깃 밴드(target band)와 인접하여 있기 때문에 타깃 밴드(target band)와의 유사성을 유지할 수 있다. 한편, 카피 밴드의 스펙트럴 데이터와 타깃 밴드의 스펙트럴 데이터간의 차이인 게인 정보(g)가 생성될 수 있다. 카피 밴드(copy band)를 이용하여 타깃 밴드(target band)를 예측하더라도, 밴드 확장 방식에 비해 bitrate가 증가되지 않으면서, 음질이 저하되는 것을 최소화시킬 수 있다.As shown in FIG. 6A, the copy band is not concentrated in the low frequency band, but also exists in the high frequency band and is adjacent to the target band. Similarity can be maintained. Meanwhile, gain information g, which is a difference between the spectral data of the copy band and the spectral data of the target band, may be generated. Even when a target band is predicted using a copy band, it is possible to minimize the degradation of sound quality without increasing the bitrate as compared with the band extension method.

도 6의 (A)는 카피 밴드의 대역폭 및 타깃 밴드의 대역폭이 동일한 예이고, 도 6의 (B)는 카피 밴드의 대역폭 및 타깃 밴드의 대역폭이 상이한 예이다. 도 6의 (B)를 참조하면, 타깃 밴드의 대역폭은 카피 밴드의 대역폭의 두 배 이상 (tb, tb'이며, 여기서, 타깃 밴드(target band)를 이루는 연속된 대역들 중 왼쪽 대역(tb) 및 오른쪽 대역(tb')에 각각 서로 다른 게인(g_s, g_s+1)을 적용할 수 있다.6A illustrates an example in which the bandwidth of the copy band and the bandwidth of the target band are the same, and FIG. 6B illustrates an example in which the bandwidth of the copy band and the bandwidth of the target band are different. Referring to FIG. 6B, the bandwidth of the target band is more than twice the bandwidth of the copy band (tb, tb '), where the left band (tb) of consecutive bands forming the target band is included. Different gains g _s and g _{s + 1} may be applied to the right band tb '.

도 7 및 도 8는 PSDD 방식에서 프레임의 길이가 서로 다른 경우의 방식을 설명하기 위한 도면이다. 도 7은 타깃 밴드의 스펙트럴 데이터의 개수(N_t)가 카피 밴드의 스펙트럴 데이터의 개수(N_c)가 보다 큰 경우, 도 8은 작은 경우를 설명하기 위한 도면이다. 7 and 8 are diagrams for describing a method in which frames have different lengths in the PSDD scheme. 7 is a case where the number (N _t) of the spectral data of the target band, the large copy number of band spectral data of (N _c) than, Figure 8 is a view for explaining a small case.

우선 도 7의 (A)를 살펴보면, 타깃 밴드(sfb_i)의 스펙트럴 데이터의 개수(N_t)가 36개이고, 카피 밴드(sfb_s)의 스펙트럴 데이터의 개수(N_c)가 24임을 알 수 있다. 데이터의 개수가 클수록 밴드의 수평 길이가 길게 표시되어 있다. 타깃 밴드의 데 이터 개수가 더 크기 때문에 카피 밴드의 데이터를 두번 이상 이용할 수 있다. 예를 들어, 도 7의 (B1)에 도시된 바와 같이, 우선 카피 밴드의 24개의 데이터를 타깃 밴드의 저주파부터 채워넣고, 도 7의 (B2)에 도시된 바와 같이, 카피 밴드의 앞 부분 12개 또는 뒷 부분 12개를 타깃 밴드의 나머지 부분에 채워넣을 수 있다. 물론 여기서도 전송된 게인 정보를 적용할 수 있다.First, referring to FIG. 7A, it can be seen that the number of spectral data N _t of the target band sfb _i is 36, and the number of spectral data N _c of the copy band sfb _s is 24. Can be. The larger the number of data, the longer the horizontal length of the band. Since the data of the target band is larger, the data of the copy band can be used more than once. For example, as shown in (B1) of FIG. 7, first, 24 data of the copy band are filled from the low frequency of the target band, and as shown in (B2) of FIG. 7, the front part 12 of the copy band is shown. The dog or the back 12 can be filled in the rest of the target band. Of course, it is also possible to apply the transmitted gain information.

한편 도 8의 (A)를 참조하면, 타깃 밴드(sfb_i)의 스펙트럴 데이터의 개수(N_t)가 24개이고, 카피 밴드(sfb_s)의 스펙트럴 데이터의 개수(N_c)가 36임을 알 수 있다. 타깃 밴드의 데이터 수가 더 작기 때문에, 카피 밴드의 데이터 중 일부만을 이용할 수 있다. 예를 들어 도 8의 (B)에 표시된 바와 같이 카피 밴드(sfb_s)의 앞 부분의 스펙트럴 데이터 24개만을 이용하거나, 도 8의 (C)에 표시된 바와 같이 카피 밴드(sfb_s)의 뒷 영역의 스펙트럴 데이터 24개만 이용하여, 타깃 밴드(sfb_i)의 스펙트럴 데이터를 생성할 수 있다.Meanwhile, referring to FIG. 8A, the number of spectral data N _t of the target band sfb _i is 24, and the number of spectral data N _c of the copy band sfb _s is 36. Able to know. Since the number of data in the target band is smaller, only some of the data in the copy band can be used. For example, as shown in FIG. 8B, only 24 spectral data of the front portion of the copy band sfb _s are used, or as shown in FIG. 8C, the back of the copy band sfb _s . Using only 24 spectral data in the region, the spectral data of the target band sfb _i can be generated.

도 9은 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 인코딩 장치의 제1 예이고, 도 10은 제2 예이다. 제1 예는, 도 5의 (A)와 함께 설명된 제2 인코딩 유닛(second encoding unit)의 제1 실시예(124a)가 적용된 인코딩 장치이고, 제2 예는, 도 5의 (B) 및 (C)와 함께 설명된 제2 인코딩 유닛(second encoding unit)의 제2 실시예(124b) 또는 제3 실시예(124c)가 적용된 인코딩 장치이다.9 is a first example of an audio signal encoding apparatus to which an audio signal processing apparatus according to an embodiment of the present invention is applied, and FIG. 10 is a second example. The first example is an encoding apparatus to which the first embodiment 124a of the second encoding unit described with reference to FIG. 5A is applied, and the second example includes FIGS. 5B and 5B. The encoding device to which the second embodiment 124b or the third embodiment 124c of the second encoding unit described with reference to (C) is applied.

우선 도 9을 참조하면, 도 9을 참조하면, 오디오 신호 인코딩 장치(300)는 복수채널 인코더(305), 치찰음 검출 유닛(310), 제1 인코딩 유닛(first encoding unit)(322), 및 오디오 신호 인코더(330), 음성 신호 인코더(340), 및 멀티 플렉서(350)을 포함할 수 있다. 여기서 치찰음 검출 유닛(sibilant detecting unit)(310) 및 제1 인코딩 유닛(first encoding unit)(320)은 도 1과 함께 설명된 동일 명칭의 구성요소(110, 122)의 기능과 동일할 수 있다.Referring first to FIG. 9, referring to FIG. 9, an audio signal encoding apparatus 300 includes a multi-channel encoder 305, a sibilant detection unit 310, a first encoding unit 322, and audio. It may include a signal encoder 330, a voice signal encoder 340, and a multiplexer 350. The sibilant detecting unit 310 and the first encoding unit 320 may be identical to the functions of the components 110 and 122 of the same name described with reference to FIG. 1.

복수채널 인코더(305)는 복수의 채널 신호(둘 이상의 채널 신호)(이하, 멀티채널 신호)를 입력받아서, 다운믹스를 수행함으로써 모노 또는 스테레오의 다운믹스 신호를 생성하고, 다운믹스 신호를 멀티채널 신호로 업믹스하기 위해 필요한 공간 정보를 생성한다. 여기서 공간 정보(spatial information)는, 채널 레벨 차이 정보, 채널간 상관정보, 채널 예측 계수, 및 다운믹스 게인 정보 등을 포함할 수 있다. 만약, 오디오 신호 인코딩 장치(300)가 모노 신호를 수신할 경우, 복수 채널 인코더(305)는 모노 신호에 대해서 다운믹스하지 않고 바이패스할 수도 있음은 물론이다.The multi-channel encoder 305 receives a plurality of channel signals (two or more channel signals) (hereinafter, referred to as a multi-channel signal) and performs downmixing to generate a mono or stereo downmix signal and multi-channel the downmix signal. Generates spatial information needed for upmixing to a signal. The spatial information may include channel level difference information, interchannel correlation information, channel prediction coefficients, downmix gain information, and the like. If the audio signal encoding apparatus 300 receives the mono signal, the multi-channel encoder 305 may bypass the mono signal without downmixing.

치찰음 검출 유닛(Sibilant detecting unit)(310)은 현재 프레임의 치찰음 정도을 검출하여 비-치찰음인 경우 오디오 신호를 제1 인코딩 유닛(first encoding unit)(322)에 전달하고, 치찰음인 경우, 오디오 신호를 제1 인코딩 유닛(first encoding unit)(322)을 바이패스하고 음성 신호 인코더(340)에 전달한다. 그리고 현재 프레임에 밴드 확장 방식 이 적용되는 지 여부를 지시하는 그 결과 밴드 확장 정보를 생성하여 멀티플렉서(350)에 전달한다.The sibilant detecting unit 310 detects the hissing degree of the current frame and transmits the audio signal to the first encoding unit 322 in the case of the non-sibilant sound, and in the case of the sibilant sound, the audio signal. The first encoding unit 322 is bypassed and passed to the speech signal encoder 340. As a result of indicating whether the band extension scheme is applied to the current frame, band extension information is generated and transmitted to the multiplexer 350.

제1 인코딩 유닛(first encoding unit)(322)는 광대역의 오디오 신호에 대 해, 앞서 도 1에서 설명한 바와 같은 주파수 도메인 기반의 밴드 확장 방식을 적용하여 협 대역의 스펙트럴 데이터 및 밴드 확장 정보를 생성한다.The first encoding unit 322 generates a narrow band of spectral data and band extension information by applying a frequency domain based band extension method as described above with reference to the wideband audio signal. do.

오디오 신호 인코더(audio signal encoder)(330)는 다운믹스 신호의 특정 프레임 또는 특정 세그먼트가 큰 오디오 특성을 갖는 경우, 오디오 코딩 방식(audio coding scheme)에 따라 다운믹스 신호를 인코딩한다. 여기서 오디오 코딩 방식은 AAC (Advanced Audio Coding) 표준 또는 HE-AAC (High Efficiency Advanced Audio Coding) 표준에 따른 것일 수 있으나, 본 발명은 이에 한정되지 아니한다. 한편, 오디오 신호 인코더(340)는, MDCT(Modified Discrete Transform) 인코더에 해당할 수 있다. The audio signal encoder 330 encodes the downmix signal according to an audio coding scheme when a specific frame or a specific segment of the downmix signal has a large audio characteristic. Here, the audio coding scheme may be based on an AAC standard or a high efficiency advanced audio coding (HE-AAC) standard, but the present invention is not limited thereto. The audio signal encoder 340 may correspond to a modified discrete transform (MDCT) encoder.

음성 신호 인코더(speech signal encoder)(340)는 다운믹스 신호의 특정 프레임 또는 특정 세그먼트가 큰 음성 특성을 갖는 경우, 음성 코딩 방식(speech coding scheme)에 따라서 다운믹스 신호를 인코딩한다. 여기서 음성 코딩 방식은 AMR-WB(Adaptive multi-rate Wide-Band) 표준에 따른 것일 수 있으나, 본 발명은 이에 한정되지 아니한다. 한편, 음성 신호 인코더(350)는 앞서 도 5와 함께 설명한 바와 같이 선형 예측 부호화(LPC: Linear Prediction Coding) 인코딩 파트(124a-1, 124b-1, 124c-1)를 더 포함할 수 있다. 하모닉 신호가 시간축 상에서 높은 중복성을 가지는 경우, 과거 신호로부터 현재 신호를 예측하는 선형 예측에 의해 모델링될 수 있는데, 이 경우 선형 예측 부호화 방식을 채택하면 부호화 효율을 높을 수 있다. 한편, 음성 신호 인코더(340)는 타임 도메인 인코더에 해당할 수 있다.A speech signal encoder 340 encodes the downmix signal according to a speech coding scheme when a specific frame or a segment of the downmix signal has a large speech characteristic. Here, the speech coding scheme may be based on an adaptive multi-rate wide-band (AMR-WB) standard, but the present invention is not limited thereto. Meanwhile, the voice signal encoder 350 may further include linear prediction coding (LPC) encoding parts 124a-1, 124b-1, and 124c-1 as described above with reference to FIG. 5. When the harmonic signal has high redundancy on the time axis, the harmonic signal may be modeled by linear prediction that predicts the current signal from the past signal. In this case, the linear prediction coding method may increase coding efficiency. Meanwhile, the voice signal encoder 340 may correspond to a time domain encoder.

멀티플렉서(350)는 공간정보, 코딩 스킴 정보, 대역확장 정보 및 스펙트럴 데이터 등을 다중화하여 오디오 신호 비트스트림을 생성한다.The multiplexer 350 generates an audio signal bitstream by multiplexing spatial information, coding scheme information, bandwidth extension information, and spectral data.

도 10은 앞서 언급한 바와 같이 도 5의 (B) 및 (C)와 함께 설명된 제2 인코딩 유닛(second encoding unit)의 제2 실시예(124b) 또는 제3 실시예(124c)가 적용된 인코딩 장치로서, 도 9과 함께 설명된 제1 예와 거의 동일하나, 전체 대역에 해당하는 오디오 신호가 음성 신호 인코더(440)에서 인코딩되기 이전에 HBE 인코딩 파트(HBE Encoding Part)(424)(또는 PSDD 인코딩 파트(PSDD encoding part))에서 HBE 방식 또는 PSDD 방식에 의해 인코딩된다는 점에서 차이가 있다. HBE 인코딩 파트(HBE encoding part)(424)는 앞서 도 5와 설명한 바와 같이 시간 도메인 기반의 밴드 확장 방식에 따라 오디오 신호를 인코딩하여 HBE 정보를 생성한다. HBE 인코딩 파트(HBE encoding part)(424)는 PSDD 인코딩 파트(PSDD encoding part)(424)로 대체될 수 있는데, PSDD 인코딩 파트(PSDD encoding part)(424)는 앞서 도 6 내지 도 8과 함께 설명한 바와 같이 카피 밴드(copy band)의 정보를 이용하여 타깃 밴드(target band)를 인코딩하고, 그 결과 타깃 밴드(target band)를 복원하기 위한 PSDD 정보를 생성한다. 음성 신호 인코더(440)는 HBE 방식 또는 PSDD 방식에 의해 인코딩된 결과를 음성 신호 방식으로 인코딩한다. 물론 음성 신호 인코더(440)는 제1 예와 마찬가지로 LPC 인코딩 파트를 더 포함할 수 있다.10 is an encoding to which the second embodiment 124b or the third embodiment 124c of the second encoding unit described with reference to FIGS. 5B and 5C is applied as mentioned above. An apparatus, which is substantially the same as the first example described with reference to FIG. 9, but before an audio signal corresponding to the entire band is encoded by the speech signal encoder 440, an HBE encoding part 424 (or PSDD). The difference is that the encoding part is encoded by the HBE method or the PSDD method. As described above with reference to FIG. 5, the HBE encoding part 424 encodes an audio signal according to a time domain based band extension scheme to generate HBE information. The HBE encoding part 424 may be replaced with a PSDD encoding part 424. The PSDD encoding part 424 is described with reference to FIGS. As described above, a target band is encoded using information of a copy band, and as a result, PSDD information for reconstructing a target band is generated. The speech signal encoder 440 encodes the result encoded by the HBE method or the PSDD method into the speech signal method. Of course, the voice signal encoder 440 may further include an LPC encoding part as in the first example.

도 11는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 디코딩 장치의 제1 예이고, 도 12은 제2 예이다. 제1 예는, 도 5의 (A)와 함께 설명된 제2 디코딩 유닛(second decoding unit)의 제1 실시예(224a)가 적용된 인코딩 장치이고, 제2 예는, 도 5의 (B) 및 (C)와 함께 설명된 제2 디코딩 유 닛(second decoding unit)의 제2 실시예(224b) 또는 제3 실시예(224c)가 적용된 인코딩 장치이다.11 is a first example of an audio signal decoding apparatus to which an audio signal processing apparatus according to an embodiment of the present invention is applied, and FIG. 12 is a second example. The first example is an encoding apparatus to which the first embodiment 224a of the second decoding unit described with reference to FIG. 5A is applied, and the second example includes FIGS. 5B and 5B. An encoding apparatus to which the second embodiment 224b or the third embodiment 224c of the second decoding unit described with reference to (C) is applied.

도 11를 참조하면, 오디오 신호 디코딩 장치(500)는 디멀티플렉서(510), 오디오 신호 디코더(520), 음성 신호 디코더(530), 제1 디코딩 유닛(540), 복수채널 디코더(plural-channel decoder)(550)를 포함한다. Referring to FIG. 11, the audio signal decoding apparatus 500 includes a demultiplexer 510, an audio signal decoder 520, a voice signal decoder 530, a first decoding unit 540, and a multi-channel decoder. 550.

디멀티플렉서(demultiplexer)(510)는 오디오신호 비트스트림으로부터 스펙트럴 데이터, 코딩 스킴 정보, 대역확장 정보, 공간정보 등을 추출한다. 코딩 스킴 정보에 따라서 현재 프레임에 해당하는 오디오 신호를 오디오 신호 디코더(520) 또는 음성 신호 디코더(530)로 전달한다. 구체적으로, 코딩 스킴 정보가 밴드 확장 스킴이 현재 프레임에 적용되었음을 지시하는 경우, 오디오 신호를 오디오 신호 디코더(520)로 전달하고, 코딩 스킴 정보가 밴드 확장 스킴이 현재 프레임에 적용되지 않았음을 지시하는 경우, 오디오 신호를 음성 신호 디코더(530)으로 전달한다.The demultiplexer 510 extracts spectral data, coding scheme information, bandwidth extension information, spatial information, and the like from the audio signal bitstream. The audio signal corresponding to the current frame is transmitted to the audio signal decoder 520 or the voice signal decoder 530 according to the coding scheme information. Specifically, when the coding scheme information indicates that the band extension scheme has been applied to the current frame, the audio signal is transmitted to the audio signal decoder 520, and the coding scheme information indicates that the band extension scheme has not been applied to the current frame. In this case, the audio signal is transmitted to the voice signal decoder 530.

오디오 신호 디코더(audio signal decoder)(520)는, 다운믹스 신호에 해당하는 스펙트럴 데이터가 오디오 특성이 큰 경우, 오디오 코딩 방식으로 스펙트럴 데이터를 디코딩한다. 여기서 오디오 코딩 방식은 앞서 설명한 바와 같이, AAC 표준, HE-AAC 표준에 따를 수 있다. 한편 오디오 신호 디코더(520)는 역양자화부(미도시), 역변환부(미도시)를 포함할 수 있다. 따라서 오디오 신호 디코더(520)는 비트스트림을 통해 전송된 스펙트럴 데이터 및 스케일 팩터에 대해 역양자화 및 역변환을 수행할 수 있다.The audio signal decoder 520 decodes the spectral data by audio coding when the spectral data corresponding to the downmix signal has a large audio characteristic. As described above, the audio coding scheme may be based on the AAC standard and the HE-AAC standard. The audio signal decoder 520 may include an inverse quantizer (not shown) and an inverse transformer (not shown). Accordingly, the audio signal decoder 520 may perform inverse quantization and inverse transformation on spectral data and scale factors transmitted through the bitstream.

음성 신호 디코더(speech signal decoder)(530)는 상기 스펙트럴 데이터가 음성 특성이 큰 경우, 음성 코딩 방식으로 다운믹스 신호를 디코딩한다. 음성 코딩 방식은, 앞서 설명한 바와 같이, AMR-WB(Adaptive multi-rate Wide-Band) 표준에 따를 수 있지만, 본 발명은 이에 한정되지 아니한다. 앞서 언급한 바와 같이 음성 신호 디코더(530)는 도 5와 함께 설명한 바와 같이, LPC 디코딩 파트(LPC decoding part)(224a-1, 224b-1, 224c-1)를 포함할 수 있다.A speech signal decoder 530 decodes the downmix signal using a speech coding scheme when the spectral data has a large speech characteristic. As described above, the speech coding scheme may conform to the adaptive multi-rate wide-band (AMR-WB) standard, but the present invention is not limited thereto. As described above, the voice signal decoder 530 may include LPC decoding parts 224a-1, 224b-1, and 224c-1 as described with reference to FIG. 5.

제1 디코딩 유닛(540)는 대역확장정보 비트스트림을 디코딩하고, 이 정보를 이용하여 오디오 신호에 앞서 설명한 주파수 도메인 기반의 대역 확장 스킴을 적용하여 고주파 대역의 오디오 신호를 생성한다. The first decoding unit 540 decodes the bandwidth extension information bitstream and generates the high frequency band audio signal by applying the frequency domain based band extension scheme described above to the audio signal using the information.

복수채널 디코더(550)은 디코딩된 오디오 신호가 다운믹스인 경우, 공간정보를 이용하여 멀티채널 신호(스테레오 신호 포함)의 출력 채널 신호를 생성한다.When the decoded audio signal is downmixed, the multichannel decoder 550 generates an output channel signal of a multichannel signal (including a stereo signal) using spatial information.

도 12는 앞서 언급한 바와 같이 도 5의 (B) 및 (C)와 함께 설명된 제2 디코딩 유닛(second decoding unit)의 제2 실시예(224b) 또는 제3 실시예(224c)가 적용된 디코딩 장치로서, 도 11과 함께 설명된 제1 예와 거의 동일하다. 그러나, 전체 대역에 해당하는 오디오 신호가 음성 신호 인코더(630)에서 디코딩된 이후에HBE 디코딩 파트(HBE decoding Part)(635)(또는 PSDD 디코딩 파트(PSDD decoding part))에서 HBE 방식 또는 PSDD 방식에 의해 디코딩된다는 점에서 차이가 있다. HBE 디코딩 파트(HBE decoding part)(635)는 앞서 설명한 바와 같이 HBE 정보를 이용하여 저주파의 여기 신호를 쉐이핑함으로써 고주파 신호를 생성한다. 한편 PSDD 디코딩 파트(PSDD decoding part)(635)는 카피 밴드(copy band)의 정보 및 PSDD 정보를 이용하여 타깃 밴드(target band)를 복원한다. 음성 신호 디코더(635)는 HBE 방식 또 는 PSDD 방식에 의해 인코딩된 결과를 음성 신호 방식으로 디코딩한다. 물론 음성 신호 인코더(635)는 제1 예와 마찬가지로 LPC 디코딩 파트(224a-1, 224b-1, 224c-1)를 더 포함할 수 있다.12 is a decoding to which the second embodiment 224b or the third embodiment 224c of the second decoding unit described with reference to FIGS. 5B and 5C is applied as mentioned above. As an apparatus, it is almost the same as the first example described with reference to FIG. However, after the audio signal corresponding to the entire band is decoded by the speech signal encoder 630, the HBE decoding part 635 (or the PSDD decoding part) is applied to the HBE method or the PSDD method. The difference is that it is decoded by. As described above, the HBE decoding part 635 generates a high frequency signal by shaping a low frequency excitation signal using the HBE information. Meanwhile, the PSDD decoding part 635 reconstructs a target band using information of a copy band and PSDD information. The speech signal decoder 635 decodes the result encoded by the HBE scheme or the PSDD scheme into the speech signal scheme. Of course, the voice signal encoder 635 may further include LPC decoding parts 224a-1, 224b-1, and 224c-1 as in the first example.

본 발명에 따른 오디오 신호 처리 장치는 다양한 제품에 포함되어 이용될 수 있다. 이러한 제품은 크게 스탠드 얼론(stand alone) 군과 포터블(portable) 군으로 나뉠 수 있는데, 스탠드 얼론군은 티비, 모니터, 셋탑 박스 등을 포함할 수 있고, 포터블군은 PMP, 휴대폰, 네비게이션 등을 포함할 수 있다.The audio signal processing apparatus according to the present invention can be included and used in various products. These products can be broadly divided into stand alone and portable groups, which can include TVs, monitors and set-top boxes, and portable groups include PMPs, mobile phones, and navigation. can do.

도 13은 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계를 보여주는 도면이다. 우선 도 13를 참조하면, 유무선 통신부(710)는 유무선 통신 방식을 통해서 비트스트림을 수신한다. 구체적으로 유무선 통신부(710)는 유선통신부(710A), 적외선통신부(710B), 블루투스부(710C), 무선랜통신부(710D) 중 하나 이상을 포함할 수 있다.FIG. 13 is a diagram illustrating a relationship between products in which an audio signal processing device according to an embodiment of the present invention is implemented. First, referring to FIG. 13, the wired / wireless communication unit 710 receives a bitstream through a wired / wireless communication method. In more detail, the wired / wireless communication unit 710 may include at least one of a wired communication unit 710A, an infrared communication unit 710B, a Bluetooth unit 710C, and a wireless LAN communication unit 710D.

사용자 인증부는(720)는 사용자 정보를 입력 받아서 사용자 인증을 수행하는 것으로서 지문인식부(720A), 홍채인식부(720B), 얼굴인식부(720C), 및 음성인식부(720D) 중 하나 이상을 포함할 수 있는데, 각각 지문, 홍채정보, 얼굴 윤곽 정보, 음성 정보를 입력받아서, 사용자 정보로 변환하고, 사용자 정보 및 기존 등록되어 있는 사용자 데이터와의 일치여부를 판단하여 사용자 인증을 수행할 수 있다. The user authentication unit 720 receives user information and performs user authentication, and includes one or more of a fingerprint recognition unit 720A, an iris recognition unit 720B, a face recognition unit 720C, and a voice recognition unit 720D. The fingerprint, iris information, facial contour information, and voice information may be input, converted into user information, and the user authentication may be performed by determining whether the user information matches the existing registered user data. .

입력부(730)는 사용자가 여러 종류의 명령을 입력하기 위한 입력장치로서, 키패드부(730A), 터치패드부(730B), 리모컨부(730C) 중 하나 이상을 포함할 수 있지만, 본 발명은 이에 한정되지 아니한다. The input unit 730 is an input device for a user to input various types of commands, and may include one or more of a keypad unit 730A, a touch pad unit 730B, and a remote controller unit 730C. It is not limited.

신호 코딩 유닛(740)는 유무선 통신부(710)를 통해 수신된 오디오 신호 및/또는 비디오 신호에 대해서 인코딩 또는 디코딩을 수행하고, 시간 도메인의 오디오 신호를 출력한다. 오디오 신호 처리 장치(745)를 포함하는데, 이는 앞서 설명한 본 발명의 일 실시예에 해당하는 것으로서, 이와 같이 오디오 처리 장치(745) 및 이를 포함한 신호 코딩 유닛은 하나 이상의 프로세서에 의해 구현될 수 있다.The signal coding unit 740 encodes or decodes an audio signal and / or a video signal received through the wired / wireless communication unit 710 and outputs an audio signal of a time domain. An audio signal processing device 745, which corresponds to an embodiment of the present invention described above, and thus, the audio processing device 745 and the signal coding unit including the same may be implemented by one or more processors.

제어부(750)는 입력장치들로부터 입력 신호를 수신하고, 신호 디코딩부(740)와 출력부(760)의 모든 프로세스를 제어한다. 출력부(760)는 신호 디코딩부(740)에 의해 생성된 출력 신호 등이 출력되는 구성요소로서, 스피커부(760A) 및 디스플레이부(760B)를 포함할 수 있다. 출력 신호가 오디오 신호일 때 출력 신호는 스피커로 출력되고, 비디오 신호일 때 출력 신호는 디스플레이를 통해 출력된다.The controller 750 receives input signals from the input apparatuses and controls all processes of the signal decoding unit 740 and the output unit 760. The output unit 760 is a component that outputs an output signal generated by the signal decoding unit 740, and may include a speaker unit 760A and a display unit 760B. When the output signal is an audio signal, the output signal is output to the speaker, and when the output signal is a video signal, the output signal is output through the display.

도 14는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계도이다. 도 14는 도 13에서 도시된 제품에 해당하는 단말 및 서버와의 관계를 도시한 것으로서, 도 14의 (A)를 참조하면, 제1 단말(700.1) 및 제2 단말(700.2)이 각 단말들은 유무선 통신부를 통해서 데이터 내지 비트스트림을 양방향으로 통신할 수 있음을 알 수 있다. 도 14의 (B)를 참조하면, 서버(800) 및 제1 단말(700.1) 또한 서로 유무선 통신을 수행할 수 있음을 알 수 있다.14 is a relationship diagram of products in which an audio signal processing apparatus according to an embodiment of the present invention is implemented. FIG. 14 illustrates a relationship between a terminal and a server corresponding to the product illustrated in FIG. 13. Referring to FIG. 14A, the first terminal 700. 1 and the second terminal 700. It can be seen that the data to the bitstream can be bidirectionally communicated through the wired / wireless communication unit. Referring to FIG. 14B, it can be seen that the server 800 and the first terminal 700.1 may also perform wired or wireless communication with each other.

도 15는 본 발명의 다른 실시예에 따른 오디오 신호 처리 장치의 구성을 보여주는 도면이다. 도 15를 참조하면, 오디오 신호 처리 장치의 인코더 측(1100)은 타입 결정 유닛(1110), 제1 밴드 확장 인코딩 유닛(1120), 제2 밴드 확장 인코딩 유닛(1122), 및 멀티플렉서(1130)을 포함한다. 오디오 신호 처리 장치의 디코더 측 은(1200)은, 디멀티플렉서(1210), 제1 밴드 확장 디코딩 유닛(1220), 제2 밴드 확장 디코딩 유닛(1222)을 포함한다.15 is a block diagram of an audio signal processing apparatus according to another embodiment of the present invention. Referring to FIG. 15, the encoder side 1100 of the audio signal processing apparatus includes a type determination unit 1110, a first band extension encoding unit 1120, a second band extension encoding unit 1122, and a multiplexer 1130. Include. The decoder side silver 1200 of the audio signal processing apparatus includes a demultiplexer 1210, a first band extension decoding unit 1220, and a second band extension decoding unit 1222.

타입 결정 유닛(1110)은 입력되는 오디오 신호를 분석하여 트랜지언트 정도(transient proportion)를 검출한다. 타입 결정 유닛(1110)은 스테이셔너리(stationary) 구간과 트랜지언트(transient) 구간을 구분하고, 그 결과를 근거로 하여, 둘 이상의 밴드 확장 방식들 중에서 현재 프레임을 위한 특정 타입의 밴드 확장 방식을 결정하고, 그 결정된 방식을 식별하기 위한 타입 정보를 생성한다. 타입 결정 유닛의 구체적인 구성은 추후 도 16과 함께 후술하고자 한다.The type determination unit 1110 analyzes the input audio signal to detect a transient proportion. The type determination unit 1110 distinguishes between a stationary section and a transient section, and based on the result, determines a specific type of band extension scheme for the current frame among two or more band extension schemes. And type information for identifying the determined manner. A detailed configuration of the type determination unit will be described later with reference to FIG. 16.

제1 밴드 확장 인코딩 유닛(1120)은 제1 타입의 밴드 확장 방식에 따라서, 해당 프레임을 인코딩하고, 제2 밴드 확장 인코딩 유닛(1122)은 제2 타입의 밴드 확장 방식에 따라서, 해당 프레임을 인코딩한다. 제1 밴드 확장 인코딩 유닛(1122)은 밴드 패스 필터링(bandpass filtering), 타임 신축 처리(time stretching processing), 및 데시메이션 처리(decimation processing) 등을 수행할 수 있다. 제1 타입의 밴드 확장 방식 및 제2 타입의 밴드 확장 방식 등에 대해서도 도 16 등과 함께 구체적으로 설명하고자 한다.The first band extension encoding unit 1120 encodes the corresponding frame according to the first type of band extension method, and the second band extension encoding unit 1122 encodes the corresponding frame according to the second type of band extension method. do. The first band extension encoding unit 1122 may perform bandpass filtering, time stretching processing, decimation processing, or the like. A band extension method of the first type and a band extension method of the second type will be described in detail with reference to FIG. 16 and the like.

멀티 플렉서(1130)은 제1 및 제2 밴드 확장 인코딩 유닛(1120, 1122)에 의해 생성된 저주파 대역(lower band)의 스펙트럴 데이터, 타입 결정 유닛(1110)에 의해 생성된 타입 정보 등을 멀티플렉싱하여, 오디오 신호 비트스트림을 생성한다.디코더 측(1200)의 디멀티플렉서(1210)는 오디오 신호 비트스트림으로부터 저주파 대역의 스펙트럴 데이터 및 타입 정보 등을 추출한다. 그런 다음 디멀티플렉서(1210)는 타입 정보가 어떤 밴드 확장 방식의 타입을 나타내는지에 따라서, 현재 프레임을 제1 밴드 확장 디코딩 유닛(1220) 또는 제2 밴드 확장 디코딩 유닛(1222)로 전달한다. 제1 밴드 확장 디코딩 유닛(1220)는 제1 밴드 확장 인코딩 유닛(1120)에서 인코딩된 제1 타입의 밴드 확장 방식에 따라서 역으로 현재 프레임을 디코딩한다. 나아가 제1 밴드 확장 디코딩 유닛(1222)은 밴드 패스 필터링(bandpass filtering), 타임 신축 처리(time stretching processing), 및 데시메이션 처리(decimation processing) 등을 수행할 수 있다. 마찬가지로, 제2 밴드 확장 디코딩 유닛(1222)은 제2 타입의 밴드 확장 방식에 따라서 현재 프레임을 디코딩함으로써, 저주파 대역의 스펙트럴 데이터를 이용하여 고주파 대역의 스펙트럴 데이터를 생성한다.The multiplexer 1130 is configured to display spectral data of the lower band generated by the first and second band extension encoding units 1120 and 1122, type information generated by the type determination unit 1110, and the like. The multiplexer generates an audio signal bitstream. The demultiplexer 1210 of the decoder side 1200 extracts spectral data and type information of a low frequency band from the audio signal bitstream. The demultiplexer 1210 then transfers the current frame to the first band extension decoding unit 1220 or the second band extension decoding unit 1222, depending on which type of band extension scheme the type information indicates. The first band extension decoding unit 1220 decodes the current frame in reverse according to the first type of band extension scheme encoded by the first band extension encoding unit 1120. Furthermore, the first band extension decoding unit 1222 may perform bandpass filtering, time stretching processing, and decimation processing. Similarly, the second band extension decoding unit 1222 generates the high frequency band of spectral data by using the low frequency band of spectral data by decoding the current frame according to the second type of band extension method.

도 16은 도 15에서의 타입 결정 유닛(1110)의 세부 구성을 보여주는 도면이다. 타입 결정 유닛(1110)은 트랜지언트 검출 파트(1112) 및 타입 정보 생성 파트(1114)를 포함하고, 코딩 스킴 결정 파트(1140)와 연계되어 있다.FIG. 16 is a diagram illustrating a detailed configuration of the type determination unit 1110 in FIG. 15. The type determination unit 1110 includes a transient detection part 1112 and a type information generation part 1114, and is associated with a coding scheme determination part 1140.

트랜지언트 검출 파트(1112)는 입력된 오디오 신호의 에너지를 분석하여, 스테이셔너리 구간과 트랜지언트 구간을 구분한다. 스테이셔너리 구간은 오디오 신호의 에너지가 평탄한 구간이고, 트랜지언트 구간은 오디오 신호의 에너지가 급격히 변화하는 구간일 수 있다. 트랜지언트 구간은 에너지가 급격하게 변화하는 구간이기 때문에, 청자는 밴드 확장 방식의 타입이 변화함에 따라 발생하는 아티팩트(artifact)를 인식하기가 쉽지 않다. 반면에 스테이셔너리 구간은 사운드가 잔잔하게 흐르는 구간이기 때문에, 이러한 구간에서 밴드 확장 방식의 타입이 바뀌게 되면 갑자기 아주 순간적으로 사운드가 중단되는 듯한 느낌이 들 수 있다. 따라서, 밴드 확장 방식의 타입을 제1 타입에서 제2 타입으로 변화시킬 필요가 있을 때, 이러한 스테이셔너리한 구간이 아닌 트랜지언트 구간에서 그 타입을 바꾸면, 마치 심리 음향 모델에 따른 마스킹 효과와 같이 타입 변화에 따른 artifact를 숨길 수 있다.The transient detection part 1112 analyzes the energy of the input audio signal to distinguish between the stationary section and the transient section. The stationary section may be a section in which the energy of the audio signal is flat, and the transient section may be a section in which the energy of the audio signal changes rapidly. Since the transient section is a section in which energy changes rapidly, it is difficult for the listener to recognize artifacts that occur as the type of the band extension scheme changes. On the other hand, since the stationary section is a section in which sound flows smoothly, if the type of the band extension system is changed in this section, it may feel like the sound suddenly stops momentarily. Therefore, when it is necessary to change the type of the band extension method from the first type to the second type, if the type is changed in the transient section instead of the stationary section, the type is similar to the masking effect according to the psychoacoustic model. You can hide artifacts from changes.

이와 같이 타입 정보 생성 파트(1114)는 현재 프레임에 대해, 둘 이상의 밴드 확장 방식 중 특정 타입의 밴드 확장 방식을 결정하고, 그 결정된 밴드 확장 방식을 나타내는 타입 정보를 생성한다. 둘 이상의 밴드 확장 방식에 대해서는 추후 도 18과 함께 후술하고자 한다.As described above, the type information generation part 1114 determines a specific type of band extension method among two or more band extension methods, and generates type information indicating the determined band extension method for the current frame. More than one band extension method will be described later with reference to FIG. 18.

특정의 밴드 확장 방식을 결정하기 위해서, 우선 코딩 스킴 결정 파트(1140)로부터 수신한 코딩 스킴을 참조하여 임시로 밴드 확장 방식의 타입을 결정하고, 트랜지언트 검출 파트(1112)로부터 수신한 정보를 참조하여, 밴드 확장 방식의 타입을 확정적으로 결정한다. 이하 도 17과 함께 이에 대해 구체적으로 설명하고자 한다.In order to determine a specific band extension scheme, first, the type of the band extension scheme is temporarily determined with reference to the coding scheme received from the coding scheme determination part 1140, and the information received from the transient detection part 1112 is referred to. In this case, the type of the band extension method is decided. Hereinafter, this will be described in detail with reference to FIG. 17.

도 17은 밴드 확장 방식의 타입을 결정하는 과정을 설명하기 위한 도면이다. 도 17을 참조하면, 우선 시간축에 따라서 여러 개의 프레임(f_i, f_n, f_t)이 존재한다. 각 프레임별로 주파수 도메인 기반의 오디오 코딩 방식(coding scheme 1)과 시간 도메인 기반의 스피치 코딩 방식(coding scheme 2)이 정해질 수 있다. 즉, 이 코딩 스킴에 따라서 그 코딩 스킴에 적합한 밴드 확장 방식의 타입이 임시적으로 정해질 수 있다. 예를 들어, 오디오 코딩 방식(coding scheme 1)에 해당하는 프레 임(f_i ~ f_n _-2)에 대해서는 제1 타입의 밴드 확장 방식이, 스피치 코딩 방식(coding scheme 2)에 해당하는 프레임(f_n _-1~ f_t)에 대해서는 제1 타입의 밴드 확장 방식이 임시로 정해질 수 있는 것이다. 그런 다음, 오디오 신호가 스테이셔너리한 구간인지 아니면 트랜지언트한 구간인지를 참조하여, 임시로 결정된 타입을 보정함으로써 최종적으로 밴드 확장 방식의 타입을 확정한다. 예를 들어, 도 17에 나타난 바와 같이 임시로 결정된 밴드 확장 방식의 타입이 f_n _-2 프레임에서 f_n _-1 프레임 경계에서 변하도록 할 경우 f_n _-2 프레임 및 f_n _-1 프레임은 스테이셔너리한 구간이기 때문에 밴드 확장 방식의 타입의 변화에 따른 아티팩트(artifact)가 감춰지지 않는다. 따라서, 밴드 확장 방식의 변화가 트랜지언트한 구간(f_n, f_n ₊₁)에서 이루어지도록 임시로 결정된 밴드 확장 방식의 타입을 보정하는 것이다. 다시 말해서, f_n _-1 프레임 및 f_n 프레임에서는 스테이셔너리한 구간이므로 밴드 확장 방식의 타입은 기존대로 제1 타입으로 유지하다가, f_n ₊₁ 프레임부터 제2 타입의 밴드 확장 방식을 적용하는 것이다. 요컨대 임시로 결정된 타입은 f_n _-1 프레임 및 f_n 프레임 이외에서는 유지되고, 상기 프레임에 대해서만 최종 단계에서 수정되었다.17 is a diagram for describing a process of determining a type of a band extension method. Referring to FIG. 17, there are several frames f _i , f _n , and f _t along the time axis. A frequency domain based audio coding scheme (coding scheme 1) and a time domain based speech coding scheme (coding scheme 2) may be determined for each frame. That is, according to this coding scheme, a type of band extension scheme suitable for the coding scheme can be temporarily determined. For example, with respect to the frames f _i to f _n _-2 corresponding to the audio coding scheme 1, the band extension scheme of the first type is a frame corresponding to the speech coding scheme 2. For f _n _-1 to f _t ), the first type of band extension may be temporarily determined. Then, the type of the band extension scheme is finally determined by correcting the temporarily determined type with reference to whether the audio signal is a stationary section or a transient section. For example, as shown in FIG. 17, when the temporarily determined type of the band extension scheme is changed at the frame boundary of f _n _-2 to f _n _-1 , the f _n _-2 frame and the f _n _-1 frame are staggered. Because it is an awkward section, artifacts due to the change of the type of the band extension method are not hidden. Therefore, the type of the band extension scheme that is temporarily determined to be changed in the transition period f _n , f _n ₊₁ is corrected. In other words, since the f _n _-1 frame and the f _n frame are stationary intervals, the type of the band extension method is maintained as the first type as before, and the band extension method of the second type is applied from the f _n ₊₁ frame. will be. In other words, the temporarily determined type is retained in other than f _n _-1 frames and f _n frames, and is modified in the last step only for the frames.

도 18은 여러 가지 타입의 밴드 확장 방식을 설명하기 위한 도면이다. 여기서 설명될 제1 타입의 밴드 확장 방식은 앞서 도 15와 함께 설명된, 제1 밴드 확장 방식에 해당되고, 여기서 설명될 제2 타입에 밴드 확장 방식은 도 15와 함께 설명된 제2 밴드 확장 방식에 해당될 수 있다. 이와는 반대로, 여기서 설명될 제1 타입 의 밴드 확장 방식은 앞서 도 15와 함께 설명된 제2 밴드 확장 방식에 해당되고, 여기서 설명된 제2 타입의 밴드 확장 방식은 앞서 도 15와 함께 설명된 제1 밴드 확장 방식에 해당할 수도 있다.18 is a diagram for describing various types of band extension methods. The band extension scheme of the first type to be described herein corresponds to the first band extension scheme described with reference to FIG. 15, and the band extension scheme to the second type to be described herein is the second band extension scheme described with reference to FIG. 15. It may correspond to. On the contrary, the first type of band extension scheme to be described herein corresponds to the second band extension scheme described above with reference to FIG. 15, and the second type of band extension scheme described above is described with reference to FIG. 15. It may correspond to a band extension method.

앞서 설명한 바와 같이 밴드 확장 방식은 협대역의 스펙트럴 데이터를 이용하여 광대역의 스펙트럴 데이터를 생성하는데, 이때 협대역은 저주파 대역(lower band)에 해당할 수 있고, 새로 생성되는 밴드는 고주파 대역(higher band)에 해당할 수 있다. 우선 도 18의 (A)을 참조하면, 제1 타입의 밴드 확장 방식의 일 예가 도시되어 있다. 제1 밴드 확장 방식은 협대역(또는 저주파 대역)의 제1 데이터 영역을 카피 밴드(copy band)로 하여 고주파 대역을 복원한다. 여기서 제1 데이터 영역은, 수신된 협대역의 전부일수도 있고, 복수 개의 포션일 수 있는데, 여기서 하나의 포션은 아래 설명될 제2 데이터 영역에 해당할 수 있고, 제1 데이터 영역은 제2 데이터 영역보다 클 수 있다.As described above, the band extension method generates wide-spectrum spectral data using narrow-band spectral data, where the narrow-band may correspond to a low frequency band, and the newly generated band may be a high frequency band ( higher band). First, referring to FIG. 18A, an example of the first type of band extension method is illustrated. The first band extension scheme restores a high frequency band by using a narrow band (or low frequency band) first data area as a copy band. Here, the first data region may be all of the received narrowband or may be a plurality of portions, where one portion may correspond to the second data region to be described below, and the first data region is the second data region. Can be greater than

반면, 도 18의 (B)-1 및 (B)-2를 참조하면, 제2 대역 확장 방식의 제1 예(type 2-1) 및 제2 예(type 2-2)가 도시되어 있다. 제2 타입의 대역 확장 방식은 제2 데이터 영역을 카피 밴드(copy band)로 이용하여 고주파 대역의 복원에 사용한다. 여기서 제2 데이터 영역은, 수신된 협대역의 포션(portion)일 수 있고, 상기 제1 데이터 영역보다 작은 대역일 수 있다. 한편,제2 타입 중 제1 예의 경우 고주파 대역을 생성하는 데 사용되는 카피 밴드(cb)들이 연속되어 있고, 제2 타입 중 제2 예의 경우, 카피 밴드들이 연속되어 있지 않고 이산적으로(discrete) 분포되어 있다.18 (B) -1 and (B) -2, a first example (type 2-1) and a second example (type 2-2) of the second band extension scheme are shown. In the second type of band extension, a second data area is used as a copy band to restore the high frequency band. Here, the second data area may be a portion of the received narrow band and may be a smaller band than the first data area. Meanwhile, in the case of the first example of the second type, the copy bands cb used to generate the high frequency band are continuous, and in the case of the second example of the second type, the copy bands are not continuous and discrete. It is distributed.

도 19는 본 발명의 다른 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 인코딩 장치의 구성을 보여주는 도면이다. 도 19를 참조하면, 오디오 신호 인코딩 장치(1300)는 복수채널 인코더(1305), 타입 결정 유닛 (1310), 제1 밴드 확장 인코딩 유닛(1320), 제2 밴드 확장 디코딩 유닛(1322), 오디오 신호 인코더(1330), 음성 신호 인코더(1340) 및 멀티플렉서(1350)을 포함한다. 여기서 타입 결정 유닛(1310), 제1 밴드 확장 인코딩 유닛(1320), 및 제2 밴드 확장 인코딩 유닛(1322)은 도 15와 함께 설명된 동일 명칭의 구성요소(1110, 1120, 1122)의 기능과 동일할 수 있다.19 is a diagram illustrating a configuration of an audio signal encoding apparatus to which an audio signal processing apparatus according to another embodiment of the present invention is applied. Referring to FIG. 19, an audio signal encoding apparatus 1300 may include a multichannel encoder 1305, a type determination unit 1310, a first band extension encoding unit 1320, a second band extension decoding unit 1322, and an audio signal. An encoder 1330, a voice signal encoder 1340, and a multiplexer 1350. Here, the type determination unit 1310, the first band extension encoding unit 1320, and the second band extension encoding unit 1322 correspond to the functions of the components 1110, 1120, and 1122 of the same name described with reference to FIG. 15. May be the same.

복수채널 인코더(1305)는 복수의 채널 신호(둘 이상의 채널 신호)(이하, 멀티채널 신호)를 입력받아서, 다운믹스를 수행함으로써 모노 또는 스테레오의 다운믹스 신호를 생성하고, 다운믹스 신호를 멀티채널 신호로 업믹스하기 위해 필요한 공간 정보를 생성한다. 여기서 공간 정보(spatial information)는, 채널 레벨 차이 정보, 채널간 상관정보, 채널 예측 계수, 및 다운믹스 게인 정보 등을 포함할 수 있다. 만약, 오디오 신호 인코딩 장치(1300)가 모노 신호를 수신할 경우, 복수 채널 인코더(1305)는 모노 신호에 대해서 다운믹스하지 않고 바이패스할 수도 있음은 물론이다.The multi-channel encoder 1305 receives a plurality of channel signals (two or more channel signals) (hereinafter, referred to as a multi-channel signal) and performs downmixing to generate a mono or stereo downmix signal and multi-channel the downmix signal. Generates spatial information needed for upmixing to a signal. The spatial information may include channel level difference information, interchannel correlation information, channel prediction coefficients, downmix gain information, and the like. If the audio signal encoding apparatus 1300 receives a mono signal, the multi-channel encoder 1305 may bypass the mono signal without downmixing.

타입 결정 유닛(1310)은 현재 프레임에 적용할 밴드 확장 방식의 타입을 결정하여 그 타입을 지시하는 타입 정보를 생성한다. 타입 결정 유닛(1310)은 현재 프레임에 제1 밴드 확장 방식을 적용할 경우, 제1 밴드 확장 인코딩 유닛(1320)으로 오디오 신호를 전달하고, 제2 밴드 확장 방식을 적용할 경우, 제2 밴드 확장 인 코딩 유닛(1322)으로 오디오 신호를 전달한다. 제1 밴드 확장 인코딩 유닛(1320) 및 제2 밴드 확장 인코딩 유닛(1322)은 각각의 타입에 따른 밴드 확장 방식을 적용함으로써, 저주파 대역을 이용하여 고주파 대역을 복원하기 위한 밴드 확장 정보를 생성한다. 그런 다음, 밴드 확장 방식으로 인코딩된 신호는 밴드 확장 방식의 타입과 무관하게, 신호의 특성에 따라 오디오 신호 인코더(1330) 또는 음성 신호 인코더(134)에 의해 인코딩된다. 신호의 특성에 따른 코딩 스킴 정보는 도 16과 함께 앞서 설명한 코딩 스킴 결정 파트(1140)에 의해 생성된 정보일 수 있는데, 이 정보 또한 다른 정보와 마찬가지로 멀티플렉서(1350)에 전달될 수 있다.The type determination unit 1310 determines the type of the band extension scheme to be applied to the current frame and generates type information indicating the type. The type determination unit 1310 transmits an audio signal to the first band extension encoding unit 1320 when the first band extension scheme is applied to the current frame, and a second band extension when the second band extension scheme is applied. The audio signal is passed to the encoding unit 1322. The first band extension encoding unit 1320 and the second band extension encoding unit 1322 generate band extension information for reconstructing a high frequency band by using a low frequency band by applying a band extension scheme according to each type. The signal encoded in the band extension method is then encoded by the audio signal encoder 1330 or the voice signal encoder 134 according to the characteristics of the signal, regardless of the type of the band extension method. The coding scheme information according to the characteristics of the signal may be information generated by the coding scheme determination part 1140 described above with reference to FIG. 16, which may be transmitted to the multiplexer 1350 like other information.

오디오 신호 인코더(1330)는 다운믹스 신호의 특정 프레임 또는 특정 세그먼트가 큰 오디오 특성을 갖는 경우, 오디오 코딩 방식(audio coding scheme)에 따라 다운믹스 신호를 인코딩한다. 여기서 오디오 코딩 방식은 AAC (Advanced Audio Coding) 표준 또는 HE-AAC (High Efficiency Advanced Audio Coding) 표준에 따른 것일 수 있으나, 본 발명은 이에 한정되지 아니한다. 한편, 오디오 신호 인코더(1340)는, MDCT(Modified Discrete Transform) 인코더에 해당할 수 있다. The audio signal encoder 1330 encodes the downmix signal according to an audio coding scheme when a specific frame or a specific segment of the downmix signal has a large audio characteristic. Here, the audio coding scheme may be based on an AAC standard or a high efficiency advanced audio coding (HE-AAC) standard, but the present invention is not limited thereto. The audio signal encoder 1340 may correspond to a modified disc transform transform (MDCT) encoder.

음성 신호 인코더(1340)는 다운믹스 신호의 특정 프레임 또는 특정 세그먼트가 큰 음성 특성을 갖는 경우, 음성 코딩 방식(speech coding scheme)에 따라서 다운믹스 신호를 인코딩한다. 여기서 음성 코딩 방식은 AMR-WB(Adaptive multi-rate Wide-Band) 표준에 따른 것일 수 있으나, 본 발명은 이에 한정되지 아니한다. 한편, 음성 신호 인코더(1350)는 선형 예측 부호화(LPC: Linear Prediction Coding) 인코딩 파트를 더 포함할 수 있다. 하모닉 신호가 시간축 상에서 높은 중복성을 가 지는 경우, 과거 신호로부터 현재 신호를 예측하는 선형 예측에 의해 모델링될 수 있는데, 이 경우 선형 예측 부호화 방식을 채택하면 부호화 효율을 높을 수 있다. 한편, 음성 신호 인코더(1340)는 타임 도메인 인코더에 해당할 수 있다.The speech signal encoder 1340 encodes the downmix signal according to a speech coding scheme when a specific frame or a segment of the downmix signal has a large speech characteristic. Here, the speech coding scheme may be based on an adaptive multi-rate wide-band (AMR-WB) standard, but the present invention is not limited thereto. Meanwhile, the speech signal encoder 1350 may further include a linear prediction coding (LPC) encoding part. When the harmonic signal has high redundancy on the time axis, the harmonic signal may be modeled by linear prediction that predicts the current signal from the past signal. In this case, the linear prediction coding method may increase coding efficiency. Meanwhile, the voice signal encoder 1340 may correspond to a time domain encoder.

멀티플렉서(1350)는 공간정보, 코딩 스킴 정보, 대역확장 정보 및 스펙트럴 데이터 등을 다중화하여 오디오 신호 비트스트림을 생성한다.The multiplexer 1350 generates an audio signal bitstream by multiplexing spatial information, coding scheme information, bandwidth extension information, and spectral data.

도 20은 본 발명의 다른 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 디코딩 장치의 구성을 보여주는 도면이다. 도 20을 참조하면, 오디오 신호 디코딩 장치(1400)는 디멀티플렉서(1410), 오디오 신호 디코더(1420), 음성 신호 디코더(1430), 제1 밴드 확장 디코딩 유닛(1440), 제2 밴드 확장 디코딩 유닛(1442), 및 복수채널 디코더(1450)를 포함한다. 20 is a diagram illustrating a configuration of an audio signal decoding apparatus to which an audio signal processing apparatus according to another embodiment of the present invention is applied. Referring to FIG. 20, the audio signal decoding apparatus 1400 may include a demultiplexer 1410, an audio signal decoder 1420, a voice signal decoder 1430, a first band extension decoding unit 1440, and a second band extension decoding unit ( 1442, and a multichannel decoder 1450.

디멀티플렉서(1410)는 오디오신호 비트스트림으로부터 스펙트럴 데이터, 코딩 스킴 정보, 타입 정보, 대역확장 정보, 공간정보 등을 추출한다. 코딩 스킴 정보에 따라서 현재 프레임에 해당하는 오디오 신호를 오디오 신호 디코더(1420) 또는 음성 신호 디코더(1430)로 전달한다. The demultiplexer 1410 extracts spectral data, coding scheme information, type information, bandwidth extension information, and spatial information from the audio signal bitstream. The audio signal corresponding to the current frame is transmitted to the audio signal decoder 1420 or the voice signal decoder 1430 according to the coding scheme information.

오디오 신호 디코더(1420)는, 다운믹스 신호에 해당하는 스펙트럴 데이터가 오디오 특성이 큰 경우, 오디오 코딩 방식으로 스펙트럴 데이터를 디코딩한다. 여기서 오디오 코딩 방식은 앞서 설명한 바와 같이, AAC 표준, HE-AAC 표준에 따를 수 있다. 한편 오디오 신호 디코더(1420)는 역양자화부(미도시), 역변환부(미도시)를 포함할 수 있다. 따라서 오디오 신호 디코더(1420)는 비트스트림을 통해 전송된 스펙트럴 데이터 및 스케일 팩터에 대해 역양자화 및 역변환을 수행할 수 있다.When the spectral data corresponding to the downmix signal has a large audio characteristic, the audio signal decoder 1420 decodes the spectral data by an audio coding method. As described above, the audio coding scheme may be based on the AAC standard and the HE-AAC standard. The audio signal decoder 1420 may include an inverse quantizer (not shown) and an inverse transformer (not shown). Accordingly, the audio signal decoder 1420 may perform inverse quantization and inverse transformation on spectral data and scale factors transmitted through the bitstream.

음성 신호 디코더(1430)는 상기 스펙트럴 데이터가 음성 특성이 큰 경우, 음성 코딩 방식으로 다운믹스 신호를 디코딩한다. 음성 코딩 방식은, 앞서 설명한 바와 같이, AMR-WB(Adaptive multi-rate Wide-Band) 표준에 따를 수 있지만, 본 발명은 이에 한정되지 아니한다. 음성 신호 디코더(1430)는 LPC 디코딩 파트를 포함할 수 있다.The speech signal decoder 1430 decodes the downmix signal using a speech coding scheme when the spectral data has a large speech characteristic. As described above, the speech coding scheme may conform to the adaptive multi-rate wide-band (AMR-WB) standard, but the present invention is not limited thereto. The voice signal decoder 1430 may include an LPC decoding part.

그리고 앞서 설명한 바와 같이, 둘 이상의 밴드 확장 방식들 중 특정한 확장 정보를 지시하는 타입 정보에 따라서, 오디오 신호는 제1 밴드 확장 디코딩 유닛(1440) 또는 제2 밴드 확장 디코딩 유닛(1442)로 전달된다. 제1 / 제2 밴드 확장 디코딩 유닛(1440, 1442)은 해당 타입의 대역 확장 방식에 따라서, 협대역의 스펙트럴 데이터 중 일부 또는 전부를 이용하여 광대역의 스펙트럴 데이터를 복원한다.As described above, the audio signal is transmitted to the first band extension decoding unit 1440 or the second band extension decoding unit 1442 according to the type information indicating specific extension information among two or more band extension schemes. The first and second band extension decoding units 1440 and 1442 reconstruct wideband spectral data using some or all of narrowband spectral data according to a band extension scheme of the type.

복수채널 디코더(1450)은 디코딩된 오디오 신호가 다운믹스인 경우, 공간정보를 이용하여 멀티채널 신호(스테레오 신호 포함)의 출력 채널 신호를 생성한다.When the decoded audio signal is downmixed, the multichannel decoder 1450 generates an output channel signal of a multichannel signal (including a stereo signal) using spatial information.

도 21은 본 발명의 다른 실시예에 따른 오디오 신호 처리 장치가 구현된 제품의 개략적인 구성을 보여주는 도면이고, 도 22는 본 발명의 다른 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계도이다. 우선, 도 21을 참조하면, 통신부(1510), 사용자 인증부(1520), 입력부(1530), 신호 코딩 유닛(1540), 제어 부(1550), 및 출력부(1560)을 포함하는데, 신호 코딩 유닛(1540)을 제외한 각 구성요소는, 앞서 도 12와 함께 설명된 동일 명칭의 구성요소와 동일한 기능을 수행한다. 한편 신호 코딩 유닛(1540)는 유무선 통신부(1510)를 통해 수신된 오디오 신호 및/또는 비디오 신호에 대해서 인코딩 또는 디코딩을 수행하고, 시간 도메인의 오디오 신호를 출력한다. 신호 코딩 유닛은 오디오 신호 처리 장치(1545)를 포함하는데, 이는 앞서 도 15 내지 도 20과 함께 설명한 본 발명의 다른 실시예에 해당하는 것으로서, 이와 같이 오디오 처리 장치(1545) 및 이를 포함한 신호 코딩 유닛은 하나 이상의 프로세서에 의해 구현될 수 있다.FIG. 21 is a view illustrating a schematic configuration of a product in which an audio signal processing device is implemented according to another embodiment of the present invention, and FIG. 22 is a relation diagram of products in which the audio signal processing device according to another embodiment of the present invention is implemented. to be. First, referring to FIG. 21, a communication unit 1510, a user authentication unit 1520, an input unit 1530, a signal coding unit 1540, a control unit 1550, and an output unit 1560 may be included. Each component except the unit 1540 performs the same function as the component of the same name described above with reference to FIG. 12. Meanwhile, the signal coding unit 1540 encodes or decodes an audio signal and / or a video signal received through the wired / wireless communication unit 1510 and outputs an audio signal of a time domain. The signal coding unit includes an audio signal processing device 1545, which corresponds to another embodiment of the present invention described above with reference to FIGS. 15 to 20, and thus, the audio processing device 1545 and the signal coding unit including the same. May be implemented by one or more processors.

도 22는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계도이다. 도 22는 도 21에서 도시된 제품에 해당하는 단말 및 서버와의 관계를 도시한 것으로서, 도 22의 (A)를 참조하면, 제1 단말(1500.1) 및 제2 단말(1500.2)이 각 단말들은 유무선 통신부를 통해서 데이터 내지 비트스트림을 양방향으로 통신할 수 있음을 알 수 있다. 도 22의 (B)를 참조하면, 서버(1600) 및 제1 단말(1500.1) 또한 서로 유무선 통신을 수행할 수 있음을 알 수 있다.22 is a relationship diagram of products in which an audio signal processing device according to an embodiment of the present invention is implemented. FIG. 22 illustrates a relationship between a terminal and a server corresponding to the product illustrated in FIG. 21. Referring to FIG. 22A, the first terminal 1500.1 and the second terminal 1500.2 may be referred to as the respective terminals. It can be seen that the data to the bitstream can be bidirectionally communicated through the wired / wireless communication unit. Referring to FIG. 22B, it can be seen that the server 1600 and the first terminal 1500.1 may also perform wired or wireless communication with each other.

본 발명에 따른 오디오 신호 처리 방법은 컴퓨터에서 실행되기 위한 프로그램으로 제작되어 컴퓨터가 읽을 수 있는 기록 매체에 저장될 수 있으며, 본 발명에 따른 데이터 구조를 가지는 멀티미디어 데이터도 컴퓨터가 읽을 수 있는 기록 매체에 저장될 수 있다. 상기 컴퓨터가 읽을 수 있는 기록 매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 저장 장치를 포함한다. 컴퓨터가 읽을 수 있는 기록 매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 플로피디 스크, 광 데이터 저장장치 등이 있으며, 또한 캐리어 웨이브(예를 들어 인터넷을 통한 전송)의 형태로 구현되는 것도 포함한다. 또한, 상기 인코딩 방법에 의해 생성된 비트스트림은 컴퓨터가 읽을 수 있는 기록 매체에 저장되거나, 유/무선 통신망을 이용해 전송될 수 있다.The audio signal processing method according to the present invention can be stored in a computer-readable recording medium which is produced as a program for execution in a computer, and multimedia data having a data structure according to the present invention can also be stored in a computer-readable recording medium. Can be stored. The computer readable recording medium includes all kinds of storage devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, floppy disk, optical data storage, and the like, which are also implemented in the form of a carrier wave (for example, transmission over the Internet). It also includes. In addition, the bit stream generated by the encoding method may be stored in a computer-readable recording medium or transmitted using a wired / wireless communication network.

이상과 같이, 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 이것에 의해 한정되지 않으며 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에 의해 본 발명의 기술사상과 아래에 기재될 특허청구범위의 균등범위 내에서 다양한 수정 및 변형이 가능함은 물론이다. As described above, although the present invention has been described by way of limited embodiments and drawings, the present invention is not limited thereto and is intended by those skilled in the art to which the present invention pertains. Of course, various modifications and variations are possible within the scope of equivalents of the claims to be described.

본 발명은 오디오 신호를 인코딩하고 디코딩하는 데 적용될 수 있다.The present invention can be applied to encoding and decoding audio signals.

도 1은 본 발명의 일 실시예에 따른 오디오 신호 처리 장치의 구성도.1 is a block diagram of an audio signal processing apparatus according to an embodiment of the present invention.

도 2는 도 1에서의 치찰음 검출 유닛(sibilant detecting unit)의 세부 구성도.FIG. 2 is a detailed configuration diagram of the sibilant detecting unit in FIG. 1. FIG.

도 3의 치찰음 검출의 원리를 설명하기 위한 도면.A diagram for explaining the principle of hissing sound detection in FIG. 3.

도 4는 비 치찰음(non-sibilant)의 경우와 치찰음(sibilant)의 경우의 에너지 스펙트럼의 일 예.4 is an example of energy spectra in the case of non-sibilant and in case of sibilant.

도 5는 도 1에서의 제2 인코딩 유닛(second encoding unit) 및 제2 디코딩 유닛(second decoding unit)의 세부 구성도의 예들.5 is an example of a detailed configuration diagram of a second encoding unit and a second decoding unit in FIG. 1.

도 6은 비 치찰음 인코딩/디코딩 방식의 일 예인 PSDD(Partial Spectral Data Duplication) 방식의 제1 실시예 내지 제2 실시예를 설명하기 위한 도면.FIG. 6 is a diagram for explaining first to second embodiments of a partial spectral data duplication (PSDD) scheme, which is an example of a non-sibilant encoding / decoding scheme; FIG.

도 7 및 도 8는 PSDD 방식에서 프레임의 길이가 서로 다른 경우의 방식을 설명하기 위한 도면.7 and 8 are diagrams for explaining the method when the length of the frame is different from each other in the PSDD method.

도 9은 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 인코딩 장치의 제1 예.9 is a first example of an audio signal encoding apparatus to which an audio signal processing apparatus according to an embodiment of the present invention is applied.

도 10은 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 인코딩 장치의 제2 예.10 is a second example of an audio signal encoding apparatus to which an audio signal processing apparatus according to an embodiment of the present invention is applied.

도 11는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 디코딩 장치의 제1 예.11 is a first example of an audio signal decoding apparatus to which an audio signal processing apparatus according to an embodiment of the present invention is applied.

도 12은 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 적용된 오디 오 신호 디코딩 장치의 제2 예.12 is a second example of an audio signal decoding apparatus to which an audio signal processing apparatus according to an embodiment of the present invention is applied.

도 13는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품의 개략적인 구성도. FIG. 13 is a schematic structural diagram of a product implemented with an audio signal processing device according to an embodiment of the present invention; FIG.

도 14는 본 발명의 일 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계도. 14 is a relationship diagram of products in which an audio signal processing apparatus according to an embodiment of the present invention is implemented.

도 15는 본 발명의 다른 실시예에 따른 오디오 신호 처리 장치의 구성도.15 is a block diagram of an audio signal processing apparatus according to another embodiment of the present invention.

도 16은 도 15에서의 타입 결정 유닛(1110)의 세부 구성도.16 is a detailed configuration diagram of a type determination unit 1110 in FIG. 15.

도 17은 밴드 확장 방식의 타입을 결정하는 과정을 설명하기 위한 도면.17 is a diagram for explaining a process of determining a type of a band extension method.

도 18은 여러 가지 타입의 밴드 확장 방식을 설명하기 위한 도면이다.18 is a diagram for describing various types of band extension methods.

도 19는 본 발명의 다른 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 인코딩 장치의 구성도.19 is a block diagram of an audio signal encoding apparatus to which an audio signal processing apparatus according to another embodiment of the present invention is applied.

도 20은 본 발명의 다른 실시예에 따른 오디오 신호 처리 장치가 적용된 오디오 신호 디코딩 장치의 구성도.20 is a block diagram of an audio signal decoding apparatus to which an audio signal processing apparatus according to another embodiment of the present invention is applied.

도 21은 본 발명의 다른 실시예에 따른 오디오 신호 처리 장치가 구현된 제품의 개략적인 구성도. 21 is a schematic structural diagram of a product implemented with an audio signal processing device according to another embodiment of the present invention;

도 22는 본 발명의 다른 실시예에 따른 오디오 신호 처리 장치가 구현된 제품들의 관계도.FIG. 22 is a relational view of products in which an audio signal processing device according to another embodiment of the present invention is implemented. FIG.

Claims

By the audio processing apparatus, among a plurality of band extension schemes including a first band extension scheme and a second band extension scheme, type information indicating a specific band extension scheme for the current frame of the audio signal, and spectral of a low frequency band Receiving data;

When the type information indicates the first band extension scheme with respect to the current frame, generating spectral data of the high frequency band of the current frame by using the spectral data of the low frequency band by performing the first band extension scheme. step; And,

When the type information indicates the second band extension scheme with respect to the current frame, the spectral data of the high frequency band of the current frame is generated by using the spectral data of the low frequency band by performing a second band extension scheme. Including steps

The first band extension scheme is based on a first data region of spectral data in the low frequency band, the second band extension scheme is based on a second data region of spectral data in the low frequency band,

And the second data area is larger than the first data area.

The method according to claim 1,

And the first data area is a portion of spectral data in the low frequency band, and the second data area is a plurality of portions including the portion of the spectral data in the low frequency band.

The method of claim 1,

And the first data area is a portion of spectral data in the low frequency band, and the second data area is all of the spectral data in the low frequency band.

delete

The method of claim 1,

And the high frequency band includes one or more bands equal to or higher than the boundary frequency, and the low frequency band includes one or more bands equal to or lower than the boundary frequency.

The method according to claim 1,

The first band extension method is performed using at least one of bandpass filtering, time stretching and decimation processing.

The method of claim 1,

Receiving band extension information including envelope information;

The first band extension method or the second band extension method is performed using the band extension method.

The method of claim 1,

Decoding the spectral data of the low frequency band using an audio coding scheme in a frequency domain or a speech coding scheme in a time domain;

The spectral data of the high frequency band is generated using the decoded spectral data of the low frequency band.

A plurality of band extension schemes including a first band extension scheme and a second band extension scheme, the demultiplexer for receiving type information indicating a specific band extension scheme for a current frame of an audio signal and spectral data of a low frequency band;

When the type information indicates the first band extension scheme with respect to the current frame, generating spectral data of the high frequency band of the current frame by using the spectral data of the low frequency band by performing the first band extension scheme. A first band extension decoding unit; And,

When the type information indicates the second band extension scheme with respect to the current frame, the spectral data of the high frequency band of the current frame is generated by using the spectral data of the low frequency band by performing a second band extension scheme. A second band extension decoding unit,

And the second data area is larger than the first data area.

The method of claim 9,

delete

The method of claim 9,

And the high frequency band includes one or more bands equal to or higher than the boundary frequency and the low frequency band includes one or more bands equal to or lower than the boundary frequency.

The method of claim 9,

The demultiplexer further receives band extension information including envelope information,

The method of claim 9,

An audio signal decoder for decoding the spectral data of the low frequency band using an audio coding scheme in a frequency domain; And,

And a speech signal decoder for decoding the low frequency spectral data using a speech coding scheme in a time domain.

The spectral data of the high frequency band is generated using spectral data of the decoded low frequency band.

Detecting, by the audio processing apparatus, the degree of transient for the current frame of the audio signal;

Determining a specific band extension scheme for the current frame among a plurality of band extension schemes including a first band extension scheme and a second band extension scheme based on the transient degree;

Generating type information indicating the specific band extension method;

Generating spectral data of a high frequency band by using spectral data of a low frequency band by performing the first band extension method when the specific band extension method is the first band extension method with respect to the current frame;

Generating the spectral data of the high frequency band by using the spectral data of the low frequency band by performing the second band extension method when the specific band extension method is the second band extension method with respect to the current frame; And

Transmitting spectral data of the type information and the low frequency band,

The first band extension scheme is based on a first data region of spectral data in the low frequency band, and the second band extension scheme is based on a second data region of spectral data in the low frequency band. Audio signal processing method.

A transient detection part that detects a transient degree of a current frame of an audio signal;

A type of determining a specific band extension scheme for the current frame among a plurality of band extension schemes including a first band extension scheme and a second band extension scheme, and indicating the specific band extension scheme based on the transient degree. A type information generation part for generating information;

When the specific band extension scheme is the first band extension scheme for the current frame, a first band for generating spectral data of a high frequency band using spectral data of a low frequency band by performing the first band extension scheme An extension encoding unit;

When the specific band extension scheme is the second band extension scheme for the current frame, performing the second band extension scheme to generate spectral data of the high frequency band using spectral data of a low frequency band; Band extension encoding unit; And

A multiplexer for transmitting the type information and the spectral data of the low frequency band,

The first band extension scheme is based on a first data region of spectral data in the low frequency band, and the second band extension scheme is based on a second data region of spectral data in the low frequency band. Audio signal processing device.

Receiving, among a plurality of band extension schemes including a first band extension scheme and a second band extension scheme, type information indicating a specific band extension scheme for a current frame of an audio signal, and spectral data of a low frequency band;

The second data area is larger than the first data area,

Instructions are stored, wherein the instructions, when executed by a processor, cause the processor to perform an operation.