KR100911994B1

KR100911994B1 - Method and apparatus for encoding/decoding signal having strong non-stationary properties using hilbert-huang transform

Info

Publication number: KR100911994B1
Application number: KR1020070080901A
Authority: KR
Inventors: 장인선; 백승권; 장대영; 강경옥; 유정주; 강홍구; 이창헌; 이동금
Original assignee: 한국전자통신연구원
Priority date: 2007-08-10
Filing date: 2007-08-10
Publication date: 2009-08-13
Also published as: KR20090016343A

Abstract

본 발명은 음성 및 오디오 신호의 코딩 및 디코딩에 관한 것으로서, 더욱 구체적으로는 HHT(Hilbert-Huang Transform)을 이용하여 음성 및 오디오신호와 같이 비정규적(non-stationary) 특성이 강한 입력신호를 동시에 부호화 및 복호화 할 수 있는 장치 및 방법에 관한 것이다. The present invention relates to coding and decoding of speech and audio signals, and more particularly, to simultaneously encode input signals having strong non-stationary characteristics such as speech and audio signals using HHT (Hilbert-Huang Transform). And an apparatus and method capable of decoding.

본 발명은 입력신호에 대하여 HHT를 적용하여 각기 다른 주파수 대역을 갖는 신호들로 분리하는 단계와, 상기 분리된 신호 각각을 독립적으로 부호화하는 단계 및 상기 독립적으로 부호화된 각각의 신호를 다중화하는 단계를 포함하는 부호화 방법을 제공한다. The present invention provides a method for separating input signals into signals having different frequency bands by applying HHT, independently encoding each of the separated signals, and multiplexing each independently encoded signal. It provides an encoding method that includes.

힐버트 변환, EMD, HHT, intrinsic mode function, Hilbert transform, EMD, HHT, intrinsic mode function,

Description

TECHNICAL AND APPARATUS FOR ENCODING / DECODING SIGNAL HAVING STRONG NON-STATIONARY PROPERTIES USING HILBERT-HUANG TRANSFORM}

본 발명은 음성 및 오디오 신호의 코딩 및 디코딩에 관한 것으로서, 더욱 구체적으로는 HHT(Hilbert-Huang Transform)을 이용하여 음성 및 오디오신호와 같이 비정규적(non-stationary) 특성이 강한 입력신호를 효율적으로 동시에 부호화 및 복호화 할 수 있는 장치 및 방법에 관한 것이다. The present invention relates to coding and decoding of speech and audio signals. More particularly, the present invention relates to an input signal having a strong non-stationary characteristic such as speech and audio signals using HHT (Hilbert-Huang Transform). An apparatus and method capable of encoding and decoding at the same time.

일반적으로, 음성(voice, speech) 코덱과 오디오(audio, music) 코덱은 독립적으로 개발되어 왔다. In general, voice and speech codecs and audio and music codecs have been developed independently.

음성 또는 오디오와 같은 단일 콘텐츠를 처리하는 부호화기는 이동통신에서의 음성 통신, portable MP3 플레이어와 같은 music player에서 매우 효과적으로 사용되고 있으나, 향후 방송과 통신의 융합으로 보다 다양한 콘텐츠의 전송이 요구되고, 또한 air channel의 근본적인 용량 한계로 인하여 저 비트율 전송이 불가피 하다. 이를 해결하기 위하여 음성 및 오디오 신호를 효율적으로 통합 처리할 수 있는 저 비트율 부호화 기술이 필요하다.Encoders that process single content such as voice or audio are used very effectively in voice communication in mobile communication and music players such as portable MP3 players. However, in the future, more convergence of broadcasting and communication is required to transmit various contents, and also air Due to the fundamental capacity limitation of the channel, low bit rate transmission is inevitable. In order to solve this problem, a low bit rate coding technique capable of efficiently integrating speech and audio signals is required.

기존의 부호화 기술은 음성 및 오디오에 대하여 각각 독립적으로 개발되어 왔으며, 음성 신호에 대하여 CELP (Code Excited Linear Prediction) 구조라는 최적의 기술이 개발되었고, 오디오 신호에 대하여 AAC+ 기술이 완성되었다. 그러나 각 부호화 기술은 서로 다른 영역의 특성을 갖는 입력신호에 대하여 큰 성능의 저하를 유발하는 문제점이 있다. 즉, CELP 부호화 방식은 오디오 신호(music signal)가 입력되면 잘못된 모델링에 의하여 정확한 주파수 정보를 표현하지 못하여 매우 큰 성능 저하가 발생하고, 저 비트율 AAC+에 음성 신호(speech signal)가 입력되면 한정된 비트로 인하여 하모닉 구조를 정확하게 표현하지 못하여 성능이 저하된다.Conventional coding techniques have been independently developed for voice and audio, and an optimal technique called a Code Excited Linear Prediction (CELP) structure for voice signals has been developed, and AAC + technology for audio signals has been completed. However, each coding technique has a problem of causing a large performance degradation for an input signal having a different area characteristic. That is, the CELP coding method cannot represent accurate frequency information by incorrect modeling when an audio signal is input, and a very large performance degradation occurs, and when a speech signal is input to a low bit rate AAC +, due to limited bits. Poor performance due to inaccurate representation of harmonic structure.

최근 3GPP에서는 보다 다양한 특성의 신호를 처리하기 위하여 다중 구조를 가지는 AMR-WB+가 표준화 되었다. 상기 AMR-WB+는 CELP 구조의 AMR-WB와 Transform 구조의 TCX (Transform Coded Excitation) 모듈을 가지는 이중 구조로써 입력 신호에 따라 CELP와 TCX 구조를 선택하여 사용한다. 결국 음성은 CELP로 처리하고 오디오는 Transform 부호화기로 처리하는 기본적인 구조를 제공하며, 하나의 부호화기로 32kbps 이하의 저 비트율에서 음성과 오디오에 대하여 모두 우수한 성능을 가진다. 그러나 AMR-WB+의 가장 큰 문제점은 TCX 모듈의 성능이 AAC+에 비하여 떨어지는 것이며, 그 결과 오디오 신호에 대한 성능이 AAC+보다 저하된다.Recently, in 3GPP, AMR-WB + having multiple structures has been standardized to process signals having various characteristics. The AMR-WB + is a dual structure having AMR-WB having a CELP structure and a TCX (Transform Coded Excitation) module having a transform structure. The AMR-WB + has a CELP and a TCX structure selected according to an input signal. In the end, it provides the basic structure of processing the voice with CELP and audio with the Transform encoder, and it has excellent performance for both voice and audio at low bit rate of 32kbps or less with one encoder. However, the biggest problem with AMR-WB + is that the performance of the TCX module is lower than that of AAC +, which results in poor performance for audio signals.

하기 [표 1]은, 음성 및 오디오 신호에 대하여 두 부호화 기술의 청취 성능을 공식적으로 측정한 결과이다.Table 1 below shows the results of officially measuring the listening performance of the two encoding techniques for speech and audio signals.

상기 [표 1]을 참조하면, 모든 비트율에서 음성 콘텐츠에 대해서는 AMR-WB+의 성능이 우수하고 오디오 콘텐츠에 대해서는 AAC+의 성능이 우수함을 보여준다. 따라서 두 부호화 기술은 매우 상반된 특성과 성능을 가지며, 결국 다양한 특성의 입력을 다루는 응용 분야에서 두 부호화 기술 모두 최적의 부호화 기술이 될 수 없음을 나타낸다.Referring to [Table 1], it is shown that the performance of AMR-WB + is excellent for voice content and AAC + is excellent for audio content at all bit rates. Therefore, the two encoding techniques have very opposite characteristics and performances, and thus, both encoding techniques cannot be optimal encoding techniques in an application field dealing with various characteristic inputs.

즉, 종래 기술에 따른 통합 부호화 기술은 AMR-WB+와 AAC+가 서로 상반되는 장단점을 가지고 있으며, 아직까지 두 부호화 기술 모두 최고의 성능을 제공하지 못하고 있다. 따라서 음성 및 오디오 신호에 대해 우수한 성능을 제공하는 통합 부호화 기술이 요구된다.In other words, the integrated coding technology according to the prior art has advantages and disadvantages that AMR-WB + and AAC + are opposite to each other, and both coding technologies have not yet provided the best performance. Thus, there is a need for an integrated encoding technique that provides excellent performance for speech and audio signals.

본 발명은 상술한 바와 같은 종래기술의 문제점을 해결하기 위해 안출된 것으로서, 본 발명의 목적은, 음성 및 오디오 신호를 동시에 효율적으로 부호화 및 복호화할 수 있는 장치 및 방법을 제공하는 것이다.SUMMARY OF THE INVENTION The present invention has been made to solve the problems of the prior art as described above, and an object of the present invention is to provide an apparatus and method capable of efficiently encoding and decoding voice and audio signals simultaneously.

또한, 본 발명의 다른 목적은 비 정규적 특성이 강한 입력 신호를 효율적으로 부호화 및 복호화할 수 있는 장치 및 방법을 제공하는 것이다.Another object of the present invention is to provide an apparatus and method capable of efficiently encoding and decoding an input signal having a strong non-normal characteristic.

상기의 목적을 이루고 종래기술의 문제점을 해결하기 위하여, 본 발명은, 입력신호를 각기 다른 주파수 대역을 갖는 신호들로 분해하는 신호 분해부; 상기 분해된 신호를 힐버트 변환하여 상기 분해된 신호 각각의 순간 위상 및 순간 진폭 정보를 추출하는 위상 및 진폭 정보 추출부; 및 상기 추출된 순간 위상 및 순간 진폭 정보를 이용하여 상기 분해된 신호 각각을 부호화하는 부호화부;를 포함하는 부호화 장치를 제공한다.In order to achieve the above object and solve the problems of the prior art, the present invention, the signal decomposition unit for decomposing the input signal into signals having different frequency bands; A phase and amplitude information extracting unit extracting instantaneous phase and instantaneous amplitude information of each of the separated signals by Hilbert transforming the decomposed signal; And an encoding unit encoding each of the decomposed signals using the extracted instantaneous phase and instantaneous amplitude information.

본 발명의 다른 일측에 따르는 부호화 장치는, 입력신호를 각기 다른 주파수 대역을 갖는 신호들로 분해하는 신호분해부; 상기 분해된 신호들 중 고주파 영역에 속하는 신호들에 대하여 CELP 기반 부호화 또는 오디오 신호 부호화 알고리즘을 적용하여 부호화하는 제 1부호화부; 및 상기 분해된 신호들 중 고주파 영역에 속하지 않는 나머지 신호들에 대하여 힐버트 변환을 적용하여 각각의 순간 위상 및 순간 진폭 정보를 추출하고 상기 추출된 순간 위상 및 순간 진폭 정보를 부호화하는 제 2부호화부를 포함한다.According to another aspect of the present invention, there is provided an encoding apparatus, including: a signal decomposition unit for decomposing an input signal into signals having different frequency bands; A first encoder to encode a signal belonging to a high frequency region among the decomposed signals by applying a CELP-based encoding or an audio signal encoding algorithm; And a second encoder for extracting respective instantaneous phase and instantaneous amplitude information by applying a Hilbert transform to the remaining signals that do not belong to the high frequency region among the decomposed signals, and encoding the extracted instantaneous phase and instantaneous amplitude information. do.

또한, 본 발명은 입력신호에 대하여 HHT를 적용하여 각기 다른 주파수 대역을 갖는 신호들로 분해하는 단계; 상기 분해된 신호 각각을 독립적으로 부호화하는 단계; 및 상기 독립적으로 부호화된 각각의 신호를 다중화하는 단계를 포함하는 부호화 방법을 제공한다.In addition, the present invention comprises the steps of decomposing the signal into a signal having a different frequency band by applying the HHT to the input signal; Independently encoding each of the decomposed signals; And multiplexing each of the independently encoded signals.

본 발명에 따른 부가적인 특징 및 장점은, 후술하는 본 발명의 실시를 위한 구체적인 내용의 상세한 설명에 의하여 보다 명료해 질 것이며, 본 발명은 비록 한정된 실시예와 도면에 의하여 설명되나, 본 발명의 권리범위는 이러한 실시예에 한정되는 것은 아니며, 이는 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능할 것이다. 따라서, 본 발명 사상은 아래에 기재된 특허 청구 범위에 의해서만 파악되어야 하고, 이의 균등 또는 등가적 변형은 모두 본 발명 사상의 범주에 속하는 것으로 해석되어야 할 것이다. Additional features and advantages of the present invention will become more apparent from the following detailed description of the embodiments for carrying out the invention, which is to be understood by the embodiments and drawings, although the invention is not limited thereto. The scope of the present invention is not limited to these embodiments, which can be variously modified and modified by those skilled in the art. Accordingly, the invention idea should be construed only by the claims set forth below, and all equivalent or equivalent modifications thereof should be construed as falling within the scope of the invention idea.

본 발명에 따르면, 하나의 통합 코덱으로 비정규적 특성이 강한 음성 및 오디오 신호를 동시에 부호화하는데 있어서, 효율적 방법을 제공할 수 있다. 추후 이동통신/방송 등에 사용할 멀티미디어 부호화기 중, 음성 및 오디오 신호를 위한 부호화기에 효과적으로 적용될 수 있다. 본 발명에 따른 부호화 기술은 기존 음성 및 오디오 부호화 기술과는 다른 새로운 방식이기 때문에, 기존 부호화 기술 사용에 의해 지불되는 막대한 양의 기술료를 절약할 수 있다.According to the present invention, it is possible to provide an efficient method for simultaneously encoding a voice and an audio signal having strong irregularity with one integrated codec. It can be effectively applied to an encoder for voice and audio signals among multimedia encoders to be used later in mobile communication / broadcasting. Since the encoding technique according to the present invention is a new method different from the existing speech and audio encoding techniques, it is possible to save a huge amount of technical fees paid by using the existing encoding technique.

이하에서는 첨부된 도면들을 참조하여 본 발명의 실시예에 따른 HHT를 이용한 음성 및 오디오 신호의 부호화/복호화 장치 및 방법을 상세히 설명한다.Hereinafter, an apparatus and method for encoding / decoding audio and audio signals using HHT according to an embodiment of the present invention will be described in detail with reference to the accompanying drawings.

본 발명의 기본적인 원리는, HHT(Hilbert-Huang Transform) 방법을 이용하여 입력 신호를 각기 다른 주파수 대역 범위를 갖는 intrinsic mode function (IMF) 신호들로 분해하고, 이러한 각 신호들의 순간 위상(phase)과 진폭(amplitude) 정보들을 추출하여 양자화함으로써 비정규적(non-stationary) 특성이 강한 신호들을 보다 효율적으로 부호화하는 것이다. 상기 HHT는 empirical mode decomposition (EMD) 방법과 힐버트 변환(Hilbert transform)을 합하여 부르는 명칭이다.The basic principle of the present invention is to decompose an input signal into intrinsic mode function (IMF) signals having different frequency band ranges using the Hilbert-Huang Transform (HHT) method, and the instantaneous phase and By extracting and quantizing amplitude information, signals having strong non-stationary characteristics are more efficiently encoded. The HHT is a name combined with the empirical mode decomposition (EMD) method and the Hilbert transform.

본 발명에 있어서, 순간 위상 정보의 부호화 방식은 임의의 짧은 시간구간에서의 다항식(polynomial) 위상 모델링 방법을 기반으로 수행된다. 효율적인 위상 부호화를 위해 임의의 짧은 시간구간에서 위상 정보들을 다항식으로 모델링한 후, 다항식 계수들을 양자화한다. 추출되는 IMF 신호의 인덱스가 증가함에 따라 단 구간 내에서의 시간에 따른 주파수 변화가 감소하는 특징을 감안하면, 이러한 위상 부호화 방식은 IMF 신호의 인덱스가 증가함에 따라 매우 효율적이다. 참고적으로, 위상은 시간에 따른 주파수 성분들의 누적 합으로 얻어진다. In the present invention, the encoding method of the instantaneous phase information is performed based on a polynomial phase modeling method in an arbitrary short time period. For efficient phase coding, the phase information is modeled polynomial in any short time interval, and then the polynomial coefficients are quantized. In consideration of the fact that the frequency change with time in the short interval decreases as the index of the extracted IMF signal increases, such a phase coding scheme is very efficient as the index of the IMF signal increases. For reference, the phase is obtained as a cumulative sum of frequency components over time.

상기 IMF 신호의 인덱스란, HHT 방법에 의하여 고주파 대역부터 저주파 대역까지 분해된 신호들의 순서를 의미한다.The index of the IMF signal refers to a sequence of signals decomposed from the high frequency band to the low frequency band by the HHT method.

본 발명에 있어서, 순간 진폭 정보들은 벡터 양자화(vector quantization) 방법에 기반하여 부호화된다. 순간 진폭 정보는 양(+)의 값을 갖는 신호의 전체적인 포락선(envelope)을 나타내며, 이들은 서로 시간 영역에서 비교적 높은 상관관계를 가지기 때문에 벡터 양자화 방법으로도 효율적으로 부호화될 수 있다.In the present invention, the instantaneous amplitude information is encoded based on the vector quantization method. The instantaneous amplitude information represents the overall envelope of a signal having a positive value, and since they have a relatively high correlation with each other in the time domain, they can be efficiently encoded even by a vector quantization method.

도 1은 본 발명에 따른 부호화 장치의 개략적인 구성을 나타내는 블럭도이다.1 is a block diagram showing a schematic configuration of an encoding apparatus according to the present invention.

도 1을 참조하면, 부호화 장치는 입력신호를 각기 다른 주파수 대역을 갖는 신호들로 분해하는 신호분해부(110)와, 상기 분해된 신호를 힐버트 변환하여 상기 분해된 신호 각각의 순간 위상 및 순간 진폭 정보를 추출하는 위상 및 진폭 정보 추출부(120) 및 상기 추출된 순간 위상 및 순간 진폭 정보를 이용하여 상기 분해된 신호 각각을 부호화하는 부호화부(130)를 포함하여 이루어진다. Referring to FIG. 1, the encoding apparatus may include a signal decomposition unit 110 that decomposes an input signal into signals having different frequency bands, and Hilbert transforms the decomposed signal to generate instantaneous phase and instantaneous amplitude of each of the decomposed signals. And a coding unit 130 for encoding each of the decomposed signals using the extracted phase and amplitude information extracting unit 120 and the extracted instantaneous phase and instantaneous amplitude information.

상기 신호분해부(110)는, 입력 신호에 대하여 EMD(empirical mode decomposition)을 적용하여 입력 신호를 분해한다. 상기 입력신호는 주로 비정규적 특성이 강한 신호로써, 예를 들어 음성 및 오디오 신호가 혼합된 신호이거나, 음성 또는 오디오 신호중 어느 하나일 수 있다.The signal decomposition unit 110 decomposes the input signal by applying EMD (empirical mode decomposition) to the input signal. The input signal is mainly a signal having a strong non-normal characteristic. For example, the input signal may be a signal in which a voice and an audio signal are mixed or one of a voice or an audio signal.

상기 위상 및 진폭 정보 추출부(120)는 힐버트 변환을 수행하는 힐버트 변환기에 해당한다.The phase and amplitude information extractor 120 corresponds to a Hilbert transformer for performing Hilbert transform.

상기 부호화부(130)은 상기 분해된 입력신호 각각에 대하여 독립적으로 부호화를 수행하기 위한 다수의 부호화기를 포함하여 구성된다.The encoder 130 includes a plurality of encoders for independently encoding each of the decomposed input signals.

도 2는 본 발명에 따른 부호화 장치의 상세 구성을 나타내는 블럭도이다.2 is a block diagram showing a detailed configuration of an encoding apparatus according to the present invention.

도 2를 참조하면, 본 발명에 따른 부호화 장치는, 입력신호에 EMD를 적용하여 입력신호의 각 프레임 마다 각기 다른 주파수 대역을 갖는 IMF 신호들로 분해하는 신호분해부(210)와, 상기 IMF 신호들 각각의 위상과 진폭 정보를 추출하기 위한 다수의 힐버트 변환부(220)와, 각각의 IMF 신호의 위상 및 진폭 알고리즘을 이용하여 부호화하는 다수의 부호화부(230)(부호화부 1 ~ 부호화부 N) 및 상기 부호화된 각각의 IMF 신호를 다중화(비트 패킹, Bit packing)하는 다중화부(250)을 포함하여 구성된다. Referring to FIG. 2, the encoding apparatus according to the present invention includes a signal decomposition unit 210 for applying an EMD to an input signal and decomposing the signal into IMF signals having different frequency bands for each frame of the input signal, and the IMF signal. A plurality of Hilbert transform units 220 for extracting respective phase and amplitude information, and a plurality of encoders 230 (encoders 1 to encoder N) encoded using a phase and amplitude algorithm of each IMF signal. And a multiplexing unit 250 for multiplexing (bit packing) each of the encoded IMF signals.

도 3은 본 발명의 실시예에 따르는 IMF 신호별로 위상 및 진폭을 부호화하는 부호화부의 상세 구성을 나타내는 블럭도이다.3 is a block diagram illustrating a detailed configuration of an encoding unit encoding phase and amplitude for each IMF signal according to an embodiment of the present invention.

도 3을 참조하면, 도 2에 도시된 부호화부(230)는 각각 위상 부호화부(330) 및 진폭 부호화부(340)를 포함하여 구성된다. Referring to FIG. 3, the encoder 230 illustrated in FIG. 2 includes a phase encoder 330 and an amplitude encoder 340, respectively.

상기 도 3에서,

는 i번째 IMF 신호에 대한 프레임내의 순간 위상 벡터를 의미하고,

는 순간 진폭 벡터를 나타낸다. 그리고

는 순간 위상 벡터를 다항식(polynomial)으로 모델링하였을 때의 계수벡터를 나타낸다.In FIG. 3,

Is the instantaneous phase vector in the frame for the i th IMF signal,

Denotes the instantaneous amplitude vector. And

Denotes a coefficient vector when the instantaneous phase vector is modeled as a polynomial.

상기 위상 부호화부(330)는 힐버트 변환부(320)에 의하여 추출된 순간 위상 정보에 대하여 위상 연속화 과정을 수행하는 위상 연속화부(331)와, 상기 연속화 과정을 거친 순간 위상 정보에 대하여 매트릭스 형태의 선형 모델링을 수행하고 상기 매트릭스 형태의 선형 모델로부터 최소 자승법을 이용하여 상기 다항식의 계수들을 계산하는 다항식 모델링부(332) 및 상기 계산된 다항식의 계수들을 양자화하는 양자화부(333)를 포함하여 구성된다.The phase encoder 330 is a phase continuator 331 for performing a phase continuity process on the instantaneous phase information extracted by the Hilbert transform unit 320 and a matrix form of the instantaneous phase information that has been subjected to the continuity process. A polynomial modeling unit 332 for performing linear modeling and calculating coefficients of the polynomial using a least square method from the linear model of the matrix form and a quantization unit 333 for quantizing the coefficients of the calculated polynomial. .

상기 위상 연속화부(331)는 상기 추출된 위상 정보에 대하여 위상 연속화(Phase unwrapping)을 수행한다.The phase continuity unit 331 performs phase unwrapping on the extracted phase information.

상기 다항식 모델링부(332)는 하기 [수학식 1]과 같이, 시간 n에서의 순간 위상 정보인

을 p차의 다항식으로 모델링한다.The polynomial modeling unit 332 is instantaneous phase information at time n, as shown in Equation 1 below.

Is modeled as a p-order polynomial.

다음에, 상기 다항식 모델링부(332)는 상기 [수학식 1]을 하기 [수학식 2]와 같이, N 길이의 위상 벡터와 p+1 길이의 다항식 계수 벡터를 이용한 매트릭스(matrix) 형태의 선형 모델로 변형한다. Next, the polynomial modeling unit 332 is a linear matrix having a length vector of N length and a polynomial coefficient vector of length p + 1, as shown in Equation 1 below. Transform into a model.

상기 [수학식 2]의 선형 모델로부터 최소 자승법(least square method)에 의하여 하기 [수학식 3]과 같이 다항식 계수들을 구할 수 있다. From the linear model of Equation 2, polynomial coefficients can be obtained as shown in Equation 3 by a least square method.

[수학식 3][Equation 3]

실질적으로, 상기 [수학식 3]에서,

의 연산은 매트릭스(matrix)의 역변환 과정 때문에 복잡하지만, 길이 N과 다항식의 차수 p가 정해지면 X 매트릭스의 성분들도 상수값으로 고정되기 때문에, 이를 미리 계산하여 테이블로 사용함으로써 연산 효율을 높일 수 있다. 또한, 각 IMF 신호의 다른 위상 변화 특성을 고려하여, IMF 신호별로 각기 다른 모델링 길이 N과 다항식 차수 p를 적용하고 양자화 비트수도 다르게 사용하는 것도 가능하다.Substantially, in [Equation 3],

The operation of is complicated by the inverse transformation of the matrix, but since the components of the X matrix are fixed to constant values when the length N and the degree p of the polynomial are determined, the computational efficiency can be increased by using them as a table in advance. have. In addition, in consideration of different phase change characteristics of each IMF signal, it is also possible to apply different modeling lengths N and polynomial order p for each IMF signal and use different numbers of quantization bits.

상기 양자화부(333)는 순간 위상 벡터를 모델링하여 추출된 다항식 계수들을 분할 스칼라 양자화(split scalar quantization)한다. 상기 분할 스칼라 양자화는, 하기 [수학식 4]로 정의되는 합성 신호 영역에서의 평균 자승 오차가 최소화 되도 록 수행된다. The quantization unit 333 performs split scalar quantization of polynomial coefficients extracted by modeling an instantaneous phase vector. The split scalar quantization is performed to minimize the mean square error in the synthesized signal region defined by Equation 4 below.

그러나, 실제 신호를 합성하기 위해서는 다항식의 모든 계수들이 필요하기 때문에, a₀ 부터 a_p _-1까지는 합성 신호 영역이 아닌 계수 영역에서 양자화가 이루어지고, a_p를 양자화하는 과정에서 [수학식 4]의 평균 자승 오차를 최소화하는 방식이 적용된다. 상기 [수학식 4]에서

은 양자화되지 않은 i 번째 IMF 신호까지 더한 목적(target) 신호이고,

은 i-1 번째까지 양자화 과정을 통해 구한 신호 에 i 번째 IMF 신호의 위상 정보를 양자화하여 더한 신호를 의미한다.However, since all coefficients of the polynomial are required to synthesize the actual signal, a ₀ to a _p _-1 are quantized in the coefficient domain instead of the synthesized signal domain, and in the process of quantizing a _p [Equation 4] The method of minimizing the mean square error is applied. In [Equation 4] above

Is the target signal plus the i-th unquantized iMF signal,

Denotes a signal obtained by quantizing the i-1 th signal and quantizing phase information of the i th IMF signal.

상기 진폭 부호화부(340)는 진폭 신호 영역에서의 벡터 양자화(vector quantization)를 수행하며, 양자화 효율을 높이기 위하여 프레임 에너지를 계산한 후 그 결과를 이용하여 신호를 정규화(normalization)한 후 양자화 한다. The amplitude encoder 340 performs vector quantization in the amplitude signal region, calculates frame energy in order to increase quantization efficiency, and normalizes the signal using the result, and then quantizes it.

상기 진폭 부호화부(340)는 상기 추출된 순간 진폭의 프레임 에너지를 계산하는 에너지 계산부(341)와, 상기 계산된 프레임 에너지값을 정규화하여 벡터 양자화하는 벡터 양자화부(344, 345) 및 상기 추출된 순간 진폭을 로그영역에서 스칼라 양자화하는 스칼라 양자화부(342, 343)를 포함하여 구성된다.The amplitude encoder 340 is an energy calculator 341 that calculates the frame energy of the extracted instantaneous amplitude, a vector quantizer 344 and 345 which normalizes the calculated frame energy value to vector quantize, and the extraction. And scalar quantization units 342 and 343 for scalar quantizing the instantaneous amplitudes in the logarithmic region.

도 4는 본 발명에 따르는 복호화 장치의 개략적인 구성을 나타내는 블럭도이다.4 is a block diagram showing a schematic configuration of a decoding apparatus according to the present invention.

도 4를 참조하면, 본 발명에 따르는 복호화 장치는 각기 다른 주파수 대역의 신호 별로 인코딩된 비트 스트림을 수신하여 역다중화하는 역다중화부(410)와, 상기 역다중화된 신호 각각에 대하여 독립적으로 복호화를 수행하는 다수의 복호부(420)(복호부 1 ~ 복호부 N) 및 상기 복호부 각각의 출력 신호를 합산하여 원신호를 복원하는 합산부(430)를 포함하여 구성된다.Referring to FIG. 4, the decoding apparatus according to the present invention receives a demultiplexer 410 for receiving and demultiplexing a bit stream encoded for each signal of a different frequency band, and independently decodes each of the demultiplexed signals. A plurality of decoding units 420 (decoding unit 1 to decoding unit N) to perform and a summation unit 430 for restoring the original signal by summing the output signal of each of the decoding unit.

도 5는 도 4에 도시된 복호부의 상세 구성을 나타내는 블럭도이다.FIG. 5 is a block diagram showing the detailed configuration of the decoding unit shown in FIG. 4.

도 5를 참조하면, 도 4에 도시된 복호부(420)는 각각 위상 복호부(421, 422, 423)와, 진폭 복호부(424, 425, 426, 427) 및 곱셈부(428)을 포함하여 구성된다. Referring to FIG. 5, the decoder 420 illustrated in FIG. 4 includes phase decoders 421, 422, and 423, amplitude decoders 424, 425, 426, and 427, and a multiplier 428, respectively. It is configured by.

상기 위상 복호부(421, 422, 423)는 역양자화기(421)를 통해 순간 위상 벡터를 모델링한 다항식 계수 벡터

를 얻어내고, 이를 바탕으로 역다항식 모델링부(PPM-1)(422)에 의하여 하기 [수학식 5]와 같은 위상 벡터

를 추출한다.The

phase decoders

421, 422, and 423 are polynomial coefficient vectors modeling instantaneous phase vectors through an inverse quantizer 421.

And a phase vector as shown in [Equation 5] by the inverse polynomial modeling unit (PPM-1) 422

Extract

그리고, 실제 합성되는 IMF 신호들은 실수이기 때문에, 위상 정보들은 코사 인(cosine) 함수 적용부(423)에 의하여 코사인 함수가 적용된다.Since the synthesized IMF signals are real numbers, the cosine function is applied to the phase information by the cosine function applying unit 423.

상기 진폭 복호부(424, 425, 426, 427)는 정규화된(normalized) 진폭 벡터 역양자화부(424)와 프레임 에너지 역양자화부(423)에 의하여 수행된다.The amplitude decoders 424, 425, 426, 427 are performed by a normalized amplitude vector dequantizer 424 and a frame energy dequantizer 423.

상기 진폭 벡터 역양자화부(424)는 표준화된 진폭 벡터

를 출력한다.The amplitude vector inverse quantizer 424 is a standardized amplitude vector

Outputs

상기 프레임 에너지 역양자화부(423)의 출력값은 로그 영역에서 양자화된 프레임 에너지를 복원하기 위하여 역 지수(exponential) 함수적용부(426)를 거쳐, 곱셈기(427)에서 디코딩된 표준화 진폭 벡터

와 곱해진다. The output value of the frame energy dequantization unit 423 passes through an inverse exponential function application unit 426 to restore the quantized frame energy in the logarithmic region, and is decoded by the multiplier 427.

Multiplied by

결과적으로, 상기 위상 복호부(421, 422, 423) 및 진폭 복호부(424, 425, 426, 427)의 출력 값은 상기 곱셈부(428)에서 곱해져서, i번째 intrinsic mode function (IMF) 신호는 하기 [수학식 6]과 같은 형태로 복원된다.As a result, the output values of the phase decoders 421, 422, and 423 and the amplitude decoders 424, 425, 426, and 427 are multiplied by the multiplier 428 to obtain an i-th intrinsic mode function (IMF) signal. Is restored to the form as shown in [Equation 6].

도 6은 본 발명의 다른 실시예에 따른 부호화 장치의 구성을 나타내는 블럭도이다.6 is a block diagram showing a configuration of an encoding apparatus according to another embodiment of the present invention.

도 6을 참조하면, 본 발명의 다른 실시예에 따른 부호화 장치는 Referring to FIG. 6, an encoding apparatus according to another embodiment of the present invention is

입력신호를 각기 다른 주파수 대역을 갖는 신호들로 분해하는 신호분해부(EMD)와, 상기 분해된 신호들 중 고주파 영역에 속하는 신호들에 대하여 CELP 기반 부호화 또는 오디오 신호 부호화 알고리즘을 적용하여 부호화하는 제 1부호화 부(620) 및 상기 분해된 신호들 중 고주파 영역에 속하지 않는 나머지 신호들에 대하여 힐버트 변환을 적용하여 각각의 순간 위상 및 순간 진폭 정보를 추출하고 상기 추출된 순간 위상 및 순간 진폭 정보를 부호화하는 제 2부호화부(630)를 포함하여 구성된다.A signal decomposition unit (EMD) for decomposing an input signal into signals having different frequency bands, and a method for encoding by applying a CELP-based encoding or an audio signal encoding algorithm to signals belonging to a high frequency region among the decomposed signals. The first encoder 620 and the Hilbert transform are applied to the remaining signals that do not belong to the high frequency region of the decomposed signals to extract respective instantaneous phase and instantaneous amplitude information and encode the extracted instantaneous phase and instantaneous amplitude information. It is configured to include a second encoder 630.

즉, 상기 본 발명의 다른 실시예는 힐버트(Hilbert) 영역에서의 IMF 신호 부호화 방법과 시간 영역에서의 CELP 기반 부호화 또는 기존 오디오 부호화 방식을 혼합하여 사용하는 하이브리드(hybrid) 부호화 알고리듬의 실시 예를 나타낸다. 실제로, 첫 번째 또는 두 번째 IMF 신호들은 대부분 고주파 영역에 존재하고 순간 위상의 변화가 크기 때문에, 다항식(polynomial)을 이용한 순간 위상 모델링 효율이 떨어지게 된다. 이에 이 신호들에 대해서는 기존의 음성 부호화에서 사용되는 CELP(Code-Excited Linear Prediction) 기반 방식 또는 오디오 부호화 알고리듬을 사용함으로써 부호화 효율을 향상시킬 수 있다. That is, another embodiment of the present invention shows an embodiment of a hybrid encoding algorithm using a mixture of an IMF signal encoding method in a Hilbert region and a CELP-based encoding or an existing audio encoding scheme in a time domain. . Indeed, since the first or second IMF signals mostly exist in the high frequency region and the change of the instantaneous phase is large, the instantaneous phase modeling efficiency using the polynomial is inferior. For these signals, coding efficiency can be improved by using a code-excited linear prediction (CELP) based method or an audio coding algorithm used in conventional speech coding.

상기 도 6에서, 심리 음향 모델링부(610)는 입력신호에서 청각적으로 중요하지 않은 불필요한 성분들을 제거하기 위하여 입력신호에 심리 음향 모델(Psycho-Acoustic model, PAM)을 적용한다. In FIG. 6, the psychoacoustic modeling unit 610 applies a psychoacoustic model (PAM) to the input signal to remove unnecessary components that are not acoustically important from the input signal.

또한, 상기 심리 음향 모델의 적용은 입력신호에 대하여 수행되거나, 도 6에 도시된 바와 같이, 각각의 IMF 신호의 힐버트 변환 후에 적용함으로써, 부호화 효율을 높일 수 있다. In addition, the psychoacoustic model may be applied to an input signal or may be applied after Hilbert transform of each IMF signal, as shown in FIG. 6, thereby increasing coding efficiency.

본 발명에 따른 부호화 및 복호화 방법은 다양한 컴퓨터 수단을 통하여 수 행될 수 있는 프로그램 명령 형태로 구현되어 컴퓨터 판독 가능 매체에 기록될 수 있다. 상기 컴퓨터 판독 가능 매체는 프로그램 명령, 데이터 파일, 데이터 구조 등을 단독으로 또는 조합하여 포함할 수 있다. 상기 매체에 기록되는 프로그램 명령은 본 발명을 위하여 특별히 설계되고 구성된 것들이거나 컴퓨터 소프트웨어 당업자에게 공지되어 사용 가능한 것일 수도 있다. 컴퓨터 판독 가능 기록 매체의 예에는 하드 디스크, 플로피 디스크 및 자기 테이프와 같은 자기 매체(magnetic media), CD-ROM, DVD와 같은 광기록 매체(optical media), 플롭티컬 디스크(floptical disk)와 같은 자기-광 매체(magneto-optical media), 및 롬(ROM), 램(RAM), 플래시 메모리 등과 같은 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치가 포함된다. 상기 매체는 프로그램 명령, 데이터 구조 등을 지정하는 신호를 전송하는 반송파를 포함하는 광 또는 금속선, 도파관 등의 전송 매체일 수도 있다. 프로그램 명령의 예에는 컴파일러에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터 등을 사용해서 컴퓨터에 의해서 실행될 수 있는 고급 언어 코드를 포함한다. 상기된 하드웨어 장치는 본 발명의 동작을 수행하기 위해 하나 이상의 소프트웨어 모듈로서 작동하도록 구성될 수 있으며, 그 역도 마찬가지이다.The encoding and decoding method according to the present invention can be implemented in the form of program instructions that can be executed by various computer means and recorded in a computer readable medium. The computer readable medium may include program instructions, data files, data structures, etc. alone or in combination. Program instructions recorded on the media may be those specially designed and constructed for the purposes of the present invention, or they may be of the kind well-known and available to those having skill in the computer software arts. Examples of computer-readable recording media include magnetic media such as hard disks, floppy disks, and magnetic tape, optical media such as CD-ROMs, DVDs, and magnetic disks, such as floppy disks. Magneto-optical media, and hardware devices specifically configured to store and execute program instructions, such as ROM, RAM, flash memory, and the like. The medium may be a transmission medium such as an optical or metal wire, a waveguide, or the like including a carrier wave for transmitting a signal specifying a program command, a data structure, or the like. Examples of program instructions include not only machine code generated by a compiler, but also high-level language code that can be executed by a computer using an interpreter or the like. The hardware device described above may be configured to operate as one or more software modules to perform the operations of the present invention, and vice versa.

이상과 같이 본 발명은 비록 한정된 실시예와 도면에 의해 설명되었으나, 본 발명은 상기의 실시예에 한정되는 것은 아니며, 본 발명이 속하는 분야에서 통상의 지식을 가진 자라면 이러한 기재로부터 다양한 수정 및 변형이 가능하다.As described above, the present invention has been described by way of limited embodiments and drawings, but the present invention is not limited to the above embodiments, and those skilled in the art to which the present invention pertains various modifications and variations from such descriptions. This is possible.

그러므로, 본 발명의 범위는 설명된 실시예에 국한되어 정해져서는 아니 되 며, 후술하는 특허청구범위뿐 아니라 이 특허청구범위와 균등한 것들에 의해 정해져야 한다.Therefore, the scope of the present invention should not be limited to the described embodiments, but should be determined not only by the claims below but also by the equivalents of the claims.

Claims

A signal decomposition unit for decomposing the input signal into signals having different frequency bands;

A phase and amplitude information extracting unit extracting instantaneous phase and instantaneous amplitude information of each of the separated signals by Hilbert transforming the decomposed signal; And

And an encoder which encodes each of the decomposed signals using the extracted instantaneous phase and instantaneous amplitude information.

The method of claim 1, wherein the encoding unit,

A phase encoder for encoding the coefficients of the polynomial by modeling the extracted instantaneous phase information into a polynomial; And

And an amplitude encoder configured to vector-quantize and extract the extracted instantaneous amplitude information.

The method of claim 2, wherein the phase encoding unit,

A phase continuity unit configured to perform a phase continuation process on the extracted instantaneous phase information;

A phase modeling unit performing linear modeling of matrix type on the instantaneous phase information subjected to the sequencing process and calculating coefficients of the polynomial using a least square method from the matrix type linear model; And

And a quantization unit configured to quantize the calculated coefficients of the polynomial.

The method of claim 3, wherein the phase modeling unit,

And encoding coefficients of the polynomial using a pre-stored transform matrix according to a modeling length and a polynomial order.

The method of claim 2, wherein the amplitude encoder,

An energy calculator configured to calculate energy of the extracted instantaneous amplitude;

A vector quantizer which normalizes the calculated energy value and quantizes the vector; And

And a scalar quantizer for scalar quantizing the extracted instantaneous amplitude in a logarithmic region.

A first encoder to encode a signal belonging to a high frequency region among the decomposed signals by applying a CELP-based encoding or audio signal encoding algorithm; And

A second encoder which extracts respective instantaneous phase and instantaneous amplitude information by applying a Hilbert transform to the remaining signals that do not belong to the high frequency region among the decomposed signals, and encodes the extracted instantaneous phase and instantaneous amplitude information. Encoding device.

The encoding apparatus of claim 6, further comprising a psychoacoustic modeling unit applying the psychoacoustic model to the input signal and providing the psychoacoustic model to the signal separation unit.

The encoding apparatus of claim 6, wherein the second encoder removes unnecessary components by applying a psychoacoustic model to the signal to which the Hilbert transform is applied.

A demultiplexer configured to receive and demultiplex the encoded bit streams for signals of different frequency bands;

A phase inverse quantizer for extracting phase information from a polynomial coefficient vector modeling a phase vector by inverse quantization of each demultiplexed signal;

An amplitude dequantization unit configured to dequantize each of the demultiplexed signals and extract amplitude information from an energy of an amplitude vector and an amplitude vector;

A multiplier for multiplying the extracted phase information and amplitude information to restore signals of different frequency bands; And

And a summation unit configured to sum the restored signals of different frequency bands and restore the original signal.

The method of claim 9, wherein the original signal,

A decoding apparatus, characterized in that a mixture of speech and audio signals or one of speech and audio signals.

Applying HHT to the input signal and decomposing the signal into signals having different frequency bands;

Independently encoding each of the decomposed signals; And

Multiplexing the independently encoded respective signals.

The method of claim 11, wherein the independently encoding each of the decomposed signals comprises:

Extracting instantaneous phase information and instantaneous amplitude information for each of the decomposed signals;

And encoding the instantaneous phase and amplitude information by applying different encoding methods according to the extracted characteristics of the extracted instantaneous phase and amplitude information.

The method of claim 12, wherein at least one of the different encoding methods,

And encoding the coefficients of the polynomial by modeling the extracted instantaneous phase information into a polynomial and performing vector quantization on the extracted instantaneous amplitude information.

Receiving and demultiplexing a bit stream encoded for signals of different frequency bands;

Extracting phase information from a polynomial coefficient vector modeling a phase vector of each of the demultiplexed signals, and extracting amplitude information from energy of amplitude vectors and amplitude vectors of each of the demultiplexed signals;

Multiplying the extracted phase information and amplitude information of each of the demultiplexed signals; And

And summing respective signals obtained by multiplying the extracted phase information and amplitude information.