KR20130014521A

KR20130014521A - Decoding apparatus, decoding method, encoding apparatus, encoding method, and program

Info

Publication number: KR20130014521A
Application number: KR1020127024669A
Authority: KR
Inventors: 시로 스즈키; 유우키 마츠무라; 준 마츠모토; 유우지 마에다; 야스히로 도구리
Original assignee: 소니 주식회사
Priority date: 2010-03-31
Filing date: 2011-03-15
Publication date: 2013-02-07
Also published as: JP2011215198A; US20130013325A1; EP3096320A1; CN102812513A; EP3096320B1; US8972249B2; CN102812513B; WO2011125430A1; EP2555193B1; EP2555193A4; JP5651980B2; EP2555193A1

Abstract

본 발명은, 복호 시의 대역 확장에 의한 지연 시간을 삭감함과 함께, 복호측의 리소스의 증가를 억제할 수 있는 복호 장치 및 복호 방법, 부호화 장치 및 부호화 방법, 및 프로그램에 관한 것이다. 고역 성분 생성부(73)는, 저역 스펙트럼(SP-L)과 고역 엔벨로프(ENV-H)를 사용하여 의사 고역 스펙트럼을 생성한다. 위상 랜덤부(74)는, 랜덤 플래그(RND)에 기초하여, 의사 고역 스펙트럼의 위상을 랜덤화한다. 역 MDCT부(75)는, 저역 엔벨로프(ENV-L)를 사용하여 저역 스펙트럼(SP-L)을 역정규화하고, 위상 랜덤부(74)로부터 공급되는 의사 고역 스펙트럼과 역정규화된 저역 스펙트럼(SP-L)을 합성하고, 그 합성 결과를 전 대역의 스펙트럼으로 한다. 본 발명은, 예를 들어 대역 확장 복호를 행하는 복호 장치에 적용할 수 있다.The present invention relates to a decoding apparatus and decoding method, an encoding apparatus and an encoding method, and a program capable of reducing the delay time due to the band extension during decoding and suppressing an increase in resources on the decoding side. The high pass component generator 73 generates a pseudo high pass spectrum using the low pass spectrum SP-L and the high pass envelope ENV-H. The phase random unit 74 randomizes the phase of the pseudo high frequency spectrum based on the random flag RND. The inverse MDCT unit 75 denormalizes the low pass spectrum SP-L using the low pass envelope ENV-L, and the pseudo high pass spectrum and the denormalized low pass spectrum SP supplied from the phase random portion 74. -L) is synthesize | combined, and the result of synthesis is made into the spectrum of the full band. The present invention can be applied to, for example, a decoding device that performs band extension decoding.

Description

Decoding device and decoding method, encoding device and encoding method, and program {DECODING APPARATUS, DECODING METHOD, ENCODING APPARATUS, ENCODING METHOD, AND PROGRAM}

본 발명은, 복호 장치 및 복호 방법, 부호화 장치 및 부호화 방법, 및 프로그램에 관한 것으로, 특히, 복호 시의 대역 확장에 의한 지연 시간을 삭감함과 함께, 복호측의 리소스의 증가를 억제할 수 있도록 한 복호 장치 및 복호 방법, 부호화 장치 및 부호화 방법, 및 프로그램에 관한 것이다.The present invention relates to a decoding apparatus, a decoding method, an encoding apparatus and an encoding method, and a program. In particular, the present invention relates to reducing a delay time caused by bandwidth expansion during decoding and suppressing an increase in resources on a decoding side. The present invention relates to a decoding apparatus and decoding method, an encoding apparatus and an encoding method, and a program.

음성 신호의 부호화 방법으로서는, 일반적으로 MP3(Moving Picture Experts Group Audio Layer-3), AAC(Advanced Audio Coding), ATRAC(Adaptive Transform Acoustic Coding) 등의 변환 부호화 방법이 잘 알려져 있다.As a method of encoding a speech signal, transform coding methods such as Moving Picture Experts Group Audio Layer-3 (MP3), Advanced Audio Coding (AAC), and Adaptive Transform Acoustic Coding (ATRAC) are generally well known.

이러한 부호화 방법에 있어서는, 부호화 결과에 정보량이 많은 고역(高域)의 스펙트럼을 포함하지 않고, 고역의 스펙트럼의 엔벨로프만을 포함함으로써 부호화 효율을 향상시키는 것이 고려되고 있다. 이 경우, 복호 시에는, 저역(低域)의 스펙트럼이 평행 이동이나 반환 등에 의해 복제됨으로써, 고역의 스펙트럼이 생성된다. 그리고, 생성된 고역의 스펙트럼의 엔벨로프만이 부호화 결과에 포함되는 본래의 고역의 스펙트럼의 엔벨로프에 접근됨으로써, 청각적인 음질의 향상을 도모할 수 있다. 이러한 복호의 기술은 대역 확장 기술이라고 불리며, 이미 일반적으로 인지되어 있다.In such an encoding method, it is considered to improve the coding efficiency by not including the high frequency spectrum having a large amount of information in the encoding result, but including only the envelope of the high frequency spectrum. In this case, during decoding, the high frequency spectrum is generated by copying the low frequency spectrum by parallel movement or return. Then, only the generated high frequency envelope is approached to the original high frequency spectrum envelope included in the encoding result, so that the audio quality can be improved. This decoding technique is called a band extension technique and is already generally recognized.

도 1은 고역의 스펙트럼에 대해서는 엔벨로프만을 부호화 결과에 포함하는 부호화 장치의 구성의 일례를 나타내는 블록도이다.1 is a block diagram showing an example of a configuration of an encoding device in which only an envelope is included in an encoding result for a high frequency spectrum.

도 1의 부호화 장치(10)는, MDCT(Modified Discrete Cosine Transform)부(11), 양자화부(12) 및 다중화부(13)로 구성된다. 또한, 부호화 장치(10)는, 고역 스펙트럼(SP-H)을 부호화 결과에 포함하지 않는 점을 제외하고, 이미 일반적으로 잘 알려져 있는 변환 부호화 장치와 마찬가지이다. 또한, 도면의 설명을 간단하게 하기 위하여, 양자화부(12)는, 양자화뿐만 아니라 양자화 대상의 추출이나 정규화도 행하는 것으로 한다.The encoding device 10 of FIG. 1 includes a Modified Discrete Cosine Transform (MDCT) unit 11, a quantization unit 12, and a multiplexer 13. In addition, the encoding apparatus 10 is similar to the already known transformation encoding apparatus, except that the high frequency spectrum SP-H is not included in the encoding result. In addition, in order to simplify description of drawing, the quantization part 12 shall perform not only quantization but also extraction and normalization of a quantization object.

구체적으로는, 부호화 장치(10)의 MDCT부(11)는, 부호화 장치(10)에 입력된 음성의 시간 영역 신호인 PCM(Pulse Code Modulation) 신호에 대하여 MDCT를 행하여, 주파수 영역 신호인 스펙트럼(SP)을 생성한다. MDCT부(11)는, 생성된 스펙트럼(SP)을 양자화부(12)에 공급한다.Specifically, the MDCT unit 11 of the encoding apparatus 10 performs MDCT on a PCM (Pulse Code Modulation) signal, which is a time domain signal of speech input to the encoding apparatus 10, to perform a spectrum (frequency spectrum signal). SP). The MDCT unit 11 supplies the generated spectrum SP to the quantization unit 12.

양자화부(12)는, MDCT부(11)로부터 공급되는 스펙트럼(SP)의 고역 성분인 고역 스펙트럼(SP-H) 및 저역 성분인 저역 스펙트럼(SP-L)으로부터, 각각 엔벨로프를 추출한다. 양자화부(12)는, 추출된 고역 스펙트럼(SP-H)의 엔벨로프인 고역 엔벨로프(ENV-H)와, 저역 스펙트럼(SP-L)의 엔벨로프인 저역 엔벨로프(ENV-L)를 양자화한다. 양자화부(12)는, 양자화된 고역 엔벨로프(ENV-H)와 저역 엔벨로프(ENV-L)를, 다중화부(13)에 공급한다. 또한, 본 명세서에서는, 설명을 간단하게 하기 위하여, 양자화나 부호화 전후의 신호의 명칭(SP-L, SP-H 등)을 동일한 것으로 하고 있다.The quantization part 12 extracts an envelope from the high frequency spectrum SP-H which is the high frequency component of the spectrum SP supplied from the MDCT part 11, and the low frequency spectrum SP-L which is the low frequency component, respectively. The quantization unit 12 quantizes the high pass envelope ENV-H, which is the envelope of the extracted high pass spectrum SP-H, and the low pass envelope ENV-L, which is the envelope of the low pass spectrum SP-L. The quantization unit 12 supplies the quantized high pass envelope ENV-H and low pass envelope ENV-L to the multiplexer 13. In addition, in this specification, in order to simplify description, the names (SP-L, SP-H, etc.) of signals before and after quantization and encoding are made the same.

또한, 양자화부(12)는, 저역 엔벨로프(ENV-L)를 사용하여, 저역 스펙트럼(SP-L)을 정규화하고, 정규화된 저역 스펙트럼(SP-L)에 대하여 양자화를 행하고, 그 결과 얻어지는 저역 스펙트럼(SP-L)을 다중화부(13)에 공급한다.In addition, the quantization unit 12 normalizes the low pass spectrum SP-L using the low pass envelope ENV-L, quantizes the normalized low pass spectrum SP-L, and the low pass obtained as a result. The spectrum SP-L is supplied to the multiplexer 13.

이와 같이, 양자화부(12)는, 스펙트럼(SP)의 저역 성분에 대해서는, 엔벨로프로 정규화된 스펙트럼을 부호화 결과에 포함하지만, 고역 성분에 대해서는 엔벨로프만을 부호화 결과에 포함한다. 이에 의해, 부호화 효율이 향상한다.As described above, the quantization unit 12 includes a spectrum normalized with the envelope in the encoding result for the low pass component of the spectrum SP, but includes only the envelope in the encoding result for the high pass component. This improves the coding efficiency.

다중화부(13)는, 양자화부(12)로부터 공급되는 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H)를 다중화하고, 그 결과 얻어지는 비트 스트림을 출력한다. 이 비트 스트림은, 도시하지 않은 기록 매체에 기록되거나, 복호 장치에 전송된다.The multiplexer 13 multiplexes the low pass envelope (ENV-L), low pass spectrum (SP-L) and high pass envelope (ENV-H) supplied from the quantization unit 12, and outputs the resulting bit stream. . This bit stream is recorded in a recording medium (not shown) or transmitted to the decoding device.

도 2는 도 1의 부호화 장치(10)에 의한 부호화 처리를 설명하는 흐름도이다. 이 부호화 처리는, 예를 들어 부호화 장치(10)에 음성의 PCM 신호가 입력되었을 때 개시된다.FIG. 2 is a flowchart for describing an encoding process by the encoding device 10 of FIG. 1. This encoding process is started, for example, when the PCM signal of audio | voice is input to the encoding apparatus 10. FIG.

도 2의 스텝 S11에 있어서, MDCT부(11)는, 부호화 장치(10)에 입력된 음성의 시간 영역 신호인 PCM 신호에 대하여 MDCT를 행하여, 주파수 영역 신호인 스펙트럼(SP)을 생성한다. MDCT부(11)는, 생성된 스펙트럼(SP)을 양자화부(12)에 공급한다.In step S11 of FIG. 2, the MDCT unit 11 performs MDCT on a PCM signal that is a time domain signal of speech input to the encoding apparatus 10 to generate a spectrum SP that is a frequency domain signal. The MDCT unit 11 supplies the generated spectrum SP to the quantization unit 12.

스텝 S12에 있어서, 양자화부(12)는, MDCT부(11)로부터 공급되는 스펙트럼(SP)의 고역 성분인 고역 스펙트럼(SP-H) 및 저역 성분인 저역 스펙트럼(SP-L)으로부터, 각각 엔벨로프를 추출한다.In step S12, the quantization part 12 is enveloped from the high frequency spectrum SP-H which is the high frequency component of the spectrum SP supplied from the MDCT part 11, and the low frequency spectrum SP-L which is the low frequency component, respectively. Extract

스텝 S13에 있어서, 양자화부(12)는, 저역 엔벨로프(ENV-L)를 사용하여, 저역 스펙트럼(SP-L)을 정규화한다.In step S13, the quantization unit 12 normalizes the low pass spectrum SP-L using the low pass envelope ENV-L.

스텝 S14에 있어서, 양자화부(12)는, 추출된 고역 엔벨로프(ENV-H), 저역 엔벨로프(ENV-L) 및 정규화된 저역 스펙트럼(SP-L)에 대하여 양자화를 행한다. 그리고, 양자화부(12)는, 양자화된 고역 엔벨로프(ENV-H), 저역 엔벨로프(ENV-L) 및 정규화된 저역 스펙트럼(SP-L)을 다중화부(13)에 공급한다.In step S14, the quantization unit 12 quantizes the extracted high pass envelope ENV-H, low pass envelope ENV-L, and normalized low pass spectrum SP-L. The quantization unit 12 supplies the quantized high pass envelope ENV-H, the low pass envelope ENV-L, and the normalized low pass spectrum SP-L to the multiplexer 13.

스텝 S15에 있어서, 다중화부(13)는, 양자화부(12)로부터 공급되는 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H)를 다중화하고, 그 결과 얻어지는 비트 스트림을 출력한다. 그리고, 처리는 종료한다.In step S15, the multiplexing unit 13 multiplexes the low pass envelope (ENV-L), low pass spectrum (SP-L), and high pass envelope (ENV-H) supplied from the quantization unit 12, and is obtained as a result. Output a bit stream. Then, the process ends.

도 3은 도 1의 부호화 장치(10)에 의해 부호화된 비트 스트림을 복호하는 복호 장치의 구성의 일례를 나타내는 블록도이다.FIG. 3 is a block diagram illustrating an example of a configuration of a decoding device that decodes a bit stream encoded by the encoding device 10 of FIG. 1.

도 3의 복호 장치(30)는, 분해화부(31), 역양자화부(32), 역 MDCT부(33) 및 대역 확장부(34)로 구성된다.The decoding apparatus 30 of FIG. 3 is comprised from the decomposition part 31, the inverse quantization part 32, the inverse MDCT part 33, and the band expansion part 34. As shown in FIG.

복호 장치(30)의 분해화부(31), 역양자화부(32) 및 역 MDCT부(33)는, 통상의 변환 복호 장치와 마찬가지로, PCM 신호의 저역 성분만을 복원한다.The decomposition section 31, the inverse quantization section 32, and the inverse MDCT section 33 of the decoding device 30 restore only the low-pass component of the PCM signal in the same manner as in the ordinary conversion decoding device.

구체적으로는, 분해화부(31)는, 부호화 장치(10)에 의해 부호화된 비트 스트림을 취득하고, 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H)로 분해하여, 역양자화부(32)에 공급한다.Specifically, the decomposition unit 31 obtains the bit stream encoded by the encoding device 10, and converts it into a low pass envelope (ENV-L), a low pass spectrum (SP-L), and a high pass envelope (ENV-H). It decomposes | disassembles and supplies it to the dequantization part 32.

역양자화부(32)는, 분해화부(31)에 의해 공급되는 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H) 각각에 대하여 역양자화를 행한다. 그리고, 역양자화부(32)는, 역양자화된 저역 엔벨로프(ENV-L)와 저역 스펙트럼(SP-L)을 역 MDCT부(33)에 공급하고, 고역 엔벨로프(ENV-H)를 대역 확장부(34)에 공급한다.The inverse quantization unit 32 performs inverse quantization on each of the low pass envelope ENV-L, the low pass spectrum SP-L, and the high pass envelope ENV-H supplied by the decomposition unit 31. The inverse quantization unit 32 supplies the inversely quantized low pass envelope (ENV-L) and low pass spectrum (SP-L) to the inverse MDCT unit 33, and supplies the high pass envelope (ENV-H) to the band extension unit. It is supplied to 34.

역 MDCT부(33)는, 역양자화부(32)로부터 공급되는 저역 엔벨로프(ENV-L)를 사용하여, 저역 스펙트럼(SP-L)에 대하여 역정규화를 행한다. 또한, 역 MDCT부(33)는, 역정규화된 주파수 영역 신호인 저역 스펙트럼(SP-L)에 대하여 역 MDCT를 행하여, 시간 영역 신호인 PCM 신호를 얻는다. 또한, 이 PCM 신호는, 고역 성분이 없는 PCM 신호이며, 청각적으로 분명치 않은 음질의 음성인 PCM 신호이다. 역 MDCT부(33)는, 이 PCM 신호를 대역 확장부(34)에 공급한다.The inverse MDCT unit 33 denormalizes the low frequency spectrum SP-L using the low frequency envelope ENV-L supplied from the inverse quantization unit 32. In addition, the inverse MDCT unit 33 performs inverse MDCT on the low frequency spectrum (SP-L), which is a denormalized frequency domain signal, to obtain a PCM signal which is a time domain signal. In addition, this PCM signal is a PCM signal without a high frequency component, and is a PCM signal which is audible indefinite sound. The inverse MDCT unit 33 supplies this PCM signal to the band extension unit 34.

대역 확장부(34)는, 대역 분할 필터(41), 고역 성분 생성부(42) 및 대역 합성 필터(43)로 구성된다. 대역 확장부(34)는, 역 MDCT부(33)에서 얻어지는 고역 성분이 없는 PCM 신호의 주파수 대역을 확장함으로써, 그 PCM 신호의 음질을 향상시키는 대역 확장 처리를 행한다.The band extension section 34 is composed of a band division filter 41, a high pass component generator 42, and a band synthesis filter 43. The band extension section 34 expands the frequency band of the PCM signal without the high band component obtained by the inverse MDCT section 33, thereby performing band extension processing for improving the sound quality of the PCM signal.

구체적으로는, 대역 확장부(34)의 대역 분할 필터(41)는, 역 MDCT부(33)로부터 공급되는 PCM 신호를 고역 성분과 저역 성분으로 분할한다. 그리고, 이 PCM 신호에는 고역 성분이 없기 때문에, 대역 분할 필터(41)는, 분할된 PCM 신호의 고역 성분을 파기한다. 또한, 대역 분할 필터(41)는, 분할된 PCM 신호의 저역 성분인 저역 PCM 신호(BS-L)를 고역 성분 생성부(42)와 대역 합성 필터(43)에 공급한다.Specifically, the band dividing filter 41 of the band extension section 34 divides the PCM signal supplied from the inverse MDCT section 33 into a high band component and a low band component. Since the PCM signal does not have a high frequency component, the band dividing filter 41 discards the high frequency component of the divided PCM signal. In addition, the band dividing filter 41 supplies the low pass PCM signal BS-L, which is a low pass component of the divided PCM signal, to the high pass component generator 42 and the band synthesis filter 43.

고역 성분 생성부(42)는, 대역 분할 필터(41)로부터 공급되는 저역 PCM 신호(BS-L)와, 역양자화부(32)로부터 공급되는 고역 엔벨로프(ENV-H)를 사용하여, 고역의 PCM 신호를 생성하여, 의사 고역 PCM 신호(BS-H)로 한다. 의사 고역 PCM 신호(BS-H)의 생성 방법에 대해서는, 예를 들어 본 출원인이 먼저 출원한 특허문헌 1에 기재되어 있다. 고역 성분 생성부(42)는, 의사 고역 PCM 신호(BS-H)를 대역 합성 필터(43)에 공급한다.The high pass component generator 42 uses a low pass PCM signal (BS-L) supplied from the band division filter 41 and a high pass envelope (ENV-H) supplied from the inverse quantization unit 32, A PCM signal is generated to be a pseudo high pass PCM signal (BS-H). About the method of generating the pseudo high pass PCM signal (BS-H), it is described in patent document 1 which the applicant filed previously. The high pass component generator 42 supplies the pseudo high pass PCM signal BS-H to the band synthesis filter 43.

대역 합성 필터(43)는, 대역 분할 필터(41)로부터 공급되는 저역 PCM 신호(BS-L)와, 고역 성분 생성부(42)로부터 공급되는 의사 고역 PCM 신호(BS-H)를 합성하여, 전 대역의 PCM 신호를 복호 결과로서 출력한다.The band synthesis filter 43 synthesizes the low pass PCM signal BS-L supplied from the band division filter 41 and the pseudo high pass PCM signal BS-H supplied from the high pass component generator 42. The PCM signal of all bands is output as a decoding result.

이상과 같이 하여 출력되는 전 대역의 PCM 신호에 대응하는 음성은, 고역 성분이 없는 PCM 신호에 대응하는 음성에 비해, 답답한 느낌이 저감되어, 청명하고 듣기 좋은 음성으로 된다.As compared with the voice corresponding to the PCM signal without the high frequency component, the voice corresponding to the PCM signal of the entire band output as described above is reduced in feeling of frustration and becomes a clear and audible voice.

도 4는 역 MDCT부(33) 및 대역 합성 필터(43)로부터 출력되는 신호를 설명하는 도면이다. 또한, 도 4에 있어서, 횡축은 주파수를 나타내고, 종축은 신호의 레벨을 나타내고 있다. 이것은, 후술하는 도 7, 도 10 및 도 12 내지 도 16에 있어서도 마찬가지이다.4 is a diagram for explaining signals output from the inverse MDCT unit 33 and the band synthesis filter 43. As shown in FIG. 4, the horizontal axis represents frequency and the vertical axis represents the level of the signal. This is the same also in FIG. 7, FIG. 10, and FIG. 12-FIG. 16 mentioned later.

역 MDCT부(33)로부터 출력되는 신호는, 도 4의 A에 도시한 바와 같은 저역 엔벨로프(ENV-L)를 사용하여 역정규화된 저역 스펙트럼(SP-L)의 PCM 신호이다. 또한, 대역 합성 필터(43)로부터 출력되는 신호는, 도 4의 B에 도시한 바와 같은 저역 엔벨로프(ENV-L)를 사용하여 역정규화된 저역 스펙트럼(SP-L)의 PCM 신호를 저역 성분으로서 갖고, 고역 엔벨로프(ENV-H)와 저역 PCM 신호(BS-L)로부터 생성된 의사 고역 PCM 신호(BS-H)를 고역 성분으로서 갖는 PCM 신호이다.The signal output from the inverse MDCT unit 33 is a PCM signal of low frequency spectrum (SP-L) denormalized using a low frequency envelope (ENV-L) as shown in A of FIG. In addition, the signal output from the band synthesis filter 43 uses a low pass spectrum (SP-L) PCM signal denormalized using a low pass envelope (ENV-L) as shown in B of FIG. 4 as a low pass component. It is a PCM signal which has a pseudo high pass PCM signal (BS-H) generated from a high pass envelope (ENV-H) and a low pass PCM signal (BS-L) as a high pass component.

도 5는 도 3의 복호 장치(30)에 의한 복호 처리를 설명하는 흐름도이다. 이 복호 처리는, 예를 들어 부호화 장치(10)에 의해 부호화된 비트 스트림이 복호 장치(30)에 입력되었을 때 개시된다.FIG. 5 is a flowchart for describing decoding processing by the decoding device 30 of FIG. 3. This decoding process is started, for example, when the bit stream encoded by the encoding device 10 is input to the decoding device 30.

도 5의 스텝 S31에 있어서, 분해화부(31)는, 복호 장치(30)에 입력된 비트 스트림을 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H)로 분해하여, 역양자화부(32)에 공급한다.In step S31 of FIG. 5, the decomposition unit 31 converts the bit stream input to the decoding device 30 into a low pass envelope (ENV-L), a low pass spectrum (SP-L), and a high pass envelope (ENV-H). It decomposes | disassembles and supplies it to the dequantization part 32.

스텝 S32에 있어서, 역양자화부(32)는, 분해화부(31)로부터 공급되는 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H) 각각에 대하여 역양자화를 행한다. 역양자화부(32)는, 역양자화된 저역 엔벨로프(ENV-L)와 저역 스펙트럼(SP-L)을 역 MDCT부(33)에 공급하고, 고역 엔벨로프(ENV-H)를 대역 확장부(34)에 공급한다.In step S32, the inverse quantization unit 32 performs inverse quantization with respect to each of the low pass envelope ENV-L, the low pass spectrum SP-L, and the high pass envelope ENV-H supplied from the decomposition unit 31. Do it. The dequantization unit 32 supplies the dequantized low pass envelope (ENV-L) and the low pass spectrum (SP-L) to the inverse MDCT unit 33, and supplies the high pass envelope ENV-H to the band extension unit 34. Supplies).

스텝 S33에 있어서, 역 MDCT부(33)는, 역양자화부(32)로부터 공급되는 저역 엔벨로프(ENV-L)를 사용하여, 저역 스펙트럼(SP-L)에 대하여 역정규화를 행한다.In step S33, the inverse MDCT unit 33 performs normalization on the low pass spectrum SP-L using the low pass envelope ENV-L supplied from the inverse quantization unit 32.

스텝 S34에 있어서, 역 MDCT부(33)는, 역정규화된 주파수 영역 신호인 저역 스펙트럼(SP-L)에 대하여 역 MDCT를 행하여, 시간 영역 신호인 PCM 신호를 얻는다. 역 MDCT부(33)는, 이 PCM 신호를 대역 확장부(34)에 공급한다.In step S34, the inverse MDCT unit 33 performs inverse MDCT on the low frequency spectrum SP-L which is a denormalized frequency domain signal to obtain a PCM signal that is a time domain signal. The inverse MDCT unit 33 supplies this PCM signal to the band extension unit 34.

스텝 S35에 있어서, 대역 확장부(34)의 대역 분할 필터(41)는, 역 MDCT부(33)로부터 공급되는 PCM 신호를 고역 성분과 저역 성분으로 분할한다. 그리고, 대역 분할 필터(41)는, 분할된 PCM 신호의 고역 성분을 파기하고, 분할된 PCM 신호의 저역 성분인 저역 PCM 신호(BS-L)를 고역 성분 생성부(42)와 대역 합성 필터(43)에 공급한다.In step S35, the band division filter 41 of the band extension part 34 divides the PCM signal supplied from the inverse MDCT part 33 into a high frequency component and a low frequency component. Then, the band split filter 41 discards the high frequency component of the divided PCM signal, and converts the low frequency PCM signal BS-L, which is the low frequency component of the divided PCM signal, into the high frequency component generator 42 and the band synthesis filter ( 43).

스텝 S36에 있어서, 고역 성분 생성부(42)는, 대역 분할 필터(41)로부터 공급되는 저역 PCM 신호(BS-L)와, 역양자화부(32)로부터 공급되는 고역 엔벨로프(ENV-H)를 사용하여, 의사 고역 PCM 신호(BS-H)를 생성한다. 고역 성분 생성부(42)는, 의사 고역 PCM 신호(BS-H)를 대역 합성 필터(43)에 공급한다.In step S36, the high pass component generation unit 42 supplies the low pass PCM signal BS-L supplied from the band division filter 41 and the high pass envelope ENV-H supplied from the inverse quantization unit 32. To generate a pseudo high pass PCM signal (BS-H). The high pass component generator 42 supplies the pseudo high pass PCM signal BS-H to the band synthesis filter 43.

스텝 S37에 있어서, 대역 합성 필터(43)는, 대역 분할 필터(41)로부터 공급되는 저역 PCM 신호(BS-L)와, 고역 성분 생성부(42)로부터 공급되는 의사 고역 PCM 신호(BS-H)를 합성하여, 전 대역의 PCM 신호를 얻는다. 대역 합성 필터(43)는, 그 전 대역의 PCM 신호를 출력하고, 처리를 종료한다.In step S37, the band synthesis filter 43 is a low pass PCM signal BS-L supplied from the band division filter 41 and a pseudo high pass PCM signal BS-H supplied from the high pass component generator 42. ) Is synthesized to obtain a full-band PCM signal. The band synthesis filter 43 outputs the PCM signal of the entire band and finishes the processing.

이상과 같은 대역 확장 기술은, 국제 규격인 HE-AAC(High-Efficiency Advanced Audio Coding)나 LPEC(상표)의 스테레오 하이크오리티 모드에서 이미 이용되고 있다.The above-described band extension technology has already been used in the international high-efficiency advanced audio coding (HE-AAC) or LPEC (trademark) stereo high quality mode.

상술한 바와 같이, 종래의 대역 확장 기술에서는, 대역 확장 처리는, 저역 스펙트럼(SP-L)의 복호 처리의 후처리(포스트 프로세스)로서 행해진다. 이에 의해, 의사 고역 PCM 신호(BS-H)의 자유도를 높일 수 있다. 즉, 의사 고역 PCM 신호(BS-H)를 주파수 영역 신호인 저역 스펙트럼(SP-L)이 아니고, 시간 영역 신호인 저역 PCM 신호(BS-L)로부터 생성할 수 있다.As described above, in the conventional band extension technique, the band extension process is performed as a post-process (post process) of the decoding process of the low-band spectrum (SP-L). As a result, the degree of freedom of the pseudo high pass PCM signal BS-H can be increased. That is, the pseudo high pass PCM signal BS-H can be generated from the low pass PCM signal BS-L which is a time domain signal rather than the low pass spectrum SP-L which is a frequency domain signal.

또한, 부호화 처리나 복호 처리의 처리 블록 크기와, 대역 확장 처리의 처리 블록 크기를 각각 자유롭게 설정함으로써, 주파수 분석 정밀도 및 시간 분해 정밀도를 각각 최적으로 할 수 있다.In addition, by freely setting the processing block size of the encoding process and the decoding process and the processing block size of the band extension process, the frequency analysis precision and the time decomposition precision can be optimized respectively.

또한, 특허문헌 1에 기재되어 있는 방법에 의해 의사 고역 PCM 신호를 생성하는 경우, 고역 엔벨로프(ENV-H)로부터 노이즈성 스펙트럼을 생성함과 함께, 고역 엔벨로프(ENV-H) 및 저역 PCM 신호(BS-L)로부터 톤성 스펙트럼을 생성하고, 양쪽 스펙트럼을 비교한다는 복잡한 처리가 필요해진다.In addition, when generating a pseudo high pass PCM signal by the method described in patent document 1, while generating a noise spectrum from a high pass envelope (ENV-H), a high pass envelope (ENV-H) and a low pass PCM signal ( A complex process of generating tonal spectra from BS-L) and comparing both spectra is required.

이러한 노이즈성 스펙트럼과 톤성 스펙트럼을 생성하는 처리는, 청각적으로 높은 품질의 음성을 생성하기 위하여 필요한, 저역 스펙트럼과 고역 스펙트럼의 매칭 정밀도의 향상에 필수적인 처리이며, 특허문헌 2 및 3에 기재되어 있는 복호 장치에 있어서도 행해지고 있다.The processing for generating such a noise spectrum and a tone spectrum is an essential process for improving the matching accuracy of the low frequency spectrum and the high frequency spectrum, which are necessary for generating audio of high quality sound, and are described in Patent Documents 2 and 3 It is also performed in the decoding apparatus.

일본 특허 제3861770호 공보Japanese Patent No. 381770 일본 특허 제3646938호 공보Japanese Patent No. 3646938 일본 특허 제3646939호 공보Japanese Patent No. 3646939

이상과 같이, 종래의 대역 확장 기술에서는, 대역 확장 처리가, 저역 스펙트럼(SP-L)의 복호 처리의 후처리로서 행해지도록 연구, 개발 및 실용화가 행해지고 있다. 따라서, 전 대역의 PCM 신호는, 분해화부(31), 역양자화부(32) 및 역 MDCT부(33)에 의한 통상의 복호 처리가 종료되고 나서(도 3의 예에서는, 시각 T0), 대역 확장부(34)에 의한 처리 시간 후(도 3의 예에서는, 시각 T1)에 출력된다.As mentioned above, in the conventional band extension technology, research, development, and practical use are performed so that a band extension process may be performed as a post-process of the decoding process of low-band spectrum (SP-L). Therefore, the PCM signal of the full band is the band after the normal decoding processing by the decomposition part 31, the inverse quantization part 32, and the inverse MDCT part 33 ends (time T0 in the example of FIG. 3). It is output after the processing time by the expansion part 34 (in the example of FIG. 3, time T1).

이것은, 복호 장치(30)가 간단히 음성만을 재생하는 재생 장치에 설치되는 경우에는, 그렇게 큰 문제로는 되지 않는다. 그러나, 복호 장치(30)가, 예를 들어 음성과 동기하여 영상도 재생하는 재생 장치에 설치되는 경우, 통상의 복호만을 행하는 경우와 대역 확장도 행하는 경우는 전 대역의 PCM 신호의 출력 시간이 상이하기 때문에, 영상과 음성을 동기하여 출력하는 것이 곤란해진다.This is not a big problem when the decoding device 30 is installed in a reproducing device that simply reproduces only audio. However, when the decoding device 30 is installed in a reproducing device that also reproduces a video in synchronism with audio, for example, the output time of the PCM signal of all bands is different in the case of performing only normal decoding and in case of performing band expansion. This makes it difficult to output video and audio synchronously.

이를 해결하기 위해서는 영상의 재생 타이밍을 늦출 필요가 있지만, 음성에 비해 영상의 버퍼링에는 대량의 메모리가 필요해지기 때문에, 리소스의 증대를 초래한다. 또한, 영상과 음성의 동기 타이밍을 미리 어긋나게 해 두는 것도 생각할 수 있지만, 통상의 복호만을 행할지, 대역 확장도 행할지는, 재생 장치에 따른 것이기 때문에, 항상 최적의 동기 타이밍을 지정하는 것은 곤란하다.In order to solve this problem, it is necessary to slow down the playback timing of the video. However, since the buffering of the video requires a large amount of memory compared to the audio, the resource is increased. It is also conceivable to shift the synchronization timing of the video and audio in advance, but it is difficult to always specify the optimum synchronization timing because it depends on the playback device whether only normal decoding or bandwidth expansion is performed.

또한, 복호 장치(30)는, 대역 확장을 위하여 대역 확장부(34)를 새롭게 설치할 필요가 있어, 대역 확장을 행하지 않는 복호 장치에 비해 리소스가 증가한다.In addition, the decoding device 30 needs to newly install the band extension unit 34 for band extension, so that resources increase as compared to a decoder that does not perform band extension.

이상에 의해, 대역 확장을 행하는 복호 장치에 있어서, 대역 확장에 의한 지연 시간을 삭감함과 함께, 리소스의 증가를 억제하는 것이 요구되고 있다.As described above, in the decoding device performing band extension, it is required to reduce the delay time caused by the band extension and to suppress the increase of resources.

본 발명은, 이러한 상황을 감안하여 이루어진 것이며, 복호 시의 대역 확장에 의한 지연 시간을 삭감함과 함께, 복호측의 리소스의 증가를 억제할 수 있도록 하는 것이다.This invention is made | formed in view of such a situation, and it is possible to reduce the delay time by bandwidth expansion at the time of decoding, and to suppress the increase of the resource of a decoding side.

본 발명의 제1 측면의 복호 장치는, 음성 신호의 저역의 엔벨로프, 상기 저역의 엔벨로프를 사용하여 정규화된 저역의 스펙트럼, 상기 음성 신호의 고역의 엔벨로프 및 상기 음성 신호의 고역의 스펙트럼의 집중도를, 부호화 결과로서 취득하는 취득 수단과, 상기 취득 수단에 의해 취득된 상기 부호화 결과 중 정규화된 상기 저역의 스펙트럼과, 상기 고역의 엔벨로프를 사용하여, 스펙트럼을 생성하는 생성 수단과, 상기 집중도에 기초하여, 상기 생성 수단에 의해 생성된 상기 스펙트럼의 위상을 랜덤화하는 랜덤화 수단과, 상기 취득 수단에 의해 취득된 상기 부호화 결과 중 상기 저역의 엔벨로프를 사용하여, 상기 저역의 스펙트럼을 역정규화하고, 상기 랜덤화 수단에 의해 랜덤화된 상기 스펙트럼 또는 상기 생성 수단에 의해 생성된 상기 스펙트럼과, 역정규화된 상기 저역의 스펙트럼을 합성하고, 그 합성 결과를 전 대역의 스펙트럼으로 하는 합성 수단을 구비하는 복호 장치이다.The decoding device according to the first aspect of the present invention is a low band envelope of a speech signal, a low band spectrum normalized using the low band envelope, a high band envelope of the voice signal, and a high frequency spectrum concentration. On the basis of acquisition means for acquiring as a coding result, generation means for generating a spectrum using the low-pass spectrum normalized among the coding results acquired by the acquiring means, the envelope of the high-pass, and the concentration level, The randomization means for randomizing the phase of the spectrum generated by the generating means, and the low-band envelope among the encoding results obtained by the obtaining means, denormalizes the low-band spectrum, and selects the random The spectrum randomized by the converting means or the spectrum generated by the generating means A decoder for combining the spectrum of the low-range normalization, and having a means for synthesizing the synthesized result to the spectrum of the entire band.

본 발명의 제1 측면의 복호 방법 및 프로그램은, 본 발명의 제1 측면의 복호 장치에 대응한다.The decoding method and program of the first aspect of the present invention correspond to the decoding device of the first aspect of the present invention.

본 발명의 제1 측면에 있어서는, 음성 신호의 저역의 엔벨로프, 상기 저역의 엔벨로프를 사용하여 정규화된 저역의 스펙트럼, 상기 음성 신호의 고역의 엔벨로프 및 상기 음성 신호의 고역의 스펙트럼의 집중도가, 부호화 결과로서 취득되고, 취득된 상기 부호화 결과 중 상기 저역의 스펙트럼과, 상기 고역의 엔벨로프를 사용하여, 스펙트럼이 생성되고, 상기 집중도에 기초하여 상기 스펙트럼의 위상이 랜덤화되고, 취득된 상기 부호화 결과 중 상기 저역의 엔벨로프를 사용하여, 상기 저역의 스펙트럼이 역정규화되고, 랜덤화된 상기 스펙트럼 또는 생성된 상기 스펙트럼과, 역정규화된 상기 저역의 스펙트럼이 합성되어, 그 합성 결과가 전 대역의 스펙트럼으로 된다.In the first aspect of the present invention, a low frequency envelope of a speech signal, a low frequency spectrum normalized using the low frequency envelope, a high frequency envelope of the speech signal, and a concentration spectrum of the high frequency spectrum of the speech signal are encoded results. The spectrum is generated using the low-band spectrum and the high-band envelope among the obtained encoding results, and the phase of the spectrum is randomized based on the concentration. Using the low-pass envelope, the low-spectrum spectrum is denormalized, the randomized spectrum or the generated spectrum and the de-normalized low-spectrum spectrum are synthesized, and the synthesis result is the spectrum of the entire band.

본 발명의 제2 측면의 복호 장치는, 음성 신호의 저역의 엔벨로프, 상기 저역의 엔벨로프를 사용하여 정규화된 저역의 스펙트럼 및 상기 음성 신호의 고역의 엔벨로프를, 부호화 결과로서 취득하는 취득 수단과, 상기 취득 수단에 의해 취득된 상기 부호화 결과 중 정규화된 상기 저역의 스펙트럼과, 상기 고역의 엔벨로프를 사용하여, 스펙트럼을 생성하는 생성 수단과, 상기 취득 수단에 의해 취득된 상기 부호화 결과 중 정규화된 상기 저역의 스펙트럼에 기초하여, 상기 저역의 스펙트럼의 집중도를 결정하는 결정 수단과, 상기 결정 수단에 의해 결정된 상기 집중도에 기초하여, 상기 생성 수단에 의해 생성된 상기 스펙트럼의 위상을 랜덤화하는 랜덤화 수단과, 상기 취득 수단에 의해 취득된 상기 부호화 결과 중 상기 저역의 엔벨로프를 사용하여, 상기 저역의 스펙트럼을 역정규화하고, 상기 랜덤화 수단에 의해 랜덤화된 상기 스펙트럼 또는 상기 생성 수단에 의해 생성된 상기 스펙트럼과, 역정규화된 상기 저역의 스펙트럼을 합성하고, 그 합성 결과를 전 대역의 스펙트럼으로 하는 합성 수단을 구비하는 복호 장치이다.The decoding device according to the second aspect of the present invention includes acquisition means for acquiring a low-pass envelope of a speech signal, a low-pass spectrum normalized using the low-pass envelope, and a high-pass envelope of the speech signal as a coding result; Generation means for generating a spectrum using the low frequency spectrum normalized among the encoding results acquired by the acquiring means and the envelope of the high frequency, and the low frequency normalized among the encoding results acquired by the acquiring means. Determination means for determining a concentration degree of the low-band spectrum based on the spectrum, randomization means for randomizing the phase of the spectrum generated by the generation means based on the concentration degree determined by the determination means; By using the low-pass envelope of the encoding result obtained by the obtaining means, Denormalizes the base low-spectrum spectrum, synthesizes the spectrum randomized by the randomization means or the spectrum generated by the generation means, and the denormalized low-spectrum spectrum, and synthesizes the result of the full band. It is a decoding apparatus provided with the synthesis | combining means used as spectrum.

본 발명의 제2 측면의 복호 방법 및 프로그램은, 본 발명의 제2 측면의 복호 장치에 대응한다.The decoding method and program of the second aspect of the present invention correspond to the decoding device of the second aspect of the present invention.

본 발명의 제2 측면에 있어서는, 음성 신호의 저역의 엔벨로프, 상기 저역의 엔벨로프를 사용하여 정규화된 저역의 스펙트럼 및 상기 음성 신호의 고역의 엔벨로프가, 부호화 결과로서 취득되고, 취득된 상기 부호화 결과 중 정규화된 상기 저역의 스펙트럼과, 상기 고역의 엔벨로프를 사용하여, 스펙트럼이 생성되고, 취득된 상기 부호화 결과 중 정규화된 상기 저역의 스펙트럼에 기초하여, 상기 저역의 스펙트럼의 집중도가 결정되고, 결정된 상기 집중도에 기초하여, 생성된 상기 스펙트럼의 위상이 랜덤화되고, 취득된 상기 부호화 결과 중 상기 저역의 엔벨로프를 사용하여, 상기 저역의 스펙트럼이 역정규화되고, 랜덤화된 상기 스펙트럼 또는 생성된 상기 스펙트럼과, 역정규화된 상기 저역의 스펙트럼이 합성되어, 그 합성 결과가 전 대역의 스펙트럼으로 된다.In the second aspect of the present invention, an envelope of a low pass of a speech signal, a spectrum of a low pass normalized using the envelope of the low pass, and an envelope of a high pass of the speech signal are obtained as encoding results, and among the obtained encoding results. Based on the normalized low-band spectrum and the high-band envelope, a spectrum is generated, and the concentration of the low-band spectrum is determined based on the normalized low-band spectrum among the obtained encoding results, and the determined concentration And based on the generated spectrum, the phase of the generated spectrum is randomized, the low-band spectrum is denormalized using the low-pass envelope of the obtained encoding result, and the randomized spectrum or the generated spectrum, The denormalized low frequency spectrum is synthesized, and the synthesis result is the specification of the whole band. It is a column.

본 발명의 제3 측면의 부호화 장치는, 음성 신호의 고역의 스펙트럼에 기초하여, 상기 고역의 스펙트럼의 집중도를 결정하는 결정 수단과, 상기 음성 신호의 스펙트럼으로부터, 저역의 스펙트럼의 엔벨로프와 상기 고역의 스펙트럼의 엔벨로프를 추출하는 추출 수단과, 상기 저역의 스펙트럼의 엔벨로프를 사용하여 상기 저역의 스펙트럼을 정규화하는 정규화 수단과, 상기 결정 수단에 의해 결정된 상기 집중도, 상기 추출 수단에 의해 추출된 상기 저역의 스펙트럼의 엔벨로프 및 상기 고역의 스펙트럼의 엔벨로프, 및 상기 정규화 수단에 의해 정규화된 상기 저역의 스펙트럼을 다중화하여, 부호화 결과로 하는 다중화 수단을 구비하는 부호화 장치이다.The encoding device of the third aspect of the present invention includes determining means for determining a degree of concentration of the high frequency spectrum based on a high frequency spectrum of an audio signal, and an envelope of the low frequency spectrum and the high frequency range from the spectrum of the audio signal. Extraction means for extracting an envelope of the spectrum, normalization means for normalizing the low-band spectrum using the envelope of the low-spectrum spectrum, the concentration determined by the determining means, the low-spectrum extracted by the extraction means And a multiplexing means for multiplexing the envelope of the high frequency spectrum, the low frequency spectrum normalized by the normalizing means, and making an encoding result.

본 발명의 제3 측면의 부호화 방법 및 프로그램은, 본 발명의 제3 측면의 부호화 장치에 대응한다.The encoding method and program of the third aspect of the present invention correspond to the encoding device of the third aspect of the present invention.

본 발명의 제3 측면에 있어서는, 음성 신호의 고역의 스펙트럼에 기초하여, 상기 고역의 스펙트럼의 집중도가 결정되고, 상기 음성 신호의 스펙트럼으로부터, 저역의 스펙트럼의 엔벨로프와 상기 고역의 스펙트럼의 엔벨로프가 추출되고, 상기 저역의 스펙트럼의 엔벨로프를 사용하여 상기 저역의 스펙트럼이 정규화되고, 결정된 상기 집중도, 추출된 상기 저역의 스펙트럼의 엔벨로프 및 상기 고역의 스펙트럼의 엔벨로프, 및 정규화된 상기 저역의 스펙트럼이 다중화되어, 부호화 결과로 된다.In the third aspect of the present invention, the concentration of the high frequency spectrum is determined based on the high frequency spectrum of the audio signal, and the envelope of the low frequency spectrum and the envelope of the high frequency spectrum are extracted from the spectrum of the audio signal. The low frequency spectrum is normalized using the envelope of the low frequency spectrum, the determined concentration, the envelope of the low frequency spectrum extracted and the envelope of the high frequency spectrum, and the normalized low frequency spectrum are multiplexed, This results in the encoding.

제1 또는 제2 측면의 복호 장치와 제3 측면의 부호화 장치는, 각각 독립된 장치이어도 좋고, 1개의 장치를 구성하고 있는 내부 블록이어도 좋다.The decoding apparatus of the first or second aspect and the encoding apparatus of the third aspect may be independent apparatuses or may be internal blocks constituting one apparatus.

본 발명의 제1 및 제2 측면에 의하면, 복호 시의 대역 확장에 의한 지연 시간을 삭감함과 함께, 리소스의 증가를 억제할 수 있다.According to the first and second aspects of the present invention, it is possible to reduce the delay time due to the band extension during decoding and to suppress the increase of resources.

또한, 본 발명의 제3 측면에 의하면, 복호 시의 대역 확장에 의한 지연 시간이 삭감되어, 복호측의 리소스의 증가가 억제되도록, 부호화를 행할 수 있다.Further, according to the third aspect of the present invention, encoding can be performed so that the delay time due to the expansion of the band during decoding is reduced, so that an increase in resources on the decoding side is suppressed.

도 1은 부호화 장치의 구성의 일례를 나타내는 블록도이다.
도 2는 도 1의 부호화 장치에 의한 부호화 처리를 설명하는 흐름도이다.
도 3은 복호 장치의 구성의 일례를 나타내는 블록도이다.
도 4는 역 MDCT부 및 대역 합성 필터로부터 출력되는 신호를 설명하는 도면이다.
도 5는 도 3의 복호 장치에 의한 복호 처리를 설명하는 흐름도이다.
도 6은 본 발명을 적용한 부호화 장치의 제1 실시 형태의 구성예를 도시하는 블록도이다.
도 7은 도 6의 MDCT부 및 양자화부로부터 출력되는 신호를 설명하는 도면이다
도 8은 도 6의 부호화 장치에 의한 부호화 처리를 설명하는 흐름도이다.
도 9는 도 6의 부호화 장치에 의해 부호화된 비트 스트림을 복호하는 복호 장치의 구성예를 도시하는 블록도이다.
도 10은 도 9의 역 MDCT부로부터 출력되는 신호를 설명하는 도면이다.
도 11은 위상의 랜덤화의 유무에 따른 복호 결과의 차를 설명하는 도면이다.
도 12는 고역 스펙트럼(SP-H)의 특성에 대하여 설명하는 도면이다.
도 13은 고역 스펙트럼(SP-H)의 특성에 대하여 설명하는 도면이다.
도 14는 고역 스펙트럼(SP-H)의 특성에 대하여 설명하는 도면이다.
도 15는 고역 스펙트럼(SP-H)의 특성에 대하여 설명하는 도면이다.
도 16은 고역 스펙트럼(SP-H)의 특성에 대하여 설명하는 도면이다.
도 17은 도 9의 복호 장치에 의한 복호 처리를 설명하는 흐름도이다.
도 18은 본 발명을 적용한 복호 장치의 제2 실시 형태의 구성예를 도시하는 블록도이다.
도 19는 도 18의 복호 장치에 의한 복호 처리를 설명하는 흐름도이다.
도 20은 컴퓨터의 구성예를 도시하는 도면이다.1 is a block diagram illustrating an example of a configuration of an encoding apparatus.
FIG. 2 is a flowchart for describing an encoding process by the encoding device of FIG. 1.
3 is a block diagram showing an example of the configuration of a decoding device.
4 is a diagram illustrating a signal output from an inverse MDCT unit and a band synthesis filter.
5 is a flowchart for describing a decoding process by the decoding device of FIG. 3.
6 is a block diagram showing a configuration example of a first embodiment of an encoding apparatus to which the present invention is applied.
FIG. 7 is a diagram illustrating signals output from the MDCT unit and the quantization unit of FIG. 6. FIG.
FIG. 8 is a flowchart for describing an encoding process by the encoding device of FIG. 6.
9 is a block diagram illustrating a configuration example of a decoding device that decodes a bit stream encoded by the encoding device of FIG. 6.
FIG. 10 is a view for explaining a signal output from the inverse MDCT unit in FIG. 9.
11 is a diagram illustrating a difference of decoding results with or without phase randomization.
It is a figure explaining the characteristic of the high frequency spectrum (SP-H).
It is a figure explaining the characteristic of the high frequency spectrum (SP-H).
It is a figure explaining the characteristic of the high frequency spectrum (SP-H).
It is a figure explaining the characteristic of the high frequency spectrum (SP-H).
It is a figure explaining the characteristic of the high frequency spectrum (SP-H).
FIG. 17 is a flowchart for describing a decoding process by the decoding device of FIG. 9.
It is a block diagram which shows the structural example of 2nd Embodiment of the decoding apparatus which applied this invention.
FIG. 19 is a flowchart for describing a decoding process by the decoding device of FIG. 18.
20 is a diagram illustrating a configuration example of a computer.

<제1 실시 형태> <1st embodiment>

[부호화 장치의 제1 실시 형태의 구성예] [Configuration example of the first embodiment of the encoding apparatus]

도 6은 본 발명을 적용한 부호화 장치의 제1 실시 형태의 구성예를 도시하는 블록도이다.6 is a block diagram showing a configuration example of a first embodiment of an encoding apparatus to which the present invention is applied.

도 6에 도시하는 구성 중, 도 1의 구성과 동일한 구성에는 동일한 부호를 부여하고 있다. 중복하는 설명에 대해서는 적절히 생략한다.The same code | symbol is attached | subjected to the structure similar to the structure of FIG. 1 among the structures shown in FIG. Overlapping descriptions are omitted as appropriate.

도 6의 부호화 장치(50)의 구성은, 주로 양자화부(12), 다중화부(13) 대신 양자화부(51), 다중화부(52)가 설치되어 있는 점이 도 1의 구성과 상이하다. 부호화 장치(10)는, 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H) 이외에, 랜덤 플래그(RND)(상세한 것은 후술한다)를 다중화하여 비트 스트림을 생성한다.The configuration of the encoding device 50 of FIG. 6 differs from the configuration of FIG. 1 in that the quantization unit 51 and the multiplexer 52 are provided instead of the quantization unit 12 and the multiplexer 13. The encoding device 10 generates a bit stream by multiplexing a random flag RND (detailed later) in addition to the low pass envelope ENV-L, the low pass spectrum SP-L, and the high pass envelope ENV-H. do.

구체적으로는, 부호화 장치(50)의 양자화부(51)는, 결정부(61), 추출부(62), 정규화부(63) 및 부분 양자화부(64)로 구성된다.Specifically, the quantization unit 51 of the encoding device 50 is composed of a determination unit 61, an extraction unit 62, a normalization unit 63, and a partial quantization unit 64.

결정부(61)는, MDCT부(11)로부터 공급되는 스펙트럼(SP) 중 고역 스펙트럼(SP-H)에 기초하여, 예를 들어 이하의 수학식 1에 의해, 고역 스펙트럼(SP-H)의 집중도 D를 결정한다.The determination part 61 is based on the high frequency spectrum SP-H of the spectrum SP supplied from the MDCT part 11, for example, according to following formula (1) of the high frequency spectrum SP-H Determine concentration D.

또한, 수학식 1에 있어서, max(SP-H)는, 고역 스펙트럼(SP-H)의 최대값을 나타내고, ave(SP-H)는, 고역 스펙트럼(SP-H)의 평균값을 나타낸다.In formula (1), max (SP-H) represents the maximum value of the high frequency spectrum SP-H, and ave (SP-H) represents the average value of the high frequency spectrum SP-H.

수학식 1에 의하면, 부호화 대상의 음성의 고역 성분의 톤성이 높고, 고역 스펙트럼(SP-H)의 분포에 큰 치우침이 있는 경우, 집중도 D는 커져, 부호화 대상의 음성의 고역 성분의 노이즈성이 높고, 고역 스펙트럼(SP-H)의 분포가 평탄한 경우, 집중도 D는 작아진다.According to Equation 1, when the tonality of the high frequency component of the audio to be encoded is high and there is a large bias in the distribution of the high frequency spectrum (SP-H), the concentration D becomes large and the noise of the high frequency component of the audio to be encoded is increased. When the height is high and the distribution of the high frequency spectrum SP-H is flat, the concentration D becomes small.

결정부(61)는, 집중도 D에 기초하여 랜덤 플래그(RND)를 결정한다. 이 랜덤 플래그(RND)는, 후술하는 복호 장치에 있어서의 대역 확장 처리 시에, 저역 스펙트럼(SP-L)과 고역 엔벨로프(ENV-H)로부터 생성되는 고역 스펙트럼(SP-H)에 의사한 스펙트럼의 위상을 랜덤화할지의 여부를 나타내는 플래그이다.The determination unit 61 determines the random flag RND based on the concentration D. This random flag RND is a spectrum pseudo-random to the high-band spectrum SP-H generated from the low-band spectrum SP-L and the high-band envelope ENV-H during the band extension processing in the decoder described later. A flag indicating whether or not to randomize the phase.

예를 들어, 집중도 D가, 부호화 장치(50)에 미리 설정되어 있는 임계값보다 큰 경우, 즉 고역 스펙트럼(SP-H)의 톤성이 높은 경우, 랜덤 플래그(RND)는, 랜덤화되지 않는 것을 나타내는 0으로 결정된다. 한편, 집중도 D가 미리 설정되어 있는 임계값 이하인 경우, 즉 고역 스펙트럼(SP-H)의 노이즈성이 높은 경우, 랜덤 플래그(RND)는, 랜덤화되는 것을 나타내는 1로 결정된다. 결정부(61)는, 결정된 랜덤 플래그(RND)를 다중화부(52)에 공급한다.For example, when the concentration degree D is larger than the threshold set in advance in the encoding device 50, that is, when the tone property of the high frequency spectrum SP-H is high, the random flag RND is not randomized. Is determined by 0. On the other hand, when the concentration degree D is equal to or less than a preset threshold value, that is, when the noise of the high frequency spectrum SP-H is high, the random flag RND is determined to be 1 indicating that it is randomized. The determination unit 61 supplies the determined random flag RND to the multiplexer 52.

추출부(62)는, 도 1의 양자화부(12)와 마찬가지로, MDCT부(11)로부터 공급되는 스펙트럼(SP) 중 고역 스펙트럼(SP-H) 및 저역 스펙트럼(SP-L)으로부터, 각각 엔벨로프를 추출한다.The extraction section 62, like the quantization section 12 of FIG. 1, is enveloped from the high-band spectrum SP-H and the low-band spectrum SP-L, respectively, of the spectrum SP supplied from the MDCT section 11. Extract

정규화부(63)는, 양자화부(12)와 마찬가지로, 저역 엔벨로프(ENV-L)를 사용하여, 저역 스펙트럼(SP-L)을 정규화한다.The normalizer 63 normalizes the low-band spectrum SP-L using the low-pass envelope ENV-L, similarly to the quantizer 12.

부분 양자화부(64)는, 정규화된 저역 스펙트럼(SP-L)에 대하여 양자화를 행하고, 그 결과 얻어지는 저역 스펙트럼(SP-L)을 다중화부(52)에 공급한다. 또한, 부분 양자화부(64)는, 양자화부(12)와 마찬가지로, 추출된 고역 엔벨로프(ENV-H)와 저역 엔벨로프(ENV-L)를 양자화한다. 부분 양자화부(64)는, 양자화부(12)와 마찬가지로, 양자화된 고역 엔벨로프(ENV-H)와 저역 엔벨로프(ENV-L)를, 다중화부(52)에 공급한다.The partial quantizer 64 quantizes the normalized low pass spectrum SP-L, and supplies the resulting low pass spectrum SP-L to the multiplexer 52. In addition, similar to the quantization unit 12, the partial quantization unit 64 quantizes the extracted high pass envelope ENV-H and low pass envelope ENV-L. The partial quantization unit 64 supplies the quantized high pass envelope ENV-H and the low pass envelope ENV-L to the multiplexer 52, similarly to the quantization unit 12.

다중화부(52)는, 양자화부(51)의 결정부(61)로부터 공급되는 랜덤 플래그(RND), 및 부분 양자화부(64)로부터 공급되는 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H)를 다중화한다. 다중화부(52)는, 그 결과 얻어지는 비트 스트림을 출력한다. 이 비트 스트림은, 도시하지 않은 기록 매체에 기록되거나, 복호 장치에 전송된다.The multiplexer 52 includes a random flag RND supplied from the determination unit 61 of the quantization unit 51, a low pass envelope ENV-L and a low pass spectrum SP- supplied from the partial quantization unit 64. L) and the high pass envelope (ENV-H) are multiplexed. The multiplexer 52 outputs the resulting bit stream. This bit stream is recorded in a recording medium (not shown) or transmitted to the decoding device.

[부호화 장치에 있어서의 신호의 설명] [Description of Signal in Coding Device]

도 7은 도 6의 부호화 장치(50)의 MDCT부(11) 및 양자화부(51)로부터 출력되는 신호를 설명하는 도면이다.FIG. 7 is a diagram for explaining signals output from the MDCT unit 11 and the quantization unit 51 of the encoding apparatus 50 of FIG. 6.

도 7의 A에 도시한 바와 같이, MDCT부(11)로부터 출력되는 스펙트럼(SP)은, 전 대역의 스펙트럼이다. 이에 반해, 양자화부(51)로부터 출력되는 랜덤 플래그(RND) 이외의 신호는, 도 7의 B에 도시한 바와 같이, 저역 스펙트럼(SP-L), 저역 엔벨로프(ENV-L) 및 고역 엔벨로프(ENV-H)이다.As shown in A of FIG. 7, the spectrum SP output from the MDCT unit 11 is a spectrum of all bands. On the other hand, signals other than the random flag RND output from the quantization unit 51, as shown in B of FIG. 7, have a low-band spectrum (SP-L), a low-pass envelope (ENV-L), and a high-pass envelope ( ENV-H).

[부호화 장치의 처리의 설명] [Description of Processing of Encoding Apparatus]

도 8은 도 6의 부호화 장치(50)에 의한 부호화 처리를 설명하는 흐름도이다. 이 부호화 처리는, 예를 들어 부호화 장치(50)에 음성의 PCM 신호가 입력되었을 때 개시된다.FIG. 8 is a flowchart for describing an encoding process by the encoding device 50 of FIG. 6. This encoding process is started, for example, when a PCM signal of speech is input to the encoding apparatus 50. FIG.

도 8의 스텝 S51에 있어서, MDCT부(11)는, 도 2의 스텝 S11의 처리와 마찬가지로, 부호화 장치(50)에 입력된 음성의 시간 영역 신호인 PCM 신호에 대하여 MDCT를 행하여, 주파수 영역 신호인 스펙트럼(SP)을 생성한다. MDCT부(11)는, 생성된 스펙트럼(SP)을 양자화부(51)에 공급한다.In step S51 of FIG. 8, the MDCT part 11 performs MDCT with respect to the PCM signal which is the time-domain signal of the audio input to the encoding apparatus 50, similarly to the process of step S11 of FIG. Generate the phosphorus spectrum (SP). The MDCT unit 11 supplies the generated spectrum SP to the quantization unit 51.

스텝 S52에 있어서, 양자화부(51)의 결정부(61)는, MDCT부(11)로부터 공급되는 스펙트럼(SP) 중 고역 스펙트럼(SP-H)에 기초하여, 상술한 수학식 1에 의해, 고역 스펙트럼(SP-H)의 집중도 D를 결정한다.In step S52, the determination part 61 of the quantization part 51 is based on the high frequency spectrum SP-H among the spectrums SP supplied from the MDCT part 11, and by the above formula (1), Determine the concentration D of the high spectrum (SP-H).

스텝 S53에 있어서, 결정부(61)는, 집중도 D에 기초하여 랜덤 플래그(RND)를 결정한다. 결정부(61)는, 결정된 랜덤 플래그(RND)를 다중화부(52)에 공급하고, 처리를 스텝 S54로 진행된다.In step S53, the determination unit 61 determines the random flag RND based on the concentration degree D. The determination unit 61 supplies the determined random flag RND to the multiplexing unit 52, and the process proceeds to step S54.

스텝 S54 내지 S56의 처리는, 도 2의 스텝 S12 내지 S14의 처리와 마찬가지이므로, 설명은 생략한다.Since the process of step S54-S56 is the same as the process of step S12-S14 of FIG. 2, description is abbreviate | omitted.

스텝 S56의 처리 후, 스텝 S57에 있어서, 다중화부(52)는, 양자화부(51)로부터 공급되는 랜덤 플래그(RND), 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H)를 다중화하고, 그 결과 얻어지는 비트 스트림을 출력한다. 그리고 처리는 종료한다.After the process of step S56, in step S57, the multiplexing unit 52 supplies a random flag RND, a low pass envelope ENV-L, a low pass spectrum SP-L, and a high pass envelope supplied from the quantization unit 51. Multiplex (ENV-H) and output the resulting bit stream. The process then ends.

[복호 장치의 구성예] [Configuration example of decoding device]

도 9는 도 6의 부호화 장치(50)에 의해 부호화된 비트 스트림을 복호하는 복호 장치의 구성예를 도시하는 블록도이다.FIG. 9 is a block diagram illustrating a configuration example of a decoding device that decodes a bit stream encoded by the encoding device 50 of FIG. 6.

도 9의 복호 장치(70)는, 분해화부(71), 역양자화부(72), 고역 성분 생성부(73), 위상 랜덤부(74) 및 역 MDCT부(75)로 구성된다. 복호 장치(70)는, 대역 확장 처리를 저역 스펙트럼(SPL)의 복호 처리와 동시에 행한다.The decoding device 70 of FIG. 9 is comprised by the decomposition part 71, the inverse quantization part 72, the high frequency component generation part 73, the phase random part 74, and the inverse MDCT part 75. As shown in FIG. The decoding device 70 performs band extension processing simultaneously with the decoding processing of the low frequency spectrum SPL.

구체적으로는, 분해화부(71)(취득 수단)는, 도 6의 부호화 장치(50)에 의해 부호화된 비트 스트림을 취득한다. 분해화부(71)는, 그 비트 스트림을 랜덤 플래그(RND), 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H)로 분해하여, 역양자화부(72)에 공급한다.Specifically, the decomposition unit 71 (acquisition means) acquires a bit stream encoded by the encoding device 50 of FIG. 6. The decomposing unit 71 decomposes the bit stream into a random flag (RND), a low pass envelope (ENV-L), a low pass spectrum (SP-L), and a high pass envelope (ENV-H). To feed.

역양자화부(72)는, 도 3의 역양자화부(32)와 마찬가지로, 분해화부(71)로부터 공급되는 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H) 각각에 대하여 역양자화를 행한다.The inverse quantization unit 72, like the inverse quantization unit 32 of FIG. 3, has a low pass envelope (ENV-L), a low pass spectrum (SP-L), and a high pass envelope (ENV-H) supplied from the decomposition unit 71. Inverse quantization is performed for each of

역양자화부(72)는, 역양자화된 저역 엔벨로프(ENV-L)를 역 MDCT부(75)에 공급하고, 저역 스펙트럼(SP-L)을 역 MDCT부(75)와 고역 성분 생성부(73)에 공급한다. 또한, 역양자화부(72)는, 고역 엔벨로프(ENV-H)를 고역 성분 생성부(73)에 공급하고, 역양자화부(72)는, 랜덤 플래그(RND)를 위상 랜덤부(74)에 공급한다.The inverse quantization unit 72 supplies the inversely quantized low pass envelope (ENV-L) to the inverse MDCT unit 75, and supplies the low spectrum (SP-L) to the inverse MDCT unit 75 and the high frequency component generation unit 73. Supplies). In addition, the inverse quantization unit 72 supplies the high frequency envelope ENV-H to the high frequency component generation unit 73, and the inverse quantization unit 72 supplies the random flag RND to the phase random unit 74. Supply.

고역 성분 생성부(73)는, 역양자화부(72)로부터 공급되는 저역 스펙트럼(SP-L)과 고역 엔벨로프(ENV-H)를 사용하여 고역의 스펙트럼을 생성하여, 의사 고역 스펙트럼으로 한다. 구체적으로는, 예를 들어 고역 성분 생성부(73)는, 저역 스펙트럼(SP-L)을 복제하고, 복제된 스펙트럼을 고역 엔벨로프(ENV-H)를 사용하여 변형시켜, 의사 고역 스펙트럼으로 한다.The high pass component generator 73 generates a high pass spectrum using the low pass spectrum SP-L and the high pass envelope ENV-H supplied from the inverse quantization unit 72 to form a pseudo high pass spectrum. Specifically, for example, the high frequency component generator 73 replicates the low frequency spectrum (SP-L), transforms the copied spectrum using the high frequency envelope (ENV-H), and sets it as a pseudo high frequency spectrum.

이 의사 고역 스펙트럼의 생성 방법으로서는, 예를 들어 본 출원인이 먼저 출원한 특허문헌 1에 기재된 방법을 사용할 수도 있고, 그 이외의 방법을 사용할 수도 있다. 고역 성분 생성부(73)는, 생성된 의사 고역 스펙트럼을 위상 랜덤부(74)에 공급한다.As a method of generating this pseudo high-band spectrum, for example, the method described in Patent Literature 1 filed by the present applicant may be used, or a method other than that may be used. The high pass component generator 73 supplies the generated pseudo high pass spectrum to the phase random unit 74.

위상 랜덤부(74)는, 역양자화부(72)로부터 공급되는 랜덤 플래그(RND)에 기초하여, 고역 성분 생성부(73)로부터 공급되는 의사 고역 스펙트럼의 위상을 랜덤화한다.The phase random part 74 randomizes the phase of the pseudo high frequency spectrum supplied from the high frequency component generation part 73 based on the random flag RND supplied from the inverse quantization part 72.

구체적으로는, 위상 랜덤부(74)는, 랜덤 플래그(RND)가 랜덤화되는 것을 나타내는 1인 경우, 이하의 수학식 2에 의해, 의사 고역 스펙트럼의 부호(sign, +/-)를 랜덤화한다.Specifically, when the phase random unit 74 is 1 indicating that the random flag RND is randomized, the phase random unit 74 randomizes the sign (+/-) of the pseudo high frequency spectrum by the following equation (2). do.

또한, 수학식 2에 있어서, SP-H는 고역 스펙트럼을 나타내고, i는 스펙트럼 번호를 나타낸다.In Equation 2, SP-H represents a high frequency spectrum and i represents a spectrum number.

수학식 2에 의하면, 「-1」을 랜덤 함수 rand()의 리턴값의 하위 1비트의 횟수만큼 곱함으로써, 고역 스펙트럼(SP-H)의 부호가 -1이나 1 중 어느 하나에 랜덤하게 할당된다.According to equation (2), the sign of the high-spectrum (SP-H) is randomly assigned to either -1 or 1 by multiplying "-1" by the number of lower 1 bits of the return value of the random function rand (). do.

한편, 랜덤 플래그(RND)가 랜덤화되지 않는 것을 나타내는 0인 경우, 위상 랜덤부(74)는, 의사 고역 스펙트럼의 위상을 랜덤화하지 않는다.On the other hand, when it is 0 indicating that the random flag RND is not randomized, the phase random unit 74 does not randomize the phase of the pseudo high pass spectrum.

위상 랜덤부(74)는, 위상이 랜덤화된 의사 고역 스펙트럼 또는 위상이 랜덤화되지 않은 의사 고역 스펙트럼을, 역 MDCT부(75)에 공급한다.The phase random part 74 supplies the pseudo high frequency spectrum whose phase was randomized or the pseudo high frequency spectrum which is not randomized to the inverse MDCT part 75.

역 MDCT부(75)(합성 수단)는, 역양자화부(72)로부터 공급되는 저역 엔벨로프(ENV-L)를 사용하여, 저역 스펙트럼(SP-L)을 역정규화한다. 그리고, 역 MDCT부(75)는, 역정규화된 저역 스펙트럼(SP-L)과 위상 랜덤부(74)로부터 공급되는 의사 고역 스펙트럼을 합성한다. 역 MDCT부(75)는, 합성의 결과 얻어지는 주파수 영역 신호인 전 대역의 스펙트럼에 대하여 역 MDCT를 행하여, 시간 영역 신호인 전 대역의 PCM 신호를 얻는다. 역 MDCT부(75)는, 그 전 대역의 PCM 신호를 복호 결과로서 출력한다.The inverse MDCT portion 75 (synthesizing means) denormalizes the low frequency spectrum SP-L using the low frequency envelope ENV-L supplied from the inverse quantization portion 72. The inverse MDCT unit 75 synthesizes the denormalized low frequency spectrum (SP-L) and the pseudo high frequency spectrum supplied from the phase random unit 74. The inverse MDCT unit 75 performs inverse MDCT on the spectrum of the entire band which is the frequency domain signal obtained as a result of the synthesis, and obtains the PCM signal of the full band which is the time domain signal. The inverse MDCT unit 75 outputs the PCM signal of the entire band as a decoding result.

이상과 같이, 복호 장치(70)는, 저역 스펙트럼(SP-L)의 복호와 동시에, 의사 고역 스펙트럼의 생성을 행한다. 따라서, 복호 장치(70)에 있어서 복호에 필요로 하는 시간은, 복호만을 행하는 통상의 복호 장치에 있어서 복호에 필요로 하는 시간과 대략 동일하다. 즉, 도 9의 복호 장치(70)에서는, 비트 스트림이 입력되고 나서, 시각 T0 후에 복호 결과를 출력할 수 있다. 즉, 복호 장치(70)에서는, 대역 확장에 의한 지연이 발생하지 않는다.As described above, the decoding device 70 simultaneously generates a low frequency spectrum (SP-L) and generates a pseudo high frequency spectrum. Therefore, the time required for decoding in the decoding device 70 is approximately equal to the time required for decoding in the normal decoding device which performs only decoding. That is, in the decoding device 70 of FIG. 9, after the bit stream is input, the decoding result can be output after time T0. That is, in the decoding device 70, no delay due to band extension occurs.

[복호 장치에 있어서의 신호의 설명] [Explanation of Signal in Decoding Device]

도 10은 도 9의 복호 장치(70)의 역 MDCT부(75)로부터 출력되는 신호를 설명하는 도면이다.FIG. 10 is a diagram for explaining a signal output from the inverse MDCT unit 75 of the decoding device 70 of FIG. 9.

역 MDCT부(75)로부터 출력되는 신호는, 도 10에 도시한 바와 같은 저역 엔벨로프(ENV-L)를 사용하여 정규화된 저역 스펙트럼(SP-L)과, 도 10에 도시한 바와 같은 고역 엔벨로프(ENV-H)와 저역 스펙트럼(SP-L)으로 생성된 의사 고역 스펙트럼의 합성 결과의 주파수 변환 후의 PCM 신호이다.The signal output from the inverse MDCT unit 75 includes the low-pass spectrum SP-L normalized using the low-pass envelope ENV-L as shown in FIG. 10, and the high-pass envelope shown in FIG. 10. PCV signal after frequency conversion of the result of synthesis of the pseudo high pass spectrum generated in the ENV-H) and the low pass spectrum (SP-L).

[위상의 랜덤화에 의한 효과의 설명] [Explanation of Effect by Randomizing Phase]

도 11 내지 도 16은, 도 9의 위상 랜덤부(74)에 의한 위상의 랜덤화의 효과를 설명하는 도면이다.11-16 is a figure explaining the effect of the phase randomization by the phase random part 74 of FIG.

도 11은 위상의 랜덤화의 유무에 따른 복호 결과의 차를 설명하는 도면이다.11 is a diagram illustrating a difference of decoding results with or without phase randomization.

도 11에 도시한 바와 같이, 도 6의 부호화 장치(50)에서는, 프레임이라고 불리는 일정한 길이를 갖는 구간마다 PCM 신호가 부호화되지만, 그 프레임은, 통상 50％씩 오버랩되어 설정된다. 구체적으로는, 도 11에 도시한 바와 같이, J-1번째 프레임과, 그 다음 J번째 프레임은 0.5프레임만큼 오버랩하여 설정된다.As illustrated in FIG. 11, in the encoding device 50 of FIG. 6, the PCM signal is encoded for each section having a constant length called a frame. However, the frames are usually overlapped and set by 50%. Specifically, as shown in Fig. 11, the J-1st frame and the next Jth frame are set overlapping by 0.5 frames.

도 11에서는, 도 11의 좌측에 도시한 바와 같이, 톤성이 높은 스펙트럼이 부호화되어 있는 경우에 대하여 설명한다.In FIG. 11, the case where the spectrum with high tonality is encoded as shown to the left of FIG. 11 is demonstrated.

이 경우, 도 11의 우측의 상단에 도시한 바와 같이, J-1번째와 J번째 프레임의 스펙트럼의 복호 시에 스펙트럼의 위상이 랜덤화되지 않으면, J-1번째와 J번째 프레임의 오버랩 기간의 스펙트럼의 위상은, J-1번째와 J번째 프레임의 스펙트럼과 부호의 합성에 의해 정확하게 복원된다. 따라서, 복원된 오버랩 기간의 스펙트럼은, 톤성이 높은 스펙트럼으로 된다.In this case, as shown in the upper right corner of FIG. 11, if the phase of the spectrum is not randomized at the time of decoding the spectrum of the J-1st and Jth frames, the overlap period of the J-1st and Jth frames is determined. The phase of the spectrum is correctly restored by combining the spectra and codes of the J-1 < th > and J < th > frames. Therefore, the spectrum of the restored overlap period becomes a spectrum with high tone.

한편, 우측의 하단에 도시한 바와 같이, J-1번째와 J번째 프레임의 스펙트럼의 복호 시에 스펙트럼의 위상이 랜덤화되면, J-1번째와 J번째 프레임의 스펙트럼의 부호는 반드시 일치하게 되지는 않는다. 따라서, 오버랩 기간의 스펙트럼의 위상은, 정확하게 복원되지 않는다. 따라서, 복호 장치(70)에 있어서 복원된 오버랩 기간의 신호는, 부호화 전의 스펙트럼이 갖고 있던 톤성이 무너진 스펙트럼으로 된다.On the other hand, as shown in the lower right, if the phase of the spectrum is randomized at the time of decoding the spectrum of the J-1st and Jth frames, the signs of the spectra of the J-1st and Jth frames do not necessarily coincide. Do not. Therefore, the phase of the spectrum of the overlap period is not correctly restored. Therefore, the signal of the overlap period restored in the decoding device 70 becomes a spectrum in which the tone characteristics of the spectrum before encoding have collapsed.

스펙트럼의 톤성이 무너지면, 원래 특정한 스펙트럼에 집중하고 있어야 할 에너지가 주위의 스펙트럼에 누출되어 버린다. 이에 의해, 본래의 스펙트럼에 비해 스펙트럼의 피크(산)가 억제되어, 주위로 누출되기 시작한 에너지가 스펙트럼의 골짜기의 에너지를 밀어 올린다. 그 결과, 스펙트럼이 노이즈성을 갖게 된다.When the tonality of the spectrum collapses, the energy that should be originally concentrated in the particular spectrum leaks into the surrounding spectrum. As a result, the peak (acid) of the spectrum is suppressed as compared with the original spectrum, and the energy that begins to leak to the surroundings pushes up the energy of the valley of the spectrum. As a result, the spectrum is noisy.

이상과 같이, 복호 시에 위상의 랜덤화가 행해지면, 부호화 전에 톤성을 갖고 있던 스펙트럼이, 노이즈성을 갖는 스펙트럼으로 변환된다.As described above, when the phase is randomized at the time of decoding, the spectrum having tone characteristics before encoding is converted into a spectrum having noise characteristics.

도 12 내지 도 16은 고역 스펙트럼(SP-H)의 특성에 대하여 설명하는 도면이다.12-16 is a figure explaining the characteristic of the high frequency spectrum (SP-H).

도 12의 A에 도시한 바와 같이, 저역 스펙트럼(SP-L)의 톤성이 높은 경우, 고역 스펙트럼(SP-H)의 톤성도 높은 경우가 많다. 이것은, 관악기, 현악기와 같은 악기류가, 기본 주파수와 그 정수배의 고조파 성분을 조합한 음파를 발하고 있는 점으로부터 추측할 수 있다.As shown in FIG. 12A, when the tone of the low-band spectrum SP-L is high, the tone of the high-band spectrum SP-H is also high in many cases. This can be estimated from the fact that musical instruments such as wind instruments and string instruments emit sound waves combining a fundamental frequency and a harmonic component of an integer multiple thereof.

이렇게 톤성이 높은 저역 스펙트럼(SP-L)과 고역 스펙트럼(SP-H)으로 이루어지는 스펙트럼이 대역 확장 부호화된 경우, 대역 확장 복호 시에, 의사 고역 스펙트럼이 저역 스펙트럼(SP-L)을 단순하게 반환함으로써 생성되면, 도 12의 B에 도시한 바와 같이, 의사 고역 스펙트럼은 톤성이 높은 스펙트럼으로 된다. 따라서, 복호 결과에 대응하는 음성은, 청각적으로 위화감이 적은 음성으로 된다.When the spectrum consisting of the high-tone low-band spectrum (SP-L) and the high-band spectrum (SP-H) is band extension coded, the pseudo high band spectrum simply returns the low band spectrum (SP-L) during band extension decoding. When generated by this, as shown in FIG. 12B, the pseudo high pass spectrum is a high tone spectrum. Therefore, the voice corresponding to the decoding result is an audio with a low sense of discomfort.

따라서, 도 6의 부호화 장치(50)는, 집중도 D가 미리 설정되어 있는 임계값보다도 큰 경우, 즉 부호화 대상의 음성의 고역 성분에 톤성이 있는 경우, 랜덤 플래그(RND)를 0으로 한다. 이에 의해, 복호 장치(70)에서는, 의사 고역 스펙트럼의 위상이 랜덤화되지 않으므로, 복호 결과에 대응하는 음성은, 청각적으로 위화감이 적은 음성으로 된다.Therefore, the encoding device 50 of FIG. 6 sets the random flag RND to 0 when the concentration degree D is larger than a preset threshold value, that is, when the high frequency component of the audio to be encoded has tone. As a result, in the decoding device 70, since the phase of the pseudo high-band spectrum is not randomized, the voice corresponding to the decoding result is audibly low voice.

한편, 도 13의 A 및 도 14의 A에 도시한 바와 같이, 저역 스펙트럼(SP-L)의 노이즈성이 높은 경우, 고역으로 될수록 보다 노이즈성이 높아진다. 이것은, 노이즈성이 높은, 즉 비톤성을 갖는 타격음이나 충격음 등의 소리를 발하는 심벌이나 마라카스 등의 악기에 있어서, 고역의 진동일수록 악기 내에서 전파되기 때문에, 고역의 음일수록 각 진동 요소의 진폭이나 위상이 복잡하게 얽혀, 노이즈성이 높아지는 점으로부터 추측할 수 있다.On the other hand, as shown in FIG. 13A and FIG. 14A, when the noise of the low pass spectrum (SP-L) is high, the higher the higher the noise, the higher the noise. In the case of cymbals or maracas, which emit noises such as hitting sounds or impact sounds having high noise, i.e., non-toning characteristics, the higher frequencies of vibrations propagate in the instruments, so the higher frequencies of the vibration element It can be estimated from the fact that the phases are entangled in complexity and the noise is increased.

이렇게 노이즈성이 높은 저역 스펙트럼(SP-L)과 고역 스펙트럼(SP-H)으로 이루어지는 스펙트럼이 대역 확장 부호화된 경우, 도 13의 B에 도시한 바와 같이, 대역 확장 복호 시에 저역 스펙트럼(SP-L)을 사용하여 생성되는 의사 고역 스펙트럼은, 노이즈성이 높은 스펙트럼으로 된다. 따라서, 도 13의 B에 도시한 바와 같이 의사 고역 스펙트럼의 위상의 랜덤화가 행해지지 않든, 도 14의 B에 도시한 바와 같이 랜덤화가 행해지든, 의사 고역 스펙트럼의 노이즈성은 높아져, 복호 결과에 대응하는 음성은 청각적으로 위화감이 적은 음성으로 된다.When the spectrum consisting of the low-noise low-frequency spectrum SP-L and the high-frequency spectrum SP-H is thus subjected to band extension coding, as shown in B of FIG. 13, the low band spectrum SP- is used during band extension decoding. The pseudo high pass spectrum generated using L) becomes a spectrum with high noise. Therefore, whether the randomization of the pseudo high-band spectrum is performed as shown in FIG. 13B or the randomization is performed as shown in FIG. 14B, the noise of the pseudo high-band spectrum becomes high, and thus corresponds to the decoding result. The voice is audibly voiced with less discomfort.

그러나, 심벌이나 마라카스 등의 악기의 노이즈성이 높은 소리라도, 저역 성분에는 톤적인 진동 성분이 포함되어 있는 경우가 있다. 또한, 심벌이나 마라카스 등의 악기의 소리의 주파수는 주로 고역이며, 저역 성분에는 다른 톤성이 높은 음성이 포함되어 있을 가능성도 있다. 따라서, 도 15의 A나 도 16의 A에 도시한 바와 같이, 고역 스펙트럼(SP-H)의 노이즈성이 높은 경우에도 저역 스펙트럼(SP-L)의 톤성이 높은 경우가 있다.However, even if the noise of a musical instrument, such as a cymbal or a maracas, is high, the low-frequency component may contain the tone vibration component. Moreover, the frequency of the sound of musical instruments, such as a cymbal and a maraca, is mainly high frequency, and the low frequency component may contain the other high tone voice. Therefore, as shown in FIG. 15A or FIG. 16A, even when the noise of the high frequency spectrum SP-H is high, the tone of the low frequency spectrum SP-L may be high.

이러한 톤성이 높은 저역 스펙트럼(SP-L)과 노이즈성이 높은 고역 스펙트럼(SP-H)으로 이루어지는 스펙트럼이 대역 확장 부호화된 경우, 도 15의 B에 도시한 바와 같이, 대역 확장 복호 시에 저역 스펙트럼(SP-L)을 사용하여 생성되는 의사 고역 스펙트럼에는 톤성 성분이 포함되어 있을 가능성이 있다. 따라서, 도 15의 B에 도시한 바와 같이 의사 고역 스펙트럼의 위상이 랜덤화되지 않으면, 복호 결과에 대응하는 고역의 음성이, 본래의 노이즈성을 갖지 않고, 저역의 음성과 마찬가지로 톤성을 갖게 되어, 청각적으로 위화감이 많은 음성으로 된다.When the spectrum consisting of the high-tone low-frequency spectrum (SP-L) and the high-noise high-frequency spectrum (SP-H) is band-extended coded, as shown in B of FIG. 15, the low-band spectrum at the time of band extension decoding The pseudo high pass spectrum generated using (SP-L) may contain tonal components. Therefore, if the phase of the pseudo high-band spectrum is not randomized as shown in Fig. 15B, the high-frequency voice corresponding to the decoding result has no inherent noise, and has tone like the low-band voice, It is audibly discordant voice.

이에 반해, 의사 고역 스펙트럼의 위상이 랜덤화되면, 원래의 의사 고역 스펙트럼에 톤성 성분이 포함되어 있는 경우에도, 도 16의 B에 도시한 바와 같이 랜덤화 후의 의사 고역 스펙트럼은 노이즈성을 갖는다. 따라서, 복호 결과에 대응하는 음성은, 청각적으로 위화감이 적은 음성으로 된다.On the other hand, when the phase of the pseudo high pass spectrum is randomized, even when the tonal component is included in the original pseudo high pass spectrum, the pseudo high pass spectrum after randomization has noise as shown in FIG. 16B. Therefore, the voice corresponding to the decoding result is an audio with a low sense of discomfort.

이상과 같이, 고역 스펙트럼(SP-H)이 노이즈성을 갖는 경우, 저역 스펙트럼(SP-L)도 노이즈성을 갖는 경우에는, 랜덤화는 행해지든 행해지지 않든 상관없지만, 저역 스펙트럼(SP-L)이 톤성을 갖는 경우에는 랜덤화를 행할 필요가 있다. 따라서, 고역 스펙트럼(SP-H)이 노이즈성을 갖는 경우, 항상 랜덤화가 행해지도록 함으로써, 집중도 D에 기초하여 청각적으로 위화감이 적은 복호 결과가 얻어지도록 할 수 있다.As described above, when the high-band spectrum SP-H has noise, when the low-band spectrum SP-L also has noise, it does not matter whether randomization is performed or not, but the low-band spectrum SP-L If)) has tone, it is necessary to perform randomization. Therefore, when the high-spectrum SP-H has noise, it is possible to always perform randomization, so that a decoding result with an aural sense of incongruity can be obtained on the basis of the concentration D.

따라서, 도 6의 부호화 장치(50)는, 집중도 D가 미리 설정되어 있는 임계값 이하인 경우, 즉 부호화 대상의 음성의 고역 성분에 노이즈성이 있는 경우, 랜덤 플래그(RND)를 1로 한다. 이에 의해, 복호 장치(70)에서는, 의사 고역 스펙트럼의 위상이 랜덤화되므로, 복호 결과에 대응하는 음성은 청각적으로 위화감이 적은 음성으로 된다.Therefore, the encoding apparatus 50 of FIG. 6 sets the random flag RND to 1 when the concentration degree D is equal to or less than a preset threshold value, that is, when there is noise in the high frequency component of the audio to be encoded. As a result, in the decoding device 70, the phase of the pseudo high-band spectrum is randomized, so that the voice corresponding to the decoding result is audible with a low sense of discomfort.

또한, 저역이고 노이즈성이 높고, 고역이고 톤성이 높은 음성은 자연계에 거의 존재하지 않기 때문에, 노이즈성이 높은 저역 스펙트럼(SP-L)과 톤성이 높은 고역 스펙트럼(SP-H)으로 이루어지는 스펙트럼에 대해서는 고려하지 않는다.In addition, since low-frequency, high-noise, high-tone voices are hardly present in the natural world, they are applied to a spectrum composed of a high-noise low-band spectrum (SP-L) and a high-tone high-frequency spectrum (SP-H). It does not take into account.

[복호 장치의 처리의 설명] [Description of Processing of Decryption Apparatus]

도 17은 도 9의 복호 장치(70)에 의한 복호 처리를 설명하는 흐름도이다. 이 복호 처리는, 예를 들어 부호화 장치(50)에 의해 부호화된 비트 스트림이 복호 장치(70)에 입력되었을 때 개시된다.FIG. 17 is a flowchart for describing decoding processing by the decoding device 70 of FIG. 9. This decoding process is started, for example, when the bit stream encoded by the encoding device 50 is input to the decoding device 70.

도 17의 스텝 S71에 있어서, 분해화부(71)는, 부호화 장치(50)에 의해 부호화된 비트 스트림을 취득하고, 그 비트 스트림을 랜덤 플래그(RND), 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H)로 분해한다. 분해화부(71)는, 랜덤 플래그(RND), 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H)를 역양자화부(72)에 공급한다.In step S71 of FIG. 17, the decomposition unit 71 obtains a bit stream encoded by the encoding device 50, and sets the bit stream as a random flag (RND), a low pass envelope (ENV-L), and a low pass spectrum. Dissolve into (SP-L) and high-pass envelopes (ENV-H). The decomposing unit 71 supplies the random flag RND, the low pass envelope ENV-L, the low pass spectrum SP-L, and the high pass envelope ENV-H to the inverse quantization unit 72.

스텝 S72에 있어서, 역양자화부(72)는, 분해화부(71)로부터 공급되는 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H) 각각에 대하여 역양자화를 행한다. 역양자화부(72)는, 역양자화된 저역 엔벨로프(ENV-L)를 역 MDCT부(75)에 공급하고, 저역 스펙트럼(SP-L)을 역 MDCT부(75)와 고역 성분 생성부(73)에 공급한다. 또한, 역양자화부(72)는, 고역 엔벨로프(ENV-H)를 고역 성분 생성부(73)에 공급하고, 역양자화부(72)는, 랜덤 플래그(RND)를 위상 랜덤부(74)에 공급한다.In step S72, the inverse quantization unit 72 performs inverse quantization with respect to each of the low pass envelope ENV-L, low pass spectrum SP-L, and high pass envelope ENV-H supplied from the decomposition unit 71. Do it. The inverse quantization unit 72 supplies the inversely quantized low pass envelope (ENV-L) to the inverse MDCT unit 75, and supplies the low spectrum (SP-L) to the inverse MDCT unit 75 and the high frequency component generation unit 73. Supplies). In addition, the inverse quantization unit 72 supplies the high frequency envelope ENV-H to the high frequency component generation unit 73, and the inverse quantization unit 72 supplies the random flag RND to the phase random unit 74. Supply.

스텝 S73에 있어서, 고역 성분 생성부(73)는, 역양자화부(72)로부터 공급되는 저역 스펙트럼(SP-L)과 고역 엔벨로프(ENV-H)를 사용하여 의사 고역 스펙트럼을 생성한다. 고역 성분 생성부(73)는, 생성된 의사 고역 스펙트럼을 위상 랜덤부(74)에 공급한다.In step S73, the high pass component generator 73 generates a pseudo high pass spectrum using the low pass spectrum SP-L and the high pass envelope ENV-H supplied from the inverse quantization unit 72. The high pass component generator 73 supplies the generated pseudo high pass spectrum to the phase random unit 74.

스텝 S74에 있어서, 위상 랜덤부(74)는, 역양자화부(72)로부터 공급되는 랜덤 플래그(RND)가 1인지의 여부를 판정한다. 스텝 S74에서 랜덤 플래그(RND)가 1이라고 판정된 경우, 스텝 S75에 있어서, 위상 랜덤부(74)는, 상술한 수학식 2에 의해, 고역 성분 생성부(73)로부터 공급되는 의사 고역 스펙트럼의 위상을 랜덤화한다. 그리고, 위상 랜덤부(74)는, 위상이 랜덤화된 의사 고역 스펙트럼을 역 MDCT부(75)에 공급하고, 처리를 스텝 S76으로 진행시킨다.In step S74, the phase random part 74 determines whether the random flag RND supplied from the inverse quantization part 72 is one. When it is determined in step S74 that the random flag RND is 1, in step S75, the phase random unit 74 of the pseudo high pass spectrum supplied from the high pass component generation unit 73 is expressed by the above expression (2). Randomize the phase. And the phase random part 74 supplies the pseudo high frequency spectrum whose phase was randomized to the inverse MDCT part 75, and advances a process to step S76.

한편, 스텝 S74에서 랜덤 플래그(RND)가 1이 아닌, 즉 랜덤 플래그(RND)가 0이라고 판정된 경우, 위상 랜덤부(74)는, 의사 고역 스펙트럼의 위상을 랜덤화하지 않고, 그대로 역 MDCT부(75)에 공급한다. 그리고, 처리는 스텝 S76으로 진행한다.On the other hand, when it is determined in step S74 that the random flag RND is not 1, that is, the random flag RND is 0, the phase random unit 74 does not randomize the phase of the pseudo high-band spectrum and inverts MDCT as it is. It supplies to the part 75. The process then proceeds to step S76.

스텝 S76에 있어서, 역 MDCT부(75)는, 역양자화부(32)로부터 공급되는 저역 엔벨로프(ENV-L)를 사용하여, 저역 스펙트럼(SP-L)을 역정규화한다.In step S76, the inverse MDCT unit 75 denormalizes the low pass spectrum SP-L by using the low pass envelope ENV-L supplied from the inverse quantization unit 32.

스텝 S77에 있어서, 역 MDCT부(75)는, 역정규화된 저역 스펙트럼(SP-L)과 위상 랜덤부(74)로부터 공급되는 의사 고역 스펙트럼을 합성하고, 그 결과 얻어지는 전 대역의 스펙트럼에 대하여 역 MDCT를 행하여, 전 대역의 PCM 신호를 얻는다. 그리고, 역 MDCT부(75)는, 그 전 대역의 PCM 신호를 복호 결과로서 출력하고, 처리를 종료한다.In step S77, the inverse MDCT unit 75 synthesizes the pseudonormalized low frequency spectrum (SP-L) and the pseudo high frequency spectrum supplied from the phase random unit 74, and inverses the spectrum of the entire band obtained as a result. MDCT is performed to obtain PCM signals of all bands. The inverse MDCT unit 75 outputs the PCM signal of the entire band as a decoding result and ends the processing.

이상과 같이, 복호 장치(70)는, 역 MDCT 전의 저역 스펙트럼(SP-L)을 사용하여 의사 고역 스펙트럼을 생성하고, 고역 스펙트럼(SP-H)의 집중도에 기초하여 결정된 랜덤 플래그(RND)에 따라 의사 고역 스펙트럼을 랜덤화함으로써, 부호화 대상의 음성의 스펙트럼의 고역 성분을 복원한다.As described above, the decoding device 70 generates a pseudo high frequency spectrum by using the low frequency spectrum (SP-L) before the inverse MDCT, and adds the pseudo high frequency spectrum to the random flag RND determined based on the concentration of the high frequency spectrum SP-H. By randomizing the pseudo high-band spectrum, the high-band component of the spectrum of the audio to be encoded is restored.

이에 의해, 저역 스펙트럼(SP-L)을 사용하여, 고역 스펙트럼(SP-H)에 비교적 합치하는 스펙트럼을, 부호화 대상의 음성의 스펙트럼의 고역 성분으로서 복원할 수 있다. 따라서, 저역 스펙트럼(SP-L)을 사용하여 부호화 대상의 음성의 스펙트럼의 고역 성분을 복원함으로써, 저역 스펙트럼(SP-L)의 복호 처리와 대역 확장 처리를 동시에 행할 수 있어, 대역 확장에 의한 지연 시간을 삭감할 수 있다. 그 결과, 답답해지지 않고, 청명하고 듣기 좋은 전 대역의 음성의 PCM 신호가, 복호 결과로서, 대역 확장 처리를 행하지 않는 복호 장치의 경우와 대략 동일한 시간 경과 후에 출력된다.Thereby, using the low frequency spectrum SP-L, a spectrum relatively consistent with the high frequency spectrum SP-H can be reconstructed as a high frequency component of the spectrum of the audio to be encoded. Therefore, by using the low-band spectrum (SP-L) to restore the high-band component of the spectrum of the audio to be encoded, the low-band spectrum (SP-L) decoding and band extension processing can be performed simultaneously, resulting in delay due to band extension. You can cut your time. As a result, the PCM signal of the audio of the full band which is not frustrated and is clear and audible is output as the decoding result after approximately the same time as that of the decoding apparatus which does not perform the band extension process.

또한, 복호 장치(70)는, 저역 스펙트럼(SP-L)을 사용하여 생성된 의사 고역 스펙트럼의 위상을 랜덤화함으로써, 노이즈성을 갖는 의사 고역 스펙트럼을 생성하므로, 그저 간단히 랜덤한 스펙트럼을 의사 고역 스펙트럼으로서 생성하는 경우에 비해, 보다 고역 스펙트럼(SP-H)에 합치한 의사 고역 스펙트럼을 생성할 수 있다.In addition, since the decoding device 70 generates a pseudo high frequency spectrum with noise by randomizing the phase of the pseudo high frequency spectrum generated by using the low frequency spectrum (SP-L), the simple high frequency spectrum is simply a random high frequency spectrum. Compared with the case where the spectrum is generated as a spectrum, a pseudo high-band spectrum more consistent with the high-band spectrum (SP-H) can be generated.

또한, 복호 장치(70)는, 역 MDCT 전에 스펙트럼의 저역 성분과 고역 성분을 생성하므로, 대역 확장 처리를 위하여, 도 3의 복호 장치(30)와 같이 대역 분할 필터(41) 및 대역 합성 필터(43)를 구비할 필요가 없다. 따라서, 도 3의 복호 장치(30)에 비해, 대역 확장 처리를 위한 처리량, 회로 규모, 코드 크기 등의 리소스를 삭감할 수 있다.In addition, since the decoding device 70 generates the low and high frequency components of the spectrum before the inverse MDCT, the band division filter 41 and the band synthesis filter (such as the decoding device 30 of FIG. 3) for the band extension processing. 43) need not be provided. Therefore, compared with the decoding apparatus 30 of FIG. 3, resources, such as a throughput, a circuit scale, and a code size, for band expansion processing can be reduced.

<제2 실시 형태> &Lt; Second Embodiment >

[복호 장치의 제2 실시 형태의 구성예] [Configuration example of second embodiment of decoding device]

도 18은 본 발명을 적용한 복호 장치의 제2 실시 형태의 구성예를 도시하는 블록도이다.It is a block diagram which shows the structural example of 2nd Embodiment of the decoding apparatus which applied this invention.

도 18에 도시한 구성 중, 도 3이나 도 9의 구성과 동일한 구성에는 동일한 부호를 부여하고 있다. 중복하는 설명에 대해서는 적절히 생략한다.The same code | symbol is attached | subjected to the structure similar to the structure of FIG. 3 or FIG. 9 among the structure shown in FIG. Overlapping descriptions are omitted as appropriate.

도 18의 복호 장치(100)의 구성은, 주로 분해화부(71), 역양자화부(72) 대신에 분해화부(31), 역양자화부(32)가 설치되어 있는 점 및 새롭게 결정부(101)가 설치되어 있는 점이, 도 9의 복호 장치(70)의 구성과 상이하다. 복호 장치(100)는, 도 1의 부호화 장치(10)에 의해 부호화된 비트 스트림에 포함되는 저역 스펙트럼(SP-L)에 기초하여 랜덤 플래그(RND)를 결정한다.The configuration of the decoding device 100 of FIG. 18 mainly includes the decomposition part 31 and the inverse quantization part 32 provided in place of the decomposition part 71 and the inverse quantization part 72 and the newly determined part 101. ) Is different from the configuration of the decoding device 70 in FIG. 9. The decoding device 100 determines the random flag RND based on the low-band spectrum SP-L included in the bit stream encoded by the encoding device 10 of FIG. 1.

구체적으로는, 결정부(101)는, 역양자화부(32)에 의해 역양자화된 저역 스펙트럼(SP-L)에 기초하여, 예를 들어 이하의 수학식 3에 의해 저역 스펙트럼(SP-L)의 집중도 D'를 결정한다.Specifically, the determination unit 101 is based on the low-band spectrum SP-L dequantized by the inverse quantization unit 32, for example, by the following equation (3). Determine the concentration of D '.

또한, 수학식 3에 있어서, max(SP-L)는, 저역 스펙트럼(SP-L)의 최대값을 나타내고, ave(SP-L)는, 저역 스펙트럼(SP-L)의 평균값을 나타낸다.In formula (3), max (SP-L) represents the maximum value of the low-pass spectrum SP-L, and ave (SP-L) represents the average value of the low-band spectrum SP-L.

수학식 3에 의하면, 부호화 대상의 음성의 저역 성분의 톤성이 높고, 저역 스펙트럼(SP-L)의 분포에 큰 치우침이 있는 경우, 집중도 D'는 커져, 부호화 대상의 음성의 저역 성분의 노이즈성이 높고, 저역 스펙트럼(SP-L)의 분포가 평탄한 경우, 집중도 D'는 작아진다.According to Equation 3, when the tone of the low frequency component of the audio to be encoded is high and there is a large bias in the distribution of the low frequency spectrum (SP-L), the concentration D 'becomes large and the noise of the low frequency component of the audio to be encoded is increased. When this is high and the distribution of the low-pass spectrum SP-L is flat, the concentration degree D 'becomes small.

결정부(101)는, 집중도 D'에 기초하여 랜덤 플래그(RND)를 결정한다. 구체적으로는, 집중도 D가, 복호 장치(100)에 미리 설정되어 있는 임계값보다도 큰 경우, 즉 저역 스펙트럼(SP-L)의 톤성이 높은 경우, 결정부(101)는, 랜덤 플래그(RND)를 0으로 결정한다. 한편, 집중도 D'가 미리 설정되어 있는 임계값 이하인 경우, 즉 저역 스펙트럼(SP-L)의 노이즈성이 높은 경우, 결정부(101)는 랜덤 플래그(RND)를 1로 결정한다. 그리고, 결정부(101)는, 결정된 랜덤 플래그(RND)를 위상 랜덤부(74)에 공급한다. 이에 의해, 저역 스펙트럼(SP-L)의 톤성이 높은 경우, 의사 고역 스펙트럼의 위상이 랜덤화되지 않고, 저역 스펙트럼(SP-L)의 노이즈성이 높은 경우, 의사 고역 스펙트럼의 위상이 랜덤화된다. 그 결과, 복호 결과에 대응하는 음성은, 청각적으로 충분한 음질의 음성으로 된다.The determination unit 101 determines the random flag RND based on the concentration degree D '. Specifically, when the concentration degree D is larger than the threshold value preset in the decoding device 100, that is, when the tone property of the low-band spectrum SP-L is high, the determination unit 101 determines the random flag RND. Determine 0 as 0. On the other hand, when the concentration degree D 'is equal to or less than a preset threshold value, that is, when the noise of the low-band spectrum SP-L is high, the determination unit 101 determines the random flag RND as one. The determination unit 101 then supplies the determined random flag RND to the phase random unit 74. As a result, the phase of the pseudo high frequency spectrum is not randomized when the tone of the low frequency spectrum SP-L is high, and the phase of the pseudo high frequency spectrum is randomized when the noise of the low frequency spectrum SP-L is high. . As a result, the voice corresponding to the decoding result is an audio of sufficient audio quality.

도 19는 도 18의 복호 장치(100)에 의한 복호 처리를 설명하는 흐름도이다. 이 복호 처리는, 예를 들어 도 1의 부호화 장치(10)에 의해 부호화된 비트 스트림이 복호 장치(100)에 입력되었을 때 개시된다.FIG. 19 is a flowchart for describing a decoding process by the decoding device 100 of FIG. 18. This decoding process is started when the bit stream encoded by the encoding apparatus 10 of FIG. 1 is input into the decoding apparatus 100, for example.

도 19의 스텝 S91에 있어서, 분해화부(31)는, 부호화 장치(10)에 의해 부호화된 비트 스트림을 저역 엔벨로프(ENV-L), 저역 스펙트럼(SP-L) 및 고역 엔벨로프(ENV-H)로 분해하여, 역양자화부(32)에 공급한다.In step S91 of FIG. 19, the decomposition unit 31 performs the low-pass envelope (ENV-L), low-band spectrum (SP-L), and high-pass envelope (ENV-H) on the bit stream encoded by the encoding device 10. Is decomposed to and supplied to the inverse quantization unit 32.

스텝 S92 및 S93의 처리는, 도 17의 스텝 S72 및 S73의 처리와 마찬가지이므로, 설명은 생략한다.Since the process of step S92 and S93 is the same as the process of step S72 and S73 of FIG. 17, description is abbreviate | omitted.

스텝 S93의 처리 후, 스텝 S94에 있어서, 결정부(101)는, 역양자화부(32)에 의해 역양자화된 저역 스펙트럼(SP-L)에 기초하여, 상술한 수학식 3에 의해, 저역 스펙트럼(SP-L)의 집중도 D'를 결정한다.After the process of step S93, in step S94, the determination part 101 is based on the low-pass spectrum SP-L dequantized by the inverse quantization part 32, and according to the above formula (3), the low-pass spectrum Determine the concentration D 'of (SP-L).

스텝 S95에 있어서, 결정부(101)는, 집중도 D'에 기초하여, 랜덤 플래그(RND)를 결정한다. 그리고, 결정부(101)는, 그 랜덤 플래그(RND)를 위상 랜덤부(74)에 공급하고, 처리를 스텝 S96로 진행시킨다.In step S95, the determination unit 101 determines the random flag RND based on the concentration degree D '. The determination unit 101 supplies the random flag RND to the phase random unit 74, and advances the process to step S96.

스텝 S96 내지 S99의 처리는, 도 17의 스텝 S74 내지 S77의 처리와 마찬가지이므로, 설명은 생략한다.Since the process of step S96-S99 is the same as the process of step S74-S77 of FIG. 17, description is abbreviate | omitted.

<제3 실시 형태> &Lt; Third Embodiment >

[본 발명을 적용한 컴퓨터의 설명] [Description of Computer to which the Present Invention]

이어서, 상술한 일련의 부호화 처리 및 복호 처리는, 하드웨어로 행할 수도 있고, 소프트웨어로 행할 수도 있다. 일련의 부호화 처리 및 복호 처리를 소프트웨어로 행하는 경우에는, 그 소프트웨어를 구성하는 프로그램이, 범용의 컴퓨터 등에 인스톨된다.Subsequently, the above-described series of encoding processing and decoding processing may be performed in hardware or in software. When a series of encoding and decoding processes are performed by software, the program constituting the software is installed in a general-purpose computer or the like.

따라서, 도 20은 상술한 일련의 처리를 실행하는 프로그램이 인스톨되는 컴퓨터의 일 실시 형태의 구성예를 나타내고 있다.Therefore, FIG. 20 shows the structural example of one Embodiment of the computer in which the program which performs a series of process mentioned above is installed.

프로그램은, 컴퓨터에 내장되어 있는 기록 매체로서의 기억부(208)나 ROM(Read Only Memory)(202)에 미리 기록해 둘 수 있다.The program can be recorded in advance in the storage unit 208 or ROM (Read Only Memory) 202 serving as a recording medium built in a computer.

혹은 또한, 프로그램은, 리무버블 미디어(211)에 저장(기록)해 둘 수 있다. 이러한 리무버블 미디어(211)는, 소위 팩키지 소프트웨어로서 제공할 수 있다. 여기서, 리무버블 미디어(211)로서는, 예를 들어 플렉시블 디스크, CD-ROM(Compact Disc Read Only Memory), MO(Magneto Optical) 디스크, DVD(Digital Versatile Disc), 자기 디스크, 반도체 메모리 등이 있다.Alternatively, the program can be stored (recorded) in the removable media 211. Such removable media 211 can be provided as so-called package software. Here, the removable media 211 includes, for example, a flexible disk, a compact disc read only memory (CD-ROM), a magneto optical (MO) disk, a digital versatile disc (DVD), a magnetic disk, a semiconductor memory, and the like.

또한, 프로그램은, 상술한 바와 같은 리무버블 미디어(211)로부터 드라이브(210)를 통하여 컴퓨터에 인스톨하는 것 외에, 통신망이나 방송망을 통하여, 컴퓨터에 다운로드하여, 내장하는 기억부(208)에 인스톨할 수 있다. 즉, 프로그램은, 예를 들어 다운로드 사이트로부터 디지털 위성 방송용의 인공 위성을 통하여, 컴퓨터에 무선으로 전송하거나, LAN(Local Area Network), 인터넷과 같은 네트워크를 통하여 컴퓨터에 유선으로 전송할 수 있다.In addition to installing the above-described removable media 211 to a computer via the drive 210, the program can be downloaded to a computer via a communication network or a broadcasting network and installed in the built-in storage unit 208. Can be. That is, the program can be wirelessly transmitted to the computer through a satellite for digital satellite broadcasting, for example, from a download site, or wired to the computer via a network such as a local area network (LAN) or the Internet.

컴퓨터는, CPU(Central Processing Unit)(201)를 내장하고 있으며, CPU(201)에는, 버스(204)를 통하여 입출력 인터페이스(205)가 접속되어 있다.The computer has a CPU (Central Processing Unit) 201 built in, and the CPU 201 is connected to the input / output interface 205 via the bus 204.

CPU(201)는, 입출력 인터페이스(205)를 통하여, 유저에 의해, 입력부(206)가 조작 등 됨으로써 명령이 입력되면, 그것에 따라 ROM(202)에 저장되어 있는 프로그램을 실행한다. 혹은, CPU(201)는, 기억부(208)에 저장된 프로그램을 RAM(Random Access Memory)(203)에 로드하여 실행한다.The CPU 201 executes a program stored in the ROM 202 when a command is input through the input / output interface 205 by the user by operating the input unit 206 or the like. Alternatively, the CPU 201 loads and executes a program stored in the storage unit 208 into the RAM (Random Access Memory) 203.

이에 의해, CPU(201)는, 상술한 흐름도에 따른 처리, 혹은 상술한 블록도의 구성으로 행해지는 처리를 행한다. 그리고, CPU(201)는, 그 처리 결과를 필요에 따라, 예를 들어 입출력 인터페이스(205)를 통하여, 출력부(207)로부터 출력, 혹은 통신부(209)로부터 송신, 나아가 기억부(208)에 기록 등도 시킨다.As a result, the CPU 201 performs the processing according to the above-described flowchart or the configuration of the above-described block diagram. The CPU 201 transmits the processing result to the output unit 207 or from the communication unit 209 via the input / output interface 205 as necessary, for example, to the storage unit 208. Record it.

또한, 입력부(206)는, 키보드나, 마우스, 마이크 등으로 구성된다. 또한, 출력부(207)는, LCD(Liquid Crystal Display)나 스피커 등으로 구성된다.In addition, the input unit 206 includes a keyboard, a mouse, a microphone, or the like. The output unit 207 is configured of an LCD (Liquid Crystal Display), a speaker, or the like.

여기서, 본 명세서에 있어서, 컴퓨터가 프로그램에 따라 행하는 처리는, 반드시 흐름도로서 기재된 순서를 따라 시계열로 행해질 필요는 없다. 즉, 컴퓨터가 프로그램에 따라 행하는 처리는, 병렬적 혹은 개별로 실행되는 처리(예를 들어, 병렬 처리 혹은 오브젝트에 의한 처리)도 포함한다.Here, in the present specification, the processing performed by the computer according to the program does not necessarily need to be performed in time series in the order described as the flowchart. That is, the processing performed by the computer according to the program also includes processing executed in parallel or separately (for example, parallel processing or processing by an object).

또한, 프로그램은, 하나의 컴퓨터(프로세서)에 의해 처리되는 것이어도 좋고, 복수의 컴퓨터에 의해 분산 처리되는 것이어도 좋다. 또한, 프로그램은, 먼 곳의 컴퓨터에 전송되어 실행되는 것이어도 좋다.The program may be processed by one computer (processor) or may be distributedly processed by a plurality of computers. In addition, the program may be transmitted to a remote computer and executed.

본 발명의 실시 형태는, 상술한 실시 형태에 한정되는 것이 아니라, 본 발명의 요지를 일탈하지 않는 범위에서 다양한 변경이 가능하다.Embodiment of this invention is not limited to embodiment mentioned above, A various change is possible in the range which does not deviate from the summary of this invention.

50: 부호화 장치
52: 다중화부
61: 결정부
62: 추출부
63: 정규화부
70: 복호 장치
71: 분해화부
73: 고역 성분 생성부
74: 위상 랜덤부
75: 역 MDCT부
100: 복호 장치
101: 분해화부
101: 결정부 50: encoding device
52: multiplexer
61: decision
62: extraction unit
63: normalization unit
70: decoding device
71: decomposition
73: high frequency component generation unit
74: phase random portion
75: reverse MDCT department
100: decoding device
101: decomposition unit
101: decision

Claims

As a decoding apparatus,
Acquiring means for acquiring a low band envelope of the speech signal, a low band spectrum normalized using the low band envelope, a high band envelope of the voice signal, and a concentration of the high band spectrum of the voice signal as encoding results;
Generation means for generating a spectrum using the normalized low frequency spectrum and the high frequency envelope among the encoding results obtained by the acquisition means;
Randomization means for randomizing the phase of the spectrum generated by the generation means based on the concentration degree;
The low-frequency spectrum is denormalized by using the low-pass envelope among the encoding results obtained by the acquiring means, and the spectrum generated by the randomizing means or the spectrum generated by the generating means; And synthesizing means for synthesizing the denormalized low frequency spectrum and making the synthesis result the spectrum of the entire band.

2. The method according to claim 1, wherein the randomization means does not randomize the phase of the spectrum generated by the generating means when the concentration is greater than a predetermined threshold and the concentration is less than or equal to the predetermined threshold. And a decoding device for randomizing the phase of the spectrum generated by the generating means.

The method of claim 1,
The acquiring means acquires a random flag that is information indicating whether the randomization means determined based on the low envelope, the low spectrum, the high envelope, and the concentration degree to randomize,
When the randomization means is information indicating that the random flag is randomized, the phase of the spectrum is randomized and supplied to the synthesizing means, and when the information indicates that the random flag is not randomized, the spectrum The decoding device which supplies to the said synthesis | combining means, without randomizing the phase of a.

As a decoding method,
The decoding apparatus comprises:
An acquisition step of acquiring, as an encoding result, a low pass envelope of the speech signal, a low pass spectrum normalized using the low pass envelope, an envelope of the high pass of the speech signal and a concentration of the high pass of the speech signal as an encoding result;
A generating step of generating a spectrum using the normalized low frequency spectrum and the high frequency envelope among the encoding results obtained by the obtaining step;
A randomization step of randomizing a phase of the spectrum generated by the processing of the generating step based on the concentration level;
The low-band spectrum is denormalized by using the low-pass envelope of the encoding result obtained by the processing of the acquiring step, and the process of the spectrum or the generation step randomized by the processing of the randomization step. And a synthesis step of synthesizing the spectrum generated by the low frequency spectrum and the normalized low frequency spectrum, and making the synthesis result the spectrum of the entire band.

On the computer,
An acquisition step of acquiring, as an encoding result, a low pass envelope of the speech signal, a low pass spectrum normalized using the low pass envelope, an envelope of the high pass of the speech signal and a concentration of the high pass of the speech signal as an encoding result;
A generating step of generating a spectrum using the normalized low frequency spectrum and the high frequency envelope among the encoding results obtained by the obtaining step;
A randomization step of randomizing a phase of the spectrum generated by the processing of the generating step based on the concentration level;
The low-band spectrum is denormalized by using the low-pass envelope of the encoding result obtained by the processing of the acquiring step, and the process of the spectrum or the generation step randomized by the processing of the randomization step. And a synthesizing step of synthesizing the spectrum generated by the denormalized low-pass spectrum and making the synthesis result the spectrum of the entire band.

As a decoding apparatus,
Acquisition means for acquiring a low-pass envelope of the speech signal, a low-pass spectrum normalized using the low-pass envelope and a high-pass envelope of the speech signal as a coding result;
Generation means for generating a spectrum using the normalized low frequency spectrum and the high frequency envelope among the encoding results obtained by the acquisition means;
Determination means for determining a degree of concentration of the low frequency spectrum based on the normalized low frequency spectrum among the encoding results obtained by the obtaining means;
Randomization means for randomizing a phase of the spectrum generated by the generation means based on the degree of concentration determined by the determination means;
The low-frequency spectrum is denormalized by using the low-pass envelope among the encoding results obtained by the acquiring means, and the spectrum generated by the randomizing means or the spectrum generated by the generating means; And synthesizing means for synthesizing the denormalized low frequency spectrum and making the synthesis result the spectrum of the entire band.

7. The method according to claim 6, wherein the randomization means does not randomize the phase of the spectrum generated by the generation means when the concentration is greater than a predetermined threshold and the concentration is less than or equal to the predetermined threshold. And a decoding device for randomizing the phase of the spectrum generated by the generating means.

The method according to claim 6,
The determining means further indicates that the randomization means does not randomize a random flag, which is information indicating whether the randomization means randomizes when the concentration of the low frequency spectrum is greater than a predetermined threshold. Information is determined, and when the concentration of the low frequency spectrum is equal to or less than the predetermined threshold value, the random flag is determined as information indicating that the randomization means is randomized,
When the randomization means is information indicating that the random flag is randomized, the phase of the spectrum is randomized and supplied to the synthesizing means, and when the information indicates that the random flag is not randomized, the spectrum The decoding device which supplies to the said synthesis | combining means, without randomizing the phase of a.

As a decoding method,
The decoding apparatus comprises:
An acquisition step of acquiring, as an encoding result, a low pass envelope of the speech signal, a low pass spectrum normalized using the low pass envelope, and an envelope of the high pass of the speech signal as encoding results;
A generating step of generating a spectrum using the normalized low frequency spectrum and the high frequency envelope among the encoding results obtained by the obtaining step;
A determining step of determining a degree of concentration of the low frequency spectrum based on the low frequency spectrum normalized among the encoding results obtained by the processing of the obtaining step;
A randomization step of randomizing a phase of the spectrum generated by the processing of the generating step based on the concentration degree determined by the processing of the determining step;
The low-band spectrum is denormalized by using the low-pass envelope of the encoding result obtained by the processing of the acquiring step, and the process of the spectrum or the generation step randomized by the processing of the randomization step. And a synthesis step of synthesizing the spectrum generated by the low frequency spectrum and the normalized low frequency spectrum, and making the synthesis result the spectrum of the entire band.

On the computer,
An acquisition step of acquiring, as an encoding result, a low pass envelope of the speech signal, a low pass spectrum normalized using the low pass envelope, and an envelope of the high pass of the speech signal as encoding results;
A generating step of generating a spectrum using the normalized low frequency spectrum and the high frequency envelope among the encoding results obtained by the obtaining step;
A determining step of determining a degree of concentration of the low frequency spectrum based on the low frequency spectrum normalized among the encoding results obtained by the processing of the obtaining step;
A randomization step of randomizing a phase of the spectrum generated by the processing of the generating step based on the concentration degree determined by the processing of the determining step;
The low-band spectrum is denormalized by using the low-pass envelope of the encoding result obtained by the processing of the acquiring step, and the process of the spectrum or the generation step randomized by the processing of the randomization step. And a synthesizing step of synthesizing the spectrum generated by the denormalized low-pass spectrum and making the synthesis result the spectrum of the entire band.

As an encoding device,
Determination means for determining a degree of concentration of the high frequency spectrum based on the high frequency spectrum of the audio signal;
Extraction means for extracting an envelope of a low frequency spectrum and an envelope of the high frequency spectrum from the spectrum of the speech signal;
Normalization means for normalizing the low frequency spectrum using an envelope of the low frequency spectrum;
And encoding the concentration degree determined by the determining means, the envelope of the low frequency spectrum extracted by the extraction means and the envelope of the high frequency spectrum, and the low frequency spectrum normalized by the normalization means to produce an encoding result. An encoding device comprising multiplexing means.

The method of claim 11,
The concentration determining means further determines whether or not to randomize the spectrum when the decoding device for decoding the encoding result generates a predetermined spectrum as the high frequency spectrum when the concentration is greater than a predetermined threshold. Determine the random flag as the information to indicate that the random flag is not randomized; when the concentration is less than or equal to the predetermined threshold value, determine as the information indicating to randomize the random flag;
And the multiplexing means multiplexes the random flag, the envelope of the low-spectrum spectrum, the envelope of the high-spectrum spectrum, and the normalized low-band spectrum to form the encoding result.

As a coding method,
An encoding apparatus comprising:
A determining step of determining the concentration of the high frequency spectrum based on the high frequency spectrum of the speech signal,
An extraction step of extracting an envelope of the low frequency spectrum and an envelope of the high frequency spectrum from the spectrum of the speech signal;
A normalization step of normalizing the low frequency spectrum using an envelope of the low frequency spectrum,
Multiplexing the concentration determined by the processing of the determining step, the envelope of the low frequency spectrum and the envelope of the high frequency spectrum extracted by the processing of the extraction step, and the spectrum of the low frequency normalized by the processing of the normalization step And a multiplexing step of making an encoding result.

On the computer,
A determining step of determining the concentration of the high frequency spectrum based on the high frequency spectrum of the speech signal,
An extraction step of extracting an envelope of the low frequency spectrum and an envelope of the high frequency spectrum from the spectrum of the speech signal;
A normalization step of normalizing the low frequency spectrum using an envelope of the low frequency spectrum,
Multiplexing the concentration determined by the processing of the determining step, the envelope of the low frequency spectrum and the envelope of the high frequency spectrum extracted by the processing of the extraction step, and the spectrum of the low frequency normalized by the processing of the normalization step To execute a process including a multiplexing step of making an encoding result.