JP5892395B2

JP5892395B2 - Encoding apparatus, encoding method, and program

Info

Publication number: JP5892395B2
Application number: JP2014160283A
Authority: JP
Inventors: 鈴木　志朗; 志朗鈴木; 松村　祐樹; 祐樹松村; 松本　淳; 淳松本; 前田　祐児; 祐児前田; 戸栗　康裕; 康裕戸栗
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2014-08-06
Filing date: 2014-08-06
Publication date: 2016-03-23
Anticipated expiration: 2030-03-31
Also published as: JP2014240974A

Description

本発明は、符号化装置、符号化方法、およびプログラムに関し、特に、復号時の帯域拡張による遅延時間を削減するとともに、復号側のリソースの増加を抑制することができるようにした符号化装置、符号化方法、およびプログラムに関する。 The present invention relates to an encoding apparatus, an encoding method, and a program, and in particular, an encoding apparatus capable of reducing delay time due to band expansion during decoding and suppressing an increase in resources on the decoding side, The present invention relates to an encoding method and a program.

音声信号の符号化方法としては、一般的に、MP3（Moving Picture Experts Group Audio Layer-3）,AAC（Advanced Audio Coding）,ATRAC（Adaptive Transform Acoustic Coding）といった変換符号化方法がよく知られている。 In general, transform coding methods such as MP3 (Moving Picture Experts Group Audio Layer-3), AAC (Advanced Audio Coding), ATRAC (Adaptive Transform Acoustic Coding) are well known as coding methods for audio signals. .

このような符号化方法においては、符号化結果に情報量の多い高域のスペクトルを含めずに、高域のスペクトルのエンベロープのみを含めることで符号化効率を向上させることが考えられている。この場合、復号の際には、低域のスペクトルが平行移動や折り返し等により複製されることにより、高域のスペクトルが生成される。そして、生成された高域のスペクトルのエンベロープのみが符号化結果に含まれる本来の高域のスペクトルのエンベロープに近づけられることで、聴覚的な音質の向上が計られる。このような復号の技術は帯域拡張技術と呼ばれ、既に一般的に認知されている。 In such an encoding method, it is considered to improve the encoding efficiency by including only the envelope of the high-frequency spectrum without including the high-frequency spectrum with a large amount of information in the encoding result. In this case, at the time of decoding, a high-frequency spectrum is generated by duplicating the low-frequency spectrum by translation, folding, or the like. Then, only the generated high-frequency spectrum envelope is brought close to the original high-frequency spectrum envelope included in the encoding result, thereby improving the auditory sound quality. Such a decoding technique is called a band extension technique and is already generally recognized.

図１は、高域のスペクトルについてはエンベロープのみを符号化結果に含める符号化装置の構成の一例を示すブロック図である。 FIG. 1 is a block diagram illustrating an example of a configuration of an encoding device that includes only an envelope in an encoding result for a high-frequency spectrum.

図１の符号化装置１０は、MDCT（Modified Discrete Cosine Transform）部１１、量子化部１２、および多重化部１３により構成される。なお、符号化装置１０は、高域スペクトルSP-Hを符号化結果に含めない点を除いて、既に一般的によく知られている変換符号化装置と同様である。なお、図の説明の簡単化のため、量子化部１２は、量子化のみならず量子化対象の抽出や正規化も行うものとする。 1 includes an MDCT (Modified Discrete Cosine Transform) unit 11, a quantization unit 12, and a multiplexing unit 13. Note that the encoding apparatus 10 is the same as a transform encoding apparatus that is already well known, except that the high-frequency spectrum SP-H is not included in the encoding result. For simplification of the description of the figure, it is assumed that the quantization unit 12 performs not only quantization but also quantization target extraction and normalization.

具体的には、符号化装置１０のMDCT部１１は、符号化装置１０に入力された音声の時間領域信号であるPCM（Pulse Code Modulation）信号に対してMDCTを行い、周波数領域信号であるスペクトルSPを生成する。MDCT部１１は、生成されたスペクトルSPを量子化部１２に供給する。 Specifically, the MDCT unit 11 of the encoding device 10 performs MDCT on a PCM (Pulse Code Modulation) signal that is a time domain signal of speech input to the encoding device 10, and performs a spectrum that is a frequency domain signal. Generate SP. The MDCT unit 11 supplies the generated spectrum SP to the quantization unit 12.

量子化部１２は、MDCT部１１から供給されるスペクトルSPの高域成分である高域スペクトルSP-Hおよび低域成分である低域スペクトルSP-Lから、それぞれエンベロープを抽出する。量子化部１２は、抽出された高域スペクトルSP-Hのエンベロープである高域エンベロープENV-Hと、低域スペクトルSP-Lのエンベロープである低域エンベロープENV-Lを量子化する。量子化部１２は、量子化された高域エンベロープENV-Hと低域エンベロープENV-Lを、多重化部１３に供給する。なお、本明細書では、説明の簡単化のため、量子化や符号化の前後の信号の名称（SP-L,SP-Hなど）を同一のものにしてある。 The quantization unit 12 extracts envelopes from the high-frequency spectrum SP-H that is the high-frequency component of the spectrum SP supplied from the MDCT unit 11 and the low-frequency spectrum SP-L that is the low-frequency component. The quantization unit 12 quantizes the high frequency envelope ENV-H that is the envelope of the extracted high frequency spectrum SP-H and the low frequency envelope ENV-L that is the envelope of the low frequency spectrum SP-L. The quantization unit 12 supplies the quantized high frequency envelope ENV-H and the low frequency envelope ENV-L to the multiplexing unit 13. In the present specification, for simplicity of explanation, the names of signals before and after quantization and encoding (SP-L, SP-H, etc.) are the same.

また、量子化部１２は、低域エンベロープENV-Lを用いて、低域スペクトルSP-Lを正規化し、正規化された低域スペクトルSP-Lに対して量子化を行い、その結果得られる低域スペクトルSP-Lを多重化部１３に供給する。 Further, the quantization unit 12 normalizes the low-frequency spectrum SP-L using the low-frequency envelope ENV-L, quantizes the normalized low-frequency spectrum SP-L, and obtains the result. The low-frequency spectrum SP-L is supplied to the multiplexing unit 13.

このように、量子化部１２は、スペクトルSPの低域成分については、エンベロープと正規化されたスペクトルを符号化結果に含めるが、高域成分についてはエンベロープのみを符号化結果に含める。これにより、符号化効率が向上する。 As described above, the quantization unit 12 includes the envelope and the normalized spectrum in the encoding result for the low frequency component of the spectrum SP, but includes only the envelope in the encoding result for the high frequency component. Thereby, encoding efficiency improves.

多重化部１３は、量子化部１２から供給される低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hを多重化し、その結果得られるビットストリームを出力する。このビットストリームは、図示せぬ記録媒体に記録されたり、復号装置に伝送されたりする。 The multiplexing unit 13 multiplexes the low-frequency envelope ENV-L, the low-frequency spectrum SP-L, and the high-frequency envelope ENV-H supplied from the quantization unit 12, and outputs the resulting bit stream. This bit stream is recorded on a recording medium (not shown) or transmitted to a decoding device.

図２は、図１の符号化装置１０による符号化処理を説明するフローチャートである。この符号化処理は、例えば、符号化装置１０に音声のPCM信号が入力されたとき開始される。 FIG. 2 is a flowchart illustrating an encoding process performed by the encoding device 10 of FIG. This encoding process is started, for example, when an audio PCM signal is input to the encoding device 10.

図２のステップＳ１１において、MDCT部１１は、符号化装置１０に入力された音声の時間領域信号であるPCM信号に対してMDCTを行い、周波数領域信号であるスペクトルSPを生成する。MDCT部１１は、生成されたスペクトルSPを量子化部１２に供給する。 In step S11 of FIG. 2, the MDCT unit 11 performs MDCT on the PCM signal that is a time domain signal of speech input to the encoding device 10, and generates a spectrum SP that is a frequency domain signal. The MDCT unit 11 supplies the generated spectrum SP to the quantization unit 12.

ステップＳ１２において、量子化部１２は、MDCT部１１から供給されるスペクトルSPの高域成分である高域スペクトルSP-Hおよび低域成分である低域スペクトルSP-Lから、それぞれエンベロープを抽出する。 In step S12, the quantization unit 12 extracts envelopes from the high-frequency spectrum SP-H, which is the high-frequency component of the spectrum SP supplied from the MDCT unit 11, and from the low-frequency spectrum SP-L, which is the low-frequency component. .

ステップＳ１３において、量子化部１２は、低域エンベロープENV-Lを用いて、低域スペクトルSP-Lを正規化する。 In step S13, the quantization unit 12 normalizes the low frequency spectrum SP-L using the low frequency envelope ENV-L.

ステップＳ１４において、量子化部１２は、抽出された高域エンベロープENV-H、低域エンベロープENV-L、および正規化された低域スペクトルSP-Lに対して量子化を行う。そして、量子化部１２は、量子化された高域エンベロープENV-H、低域エンベロープENV-L、および正規化された低域スペクトルSP-Lを多重化部１３に供給する。 In step S14, the quantization unit 12 performs quantization on the extracted high frequency envelope ENV-H, low frequency envelope ENV-L, and normalized low frequency spectrum SP-L. Then, the quantization unit 12 supplies the quantized high frequency envelope ENV-H, the low frequency envelope ENV-L, and the normalized low frequency spectrum SP-L to the multiplexing unit 13.

ステップＳ１５において、多重化部１３は、量子化部１２から供給される低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hを多重化し、その結果得られるビットストリームを出力する。そして、処理は終了する。 In step S15, the multiplexing unit 13 multiplexes the low frequency envelope ENV-L, the low frequency spectrum SP-L, and the high frequency envelope ENV-H supplied from the quantization unit 12, and the resulting bit stream is multiplexed. Output. Then, the process ends.

図３は、図１の符号化装置１０により符号化されたビットストリームを復号する復号装置の構成の一例を示すブロック図である。 FIG. 3 is a block diagram illustrating an example of a configuration of a decoding device that decodes the bitstream encoded by the encoding device 10 of FIG.

図３の復号装置３０は、分解化部３１、逆量子化部３２、逆MDCT部３３、および帯域拡張部３４により構成される。 The decoding device 30 in FIG. 3 includes a decomposition unit 31, an inverse quantization unit 32, an inverse MDCT unit 33, and a band extension unit 34.

復号装置３０の分解化部３１、逆量子化部３２、および逆MDCT部３３は、通常の変換復号装置と同様に、PCM信号の低域成分のみを復元する。 The decomposing unit 31, the inverse quantizing unit 32, and the inverse MDCT unit 33 of the decoding device 30 restore only the low frequency components of the PCM signal in the same manner as in a normal transform decoding device.

具体的には、分解化部３１は、符号化装置１０により符号化されたビットストリームを取得し、低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hに分解して、逆量子化部３２に供給する。 Specifically, the decomposing unit 31 acquires the bit stream encoded by the encoding device 10 and decomposes it into the low frequency envelope ENV-L, the low frequency spectrum SP-L, and the high frequency envelope ENV-H. To the inverse quantization unit 32.

逆量子化部３２は、分解化部３１により供給される低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hそれぞれに対して逆量子化を行う。そして、逆量子化部３２は、逆量子化された低域エンベロープENV-Lと低域スペクトルSP-Lを逆MDCT部３３に供給し、高域エンベロープENV-Hを帯域拡張部３４に供給する。 The inverse quantization unit 32 performs inverse quantization on each of the low frequency envelope ENV-L, the low frequency spectrum SP-L, and the high frequency envelope ENV-H supplied by the decomposition unit 31. Then, the inverse quantization unit 32 supplies the inversely quantized low-frequency envelope ENV-L and the low-frequency spectrum SP-L to the inverse MDCT unit 33 and supplies the high-frequency envelope ENV-H to the band extension unit 34. .

逆MDCT部３３は、逆量子化部３２から供給される低域エンベロープENV-Lを用いて、低域スペクトルSP-Lに対して逆正規化を行う。また、逆MDCT部３３は、逆正規化された周波数領域信号である低域スペクトルSP-Lに対して逆MDCTを行い、時間領域信号であるPCM信号を得る。なお、このPCM信号は、高域成分がないPCM信号であり、聴覚的に篭った音質の音声のPCM信号である。逆MDCT部３３は、このPCM信号を帯域拡張部３４に供給する。 The inverse MDCT unit 33 performs denormalization on the low frequency spectrum SP-L using the low frequency envelope ENV-L supplied from the inverse quantization unit 32. Further, the inverse MDCT unit 33 performs inverse MDCT on the low-frequency spectrum SP-L that is a frequency domain signal that has been denormalized to obtain a PCM signal that is a time-domain signal. Note that this PCM signal is a PCM signal having no high-frequency component, and is a PCM signal having a sound quality that is audibly heard. The inverse MDCT unit 33 supplies this PCM signal to the band extension unit 34.

帯域拡張部３４は、帯域分割フィルタ４１、高域成分生成部４２、および帯域合成フィルタ４３により構成される。帯域拡張部３４は、逆MDCT部３３で得られる高域成分がないPCM信号の周波数帯域を拡張することにより、そのPCM信号の音質を向上させる帯域拡張処理を行う。 The band extension unit 34 includes a band division filter 41, a high frequency component generation unit 42, and a band synthesis filter 43. The band extension unit 34 performs band extension processing to improve the sound quality of the PCM signal by extending the frequency band of the PCM signal without the high frequency component obtained by the inverse MDCT unit 33.

具体的には、帯域拡張部３４の帯域分割フィルタ４１は、逆MDCT部３３から供給されるPCM信号を高域成分と低域成分に分割する。そして、このPCM信号には高域成分がないので、帯域分割フィルタ４１は、分割されたPCM信号の高域成分を破棄する。また、帯域分割フィルタ４１は、分割されたPCM信号の低域成分である低域PCM信号BS-Lを高域成分生成部４２と帯域合成フィルタ４３に供給する。 Specifically, the band division filter 41 of the band extension unit 34 divides the PCM signal supplied from the inverse MDCT unit 33 into a high frequency component and a low frequency component. Since this PCM signal does not have a high frequency component, the band division filter 41 discards the high frequency component of the divided PCM signal. Further, the band division filter 41 supplies the low frequency PCM signal BS-L, which is a low frequency component of the divided PCM signal, to the high frequency component generation unit 42 and the band synthesis filter 43.

高域成分生成部４２は、帯域分割フィルタ４１から供給される低域PCM信号BS-Lと、逆量子化部３２から供給される高域エンベロープENV-Hとを用いて、高域のPCM信号を生成し、擬似高域PCM信号BS-Hとする。擬似高域PCM信号BS-Hの生成方法については、例えば、本出願人が先に出願した特許文献１に記載されている。高域成分生成部４２は、擬似高域PCM信号BS-Hを帯域合成フィルタ４３に供給する。 The high frequency component generation unit 42 uses the low frequency PCM signal BS-L supplied from the band division filter 41 and the high frequency envelope ENV-H supplied from the inverse quantization unit 32 to generate a high frequency PCM signal. Is generated as a pseudo high frequency PCM signal BS-H. A method for generating the pseudo high frequency PCM signal BS-H is described in, for example, Patent Document 1 filed earlier by the present applicant. The high frequency component generation unit 42 supplies the pseudo high frequency PCM signal BS-H to the band synthesis filter 43.

帯域合成フィルタ４３は、帯域分割フィルタ４１から供給される低域PCM信号BS-Lと、高域成分生成部４２から供給される擬似高域PCM信号BS-Hを合成し、全帯域のPCM信号を復号結果として出力する。 The band synthesis filter 43 synthesizes the low-band PCM signal BS-L supplied from the band-dividing filter 41 and the pseudo high-band PCM signal BS-H supplied from the high-band component generation unit 42 to generate a PCM signal for all bands. Is output as a decoding result.

以上のようにして出力される全帯域のPCM信号に対応する音声は、高域成分がないPCM信号に対応する音声に比べて、篭り感が低減され、きらびやかで聞き心地の良い音声となる。 The sound corresponding to the PCM signal of the entire band output as described above has a reduced feeling of squeeze, and is brilliant and comfortable to hear, compared to the sound corresponding to the PCM signal having no high frequency component.

図４は、逆MDCT部３３および帯域合成フィルタ４３から出力される信号を説明する図である。なお、図４において、横軸は周波数を表し、縦軸は信号のレベルを表している。このことは、後述する図７、図１０、および図１２乃至図１６においても同様である。 FIG. 4 is a diagram illustrating signals output from the inverse MDCT unit 33 and the band synthesis filter 43. In FIG. 4, the horizontal axis represents frequency and the vertical axis represents signal level. This also applies to FIGS. 7, 10, and 12 to 16, which will be described later.

逆MDCT部３３から出力される信号は、図４Ａに示すような低域エンベロープENV-Lを用いて逆正規化された低域スペクトルSP-LのPCM信号である。また、帯域合成フィルタ４３から出力される信号は、図４Ｂに示すような低域エンベロープENV-Lを用いて逆正規化された低域スペクトルSP-LのPCM信号を低域成分として有し、高域エンベロープENV-Hと低域PCM信号BS-Lから生成された擬似高域PCM信号BS-Hを高域成分として有するPCM信号である。 The signal output from the inverse MDCT unit 33 is a PCM signal of the low-frequency spectrum SP-L that is denormalized using the low-frequency envelope ENV-L as shown in FIG. 4A. Further, the signal output from the band synthesis filter 43 has, as a low frequency component, a PCM signal of a low frequency spectrum SP-L that is denormalized using the low frequency envelope ENV-L as shown in FIG. 4B. This is a PCM signal having a pseudo high frequency PCM signal BS-H generated from the high frequency envelope ENV-H and the low frequency PCM signal BS-L as a high frequency component.

図５は、図３の復号装置３０による復号処理を説明するフローチャートである。この復号処理は、例えば、符号化装置１０により符号化されたビットストリームが復号装置３０に入力されたとき開始される。 FIG. 5 is a flowchart for explaining a decoding process by the decoding device 30 of FIG. This decoding process is started, for example, when a bit stream encoded by the encoding device 10 is input to the decoding device 30.

図５のステップＳ３１において、分解化部３１は、復号装置３０に入力されたビットストリームを低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hに分解し、逆量子化部３２に供給する。 In step S31 of FIG. 5, the decomposition unit 31 decomposes the bit stream input to the decoding device 30 into the low-frequency envelope ENV-L, the low-frequency spectrum SP-L, and the high-frequency envelope ENV-H, and performs inverse quantum To the conversion unit 32.

ステップＳ３２において、逆量子化部３２は、分解化部３１から供給される低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hそれぞれに対して逆量子化を行う。逆量子化部３２は、逆量子化された低域エンベロープENV-Lと低域スペクトルSP-Lを逆MDCT部３３に供給し、高域エンベロープENV-Hを帯域拡張部３４に供給する。 In step S32, the inverse quantization unit 32 performs inverse quantization on each of the low frequency envelope ENV-L, the low frequency spectrum SP-L, and the high frequency envelope ENV-H supplied from the decomposition unit 31. The inverse quantization unit 32 supplies the inversely quantized low frequency envelope ENV-L and the low frequency spectrum SP-L to the inverse MDCT unit 33 and supplies the high frequency envelope ENV-H to the band extending unit 34.

ステップＳ３３において、逆MDCT部３３は、逆量子化部３２から供給される低域エンベロープENV-Lを用いて、低域スペクトルSP-Lに対して逆正規化を行う。 In step S33, the inverse MDCT unit 33 performs denormalization on the low frequency spectrum SP-L using the low frequency envelope ENV-L supplied from the inverse quantization unit 32.

ステップＳ３４において、逆MDCT部３３は、逆正規化された周波数領域信号である低域スペクトルSP-Lに対して逆MDCTを行い、時間領域信号であるPCM信号を得る。逆MDCT部３３は、このPCM信号を帯域拡張部３４に供給する In step S34, the inverse MDCT unit 33 performs inverse MDCT on the low-frequency spectrum SP-L, which is a denormalized frequency domain signal, to obtain a PCM signal, which is a time domain signal. The inverse MDCT unit 33 supplies this PCM signal to the band extension unit 34.

ステップＳ３５において、帯域拡張部３４の帯域分割フィルタ４１は、逆MDCT部３３から供給されるPCM信号を高域成分と低域成分に分割する。そして、帯域分割フィルタ４１は、分割されたPCM信号の高域成分を破棄し、分割されたPCM信号の低域成分である低域PCM信号BS-Lを高域成分生成部４２と帯域合成フィルタ４３に供給する。 In step S35, the band dividing filter 41 of the band extending unit 34 divides the PCM signal supplied from the inverse MDCT unit 33 into a high frequency component and a low frequency component. Then, the band division filter 41 discards the high frequency component of the divided PCM signal, and converts the low frequency PCM signal BS-L, which is the low frequency component of the divided PCM signal, into the high frequency component generation unit 42 and the band synthesis filter. 43.

ステップＳ３６において、高域成分生成部４２は、帯域分割フィルタ４１から供給される低域PCM信号BS-Lと、逆量子化部３２から供給される高域エンベロープENV-Hとを用いて、擬似高域PCM信号BS-Hを生成する。高域成分生成部４２は、擬似高域PCM信号BS-Hを帯域合成フィルタ４３に供給する。 In step S36, the high frequency component generation unit 42 uses the low frequency PCM signal BS-L supplied from the band division filter 41 and the high frequency envelope ENV-H supplied from the inverse quantization unit 32 to simulate the A high frequency PCM signal BS-H is generated. The high frequency component generation unit 42 supplies the pseudo high frequency PCM signal BS-H to the band synthesis filter 43.

ステップＳ３７において、帯域合成フィルタ４３は、帯域分割フィルタ４１から供給される低域PCM信号BS-Lと、高域成分生成部４２から供給される擬似高域PCM信号BS-Hを合成し、全帯域のPCM信号を得る。帯域合成フィルタ４３は、その全帯域のPCM信号を出力し、処理を終了する。 In step S37, the band synthesizing filter 43 synthesizes the low frequency PCM signal BS-L supplied from the band division filter 41 and the pseudo high frequency PCM signal BS-H supplied from the high frequency component generation unit 42. Get PCM signal of the band. The band synthesis filter 43 outputs the PCM signal of the entire band and ends the process.

以上のような帯域拡張技術は、国際規格であるHE-AAC (High-Efficiency Advanced Audio Coding)やLPEC（商標）のステレオハイクオリティモードで既に利用されている。 The band expansion technology as described above is already used in the high-efficiency advanced audio coding (HE-AAC) and LPEC (trademark) stereo high quality modes.

上述したように、従来の帯域拡張技術では、帯域拡張処理は、低域スペクトルSP-Lの復号処理の後処理（ポストプロセス）として行われる。これにより、擬似高域PCM信号BS-Hの自由度を高めることができる。即ち、擬似高域PCM信号BS-Hを周波数領域信号である低域スペクトルSP-Lではなく、時間領域信号である低域PCM信号BS-Lから生成することができる。 As described above, in the conventional band extension technique, the band extension process is performed as a post process (post process) of the decoding process of the low band spectrum SP-L. Thereby, the degree of freedom of the pseudo high frequency PCM signal BS-H can be increased. That is, the pseudo high frequency PCM signal BS-H can be generated from the low frequency PCM signal BS-L, which is a time domain signal, instead of the low frequency spectrum SP-L, which is a frequency domain signal.

なお、符号化処理や復号処理の処理ブロックサイズと、帯域拡張処理の処理ブロックサイズをそれぞれ自由に設定することで、周波数分析精度および時間分解精度をそれぞれ最適にすることができる。 Note that the frequency analysis accuracy and the time resolution accuracy can be optimized by freely setting the processing block size of the encoding process and decoding process and the processing block size of the band extension process, respectively.

また、特許文献１に記載されている方法で擬似高域PCM信号を生成する場合、高域エンベロープENV-Hからノイズ性スペクトルを生成するとともに、高域エンベロープENV-Hおよび低域PCM信号BS-Lからトーン性スペクトルを生成し、両方のスペクトルを比較するといった複雑な処理が必要となる。 Further, when generating a pseudo high frequency PCM signal by the method described in Patent Document 1, a noise spectrum is generated from the high frequency envelope ENV-H, and the high frequency envelope ENV-H and the low frequency PCM signal BS- are generated. A complicated process such as generating a tone characteristic spectrum from L and comparing both spectra is required.

このようなノイズ性スペクトルとトーン性スペクトルを生成する処理は、聴覚的に高い品質の音声を生成するために必要な、低域スペクトルと高域スペクトルのマッチング精度の向上に必須の処理であり、特許文献２および３に記載されている復号装置においても行われている。 Such a process of generating a noise spectrum and a tone spectrum is an essential process for improving the matching accuracy of the low-frequency spectrum and the high-frequency spectrum, which is necessary for generating audio of high quality auditoryly. This is also performed in the decoding devices described in Patent Documents 2 and 3.

特許第３８６１７７０号公報Japanese Patent No. 3861770 特許第３６４６９３８号公報Japanese Patent No. 3646938 特許第３６４６９３９号公報Japanese Patent No. 3646939

以上のように、従来の帯域拡張技術では、帯域拡張処理が、低域スペクトルSP-Lの復号処理の後処理として行われるように研究、開発、および実用化が行われている。従って、全帯域のPCM信号は、分解化部３１、逆量子化部３２、および逆MDCT部３３による通常の復号処理が終了してから（図３の例では、時刻T0）、帯域拡張部３４による処理時間後（図３の例では、時刻T1）に出力される。 As described above, in the conventional bandwidth extension technology, research, development, and practical use have been performed so that the bandwidth extension processing is performed as post-processing of the decoding processing of the low-band spectrum SP-L. Therefore, the PCM signal of all bands is subjected to normal decoding processing by the decomposition unit 31, the inverse quantization unit 32, and the inverse MDCT unit 33 (time T0 in the example of FIG. 3), and then the band extension unit 34 Is output after the processing time (time T1 in the example of FIG. 3).

このことは、復号装置３０が単に音声のみを再生する再生装置に設けられる場合には、それほど大きな問題とはならない。しかしながら、復号装置３０が、例えば音声と同期して映像も再生する再生装置に設けられる場合、通常の復号のみを行う場合と帯域拡張も行う場合とで全帯域のPCM信号の出力時間が異なるため、映像と音声を同期して出力することが困難になる。 This is not a significant problem when the decoding device 30 is provided in a playback device that only plays back audio. However, when the decoding device 30 is provided, for example, in a playback device that also plays back video in synchronization with audio, the output time of the PCM signal in the entire band differs between when performing normal decoding only and when performing band expansion. It becomes difficult to output video and audio in synchronization.

これを解決するためには映像の再生タイミングを遅らせる必要があるが、音声に比べ映像のバッファリングには大量のメモリが必要となるため、リソースの増大を招く。また、映像と音声の同期タイミングを予めずらしておくことも考えられるが、通常の復号のみを行うか、帯域拡張も行うかは、再生装置によるため、常に最適な同期タイミングを指定することは困難である。 In order to solve this, it is necessary to delay the playback timing of the video. However, a larger amount of memory is required for buffering the video than audio, which increases resources. Although it may be possible to shift the synchronization timing of video and audio in advance, it is difficult to always specify the optimal synchronization timing because only the normal decoding or the bandwidth expansion is performed by the playback device. It is.

また、復号装置３０は、帯域拡張のために帯域拡張部３４を新たに設ける必要があり、帯域拡張を行わない復号装置に比べてリソースが増加する。 In addition, the decoding device 30 needs to newly provide a bandwidth extension unit 34 for bandwidth extension, and the resources increase compared to a decoding device that does not perform bandwidth extension.

以上により、帯域拡張を行う復号装置において、帯域拡張による遅延時間を削減するとともに、リソースの増加を抑制することが求められている。 As described above, in a decoding device that performs bandwidth expansion, it is required to reduce delay time due to bandwidth expansion and to suppress an increase in resources.

本発明は、このような状況に鑑みてなされたものであり、復号時の帯域拡張による遅延時間を削減するとともに、復号側のリソースの増加を抑制することができるようにするものである。 The present invention has been made in view of such circumstances, and is intended to reduce the delay time due to band expansion during decoding and to suppress an increase in resources on the decoding side.

本発明の一側面の符号化装置は、音声信号の高域のスペクトルに基づいて、前記高域のスペクトルの分布の偏りを表す集中度の情報を決定する決定手段と、前記音声信号のスペクトルから、前記高域のスペクトルのエンベロープを抽出する抽出手段と、前記決定手段により決定された前記集中度の情報、前記抽出手段により抽出された前記高域のスペクトルのエンベロープ、および低域のスペクトルを多重化して、符号化結果とする多重化手段とを備える符号化装置である。 An encoding apparatus according to an aspect of the present invention includes: a determination unit that determines concentration information indicating a bias of a distribution of a high-frequency spectrum based on a high-frequency spectrum of the audio signal; and a spectrum of the audio signal. Extraction means for extracting an envelope of the high-frequency spectrum; information on the degree of concentration determined by the determination means; an envelope of the high-frequency spectrum extracted by the extraction means; and a low-frequency spectrum. And an encoding device including multiplexing means for generating an encoding result.

本発明の一側面の符号化方法およびプログラムは、本発明の一側面の符号化装置に対応する。 The encoding method and program according to one aspect of the present invention correspond to the encoding apparatus according to one aspect of the present invention.

本発明の一側面においては、音声信号の高域のスペクトルに基づいて、前記高域のスペクトルの分布の偏りを表す集中度の情報が決定され、前記音声信号のスペクトルから、前記高域のスペクトルのエンベロープが抽出され、決定された前記集中度の情報、抽出された前記高域のスペクトルのエンベロープ、および低域のスペクトルが多重化されて、符号化結果とされる。 In one aspect of the present invention, based on the high frequency spectrum of the audio signal, information on the degree of concentration indicating a bias in the distribution of the high frequency spectrum is determined, and the high frequency spectrum is determined from the spectrum of the audio signal. Are extracted, and the information on the determined degree of concentration, the extracted high-frequency spectrum envelope, and the low-frequency spectrum are multiplexed to obtain an encoding result.

本発明によれば、復号時の帯域拡張による遅延時間が削減され、復号側のリソースの増加が抑制されるように、符号化を行うことができる。 According to the present invention, encoding can be performed so that a delay time due to band expansion during decoding is reduced and an increase in resources on the decoding side is suppressed.

符号化装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of an encoding apparatus. 図１の符号化装置による符号化処理を説明するフローチャートである。It is a flowchart explaining the encoding process by the encoding apparatus of FIG. 復号装置の構成の一例を示すブロック図である。It is a block diagram which shows an example of a structure of a decoding apparatus. 逆MDCT部および帯域合成フィルタから出力される信号を説明する図である。It is a figure explaining the signal output from an inverse MDCT part and a band composition filter. 図３の復号装置による復号処理を説明するフローチャートである。It is a flowchart explaining the decoding process by the decoding apparatus of FIG. 本発明を適用した符号化装置の第１実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of 1st Embodiment of the encoding apparatus to which this invention is applied. 図６のMDCT部および量子化部から出力される信号を説明する図であるIt is a figure explaining the signal output from the MDCT part and quantization part of FIG. 図６の符号化装置による符号化処理を説明するフローチャートである。It is a flowchart explaining the encoding process by the encoding apparatus of FIG. 図６の符号化装置により符号化されたビットストリームを復号する復号装置の構成例を示すブロック図である。FIG. 7 is a block diagram illustrating a configuration example of a decoding device that decodes a bitstream encoded by the encoding device of FIG. 6. 図９の逆MDCT部から出力される信号を説明する図である。It is a figure explaining the signal output from the inverse MDCT part of FIG. 位相のランダム化の有無による復号結果の差を説明する図である。It is a figure explaining the difference of the decoding result by the presence or absence of the randomization of a phase. 高域スペクトルSP-Hの特性について説明する図である。It is a figure explaining the characteristic of high region spectrum SP-H. 高域スペクトルSP-Hの特性について説明する図である。It is a figure explaining the characteristic of high region spectrum SP-H. 高域スペクトルSP-Hの特性について説明する図である。It is a figure explaining the characteristic of high region spectrum SP-H. 高域スペクトルSP-Hの特性について説明する図である。It is a figure explaining the characteristic of high region spectrum SP-H. 高域スペクトルSP-Hの特性について説明する図である。It is a figure explaining the characteristic of high region spectrum SP-H. 図９の復号装置による復号処理を説明するフローチャートである。It is a flowchart explaining the decoding process by the decoding apparatus of FIG. 本発明を適用した復号装置の第２実施の形態の構成例を示すブロック図である。It is a block diagram which shows the structural example of 2nd Embodiment of the decoding apparatus to which this invention is applied. 図１８の復号装置による復号処理を説明するフローチャートである。It is a flowchart explaining the decoding process by the decoding apparatus of FIG. コンピュータの構成例を示す図である。It is a figure which shows the structural example of a computer.

＜第１実施の形態＞
［符号化装置の第１実施の形態の構成例］
図６は、本発明を適用した符号化装置の第１実施の形態の構成例を示すブロック図である。 <First embodiment>
[Configuration Example of First Embodiment of Encoding Device]
FIG. 6 is a block diagram showing a configuration example of the first embodiment of the encoding apparatus to which the present invention is applied.

図６に示す構成のうち、図１の構成と同じ構成には同じ符号を付してある。重複する説明については適宜省略する。 Of the configurations shown in FIG. 6, the same configurations as those in FIG. The overlapping description will be omitted as appropriate.

図６の符号化装置５０の構成は、主に、量子化部１２、多重化部１３の代わりに量子化部５１、多重化部５２が設けられている点が図１の構成と異なる。符号化装置１０は、低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hの他に、ランダムフラグRND（詳細は後述する）を多重化してビットストリームを生成する。 The configuration of the encoding device 50 in FIG. 6 is mainly different from the configuration in FIG. 1 in that a quantization unit 51 and a multiplexing unit 52 are provided instead of the quantization unit 12 and the multiplexing unit 13. The encoding apparatus 10 generates a bit stream by multiplexing a random flag RND (details will be described later) in addition to the low-frequency envelope ENV-L, the low-frequency spectrum SP-L, and the high-frequency envelope ENV-H.

具体的には、符号化装置５０の量子化部５１は、決定部６１、抽出部６２、正規化部６３、および部分量子化部６４により構成される。 Specifically, the quantization unit 51 of the encoding device 50 includes a determination unit 61, an extraction unit 62, a normalization unit 63, and a partial quantization unit 64.

決定部６１は、MDCT部１１から供給されるスペクトルSPのうちの高域スペクトルSP-Hに基づいて、例えば以下の式（１）により、高域スペクトルSP-Hの集中度Dを決定する。 The determination unit 61 determines the degree of concentration D of the high-frequency spectrum SP-H based on the high-frequency spectrum SP-H in the spectrum SP supplied from the MDCT unit 11 by using, for example, the following equation (1).

D=max(SP-H)/ave(SP-H) ・・・（１） D = max (SP-H) / ave (SP-H) (1)

なお、式（１）において、max(SP-H)は、高域スペクトルSP-Hの最大値を表し、ave(SP-H)は、高域スペクトルSP-Hの平均値を表す。 In Equation (1), max (SP-H) represents the maximum value of the high-frequency spectrum SP-H, and ave (SP-H) represents the average value of the high-frequency spectrum SP-H.

式（１）によれば、符号化対象の音声の高域成分のトーン性が高く、高域スペクトルSP-Hの分布に大きな偏りがある場合、集中度Dは大きくなり、符号化対象の音声の高域成分のノイズ性が高く、高域スペクトルSP-Hの分布が平坦である場合、集中度Dは小さくなる。 According to equation (1), when the tonal characteristic of the high frequency component of the speech to be encoded is high and the distribution of the high frequency spectrum SP-H has a large bias, the degree of concentration D increases and the speech to be encoded When the high-frequency component noise characteristic is high and the distribution of the high-frequency spectrum SP-H is flat, the concentration degree D is small.

決定部６１は、集中度Dに基づいてランダムフラグRNDを決定する。このランダムフラグRNDは、後述する復号装置における帯域拡張処理時に、低域スペクトルSP-Lと高域エンベロープENV-Hから生成される高域スペクトルSP-Hに擬似するスペクトルの位相をランダム化するかどうかを表すフラグである。 The determination unit 61 determines a random flag RND based on the degree of concentration D. This random flag RND is used to randomize the phase of the spectrum that simulates the high-frequency spectrum SP-H generated from the low-frequency spectrum SP-L and the high-frequency envelope ENV-H during band expansion processing in the decoding device described later. It is a flag indicating whether or not.

例えば、集中度Dが、符号化装置５０に予め設定されている閾値より大きい場合、即ち高域スペクトルSP-Hのトーン性が高い場合、ランダムフラグRNDは、ランダム化しないことを表す0に決定される。一方、集中度Dが予め設定されている閾値以下である場合、即ち高域スペクトルSP-Hのノイズ性が高い場合、ランダムフラグRNDは、ランダム化することを表す1に決定される。決定部６１は、決定されたランダムフラグRNDを多重化部５２に供給する。 For example, when the degree of concentration D is larger than a threshold value set in advance in the encoding device 50, that is, when the tone characteristic of the high frequency spectrum SP-H is high, the random flag RND is determined to be 0 indicating that randomization is not performed. Is done. On the other hand, when the degree of concentration D is equal to or less than a preset threshold value, that is, when the noise characteristic of the high frequency spectrum SP-H is high, the random flag RND is determined to be 1 representing randomization. The determination unit 61 supplies the determined random flag RND to the multiplexing unit 52.

抽出部６２は、図１の量子化部１２と同様に、MDCT部１１から供給されるスペクトルSPのうちの高域スペクトルSP-Hおよび低域スペクトルSP-Lから、それぞれエンベロープを抽出する。 Similar to the quantization unit 12 in FIG. 1, the extraction unit 62 extracts envelopes from the high-frequency spectrum SP-H and the low-frequency spectrum SP-L in the spectrum SP supplied from the MDCT unit 11.

正規化部６３は、量子化部１２と同様に、低域エンベロープENV-Lを用いて、低域スペクトルSP-Lを正規化する。 Similar to the quantization unit 12, the normalization unit 63 normalizes the low frequency spectrum SP-L using the low frequency envelope ENV-L.

部分量子化部６４は、正規化された低域スペクトルSP-Lに対して量子化を行い、その結果得られる低域スペクトルSP-Lを多重化部５２に供給する。また、部分量子化部６４は、量子化部１２と同様に、抽出された高域エンベロープENV-Hと低域エンベロープENV-Lを量子化する。部分量子化部６４は、量子化部１２と同様に、量子化された高域エンベロープENV-Hと低域エンベロープENV-Lを、多重化部５２に供給する。 The partial quantization unit 64 quantizes the normalized low frequency spectrum SP-L and supplies the resulting low frequency spectrum SP-L to the multiplexing unit 52. Similarly to the quantization unit 12, the partial quantization unit 64 quantizes the extracted high frequency envelope ENV-H and low frequency envelope ENV-L. Similar to the quantization unit 12, the partial quantization unit 64 supplies the quantized high frequency envelope ENV-H and low frequency envelope ENV-L to the multiplexing unit 52.

多重化部５２は、量子化部５１の決定部６１から供給されるランダムフラグRND、並びに、部分量子化部６４から供給される低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hを多重化する。多重化部５２は、その結果得られるビットストリームを出力する。このビットストリームは、図示せぬ記録媒体に記録されたり、復号装置に伝送されたりする。 The multiplexing unit 52 includes a random flag RND supplied from the determination unit 61 of the quantization unit 51, a low frequency envelope ENV-L, a low frequency spectrum SP-L, and a high frequency supplied from the partial quantization unit 64. Multiplex the envelope ENV-H. The multiplexing unit 52 outputs the resulting bit stream. This bit stream is recorded on a recording medium (not shown) or transmitted to a decoding device.

[符号化装置における信号の説明］
図７は、図６の符号化装置５０のMDCT部１１および量子化部５１から出力される信号を説明する図である。 [Description of signal in encoding apparatus]
FIG. 7 is a diagram illustrating signals output from the MDCT unit 11 and the quantization unit 51 of the encoding device 50 in FIG.

図７Ａに示すように、MDCT部１１から出力されるスペクトルSPは、全帯域のスペクトルである。これに対して、量子化部５１から出力されるランダムフラグRND以外の信号は、図７Ｂに示すように、低域スペクトルSP-L、低域エンベロープENV-L、および高域エンベロープENV-Hである。 As shown in FIG. 7A, the spectrum SP output from the MDCT unit 11 is a spectrum of the entire band. On the other hand, signals other than the random flag RND output from the quantizing unit 51 are, as shown in FIG. 7B, a low-frequency spectrum SP-L, a low-frequency envelope ENV-L, and a high-frequency envelope ENV-H. is there.

[符号化装置の処理の説明］
図８は、図６の符号化装置５０による符号化処理を説明するフローチャートである。この符号化処理は、例えば、符号化装置５０に音声のPCM信号が入力されたとき開始される。 [Description of Processing of Encoding Device]
FIG. 8 is a flowchart for explaining the encoding process by the encoding device 50 of FIG. This encoding process is started, for example, when an audio PCM signal is input to the encoding device 50.

図８のステップＳ５１において、MDCT部１１は、図２のステップＳ１１の処理と同様に、符号化装置５０に入力された音声の時間領域信号であるPCM信号に対してMDCTを行い、周波数領域信号であるスペクトルSPを生成する。MDCT部１１は、生成されたスペクトルSPを量子化部５１に供給する。 In step S51 of FIG. 8, the MDCT unit 11 performs MDCT on the PCM signal, which is a time domain signal of the voice input to the encoding device 50, in the same manner as the process of step S11 of FIG. A spectrum SP is generated. The MDCT unit 11 supplies the generated spectrum SP to the quantization unit 51.

ステップＳ５２において、量子化部５１の決定部６１は、MDCT部１１から供給されるスペクトルSPのうちの高域スペクトルSP-Hに基づいて、上述した式（１）により、高域スペクトルSP-Hの集中度Dを決定する。 In step S52, the determination unit 61 of the quantization unit 51 uses the above-described equation (1) based on the high-frequency spectrum SP-H in the spectrum SP supplied from the MDCT unit 11 to calculate the high-frequency spectrum SP-H. Determine the degree of concentration D.

ステップＳ５３において、決定部６１は、集中度Dに基づいてランダムフラグRNDを決定する。決定部６１は、決定されたランダムフラグRNDを多重化部５２に供給し、処理をステップＳ５４に進める。 In step S53, the determination unit 61 determines the random flag RND based on the concentration degree D. The determination unit 61 supplies the determined random flag RND to the multiplexing unit 52, and the process proceeds to step S54.

ステップＳ５４乃至Ｓ５６の処理は、図２のステップＳ１２乃至Ｓ１４の処理と同様であるので、説明は省略する。 The processing in steps S54 to S56 is the same as the processing in steps S12 to S14 in FIG.

ステップＳ５６の処理後、ステップＳ５７において、多重化部５２は、量子化部５１から供給されるランダムフラグRND、低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hを多重化し、その結果得られるビットストリームを出力する。そして処理は終了する。 After the process of step S56, in step S57, the multiplexing unit 52 obtains the random flag RND, the low frequency envelope ENV-L, the low frequency spectrum SP-L, and the high frequency envelope ENV-H supplied from the quantization unit 51. Multiplex and output the resulting bitstream. Then, the process ends.

[復号装置の構成例］
図９は、図６の符号化装置５０により符号化されたビットストリームを復号する復号装置の構成例を示すブロック図である。 [Configuration Example of Decoding Device]
FIG. 9 is a block diagram illustrating a configuration example of a decoding device that decodes the bitstream encoded by the encoding device 50 of FIG.

図９の復号装置７０は、分解化部７１、逆量子化部７２、高域成分生成部７３、位相ランダム部７４、および逆MDCT部７５により構成される。復号装置７０は、帯域拡張処理を低域スペクトルSPLの復号処理と同時に行う。 The decoding device 70 of FIG. 9 includes a decomposition unit 71, an inverse quantization unit 72, a high frequency component generation unit 73, a phase random unit 74, and an inverse MDCT unit 75. The decoding device 70 performs the band extension process simultaneously with the decoding process of the low band spectrum SPL.

具体的には、分解化部７１（取得手段）は、図６の符号化装置５０により符号化されたビットストリームを取得する。分解化部７１は、そのビットストリームをランダムフラグRND、低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hに分解し、逆量子化部７２に供給する。 Specifically, the decomposition unit 71 (acquisition means) acquires the bitstream encoded by the encoding device 50 of FIG. The decomposition unit 71 decomposes the bit stream into a random flag RND, a low-frequency envelope ENV-L, a low-frequency spectrum SP-L, and a high-frequency envelope ENV-H, and supplies the result to the inverse quantization unit 72.

逆量子化部７２は、図３の逆量子化部３２と同様に、分解化部７１から供給される低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hそれぞれに対して逆量子化を行う。 Similarly to the inverse quantization unit 32 in FIG. 3, the inverse quantization unit 72 applies the low frequency envelope ENV-L, the low frequency spectrum SP-L, and the high frequency envelope ENV-H supplied from the decomposition unit 71, respectively. On the other hand, inverse quantization is performed.

逆量子化部７２は、逆量子化された低域エンベロープENV-Lを逆MDCT部７５に供給し、低域スペクトルSP-Lを逆MDCT部７５と高域成分生成部７３に供給する。また、逆量子化部７２は、高域エンベロープENV-Hを高域成分生成部７３に供給し、逆量子化部７２は、ランダムフラグRNDを位相ランダム部７４に供給する。 The inverse quantization unit 72 supplies the inversely quantized low frequency envelope ENV-L to the inverse MDCT unit 75, and supplies the low frequency spectrum SP-L to the inverse MDCT unit 75 and the high frequency component generation unit 73. The inverse quantization unit 72 supplies the high frequency envelope ENV-H to the high frequency component generation unit 73, and the inverse quantization unit 72 supplies the random flag RND to the phase random unit 74.

高域成分生成部７３は、逆量子化部７２から供給される低域スペクトルSP-Lと高域エンベロープENV-Hを用いて高域のスペクトルを生成し、擬似高域スペクトルとする。具体的には、例えば、高域成分生成部７３は、低域スペクトルSP-Lを複製し、複製されたスペクトルを高域エンベロープENV-Hを用いて変形し、擬似高域スペクトルとする。 The high frequency component generation unit 73 generates a high frequency spectrum using the low frequency spectrum SP-L and the high frequency envelope ENV-H supplied from the inverse quantization unit 72, and generates a pseudo high frequency spectrum. Specifically, for example, the high frequency component generation unit 73 duplicates the low frequency spectrum SP-L, deforms the duplicated spectrum using the high frequency envelope ENV-H, and generates a pseudo high frequency spectrum.

この擬似高域スペクトルの生成方法としては、例えば、本出願人が先に出願した特許文献１に記載された方法を用いることもできるし、それ以外の方法を用いることもできる。高域成分生成部７３は、生成された擬似高域スペクトルを位相ランダム部７４に供給する。 As a method for generating the pseudo high frequency spectrum, for example, the method described in Patent Document 1 previously filed by the present applicant can be used, and other methods can also be used. The high frequency component generation unit 73 supplies the generated pseudo high frequency spectrum to the phase random unit 74.

位相ランダム部７４は、逆量子化部７２から供給されるランダムフラグRNDに基づいて、高域成分生成部７３から供給される擬似高域スペクトルの位相をランダム化する。 The phase random unit 74 randomizes the phase of the pseudo high frequency spectrum supplied from the high frequency component generating unit 73 based on the random flag RND supplied from the inverse quantization unit 72.

具体的には、位相ランダム部７４は、ランダムフラグRNDがランダム化することを表す1である場合、以下の式（２）により、擬似高域スペクトルの符号（sign,+/-）をランダム化する。 Specifically, when the random flag RND is 1 indicating that the random flag RND is randomized, the phase random unit 74 randomizes the sign (sign, +/-) of the pseudo high frequency spectrum by the following equation (2). To do.

SP-H(i)=-1^(rand()&0x1)×SP-H(i) ・・・（２） SP-H (i) =-1 ^ (rand () & 0x1) × SP-H (i) (2)

なお、式（２）において、SP-Hは高域スペクトルを表し、iはスペクトル番号を表す。 In Formula (2), SP-H represents a high frequency spectrum, and i represents a spectrum number.

式（２）によれば、「-1」をランダム関数rand()の返り値の下位1ビットの回数だけ掛け合わせることで、高域スペクトルSP-Hの符号が-1か1のどちらかにランダムに割り当てられる。 According to Equation (2), the sign of the high frequency spectrum SP-H is either -1 or 1 by multiplying "-1" by the number of times of the lower 1 bit of the return value of the random function rand (). Randomly assigned.

一方、ランダムフラグRNDがランダム化しないことを表す0である場合、位相ランダム部７４は、擬似高域スペクトルの位相をランダム化しない。 On the other hand, when the random flag RND is 0 indicating that randomization is not performed, the phase random unit 74 does not randomize the phase of the pseudo high frequency spectrum.

位相ランダム部７４は、位相がランダム化された擬似高域スペクトル、または、位相がランダム化されなかった擬似高域スペクトルを、逆MDCT部７５に供給する。 The phase random unit 74 supplies a pseudo high frequency spectrum whose phase is randomized or a pseudo high frequency spectrum whose phase is not randomized to the inverse MDCT unit 75.

逆MDCT部７５（合成手段）は、逆量子化部７２から供給される低域エンベロープENV-Lを用いて、低域スペクトルSP-Lを逆正規化する。そして、逆MDCT部７５は、逆正規化された低域スペクトルSP-Lと位相ランダム部７４から供給される擬似高域スペクトルを合成する。逆MDCT部７５は、合成の結果得られる周波数領域信号である全帯域のスペクトルに対して逆MDCTを行い、時間領域信号である全帯域のPCM信号を得る。逆MDCT部７５は、その全帯域のPCM信号を復号結果として出力する。 The inverse MDCT unit 75 (combining means) denormalizes the low frequency spectrum SP-L using the low frequency envelope ENV-L supplied from the inverse quantization unit 72. Then, the inverse MDCT unit 75 synthesizes the denormalized low frequency spectrum SP-L and the pseudo high frequency spectrum supplied from the phase random unit 74. The inverse MDCT unit 75 performs inverse MDCT on the spectrum of the entire band that is the frequency domain signal obtained as a result of the synthesis, and obtains the PCM signal of the entire band that is the time domain signal. The inverse MDCT unit 75 outputs the PCM signal of the entire band as a decoding result.

以上のように、復号装置７０は、低域スペクトルSP-Lの復号と同時に、擬似高域スペクトルの生成を行う。従って、復号装置７０において復号に要する時間は、復号のみを行う通常の復号装置において復号に要する時間と略同一である。即ち、図９の復号装置７０では、ビットストリームが入力されてから、時刻T0後に復号結果を出力することができる。つまり、復号装置７０では、帯域拡張による遅延が発生しない。 As described above, the decoding device 70 generates a pseudo high frequency spectrum simultaneously with the decoding of the low frequency spectrum SP-L. Therefore, the time required for decoding in the decoding device 70 is substantially the same as the time required for decoding in a normal decoding device that performs only decoding. That is, the decoding device 70 in FIG. 9 can output the decoding result after time T0 after the bitstream is input. That is, in the decoding device 70, no delay due to bandwidth expansion occurs.

[復号装置における信号の説明］
図１０は、図９の復号装置７０の逆MDCT部７５から出力される信号を説明する図である。 [Description of Signal in Decoding Device]
FIG. 10 is a diagram illustrating a signal output from the inverse MDCT unit 75 of the decoding device 70 of FIG.

逆MDCT部７５から出力される信号は、図１０に示すような低域エンベロープENV-Lを用いて正規化された低域スペクトルSP-Lと、図１０に示すような高域エンベロープENV-Hと低域スペクトルSP-Lから生成された擬似高域スペクトルの合成結果の周波数変換後のPCM信号である。 The signal output from the inverse MDCT unit 75 includes a low frequency spectrum SP-L normalized using a low frequency envelope ENV-L as shown in FIG. 10 and a high frequency envelope ENV-H as shown in FIG. And a PCM signal after frequency conversion of the synthesized result of the pseudo high frequency spectrum generated from the low frequency spectrum SP-L.

[位相のランダム化による効果の説明］
図１１乃至図１６は、図９の位相ランダム部７４による位相のランダム化の効果を説明する図である。 [Explanation of the effects of phase randomization]
11 to 16 are diagrams for explaining the effect of phase randomization by the phase random unit 74 of FIG.

図１１は、位相のランダム化の有無による復号結果の差を説明する図である。 FIG. 11 is a diagram for explaining a difference in decoding results depending on the presence / absence of phase randomization.

図１１に示すように、図６の符号化装置５０では、フレームと呼ばれる一定の長さを有する区間毎にPCM信号が符号化されるが、そのフレームは、通常、50%ずつオーバーラップされて設定される。具体的には、図１１に示すように、J-1番目のフレームと、その次のJ番目のフレームは、0.5フレーム分だけオーバーラップして設定される。 As shown in FIG. 11, in the encoding device 50 of FIG. 6, a PCM signal is encoded for each section having a certain length called a frame, but the frames are usually overlapped by 50%. Is set. Specifically, as shown in FIG. 11, the J-1th frame and the next Jth frame are set so as to overlap by 0.5 frames.

図１１では、図１１の左側に示すように、トーン性が高いスペクトルが符号化されている場合について説明する。 In FIG. 11, as shown on the left side of FIG. 11, a case where a spectrum having high tone characteristics is encoded will be described.

この場合、図１１の右側の上段に示すように、J-1番目とJ番目のフレームのスペクトルの復号時にスペクトルの位相がランダム化されないと、J-1番目とJ番目のフレームのオーバーラップ期間のスペクトルの位相は、J-1番目とJ番目のフレームのスペクトルと符号の合成により、正確に復元される。従って、復元されたオーバーラップ期間のスペクトルは、トーン性が高いスペクトルとなる。 In this case, as shown in the upper part on the right side of FIG. 11, if the spectrum phase is not randomized when the spectra of the J-1th and Jth frames are decoded, the overlap period of the J-1th and Jth frames Is accurately restored by synthesizing the spectrum and code of the (J-1) th and Jth frames. Therefore, the restored spectrum of the overlap period is a spectrum with high tone characteristics.

一方、右側の下段に示すように、J-1番目とJ番目のフレームのスペクトルの復号時にスペクトルの位相がランダム化されると、J-1番目とJ番目のフレームのスペクトルの符号は必ずしも一致しなくなる。従って、オーバーラップ期間のスペクトルの位相は、正確に復元されない。よって、復号装置７０において復元されたオーバーラップ期間の信号は、符号化前のスペクトルが有していたトーン性が崩れたスペクトルとなる。 On the other hand, as shown in the lower part on the right side, when the spectrum phase is randomized during the decoding of the spectrum of the J-1th and Jth frames, the sign of the spectrum of the J-1th and Jth frames is not necessarily identical. I will not do it. Therefore, the phase of the spectrum in the overlap period is not accurately restored. Therefore, the overlap period signal restored by the decoding device 70 has a spectrum in which the tone characteristic of the spectrum before encoding is lost.

スペクトルのトーン性が崩されると、本来特定のスペクトルに集中しているはずのエネルギーが周囲のスペクトルに漏れ出してしまう。これにより、本来のスペクトルに比べてスペクトルのピーク（山）が抑制され、周囲に漏れだしたエネルギーがスペクトルの谷のエネルギーを押し上げる。その結果、スペクトルがノイズ性を有するようになる。 When the tone of the spectrum is destroyed, energy that should have been concentrated on a specific spectrum leaks to the surrounding spectrum. As a result, the peak (crest) of the spectrum is suppressed compared to the original spectrum, and the energy leaked to the surroundings pushes up the energy of the valley of the spectrum. As a result, the spectrum becomes noisy.

以上のように、復号時に位相のランダム化が行われると、符号化前にトーン性を有していたスペクトルが、ノイズ性を有するスペクトルに変換される。 As described above, when phase randomization is performed at the time of decoding, a spectrum having tone characteristics before encoding is converted into a spectrum having noise characteristics.

図１２乃至図１６は、高域スペクトルSP-Hの特性について説明する図である。 12 to 16 are diagrams for describing the characteristics of the high-frequency spectrum SP-H.

図１２Ａに示すように、低域スペクトルSP-Lのトーン性が高い場合、高域スペクトルSP-Hのトーン性も高いことが多い。これは、管楽器、弦楽器といった楽器類が、基本周波数とその整数倍の高調波成分を組み合わせた音波を発していることから推測することができる。 As shown in FIG. 12A, when the tone characteristics of the low frequency spectrum SP-L are high, the tone characteristics of the high frequency spectrum SP-H are often high. This can be inferred from the fact that musical instruments such as wind instruments and stringed instruments emit sound waves that combine a fundamental frequency and a harmonic component that is an integral multiple of the fundamental frequency.

このようにトーン性の高い低域スペクトルSP-Lと高域スペクトルSP-Hからなるスペクトルが帯域拡張符号化された場合、帯域拡張復号時に、擬似高域スペクトルが低域スペクトルSP-Lを単純に折り返すことにより生成されると、図１２Ｂに示すように、擬似高域スペクトルは、トーン性の高いスペクトルとなる。従って、復号結果に対応する音声は、聴覚的に違和感が少ない音声となる。 When a spectrum consisting of a low-frequency spectrum SP-L and a high-frequency spectrum SP-H with high tone characteristics is band-encoded as described above, the pseudo high-frequency spectrum is simply converted from the low-frequency spectrum SP-L during band expansion decoding. When generated by folding back to, the pseudo high frequency spectrum becomes a spectrum with high tone characteristics as shown in FIG. 12B. Therefore, the sound corresponding to the decoding result is a sound that is less audibly strange.

よって、図６の符号化装置５０は、集中度Dが予め設定されている閾値よりも大きい場合、即ち符号化対象の音声の高域成分にトーン性がある場合、ランダムフラグRNDを0にする。これにより、復号装置７０では、擬似高域スペクトルの位相がランダム化されないので、復号結果に対応する音声は、聴覚的に違和感が少ない音声となる。 Therefore, the encoding apparatus 50 in FIG. 6 sets the random flag RND to 0 when the degree of concentration D is larger than a preset threshold value, that is, when the high frequency component of the encoding target speech has tone characteristics. . Thereby, in the decoding device 70, since the phase of the pseudo high frequency spectrum is not randomized, the sound corresponding to the decoding result is a sound that is less audibly strange.

一方、図１３Ａおよび図１４Ａに示すように、低域スペクトルSP-Lのノイズ性が高い場合、高域になるほどよりノイズ性が高くなる。これは、ノイズ性の高い、即ち非トーン性を有する打撃音や衝撃音などの音を発するシンバルやマラカスなどの楽器において、高域の振動ほど楽器内で伝播されるため、高域の音ほど各振動要素の振幅や位相が複雑に絡み合い、ノイズ性が高くなることから推測できる。 On the other hand, as shown in FIGS. 13A and 14A, when the noise characteristic of the low-frequency spectrum SP-L is high, the noise characteristic becomes higher as the frequency becomes higher. This is because, in a musical instrument such as a cymbal or maraca that emits a sound such as a hitting sound or an impact sound having a high noise characteristic, that is, a non-tone characteristic, the higher frequency vibration is propagated in the musical instrument. It can be inferred from the fact that the amplitude and phase of each vibration element are intertwined in a complicated manner and the noise property is increased.

このようにノイズ性の高い低域スペクトルSP-Lと高域スペクトルSP-Hからなるスペクトルが帯域拡張符号化された場合、図１３Ｂに示すように、帯域拡張復号時に低域スペクトルSP-Lを用いて生成される擬似高域スペクトルは、ノイズ性の高いスペクトルとなる。従って、図１３Ｂに示すように擬似高域スペクトルの位相のランダム化が行われなくても、図１４Ｂに示すようにランダム化が行われても、擬似高域スペクトルのノイズ性は高くなり、復号結果に対応する音声は、聴覚的に違和感が少ない音声となる。 When a spectrum composed of the low-frequency spectrum SP-L and the high-frequency spectrum SP-H having high noise characteristics is band-encoded as shown in FIG. 13B, the low-frequency spectrum SP-L is obtained at the time of band expansion decoding as shown in FIG. The pseudo high-frequency spectrum generated by using the spectrum becomes a spectrum with high noise characteristics. Therefore, even if the phase of the pseudo high frequency spectrum is not randomized as shown in FIG. 13B, or the randomization is performed as shown in FIG. The sound corresponding to the result is a sound that is less audibly strange.

しかしながら、シンバルやマラカスなどの楽器のノイズ性の高い音であっても、低域成分には、トーン的な振動成分が含まれている場合がある。また、シンバルやマラカスなどの楽器の音の周波数は主に高域であり、低域成分には別のトーン性の高い音声が含まれている可能性もある。従って、図１５Ａや図１６Ａに示すように、高域スペクトルSP-Hのノイズ性が高い場合であっても、低域スペクトルSP-Lのトーン性が高い場合がある。 However, there are cases where a low-frequency component includes a tone-like vibration component even for a noisy sound of a musical instrument such as a cymbal or maraca. Moreover, the frequency of the sound of musical instruments such as cymbals and maracas is mainly in the high range, and there is a possibility that another high tone characteristic sound is included in the low range component. Accordingly, as shown in FIG. 15A and FIG. 16A, even if the noise characteristics of the high frequency spectrum SP-H are high, the tone characteristics of the low frequency spectrum SP-L may be high.

このようなトーン性の高い低域スペクトルSP-Lとノイズ性の高い高域スペクトルSP-Hからなるスペクトルが帯域拡張符号化された場合、図１５Ｂに示すように、帯域拡張復号時に低域スペクトルSP-Lを用いて生成される擬似高域スペクトルには、トーン性成分が含まれている可能性がある。従って、図１５Ｂに示すように擬似高域スペクトルの位相がランダム化されないと、復号結果に対応する高域の音声が、本来のノイズ性を有さず、低域の音声と同様にトーン性を有することになり、聴覚的に違和感が多い音声となる。 When a spectrum composed of such a low-frequency spectrum SP-L having a high tone characteristic and a high-frequency spectrum SP-H having a high noise characteristic is subjected to band extension coding, as shown in FIG. The pseudo high frequency spectrum generated using SP-L may include a tone component. Therefore, as shown in FIG. 15B, if the phase of the pseudo high frequency spectrum is not randomized, the high frequency sound corresponding to the decoding result does not have the original noise characteristic, and the tone characteristic is the same as the low frequency sound. As a result, the sound is audibly uncomfortable.

これに対して、擬似高域スペクトルの位相がランダム化されると、元の擬似高域スペクトルにトーン性成分が含まれている場合であっても、図１６Ｂに示すように、ランダム化後の擬似高域スペクトルはノイズ性を有する。従って、復号結果に対応する音声は、聴覚的に違和感が少ない音声となる。 On the other hand, when the phase of the pseudo high frequency spectrum is randomized, even after the original pseudo high frequency spectrum includes a tone component, as shown in FIG. The pseudo high frequency spectrum has noise characteristics. Therefore, the sound corresponding to the decoding result is a sound that is less audibly strange.

以上のように、高域スペクトルSP-Hがノイズ性を有する場合、低域スペクトルSP-Lもノイズ性を有する場合には、ランダム化は行われても行われなくてもよいが、低域スペクトルSP-Lがトーン性を有する場合には、ランダム化を行う必要がある。従って、高域スペクトルSP-Hがノイズ性を有する場合、常にランダム化が行われるようにすることで、集中度Dに基づいて聴覚的に違和感の少ない復号結果が得られるようにすることができる。 As described above, when the high frequency spectrum SP-H has noise characteristics, when the low frequency spectrum SP-L also has noise characteristics, randomization may or may not be performed. When the spectrum SP-L has tone characteristics, it is necessary to perform randomization. Therefore, when the high-frequency spectrum SP-H has noise characteristics, it is possible to obtain a decoding result with a little uncomfortable feeling based on the degree of concentration D by always performing randomization. .

よって、図６の符号化装置５０は、集中度Dが予め設定されている閾値以下である場合、即ち符号化対象の音声の高域成分にノイズ性がある場合、ランダムフラグRNDを1にする。これにより、復号装置７０では、擬似高域スペクトルの位相がランダム化されるので、復号結果に対応する音声は、聴覚的に違和感が少ない音声となる。 Therefore, the encoding apparatus 50 of FIG. 6 sets the random flag RND to 1 when the degree of concentration D is equal to or less than a preset threshold value, that is, when the high frequency component of the speech to be encoded is noisy. . As a result, in the decoding device 70, the phase of the pseudo high frequency spectrum is randomized, so that the sound corresponding to the decoding result is a sound that is less audibly strange.

なお、低域でノイズ性が高く、高域でトーン性が高い音声は自然界にほとんど存在しないため、ノイズ性の高い低域スペクトルSP-Lとトーン性の高い高域スペクトルSP-Hからなるスペクトルについては考慮しない。 Note that since there is almost no sound in the natural world with high noise characteristics at low frequencies and high tone characteristics, there is a spectrum consisting of a low frequency spectrum SP-L with high noise characteristics and a high frequency spectrum SP-H with high tone characteristics. Is not considered.

[復号装置の処理の説明］
図１７は、図９の復号装置７０による復号処理を説明するフローチャートである。この復号処理は、例えば、符号化装置５０により符号化されたビットストリームが復号装置７０に入力されたとき開始される。 [Description of Decoding Device Processing]
FIG. 17 is a flowchart for explaining the decoding process by the decoding device 70 of FIG. This decoding process is started, for example, when a bit stream encoded by the encoding device 50 is input to the decoding device 70.

図１７のステップＳ７１において、分解化部７１は、符号化装置５０により符号化されたビットストリームを取得し、そのビットストリームをランダムフラグRND、低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hに分解する。分解化部７１は、ランダムフラグRND、低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hを逆量子化部７２に供給する。 In step S71 of FIG. 17, the decomposing unit 71 acquires the bit stream encoded by the encoding device 50, and the bit stream is converted into a random flag RND, a low frequency envelope ENV-L, a low frequency spectrum SP-L, And decomposes into high-frequency envelope ENV-H. The decomposition unit 71 supplies the random flag RND, the low frequency envelope ENV-L, the low frequency spectrum SP-L, and the high frequency envelope ENV-H to the inverse quantization unit 72.

ステップＳ７２において、逆量子化部７２は、分解化部７１から供給される低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hそれぞれに対して逆量子化を行う。逆量子化部７２は、逆量子化された低域エンベロープENV-Lを逆MDCT部７５に供給し、低域スペクトルSP-Lを逆MDCT部７５と高域成分生成部７３に供給する。また、逆量子化部７２は、高域エンベロープENV-Hを高域成分生成部７３に供給し、逆量子化部７２は、ランダムフラグRNDを位相ランダム部７４に供給する。 In step S72, the inverse quantization unit 72 performs inverse quantization on each of the low frequency envelope ENV-L, the low frequency spectrum SP-L, and the high frequency envelope ENV-H supplied from the decomposition unit 71. The inverse quantization unit 72 supplies the inversely quantized low frequency envelope ENV-L to the inverse MDCT unit 75, and supplies the low frequency spectrum SP-L to the inverse MDCT unit 75 and the high frequency component generation unit 73. The inverse quantization unit 72 supplies the high frequency envelope ENV-H to the high frequency component generation unit 73, and the inverse quantization unit 72 supplies the random flag RND to the phase random unit 74.

ステップＳ７３において、高域成分生成部７３は、逆量子化部７２から供給される低域スペクトルSP-Lと高域エンベロープENV-Hを用いて擬似高域スペクトルを生成する。高域成分生成部７３は、生成された擬似高域スペクトルを位相ランダム部７４に供給する。 In step S73, the high frequency component generation unit 73 generates a pseudo high frequency spectrum using the low frequency spectrum SP-L and the high frequency envelope ENV-H supplied from the inverse quantization unit 72. The high frequency component generation unit 73 supplies the generated pseudo high frequency spectrum to the phase random unit 74.

ステップＳ７４において、位相ランダム部７４は、逆量子化部７２から供給されるランダムフラグRNDが1であるかどうかを判定する。ステップＳ７４でランダムフラグRNDが1であると判定された場合、ステップＳ７５において、位相ランダム部７４は、上述した式（２）により、高域成分生成部７３から供給される擬似高域スペクトルの位相をランダム化する。そして、位相ランダム部７４は、位相がランダム化された擬似高域スペクトルを逆MDCT部７５に供給し、処理をステップＳ７６に進める。 In step S 74, the phase random unit 74 determines whether or not the random flag RND supplied from the inverse quantization unit 72 is 1. When it is determined in step S74 that the random flag RND is 1, in step S75, the phase random unit 74 calculates the phase of the pseudo high frequency spectrum supplied from the high frequency component generation unit 73 according to the above-described equation (2). Is randomized. Then, the phase random unit 74 supplies the pseudo high frequency spectrum whose phase is randomized to the inverse MDCT unit 75, and advances the processing to step S76.

一方、ステップＳ７４でランダムフラグRNDが1ではない、即ちランダムフラグRNDが0であると判定された場合、位相ランダム部７４は、擬似高域スペクトルの位相をランダム化せず、そのまま逆MDCT部７５に供給する。そして、処理はステップＳ７６に進む。 On the other hand, when it is determined in step S74 that the random flag RND is not 1, that is, the random flag RND is 0, the phase random unit 74 does not randomize the phase of the pseudo high frequency spectrum, and directly performs the inverse MDCT unit 75. To supply. Then, the process proceeds to step S76.

ステップＳ７６において、逆MDCT部７５は、逆量子化部３２から供給される低域エンベロープENV-Lを用いて、低域スペクトルSP-Lを逆正規化する。 In step S76, the inverse MDCT unit 75 denormalizes the low frequency spectrum SP-L using the low frequency envelope ENV-L supplied from the inverse quantization unit 32.

ステップＳ７７において、逆MDCT部７５は、逆正規化された低域スペクトルSP-Lと位相ランダム部７４から供給される擬似高域スペクトルを合成し、その結果得られる全帯域のスペクトルに対して逆MDCTを行い、全帯域のPCM信号を得る。そして、逆MDCT部７５は、その全帯域のPCM信号を復号結果として出力し、処理を終了する。 In step S77, the inverse MDCT unit 75 synthesizes the denormalized low-frequency spectrum SP-L and the pseudo high-frequency spectrum supplied from the phase random unit 74, and performs inverse operation on the spectrum of the entire band obtained as a result. MDCT is performed to obtain PCM signals in all bands. Then, the inverse MDCT unit 75 outputs the PCM signal of the entire band as a decoding result, and ends the process.

以上のように、復号装置７０は、逆MDCT前の低域スペクトルSP-Lを用いて擬似高域スペクトルを生成し、高域スペクトルSP-Hの集中度に基づいて決定されたランダムフラグRNDにしたがって擬似高域スペクトルをランダム化することにより、符号化対象の音声のスペクトルの高域成分を復元する。 As described above, the decoding device 70 generates a pseudo high frequency spectrum using the low frequency spectrum SP-L before inverse MDCT, and uses the random flag RND determined based on the degree of concentration of the high frequency spectrum SP-H. Therefore, the high frequency component of the spectrum of the speech to be encoded is restored by randomizing the pseudo high frequency spectrum.

これにより、低域スペクトルSP-Lを用いて、高域スペクトルSP-Hに比較的合致するスペクトルを、符号化対象の音声のスペクトルの高域成分として復元することができる。従って、低域スペクトルSP-Lを用いて符号化対象の音声のスペクトルの高域成分を復元することにより、低域スペクトルSP-Lの復号処理と帯域拡張処理を同時に行うことができ、帯域拡張による遅延時間を削減することができる。その結果、篭らず、きらびやかで聞き心地の良い全帯域の音声のPCM信号が、復号結果として、帯域拡張処理を行わない復号装置の場合と略同一の時間経過後に出力される。 Thereby, using the low-frequency spectrum SP-L, a spectrum that relatively matches the high-frequency spectrum SP-H can be restored as a high-frequency component of the spectrum of the speech to be encoded. Therefore, by restoring the high-frequency component of the speech spectrum to be encoded using the low-frequency spectrum SP-L, the low-frequency spectrum SP-L can be decoded and expanded at the same time. The delay time due to can be reduced. As a result, the PCM signal of the full-band voice that is not harsh, brilliant, and comfortable to listen to is output as a decoding result after almost the same time as in the case of the decoding device that does not perform the band expansion process.

また、復号装置７０は、低域スペクトルSP-Lを用いて生成された擬似高域スペクトルの位相をランダム化することにより、ノイズ性を有する擬似高域スペクトルを生成するので、ただ単にランダムなスペクトルを擬似高域スペクトルとして生成する場合に比べて、より高域スペクトルSP-Hに合致した擬似高域スペクトルを生成することができる。 Also, the decoding device 70 generates a pseudo high-frequency spectrum having noise characteristics by randomizing the phase of the pseudo high-frequency spectrum generated using the low-frequency spectrum SP-L. Compared to the case where is generated as a pseudo high frequency spectrum, a pseudo high frequency spectrum that matches the high frequency spectrum SP-H can be generated.

さらに、復号装置７０は、逆MDCT前にスペクトルの低域成分と高域成分を生成するので、帯域拡張処理のために、図３の復号装置３０のように帯域分割フィルタ４１および帯域合成フィルタ４３を備える必要がない。従って、図３の復号装置３０に比べて、帯域拡張処理のための処理量、回路規模、コードサイズなどのリソースを削減することができる。 Furthermore, since the decoding device 70 generates a low-frequency component and a high-frequency component of the spectrum before inverse MDCT, the band division filter 41 and the band synthesis filter 43 are used as in the decoding device 30 of FIG. It is not necessary to have. Therefore, it is possible to reduce resources such as a processing amount, a circuit scale, and a code size for the bandwidth extension process, as compared with the decoding device 30 in FIG.

＜第２実施の形態＞
［復号装置の第２実施の形態の構成例］
図１８は、本発明を適用した復号装置の第２実施の形態の構成例を示すブロック図である。 <Second Embodiment>
[Configuration Example of Decoding Device in Second Embodiment]
FIG. 18 is a block diagram illustrating a configuration example of the second embodiment of the decoding device to which the present invention has been applied.

図１８に示す構成のうち、図３や図９の構成と同じ構成には同じ符号を付してある。重複する説明については適宜省略する。 Among the configurations shown in FIG. 18, the same reference numerals are given to the same configurations as the configurations in FIG. 3 and FIG. 9. The overlapping description will be omitted as appropriate.

図１８の復号装置１００の構成は、主に、分解化部７１、逆量子化部７２の代わりに、分解化部３１、逆量子化部３２が設けられている点、および、新たに決定部１０１が設けられている点が、図９の復号装置７０の構成と異なる。復号装置１００は、図１の符号化装置１０により符号化されたビットストリームに含まれる低域スペクトルSP-Lに基づいてランダムフラグRNDを決定する。 The configuration of the decoding device 100 of FIG. 18 is mainly that a decomposition unit 31 and an inverse quantization unit 32 are provided instead of the decomposition unit 71 and the inverse quantization unit 72, and a new determination unit. 9 is different from the configuration of the decoding device 70 in FIG. The decoding apparatus 100 determines the random flag RND based on the low frequency spectrum SP-L included in the bitstream encoded by the encoding apparatus 10 of FIG.

具体的には、決定部１０１は、逆量子化部３２により逆量子化された低域スペクトルSP-Lに基づいて、例えば、以下の式（３）により、低域スペクトルSP-Lの集中度Ｄ´を決定する。 Specifically, the determination unit 101, based on the low frequency spectrum SP-L inversely quantized by the inverse quantization unit 32, for example, by the following formula (3), the degree of concentration of the low frequency spectrum SP-L D ′ is determined.

Ｄ´=max(SP-L)/ave(SP-L) ・・・（３） D '= max (SP-L) / ave (SP-L) (3)

なお、式（３）において、max(SP-L)は、低域スペクトルSP-Lの最大値を表し、ave(SP-L)は、低域スペクトルSP-Lの平均値を表す。 In Expression (3), max (SP-L) represents the maximum value of the low-frequency spectrum SP-L, and ave (SP-L) represents the average value of the low-frequency spectrum SP-L.

式（３）によれば、符号化対象の音声の低域成分のトーン性が高く、低域スペクトルSP-Lの分布に大きな偏りがある場合、集中度Ｄ´は大きくなり、符号化対象の音声の低域成分のノイズ性が高く、低域スペクトルSP-Lの分布が平坦である場合、集中度Ｄ´は小さくなる。 According to the equation (3), when the tone characteristic of the low frequency component of the speech to be encoded is high and the distribution of the low frequency spectrum SP-L is greatly biased, the degree of concentration D ′ becomes large and the encoding target When the noise property of the low frequency component of the voice is high and the distribution of the low frequency spectrum SP-L is flat, the concentration degree D ′ is small.

決定部１０１は、集中度Ｄ´に基づいてランダムフラグRNDを決定する。具体的には、集中度Ｄが、復号装置１００に予め設定されている閾値よりも大きい場合、即ち低域スペクトルSP-Lのトーン性が高い場合、決定部１０１は、ランダムフラグRNDを0に決定する。一方、集中度Ｄ´が予め設定されている閾値以下である場合、即ち低域スペクトルSP-Lのノイズ性が高い場合、決定部１０１は、ランダムフラグRNDを1に決定する。そして、決定部１０１は、決定されたランダムフラグRNDを位相ランダム部７４に供給する。これにより、低域スペクトルSP-Lのトーン性が高い場合、擬似高域スペクトルの位相がランダム化されず、低域スペクトルSP-Lのノイズ性が高い場合、擬似高域スペクトルの位相がランダム化される。その結果、復号結果に対応する音声は、聴覚的に充分な音質の音声となる。 The determination unit 101 determines the random flag RND based on the concentration degree D ′. Specifically, when the degree of concentration D is larger than a threshold set in advance in the decoding apparatus 100, that is, when the tone characteristic of the low frequency spectrum SP-L is high, the determination unit 101 sets the random flag RND to 0. decide. On the other hand, when the degree of concentration D ′ is equal to or lower than a preset threshold value, that is, when the noise characteristic of the low-frequency spectrum SP-L is high, the determination unit 101 determines the random flag RND as 1. Then, the determination unit 101 supplies the determined random flag RND to the phase random unit 74. As a result, when the tone characteristics of the low frequency spectrum SP-L are high, the phase of the pseudo high frequency spectrum is not randomized. When the noise characteristics of the low frequency spectrum SP-L is high, the phase of the pseudo high frequency spectrum is randomized. Is done. As a result, the sound corresponding to the decoding result is a sound with sufficient sound quality.

[復号装置の処理の説明］
図１９は、図１８の復号装置１００による復号処理を説明するフローチャートである。この復号処理は、例えば、図１の符号化装置１０により符号化されたビットストリームが復号装置１００に入力されたとき開始される。 [Description of Decoding Device Processing]
FIG. 19 is a flowchart illustrating a decoding process performed by the decoding device 100 in FIG. This decoding process is started when, for example, a bit stream encoded by the encoding device 10 in FIG.

図１９のステップＳ９１において、分解化部３１は、符号化装置１０により符号化されたビットストリームを低域エンベロープENV-L、低域スペクトルSP-L、および高域エンベロープENV-Hに分解し、逆量子化部３２に供給する。 In step S91 of FIG. 19, the decomposing unit 31 decomposes the bit stream encoded by the encoding device 10 into the low frequency envelope ENV-L, the low frequency spectrum SP-L, and the high frequency envelope ENV-H, This is supplied to the inverse quantization unit 32.

ステップＳ９２およびＳ９３の処理は、図１７のステップＳ７２およびＳ７３の処理と同様であるので、説明は省略する。 The processing in steps S92 and S93 is the same as the processing in steps S72 and S73 in FIG.

ステップＳ９３の処理後、ステップＳ９４において、決定部１０１は、逆量子化部３２により逆量子化された低域スペクトルSP-Lに基づいて、上述した式（３）により、低域スペクトルSP-Lの集中度Ｄ´を決定する。 After the process of step S93, in step S94, the determination unit 101 performs the low-frequency spectrum SP-L according to the above equation (3) based on the low-frequency spectrum SP-L inversely quantized by the inverse quantization unit 32. Is determined.

ステップＳ９５において、決定部１０１は、集中度Ｄ´に基づいて、ランダムフラグRNDを決定する。そして、決定部１０１は、そのランダムフラグRNDを位相ランダム部７４に供給し、処理をステップＳ９６に進める。 In step S95, the determination unit 101 determines the random flag RND based on the concentration degree D ′. Then, the determination unit 101 supplies the random flag RND to the phase random unit 74, and the process proceeds to step S96.

ステップＳ９６乃至Ｓ９９の処理は、図１７のステップＳ７４乃至Ｓ７７の処理と同様であるので、説明は省略する。 The processing in steps S96 to S99 is the same as the processing in steps S74 to S77 in FIG.

＜第３実施の形態＞
[本発明を適用したコンピュータの説明］
次に、上述した一連の符号化処理および復号処理は、ハードウェアにより行うこともできるし、ソフトウェアにより行うこともできる。一連の符号化処理および復号処理をソフトウェアによって行う場合には、そのソフトウェアを構成するプログラムが、汎用のコンピュータ等にインストールされる。 <Third Embodiment>
[Description of Computer to which the Present Invention is Applied]
Next, the above-described series of encoding processing and decoding processing can be performed by hardware or can be performed by software. When a series of encoding processing and decoding processing is performed by software, a program constituting the software is installed in a general-purpose computer or the like.

そこで、図２０は、上述した一連の処理を実行するプログラムがインストールされるコンピュータの一実施の形態の構成例を示している。 Therefore, FIG. 20 shows a configuration example of an embodiment of a computer in which a program for executing the series of processes described above is installed.

プログラムは、コンピュータに内蔵されている記録媒体としての記憶部２０８やROM（Read Only Memory）２０２に予め記録しておくことができる。 The program can be recorded in advance in a storage unit 208 or a ROM (Read Only Memory) 202 as a recording medium built in the computer.

あるいはまた、プログラムは、リムーバブルメディア２１１に格納（記録）しておくことができる。このようなリムーバブルメディア２１１は、いわゆるパッケージソフトウエアとして提供することができる。ここで、リムーバブルメディア２１１としては、例えば、フレキシブルディスク、CD-ROM(Compact Disc Read Only Memory)，MO(Magneto Optical)ディスク，DVD(Digital Versatile Disc)、磁気ディスク、半導体メモリ等がある。 Alternatively, the program can be stored (recorded) in the removable medium 211. Such a removable medium 211 can be provided as so-called package software. Here, examples of the removable medium 211 include a flexible disk, a CD-ROM (Compact Disc Read Only Memory), an MO (Magneto Optical) disk, a DVD (Digital Versatile Disc), a magnetic disk, and a semiconductor memory.

なお、プログラムは、上述したようなリムーバブルメディア２１１からドライブ２１０を介してコンピュータにインストールする他、通信網や放送網を介して、コンピュータにダウンロードし、内蔵する記憶部２０８にインストールすることができる。すなわち、プログラムは、例えば、ダウンロードサイトから、ディジタル衛星放送用の人工衛星を介して、コンピュータに無線で転送したり、LAN(Local Area Network)、インターネットといったネットワークを介して、コンピュータに有線で転送することができる。 The program can be installed on the computer from the removable medium 211 as described above via the drive 210, or can be downloaded to the computer via the communication network or the broadcast network and installed in the built-in storage unit 208. That is, for example, the program is wirelessly transferred from a download site to a computer via a digital satellite broadcasting artificial satellite, or wired to a computer via a network such as a LAN (Local Area Network) or the Internet. be able to.

コンピュータは、CPU(Central Processing Unit)２０１を内蔵しており、CPU２０１には、バス２０４を介して、入出力インタフェース２０５が接続されている。 The computer includes a CPU (Central Processing Unit) 201, and an input / output interface 205 is connected to the CPU 201 via a bus 204.

CPU２０１は、入出力インタフェース２０５を介して、ユーザによって、入力部２０６が操作等されることにより指令が入力されると、それに従って、ROM２０２に格納されているプログラムを実行する。あるいは、CPU２０１は、記憶部２０８に格納されたプログラムを、RAM(Random Access Memory)２０３にロードして実行する。 When a command is input by the user operating the input unit 206 via the input / output interface 205, the CPU 201 executes a program stored in the ROM 202 accordingly. Alternatively, the CPU 201 loads a program stored in the storage unit 208 to a RAM (Random Access Memory) 203 and executes it.

これにより、CPU２０１は、上述したフローチャートにしたがった処理、あるいは上述したブロック図の構成により行われる処理を行う。そして、CPU２０１は、その処理結果を、必要に応じて、例えば、入出力インタフェース２０５を介して、出力部２０７から出力、あるいは、通信部２０９から送信、さらには、記憶部２０８に記録等させる。 Thereby, the CPU 201 performs processing according to the flowchart described above or processing performed by the configuration of the block diagram described above. Then, the CPU 201 outputs the processing result as necessary, for example, via the input / output interface 205, from the output unit 207, transmitted from the communication unit 209, and further recorded in the storage unit 208.

なお、入力部２０６は、キーボードや、マウス、マイク等で構成される。また、出力部２０７は、LCD(Liquid Crystal Display)やスピーカ等で構成される。 The input unit 206 includes a keyboard, a mouse, a microphone, and the like. The output unit 207 includes an LCD (Liquid Crystal Display), a speaker, and the like.

ここで、本明細書において、コンピュータがプログラムに従って行う処理は、必ずしもフローチャートとして記載された順序に沿って時系列に行われる必要はない。すなわち、コンピュータがプログラムに従って行う処理は、並列的あるいは個別に実行される処理（例えば、並列処理あるいはオブジェクトによる処理）も含む。 Here, in the present specification, the processing performed by the computer according to the program does not necessarily have to be performed in time series in the order described as the flowchart. That is, the processing performed by the computer according to the program includes processing executed in parallel or individually (for example, parallel processing or object processing).

また、プログラムは、１のコンピュータ（プロセッサ）により処理されるものであっても良いし、複数のコンピュータによって分散処理されるものであっても良い。さらに、プログラムは、遠方のコンピュータに転送されて実行されるものであっても良い。 Further, the program may be processed by one computer (processor) or may be distributedly processed by a plurality of computers. Furthermore, the program may be transferred to a remote computer and executed.

本発明の実施の形態は、上述した実施の形態に限定されるものではなく、本発明の要旨を逸脱しない範囲において種々の変更が可能である。 The embodiments of the present invention are not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the present invention.

５０符号化装置，５２多重化部，６１決定部，６２抽出部，６３正規化部，７０復号装置，７１分解化部，７３高域成分生成部，７４位相ランダム部，７５逆MDCT部，１００復号装置，１０１分解化部，１０１決定部 50 encoding device, 52 multiplexing unit, 61 determining unit, 62 extracting unit, 63 normalizing unit, 70 decoding device, 71 decomposing unit, 73 high frequency component generating unit, 74 phase random unit, 75 inverse MDCT unit, 100 Decoding device, 101 decomposing unit, 101 determining unit

Claims

Determining means for determining concentration information representing a bias in the distribution of the high-frequency spectrum based on the high-frequency spectrum of the audio signal;
Extraction means for extracting an envelope of the high-frequency spectrum from the spectrum of the audio signal;
A multiplexing unit that multiplexes the information on the degree of concentration determined by the determination unit, the envelope of the high-frequency spectrum extracted by the extraction unit, and the low-frequency spectrum to obtain a coding result. Device.

When the degree of concentration is greater than a predetermined threshold, the determining means generates a predetermined spectrum as the high-frequency spectrum when a decoding device that decodes the encoding result generates the spectrum based on the concentration level. A random flag that is information on the degree of concentration indicating whether or not to randomize is determined as information indicating that the concentration is not randomized, and if the degree of concentration is equal to or less than the predetermined threshold, the random flag is randomized To the information that represents
The encoding apparatus according to claim 1, wherein the multiplexing unit multiplexes the random flag, the envelope of the high frequency spectrum, and the low frequency spectrum to obtain the encoding result.

Normalization means for normalizing the low-frequency spectrum using an envelope of the low-frequency spectrum;
The extraction means extracts an envelope of the low-frequency spectrum from the spectrum of the audio signal,
The multiplexing means multiplexes the information on the degree of concentration, the envelope of the high-frequency spectrum, the envelope of the low-frequency spectrum, and the low-frequency spectrum normalized by the normalizing means , and The encoding device according to claim 1, wherein the encoding result is obtained.

The encoding device
A determination step for determining concentration information representing a bias of distribution of the high-frequency spectrum based on a high-frequency spectrum of the audio signal;
An extraction step of extracting an envelope of the high-frequency spectrum from the spectrum of the audio signal;
Multiplexing step of multiplexing the information on the degree of concentration determined by the processing of the determining step, the envelope of the high frequency spectrum extracted by the processing of the extracting step, and the low frequency spectrum to obtain an encoding result And an encoding method.

On the computer,
A determination step for determining concentration information representing a bias of distribution of the high-frequency spectrum based on a high-frequency spectrum of the audio signal;
An extraction step of extracting an envelope of the high-frequency spectrum from the spectrum of the audio signal;
Multiplexing step of multiplexing the information on the degree of concentration determined by the processing of the determining step, the envelope of the high frequency spectrum extracted by the processing of the extracting step, and the low frequency spectrum to obtain an encoding result A program for executing processing including and.