JP2011154383A

JP2011154383A - Voice encoding device, voice decoding device and methods thereof

Info

Publication number: JP2011154383A
Application number: JP2011054916A
Authority: JP
Inventors: Masahiro Oshikiri; 正浩押切; Tomohito Yamanashi; 智史山梨; Toshiyuki Morii; 利幸森井
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2007-03-02
Filing date: 2011-03-14
Publication date: 2011-08-11
Anticipated expiration: 2028-02-26
Also published as: JP5294713B2; JP5236032B2; JP2009042739A; JP5236033B2; JP2011154384A

Abstract

<P>PROBLEM TO BE SOLVED: To accurately specify a band having a large error among all the bands by using a small calculation amount. <P>SOLUTION: A first position identification unit 201 uses a first layer error conversion coefficient including an error of a decoding signal for an input signal to search for a band having a large error in a relatively wide bandwidth in all the bands of the input signal and generates first position information indicating the identified band. A second position identification unit 202 searches for a target frequency band having a large error in a relatively narrow bandwidth in the band identified by the first position identification unit 201, and generates second position information indicating the identified target frequency band. The encoding unit 203 encodes a first layer decoding error conversion coefficient contained in the target frequency band, and generates encoding information. The first position information, the second position information, and the encoding information are transmitted to a communication partner. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、スケーラブル符号化方式の通信システムに使用される音声符号化装置、音声復号装置およびそれらの方法に関する。 The present invention relates to a speech coding apparatus, a speech decoding apparatus, and a method thereof used in a scalable coding communication system.

移動体通信システムでは、電波資源等の有効利用のために、音声信号を低ビットレートに圧縮して伝送することが要求されている。その一方で、通話音声の品質向上や臨場感の高い通話サービスの実現も望まれており、その実現には、音声信号の高品質化のみならず、より帯域の広いオーディオ信号等、音声信号以外の信号をも高品質に符号化することが望ましい。 In a mobile communication system, it is required to compress and transmit an audio signal at a low bit rate in order to effectively use radio resources and the like. On the other hand, it is also desired to improve the quality of call voice and to realize a call service with a high sense of reality. For this purpose, not only the quality of the audio signal but also the audio signal with a wider bandwidth, etc. It is desirable to encode these signals with high quality.

このように相反する２つの要求に対し、複数の符号化技術を階層的に統合する技術が有望視されている。この技術は、音声信号に適したモデルで入力信号を低ビットレートで符号化する第１レイヤと、入力信号と第１レイヤの復号信号との差分信号を音声以外の信号にも適したモデルで符号化する第２レイヤとを階層的に組み合わせるものである。このように階層的に符号化を行う技術は、符号化装置から得られるビットストリームにスケーラビリティ性、すなわち、ビットストリームの一部の情報からでも復号信号を得ることができる性質を有するため、一般的にスケーラブル符号化（階層符号化）と呼ばれている。 For such two conflicting requirements, a technique for hierarchically integrating a plurality of encoding techniques is considered promising. This technology is a model suitable for audio signals and a first layer that encodes an input signal at a low bit rate, and a differential signal between the input signal and the decoded signal of the first layer is also a model suitable for signals other than audio. The second layer to be encoded is combined hierarchically. The technique of performing hierarchical encoding in this way is general because the bitstream obtained from the encoding device has scalability, that is, a decoded signal can be obtained even from partial information of the bitstream. This is called scalable coding (hierarchical coding).

スケーラブル符号化方式は、その性質から、ビットレートの異なるネットワーク間の通信に柔軟に対応することができるので、ＩＰプロトコルで多様なネットワークが統合されていく今後のネットワーク環境に適したものと言える。 The scalable coding scheme can be flexibly adapted to communication between networks having different bit rates because of its nature, and can be said to be suitable for a future network environment in which various networks are integrated by the IP protocol.

ＭＰＥＧ−４（Moving Picture Experts Group phase-4）で規格化された技術を用いてスケーラブル符号化を実現する例として、例えば、非特許文献１に開示されている技術がある。この技術は、第１レイヤにおいて、音声信号に適したＣＥＬＰ（Code Excited Linear Prediction；符号励振線形予測）符号化を用い、第２レイヤにおいて、原信号から第１レイヤ復号信号を減じた残差信号に対して、ＡＡＣ（Advanced Audio Coder）やＴｗｉｎＶＱ（Transform Domain Weighted Interleave Vector Quantization；周波数領域重み付きインターリーブベクトル量子化）等の変換符号化を用いる。 As an example of realizing scalable coding using a technique standardized by MPEG-4 (Moving Picture Experts Group phase-4), there is a technique disclosed in Non-Patent Document 1, for example. This technique uses CELP (Code Excited Linear Prediction) coding suitable for a speech signal in the first layer, and subtracts the first layer decoded signal from the original signal in the second layer. On the other hand, transform coding such as AAC (Advanced Audio Coder) and TwinVQ (Transform Domain Weighted Interleave Vector Quantization) is used.

一方、非特許文献２は、基本構成単位としてモジュール化されたＴｗｉｎＶＱを用いて所望の周波数帯域のＭＤＣＴ係数を階層的に符号化する手法を開示している。当該モジュールを共通化して複数回使用することにより、シンプルかつ自由度の高いスケーラブル符号化を実現できる。この手法では、各階層(レイヤ)の符号化対象となるサブバンドはあらかじめ決められている構成が基本となるが、入力信号の性質に応じて各階層レイヤの符号化対象となるサブバンドの位置をあらかじめ定められた帯域の中で変動させる構成も開示されている。 On the other hand, Non-Patent Document 2 discloses a technique of hierarchically encoding MDCT coefficients in a desired frequency band using TwinVQ modularized as a basic structural unit. By using the module in common and using it a plurality of times, a simple and highly flexible scalable encoding can be realized. In this method, the subbands to be encoded in each layer (layer) are basically determined in advance, but the position of the subband to be encoded in each layer according to the nature of the input signal. A configuration is also disclosed in which the frequency fluctuates within a predetermined band.

三木弼一編著、「ＭＰＥＧ−４のすべて」、初版、（株）工業調査会、１９９８年９月３０日、ｐ．１２６−１２７Edited by Junichi Miki, “All about MPEG-4”, first edition, Industrial Research Co., Ltd., September 30, 1998, p. 126-127 神明夫他、「階層的変換符号化基本モジュールによって構成されるスケーラブル楽音符号化」、電子情報通信学会論文誌A, Vol. J83-A, No.3, pp.241-252, 2000年3月Jinakio et al., “Scalable Music Coding Constructed by Hierarchical Transform Coding Basic Module”, IEICE Transactions A, Vol. J83-A, No.3, pp.241-252, March 2000 “AMR Wideband Speech Codec; Transcoding functions”, 3GPP TS 26.190, March 2001.“AMR Wideband Speech Codec; Transcoding functions”, 3GPP TS 26.190, March 2001. “Source-Controlled-Variable-Rate Multimode Wideband Speech Codec (VMR-WB), Service options 62 and 63 for Spread Spectrum Systems”, 3GPP2 C.S0052-A, April 2005.“Source-Controlled-Variable-Rate Multimode Wideband Speech Codec (VMR-WB), Service options 62 and 63 for Spread Spectrum Systems”, 3GPP2 C.S0052-A, April 2005. “ピッチフィルタリングによる帯域拡張技術を用いた7/10/15 kHz帯域スケーラブル音声符号化方式”, 音講論集3-11-4, pp.327-328, March 2004."7/10/15 kHz band scalable speech coding using band expansion technology by pitch filtering", pp.327-328, March 2004.

しかしながら、出力信号の音声品質を改善する上で、第２レイヤ符号化部のサブバンド（ターゲット周波数帯域）をどのように設定するかが重要となる。非特許文献２に開示の手法に従えば、第２レイヤの符号化対象となるサブバンドはあらかじめ決められている（図２１（Ａ））。この場合、常に所定のサブバンドの品質を上げることになるため、そのサブバンド以外の帯域に誤差成分が集中している場合に十分な音声品質の改善効果は得られないという問題がある。 However, in order to improve the voice quality of the output signal, it is important how to set the subband (target frequency band) of the second layer encoding unit. According to the technique disclosed in Non-Patent Document 2, the subbands to be encoded in the second layer are determined in advance (FIG. 21A). In this case, since the quality of a predetermined subband is always improved, there is a problem that a sufficient voice quality improvement effect cannot be obtained when error components are concentrated in a band other than the subband.

また、入力信号の性質に応じて、各階層(レイヤ)の符号化対象となるサブバンドの位置をあらかじめ定められた帯域の中で変動させる（図２１（Ｂ））ことが記載されているが、サブバンドの採りえる位置が所定の帯域の中に限定されてしまうので、前述した課題を解決することにはならない。また、仮にサブバンドの採りえる帯域が入力信号の全帯域にわたる（図２１（Ｃ））ようにした場合、サブバンドの位置を特定するための演算量が増加してしまうという問題がある。さらに、レイヤ数を増やしたときに、レイヤ毎にサブバンドの位置を特定する必要があるため、この問題は顕著になる。 Further, it is described that the position of the subband to be encoded in each layer (layer) is changed in a predetermined band in accordance with the nature of the input signal (FIG. 21B). Since the position where the subband can be taken is limited to a predetermined band, the above-described problem cannot be solved. In addition, if the bandwidth that can be taken by the subband extends over the entire bandwidth of the input signal (FIG. 21C), there is a problem that the amount of calculation for specifying the position of the subband increases. Further, when the number of layers is increased, this problem becomes significant because it is necessary to specify the position of the subband for each layer.

本発明は、かかる点に鑑みてなされたものであり、スケーラブル符号化方式において、全帯域の中から誤差が大きい帯域を少ない演算量で正確に特定することができる音声符号化装置、音声復号装置およびそれらの方法を提供することを目的とする。 The present invention has been made in view of the above points, and in a scalable coding system, a speech coding apparatus and speech decoding apparatus that can accurately identify a band having a large error from all bands with a small amount of computation. And an object thereof.

本発明の第１の態様に係る符号化装置は、入力信号を変換係数に変換する変換手段と、符号化の対象となるターゲット周波数帯域を特定する特定手段と、前記変換係数のうち、ターゲット周波数帯域に含まれる変換係数を符号化する符号化手段と、を具備し、前記特定手段は、前記ターゲット周波数帯域よりも広い帯域幅で前記変換係数が最も大きい第１帯域を、所定の第１刻み幅で探索し、特定した第１帯域を示す第１位置情報を生成する第１位置特定手段と、前記第１帯域に渡って、前記第１刻み幅より細かい第２刻み幅で前記ターゲット周波数帯域を探索し、特定したターゲット周波数帯域を示す第２位置情報を生成する第２位置特定手段と、前記第１位置情報、前記第２位置情報により特定されたターゲット周波数帯域に含まれる前記変換係数を符号化して符号化情報を生成する符号化手段と、を具備する構成を採る。 An encoding apparatus according to a first aspect of the present invention includes a conversion unit that converts an input signal into a conversion coefficient, a specifying unit that specifies a target frequency band to be encoded, and a target frequency among the conversion coefficients. Encoding means for encoding a transform coefficient included in a band, wherein the specifying means defines a first band having the largest transform coefficient in a bandwidth wider than the target frequency band in a predetermined first step. A first position specifying means for searching for a width and generating first position information indicating the specified first band; and the target frequency band with a second step width smaller than the first step width across the first band. And a second position specifying means for generating second position information indicating the specified target frequency band, and before being included in the target frequency band specified by the first position information and the second position information. A configuration that includes encoding means for generating encoded information transform coefficients is encoded, the.

本発明の第２の態様に係る復号装置は、符号化の対象となるターゲット周波数帯域に含まれる変換係数に対して符号化処理を行って得られた符号化データ、前記ターゲット周波数よりも広い帯域幅で前記変換係数が最も大きい第１帯域を示す第１位置情報、および前記第１位置帯域の中の前記ターゲット周波数帯域を示す第２位置情報を受信する受信手段と、前記符号化データを復号して復号変換係数を生成する復号手段と、前記第１位置情報および前記第２位置情報に基づいて前記ターゲット周波数帯域を特定し、前記復号変換係数を前記ターゲット周波数帯域に配置する配置手段と、を具備する構成を採る。 The decoding device according to the second aspect of the present invention provides encoded data obtained by performing an encoding process on a transform coefficient included in a target frequency band to be encoded, a band wider than the target frequency. Receiving means for receiving first position information indicating the first band having the largest transform coefficient in width and second position information indicating the target frequency band in the first position band; and decoding the encoded data Decoding means for generating a decoded transform coefficient, and an arrangement means for specifying the target frequency band based on the first position information and the second position information, and arranging the decoded transform coefficient in the target frequency band; The structure which comprises is taken.

本発明の第３の態様に係る符号化方法は、入力信号を変換係数に変換する変換ステップと、符号化の対象となるターゲット周波数帯域を特定する特定ステップと、前記変換係数のうち、ターゲット周波数帯域に含まれる変換係数を符号化する符号化ステップと、を具備し、前記特定ステップは、前記ターゲット周波数帯域よりも広い帯域幅で前記変換係数が最も大きい第１帯域を、所定の第１刻み幅で探索し、特定した第１帯域を示す第１位置情報を生成する第１位置特定ステップと、前記第１帯域に渡って、前記第１刻み幅より細かい第２刻み幅で前記ターゲット周波数帯域を探索し、特定したターゲット周波数帯域を示す第２位置情報を生成する第２位置特定ステップと、前記第１位置情報、前記第２位置情報により特定されたターゲット周波数帯域に含まれる前記変換係数を符号化して符号化情報を生成する符号化ステップと、を具備する方法を採る。 The encoding method according to the third aspect of the present invention includes a conversion step for converting an input signal into a conversion coefficient, a specifying step for specifying a target frequency band to be encoded, and a target frequency among the conversion coefficients. An encoding step for encoding a transform coefficient included in a band, wherein the specifying step includes a first band in which the transform coefficient is the largest in a bandwidth wider than the target frequency band in a predetermined first step. A first position specifying step of searching for a width and generating first position information indicating the specified first band; and the target frequency band in a second step width smaller than the first step width across the first band. A second position specifying step for generating second position information indicating the specified target frequency band, and the target specified by the first position information and the second position information. Adopt a method comprising an encoding step of generating encoded information the transform coefficients included in the frequency band is encoded, a.

本発明の第４の態様に係る復号方法は、符号化の対象となるターゲット周波数帯域に含まれる変換係数に対して符号化処理を行って得られた符号化データ、前記ターゲット周波数よりも広い帯域幅で前記変換係数が最も大きい第１帯域を示す第１位置情報、および前記第１位置帯域の中の前記ターゲット周波数帯域を示す第２位置情報を受信する受信ステップと、前記符号化データを復号して復号変換係数を生成する復号ステップと、前記第１位置情報および前記第２位置情報に基づいて前記ターゲット周波数帯域を特定し、前記復号変換係数を前記ターゲット周波数帯域に配置する配置ステップと、を具備する方法を採る。 The decoding method according to the fourth aspect of the present invention provides encoded data obtained by performing an encoding process on a transform coefficient included in a target frequency band to be encoded, a band wider than the target frequency. A receiving step of receiving first position information indicating a first band having the largest transform coefficient in width and second position information indicating the target frequency band in the first position band; and decoding the encoded data A decoding step of generating a decoded transform coefficient, an arrangement step of identifying the target frequency band based on the first position information and the second position information, and arranging the decoded transform coefficient in the target frequency band; A method comprising:

本発明によれば、全帯域の中から誤差が大きい帯域を少ない演算量で正確に特定することができ、音質を改善することができる。 According to the present invention, it is possible to accurately specify a band having a large error from all bands with a small amount of calculation, and to improve sound quality.

本発明の実施の形態１に係る符号化装置の主要な構成を示すブロック図FIG. 1 is a block diagram showing the main configuration of an encoding apparatus according to Embodiment 1 of the present invention. 図１に示した第２レイヤ符号化部の構成を示すブロック図The block diagram which shows the structure of the 2nd layer encoding part shown in FIG. 図２に示した第１位置特定部が特定する帯域の位置を示す図The figure which shows the position of the zone | band which the 1st position specific part shown in FIG. 2 specifies. 図２に示した第１位置特定部が特定する帯域の他の位置を示す図The figure which shows the other position of the zone | band which the 1st position specific part shown in FIG. 2 specifies. 図２に示した第２位置特定部が特定するターゲット周波数帯域の位置を示す図The figure which shows the position of the target frequency band which the 2nd position specific part shown in FIG. 2 specifies. 図２に示した符号化部の構成を示すブロック図The block diagram which shows the structure of the encoding part shown in FIG. 本発明の実施の形態１に係る復号装置の主要な構成を示すブロック図The block diagram which shows the main structures of the decoding apparatus which concerns on Embodiment 1 of this invention. 図７に示した第２レイヤ復号部の構成を示す図The figure which shows the structure of the 2nd layer decoding part shown in FIG. 図８に示した配置部から出力される第１レイヤ復号誤差変換係数の様子を示す図The figure which shows the mode of the 1st layer decoding error conversion coefficient output from the arrangement | positioning part shown in FIG. 図２に示した第２位置特定部が特定するターゲット周波数の位置を示す図The figure which shows the position of the target frequency which the 2nd position specific part shown in FIG. 2 specifies. 図６に示した符号化部の別の態様の構成を示すブロック図The block diagram which shows the structure of another aspect of the encoding part shown in FIG. 図８に示した第２レイヤ復号部の別の態様の構成を示すブロック図The block diagram which shows the structure of another aspect of the 2nd layer decoding part shown in FIG. 本発明の実施の形態３に係る符号化装置の第２レイヤ符号化部の構成を示すブロック図The block diagram which shows the structure of the 2nd layer encoding part of the encoding apparatus which concerns on Embodiment 3 of this invention. 実施の形態３に係る符号化装置の複数のサブ位置特定部が特定するターゲット周波数の位置を示す図The figure which shows the position of the target frequency which the some sub position specific | specification part of the encoding apparatus which concerns on Embodiment 3 specifies. 本発明の実施の形態４に係る符号化装置の第２レイヤ符号化部の構成を示すブロック図The block diagram which shows the structure of the 2nd layer encoding part of the encoding apparatus which concerns on Embodiment 4 of this invention. 図１５に示した符号化部の構成を示すブロック図The block diagram which shows the structure of the encoding part shown in FIG. 図１６の第２位置情報符号帳に記憶するそれぞれの第２位置情報候補が３つのターゲット周波数を持つ場合の符号化部を示す図The figure which shows an encoding part in case each 2nd position information candidate memorize | stored in the 2nd position information codebook of FIG. 16 has three target frequencies. 図１５に示した符号化部の別の構成を示すブロック図The block diagram which shows another structure of the encoding part shown in FIG. 本発明の実施の形態５に係る第２レイヤ符号化部の構成を示すブロック図Block diagram showing the configuration of the second layer encoding section according to Embodiment 5 of the present invention 図１９に示した第１位置特定部が特定する帯域の位置を示す図The figure which shows the position of the zone | band which the 1st position specific part shown in FIG. 19 specifies. 従来の音声符号化装置の第２レイヤ符号化部の符号化帯域を示す図The figure which shows the encoding band of the 2nd layer encoding part of the conventional audio | voice coding apparatus. 実施の形態６に係る符号化装置の主要な構成を示すブロック図FIG. 9 is a block diagram showing the main configuration of an encoding apparatus according to Embodiment 6 図２２に示した符号化装置の第１レイヤ符号化部の構成を示すブロック図FIG. 22 is a block diagram showing the configuration of the first layer encoding unit of the encoding apparatus shown in FIG. 図２２に示した符号化装置の第１レイヤ復号部の構成を示すブロック図FIG. 22 is a block diagram showing the configuration of the first layer decoding unit of the encoding apparatus shown in FIG. 図２２に示した符号化装置に対応する復号装置の主要な構成を示すブロック図FIG. 22 is a block diagram showing the main configuration of a decoding apparatus corresponding to the encoding apparatus shown in FIG. 実施の形態７に係る符号化装置の主要な構成を示すブロック図FIG. 9 is a block diagram showing the main configuration of an encoding apparatus according to Embodiment 7 図２６に示した符号化装置に対応する復号装置の主要な構成を示すブロック図FIG. 26 is a block diagram showing the main configuration of a decoding apparatus corresponding to the encoding apparatus shown in FIG. 実施の形態７に係る他の態様の符号化装置の主要な構成を示すブロック図FIG. 11 is a block diagram showing the main configuration of an encoding apparatus according to another aspect according to Embodiment 7. 図２８に示した第２レイヤ符号化部における帯域の位置を示す図The figure which shows the position of the band in the 2nd layer encoding part shown in FIG. 図２８に示した第３レイヤ符号化部における帯域の位置を示す図The figure which shows the position of the band in the 3rd layer encoding part shown in FIG. 図２８に示した第４レイヤ符号化部における帯域の位置を示す図The figure which shows the position of the band in the 4th layer encoding part shown in FIG. 図２８に示した符号化装置に対応する復号装置の主要な構成を示すブロック図FIG. 28 is a block diagram showing the main configuration of a decoding apparatus corresponding to the encoding apparatus shown in FIG. 図２８に示した第２レイヤ符号化部における帯域の他の位置を示す図The figure which shows the other position of the band in the 2nd layer encoding part shown in FIG. 図２８に示した第３レイヤ符号化部における帯域の他の位置を示す図The figure which shows the other position of the band in the 3rd layer encoding part shown in FIG. 図２８に示した第４レイヤ符号化部における帯域の他の位置を示す図The figure which shows the other position of the band in the 4th layer encoding part shown in FIG. 実施の形態８に係る第１位置特定部の動作を説明するための図The figure for demonstrating operation | movement of the 1st position specific | specification part which concerns on Embodiment 8. FIG. 実施の形態８に係る第１位置特定部の構成を示すブロック図FIG. 9 is a block diagram showing a configuration of a first position specifying unit according to the eighth embodiment. 実施の形態８に係る第１位置情報構成部において第１位置情報を構成する様子を例示する図The figure which illustrates a mode that 1st position information is comprised in the 1st position information structure part which concerns on Embodiment 8. FIG. 実施の形態８に係る復号処理を説明するための図The figure for demonstrating the decoding process which concerns on Embodiment 8. FIG. 実施の形態８に係るバリエーションを説明するための図The figure for demonstrating the variation which concerns on Embodiment 8. FIG. 実施の形態８に係るバリエーションを説明するための図The figure for demonstrating the variation which concerns on Embodiment 8. FIG.

以下、本発明の実施の形態について、添付図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

（実施の形態１）
図１は、本発明の実施の形態１に係る符号化装置の主要な構成を示すブロック図である。図１に示す符号化装置１００は、周波数領域変換部１０１と、第１レイヤ符号化部１０２と、第１レイヤ復号部１０３と、減算部１０４と、第２レイヤ符号化部１０５と、多重化部１０６とを備える。 (Embodiment 1)
FIG. 1 is a block diagram showing the main configuration of the coding apparatus according to Embodiment 1 of the present invention. 1 includes a frequency domain transform unit 101, a first layer coding unit 102, a first layer decoding unit 103, a subtraction unit 104, a second layer coding unit 105, and a multiplexing unit. Unit 106.

周波数領域変換部１０１は、時間領域の入力信号を周波数領域の信号（入力変換係数）に変換し、入力変換係数を第１レイヤ符号化部１０２に出力する。 Frequency domain transform section 101 transforms a time domain input signal into a frequency domain signal (input transform coefficient), and outputs the input transform coefficient to first layer encoding section 102.

第１レイヤ符号化部１０２は、入力変換係数に対して符号化処理を行い、第１レイヤ符号化データを生成し、この第１レイヤ符号化データを第１レイヤ復号部１０３および多重化部１０６に出力する。 First layer encoding section 102 performs encoding processing on the input transform coefficients, generates first layer encoded data, and uses this first layer encoded data as first layer decoding section 103 and multiplexing section 106. Output to.

第１レイヤ復号部１０３は、第１レイヤ符号化データを用いて復号処理を行い、第１レイヤ復号変換係数を生成し、減算部１０４に出力する。 First layer decoding section 103 performs decoding processing using the first layer encoded data, generates a first layer decoded transform coefficient, and outputs the first layer decoded transform coefficient to subtracting section 104.

減算部１０４は、入力変換係数から第１レイヤ復号部１０３で生成された第１レイヤ復号変換係数を減じて第１レイヤ誤差変換係数を生成し、この第１レイヤ誤差変換係数を第２レイヤ符号化部１０５に出力する。 The subtracting unit 104 subtracts the first layer decoded transform coefficient generated by the first layer decoding unit 103 from the input transform coefficient to generate a first layer error transform coefficient, and the first layer error transform coefficient is converted into a second layer code. To the conversion unit 105.

第２レイヤ符号化部１０５は、減算部１０４より出力された第１レイヤ誤差変換係数の符号化処理を行い、第２レイヤ符号化データを生成し、この第２レイヤ符号化データを多重化部１０６に出力する。 Second layer encoding section 105 performs encoding processing of the first layer error transform coefficient output from subtracting section 104, generates second layer encoded data, and multiplexes the second layer encoded data. The data is output to 106.

多重化部１０６は、第１レイヤ符号化部１０２で求められる第１レイヤ符号化データと第２レイヤ符号化部１０５で求められる第２レイヤ符号化データを多重化してビットストリームを形成し、これを最終的な符号化データとして通信路に出力する。 The multiplexing unit 106 multiplexes the first layer encoded data obtained by the first layer encoding unit 102 and the second layer encoded data obtained by the second layer encoding unit 105 to form a bit stream. Is output to the communication path as final encoded data.

図２は、図１に示した第２レイヤ符号化部１０５の構成を示すブロック図である。図２に示す第２レイヤ符号化部１０５は、第１位置特定部２０１と、第２位置特定部２０２と、符号化部２０３と、多重化部２０４とを備える。 FIG. 2 is a block diagram showing a configuration of second layer encoding section 105 shown in FIG. The second layer encoding unit 105 illustrated in FIG. 2 includes a first position specifying unit 201, a second position specifying unit 202, an encoding unit 203, and a multiplexing unit 204.

第１位置特定部２０１は、減算部１０４から入力される第１レイヤ誤差変換係数を用いて、符号化の対象となるターゲット周波数帯域が採り得る帯域を、所定の帯域幅および所定の刻み幅で探索し、特定した帯域を示す情報を第１位置情報として、第２位置特定部２０２、符号化部２０３および多重化部２０４へ出力する。なお、第１位置特定部２０１の詳細については後述する。なお、この特定した帯域を、「範囲」もしくは「領域」などと言うこともできる。 The first position specifying unit 201 uses the first layer error transform coefficient input from the subtracting unit 104 to determine a band that can be taken by the target frequency band to be encoded with a predetermined bandwidth and a predetermined step size. The information indicating the searched and specified band is output as first position information to the second position specifying unit 202, the encoding unit 203, and the multiplexing unit 204. Details of the first position specifying unit 201 will be described later. The specified band can also be referred to as “range” or “area”.

第２位置特定部２０２は、第１位置特定部２０１より特定された帯域の中から、第１位置特定部２０１での帯域幅より狭い帯域幅、および第１位置特定部２０１での刻み幅より細かい刻み幅でターゲット周波数帯域を探索し、特定したターゲット周波数帯域を示す情報を第２位置情報として、符号化部２０３および多重化部２０４へ出力する。なお、第２位置特定部２０２の詳細については後述する。 The second position specifying unit 202 has a bandwidth narrower than the bandwidth of the first position specifying unit 201 and the step size of the first position specifying unit 201 out of the bands specified by the first position specifying unit 201. The target frequency band is searched with a fine step size, and information indicating the identified target frequency band is output to the encoding unit 203 and the multiplexing unit 204 as second position information. Details of the second position specifying unit 202 will be described later.

符号化部２０３は、第１位置情報および第２位置情報により特定されたターゲット周波数帯域に含まれる第１レイヤ誤差変換係数を符号化し、符号化情報を生成して多重化部２０４に出力する。なお、符号化部２０３の詳細については後述する。 The encoding unit 203 encodes the first layer error transform coefficient included in the target frequency band specified by the first position information and the second position information, generates encoded information, and outputs the encoded information to the multiplexing unit 204. Details of the encoding unit 203 will be described later.

多重化部２０４は、第１位置情報、第２位置情報、符号化情報を多重化し、第２レイヤ符号化データを生成して出力する。なお、この多重化部２０４は必須ではなく、これらの情報を直接図１に示す多重化部１０６に出力しても良い。 The multiplexing unit 204 multiplexes the first position information, the second position information, and the encoded information, generates second layer encoded data, and outputs it. Note that the multiplexing unit 204 is not essential, and such information may be output directly to the multiplexing unit 106 shown in FIG.

図３は、図２に示した第１位置特定部２０１が特定する帯域を示す図である。 FIG. 3 is a diagram showing a band specified by the first position specifying unit 201 shown in FIG.

図３では、第１位置特定部２０１は、予め所定の帯域幅で設定される３つの帯域の中から１つを特定し、この帯域の位置情報を第１位置情報として、第２位置特定部２０２、符号化部２０３および多重化部２０４に出力する。図３に示す各帯域はターゲット周波数帯域以上の帯域幅を持つように設定される（帯域１はＦ_１以上Ｆ_３未満、帯域２はＦ_２以上Ｆ_４未満、帯域３はＦ_３以上Ｆ_５未満としている）。なお、本実施の形態では各帯域は同一の帯域幅を持つように設定されているが、各帯域が異なる帯域幅を持つように設定しても良い。例えば、人間の聴覚の臨界帯域幅のように、低域に位置する帯域の帯域幅は狭く、高域に位置する帯域の帯域幅は広くなるように設定しても良い。 In FIG. 3, the first position specifying unit 201 specifies one of three bands set in advance with a predetermined bandwidth, and uses the position information of this band as the first position information, and the second position specifying unit 201 202, and output to the encoding unit 203 and the multiplexing unit 204. Each band shown in FIG. 3 is set to have a bandwidth equal to or larger than the target frequency band (band 1 is F ₁ or more and less than F ₃ , band 2 is F ₂ or more and less than F ₄ , and band 3 is F ₃ or more and F _5. Less than). In this embodiment, each band is set to have the same bandwidth, but each band may be set to have a different bandwidth. For example, it may be set so that the bandwidth of the band located in the low band is narrow and the bandwidth of the band located in the high band is wide like the critical bandwidth of human hearing.

次いで、第１位置特定部２０１による帯域の特定方法について説明する。ここでは、第１位置特定部２０１は、第１レイヤ誤差変換係数のエネルギーの大きさを基準に帯域を特定する。第１レイヤ誤差変換係数をｅ_１（ｋ）と示し、各帯域に含まれる第１レイヤ誤差変換係数のエネルギーＥ_Ｒ（ｉ）を次式（１）により算出する。

ここで、ｉは帯域を特定する識別子、ＦＲＬ（ｉ）は帯域ｉの最低域周波数、ＦＲＨ（ｉ)は帯域ｉの最高域周波数を表す。 Next, a band specifying method by the first position specifying unit 201 will be described. Here, the 1st position specific | specification part 201 specifies a zone | band on the basis of the magnitude | size of the energy of a 1st layer error conversion coefficient. The first layer error conversion coefficient is denoted as e ₁ (k), and the energy E _R (i) of the first layer error conversion coefficient included in each band is calculated by the following equation (1).

Here, i is an identifier for identifying a band, FRL (i) is the lowest frequency of the band i, and FRH (i) is the highest frequency of the band i.

このように、第１レイヤ誤差変換係数のエネルギーが大きい帯域を特定し、誤差が大きい帯域に含まれる第１レイヤ誤差変換係数を符号化することにより、入力信号に対する復号信号の誤差が小さくなり、音声品質を改善することができる。 In this way, by identifying the band where the energy of the first layer error transform coefficient is large and encoding the first layer error transform coefficient included in the band where the error is large, the error of the decoded signal with respect to the input signal is reduced, Voice quality can be improved.

なお、第１レイヤ誤差変換係数のエネルギーの代わりに、次式（２）のように帯域幅で正規化した正規化エネルギーＮＥ_Ｒ（ｉ）を算出しても良い。

Instead of the energy of the first layer error conversion coefficient, normalized energy NE _R (i) normalized by the bandwidth may be calculated as in the following equation (2).

また、帯域を特定する基準として、第１レイヤ誤差変換係数のエネルギーの代わりに、人間の聴感特性を反映した重み付けが行われた第１レイヤ誤差変換係数のエネルギーＷＥ_Ｒ（ｉ）、ＷＮＥ_Ｒ（ｉ）（帯域幅で正規化した正規化エネルギー）を、式（３）、（４）により算出しても良い。ここで、ｗ（ｋ）は人間の聴感特性に関連する重みを表す。

In addition, as a reference for specifying the band, instead of the energy of the first layer error conversion coefficient, the energy of the first layer error conversion coefficient WE _R (i), WNE _R ( i) (Normalized energy normalized by the bandwidth) may be calculated by the equations (3) and (4). Here, w (k) represents a weight related to human auditory characteristics.

この場合、第１位置特定部２０１は、聴感特性上、重要度の高い周波数には重みを大きくして、その周波数を含む帯域が選択されやすいようにし、一方、重要度の低い周波数には重みを小さくして、その周波数を含む帯域が選択されにくいようにする。これにより、聴感的に重要な帯域が優先的に選択されるため、上記同様の音質改善の効果を得ることができる。この重みとして、例えば、入力信号または第１レイヤの復号信号を基に算出された聴覚マスキング閾値や、人間の聴覚のラウドネス特性を利用して求めたものを用いても良い。 In this case, the first position specifying unit 201 increases the weight for a frequency with high importance in order to facilitate selection of a band including the frequency, while weighting for a frequency with low importance. To make it difficult to select a band including the frequency. As a result, a band that is audibly important is preferentially selected, so that the same sound quality improvement effect as described above can be obtained. As this weight, for example, an auditory masking threshold calculated based on an input signal or a decoded signal of the first layer or a human auditory loudness characteristic may be used.

また、帯域の選択法において、周波数が予め設定された基準周波数（Ｆｘ）よりも低い低域部に配置された帯域の中から帯域を選択するようにしても良い。図４の例では、帯域１〜帯域８の中から帯域を選択する。帯域の選択に制限（基準周波数）を設定する理由は次の通りである。音声信号の特徴の１つである調波構造またはハーモニクス構造（ある周波数間隔でスペクトルがピーク状に現れる構造）は、高域部に比べて低域部のピークが大きく現れ、符号化処理によって生じる量子化誤差（誤差スペクトルまたは誤差変換係数）においても同様に、高域部よりは低域部のピーク性が強くなる。そのため、たとえ、低域部の誤差スペクトル（誤差変換係数）のエネルギーが高域部と比べて小さくても、低域部の誤差スペクトル（誤差変換係数）のピーク性が高域部に比べ強くなる傾向にあることから、低域部の誤差スペクトル（誤差変換係数）が聴覚マスキング閾値（人間が音を感じ取ることのできる閾値）を超えやすくなり、その結果、聴感的な音質劣化を引き起こす。 Further, in the band selection method, a band may be selected from bands arranged in a low frequency part whose frequency is lower than a preset reference frequency (Fx). In the example of FIG. 4, a band is selected from band 1 to band 8. The reason for setting the limit (reference frequency) for the selection of the band is as follows. A harmonic structure or a harmonic structure (a structure in which a spectrum appears in a peak shape at a certain frequency interval), which is one of the characteristics of an audio signal, has a larger peak in the low band than in the high band, and is caused by the encoding process. Similarly, in the quantization error (error spectrum or error conversion coefficient), the peak property in the low frequency region is stronger than that in the high frequency region. Therefore, even if the energy of the low-frequency part error spectrum (error conversion coefficient) is smaller than that of the high-frequency part, the peak of the low-frequency part error spectrum (error conversion coefficient) is stronger than that of the high-frequency part. Due to this tendency, the low-frequency part error spectrum (error conversion coefficient) tends to exceed the auditory masking threshold (threshold by which a human can perceive sound), resulting in auditory sound quality degradation.

この方法によれば、基準周波数を予め設定することにより、周波数が基準周波数（Ｆｘ）よりも高い高域部と比べて、誤差変換係数（または誤差ベクトル）のピーク性が強い低域部からターゲット周波数が決定されるため、誤差変換係数のピークを抑えることができ、音質を改善することができる。 According to this method, by setting the reference frequency in advance, the error conversion coefficient (or error vector) has a higher peak characteristic of the error conversion coefficient (or error vector) than the high frequency part where the frequency is higher than the reference frequency (Fx). Since the frequency is determined, the peak of the error conversion coefficient can be suppressed and the sound quality can be improved.

さらに、帯域の選択法において、低中域部に配置されている帯域の中から帯域を選択するようにしても良い。図３の例では、帯域３は選択候補から除外し、帯域１および帯域２の中から帯域を選択する。これにより、低中域部の中からターゲット周波数帯域が決定されることになる。 Further, in the band selection method, the band may be selected from the bands arranged in the low and mid-range parts. In the example of FIG. 3, band 3 is excluded from selection candidates, and a band is selected from band 1 and band 2. As a result, the target frequency band is determined from the low mid-range portion.

以下では、第１位置特定部２０１は、帯域１を特定した場合に「１」を、帯域２を特定した場合に「２」を、帯域３を特定した場合に「３」を第１位置情報として出力する。 Hereinafter, the first position specifying unit 201 sets the first position information to “1” when the band 1 is specified, “2” when the band 2 is specified, and “3” when the band 3 is specified. Output as.

図５は、図２に示した第２位置特定部２０２が特定するターゲット周波数帯域の位置を示す図である。 FIG. 5 is a diagram showing the position of the target frequency band specified by the second position specifying unit 202 shown in FIG.

第２位置特定部２０２は、第１位置特定部２０１により特定された帯域において、さらに細かい刻み幅でターゲット周波数帯域を特定し、そのターゲット周波数帯域の位置情報を第２位置情報として符号化部２０３および多重化部２０４に出力する。 The second position specifying unit 202 specifies a target frequency band with a finer step size in the band specified by the first position specifying unit 201, and the position information of the target frequency band is set as second position information in the encoding unit 203. And output to the multiplexing unit 204.

次いで、第２位置特定部２０２によるターゲット周波数帯域の特定方法を説明する。ここでは、図２に示す第１位置特定部２０１から出力される第１位置情報が「２」である場合を例にして、ターゲット周波数帯域の幅をＢＷとする。また、帯域２の最低周波数Ｆ_２を起点にし、この最低周波数Ｆ_２を便宜上Ｇ_１とする。そして、第２位置特定部２０２が特定することができるターゲット周波数帯域の最低周波数をＧ_２〜Ｇ_Ｎとする。なお、第２位置特定部２０２が特定するターゲット周波数帯域の刻み幅はＧ_ｎ−Ｇ_ｎ−１であり、一方、第１位置特定部２０１が特定する帯域の刻み幅はＦ_ｎ−Ｆ_ｎ−１（Ｇ_ｎ−Ｇ_ｎ−１＜Ｆ_ｎ−Ｆ_ｎ−１）である。 Next, a method for specifying a target frequency band by the second position specifying unit 202 will be described. Here, the case where the first position information output from the first position specifying unit 201 shown in FIG. 2 is “2” is taken as an example, and the width of the target frequency band is BW. Further, the lowest frequency F ₂ in the band 2 is set as a starting point, and this lowest frequency F ₂ is set as G ₁ for convenience. Then, the lowest frequency of the target frequency band can be a second position specifying section 202 specifies a G ₂ ~G _N. The step size of the target frequency band specified by the second position specifying unit 202 is G _n −G _n−1 , while the step size of the band specified by the first position specifying unit 201 is F _n −F _n−. it is _{_{_{_{1 (G n -G n-1}}}} <F n -F n-1).

第２位置特定部２０２は、最低周波数がそれぞれＧ_１、…、Ｇ_Ｎのターゲット周波数帯域候補から、第１レイヤ誤差変換係数のエネルギー又はそれに類する基準で、ターゲット周波数帯域を特定する。例えば、全てのＧ_ｎ個のターゲット周波数帯域候補について、式（５）により第１レイヤ誤差変換係数のエネルギーを算出し、算出されたエネルギーＥ_Ｒ（ｎ）が最大であるターゲット周波数帯域を特定し、このターゲット周波数帯域の位置情報を第２位置情報として出力する。

Second position specifying section 202, G ₁ lowest frequency, _{respectively,} ..., from the target frequency band candidates G _N, the energy or reference similar to that of the first layer error transform coefficients, to identify the target frequency band. For example, for all G _n target frequency band candidates, the energy of the first layer error conversion coefficient is calculated by Equation (5), and the target frequency band where the calculated energy E _R (n) is maximum is specified. The position information of the target frequency band is output as the second position information.

なお、上記説明したように、人間の聴感特性を反映した重み付けが行われた第１レイヤ誤差変換係数のエネルギーＷＥ_Ｒ（ｎ）が基準となる場合、次式（６）によりＷＥ_Ｒ（ｎ）の算出を行う。ここで、ｗ（ｋ）は人間の聴感特性に関連する重みを表す。この重みとして、例えば、入力信号または第１レイヤの復号信号を基に算出された聴覚マスキング閾値や、人間の聴覚のラウドネス特性を利用して求めたものを用いても良い。

Incidentally, as described above, if energy WE R of first layer error transform coefficients weighting that reflects the characteristics of human perception has been performed _(n) is a reference, the following equation (6) WE R _(n) Is calculated. Here, w (k) represents a weight related to human auditory characteristics. As this weight, for example, an auditory masking threshold calculated based on an input signal or a decoded signal of the first layer or a human auditory loudness characteristic may be used.

この場合、第２位置特定部２０２は、聴感特性上、重要度の高い周波数には重みを大きくして、その周波数を含むターゲット周波数帯域が選択されやすいようにし、一方、重要度の低い周波数には重みを小さくして、その周波数を含むターゲット周波数帯域が選択されにくいようにする。これにより、聴感的に重要なターゲット周波数帯域が優先的に選択されるため、音質をさらに改善することができる。 In this case, the second position specifying unit 202 increases the weight for a frequency with high importance in order to facilitate selection of a target frequency band including the frequency, while reducing the frequency to a frequency with low importance. Reduces the weight so that the target frequency band including the frequency is not easily selected. Thereby, since the target frequency band important perceptually is preferentially selected, the sound quality can be further improved.

図６は、図２に示した符号化部２０３の構成を示すブロック図である。図６に示す符号化部２０３は、目標信号構成部３０１と、誤差算出部３０２と、探索部３０３と、形状符号帳３０４と、ゲイン符号帳３０５とを備える。 FIG. 6 is a block diagram showing a configuration of encoding section 203 shown in FIG. The encoding unit 203 illustrated in FIG. 6 includes a target signal configuration unit 301, an error calculation unit 302, a search unit 303, a shape code book 304, and a gain code book 305.

目標信号構成部３０１は、第１位置特定部２０１から入力される第１位置情報と第２位置特定部２０２から入力される第２位置情報とを使用してターゲット周波数帯域を特定し、減算部１０４から入力される第１レイヤ誤差変換係数からターゲット周波数帯域に含まれる部分を抽出し、抽出された第１レイヤ誤差変換係数を目標信号として誤差算出部３０２に出力する。この第１誤差変換係数をｅ_１（ｋ）と表す。 The target signal constituting unit 301 specifies the target frequency band using the first position information input from the first position specifying unit 201 and the second position information input from the second position specifying unit 202, and the subtracting unit A portion included in the target frequency band is extracted from the first layer error conversion coefficient input from 104, and the extracted first layer error conversion coefficient is output to the error calculation unit 302 as a target signal. This first error conversion coefficient is represented as e ₁ (k).

誤差算出部３０２は、誤差変換係数の形状を表す候補（形状候補）を格納する形状符号帳３０４から入力される第ｉ番目の形状候補、誤差変換係数のゲインを表す候補（ゲイン候補）を格納するゲイン符号帳３０５から入力される第ｍ番目のゲイン候補、および目標信号構成部３０１から入力される目標信号に基づいて、次式（７）により誤差Ｅを算出し、算出された誤差Ｅを探索部３０３に出力する。

ここで、ｓｈ（ｉ，ｋ）は第ｉ番目の形状候補、ｇａ（ｍ）は第ｍ番目のゲイン候補を表す。 The error calculation unit 302 stores the i-th shape candidate input from the shape codebook 304 that stores candidates (shape candidates) representing the shape of the error conversion coefficient, and candidates (gain candidates) representing the gain of the error conversion coefficient. The error E is calculated by the following equation (7) based on the mth gain candidate input from the gain codebook 305 and the target signal input from the target signal configuration unit 301, and the calculated error E is Output to the search unit 303.

Here, sh (i, k) represents the i-th shape candidate, and ga (m) represents the m-th gain candidate.

探索部３０３は、誤差算出部３０２により算出された誤差Ｅを基に、誤差Ｅが最小である形状候補とゲイン候補との組み合わせを探索し、探索結果である形状情報とゲイン情報とを符号化情報として、図２に示す多重化部２０４に出力する。ここで、形状情報は誤差Ｅを最小にするときのパラメータｍ、ゲイン情報は誤差Ｅを最小にするときのパラメータｉを指す。 The search unit 303 searches for a combination of a shape candidate and a gain candidate with the smallest error E based on the error E calculated by the error calculation unit 302, and encodes shape information and gain information as a search result. Information is output to the multiplexing unit 204 shown in FIG. Here, the shape information indicates a parameter m when the error E is minimized, and the gain information indicates a parameter i when the error E is minimized.

なお、誤差算出部３０２は、聴感的に重要なスペクトルに大きな重み付けを行うことにより聴感的に重要なスペクトルの影響を大きくし、次式（８）により誤差Ｅを求めても良い。ここで、ｗ（ｋ）は人間の聴感特性に関連する重みを表す。

Note that the error calculation unit 302 may increase the influence of the audibly important spectrum by applying a large weight to the audibly important spectrum, and obtain the error E by the following equation (8). Here, w (k) represents a weight related to human auditory characteristics.

このように、聴感特性上、重要度の高い周波数には重みを大きくして、聴感特性上、重要度の高い周波数の量子化歪の影響を大きくし、一方、重要度の低い周波数には重みを小さくして、重要度の低い周波数の量子化歪の影響を小さくすることにより、主観品質を改善することができる。 In this way, weights are increased for frequencies that are more important for auditory characteristics, and the influence of quantization distortion for frequencies that are more important for auditory characteristics is increased, while weights for frequencies that are less important are weighted. And the subjective quality can be improved by reducing the influence of quantization distortion at a low importance frequency.

図７は、本実施の形態に係る復号装置の主要な構成を示すブロック図である。図７に示す復号装置６００は、分離部６０１と、第１レイヤ復号部６０２と、第２レイヤ復号部６０３と、加算部６０４と、切替部６０５と、時間領域変換部６０６と、ポストフィルタ６０７とを備える。 FIG. 7 is a block diagram showing the main configuration of the decoding apparatus according to the present embodiment. A decoding apparatus 600 illustrated in FIG. 7 includes a separation unit 601, a first layer decoding unit 602, a second layer decoding unit 603, an addition unit 604, a switching unit 605, a time domain conversion unit 606, and a post filter 607. With.

分離部６０１は、通信路を介して入力されるビットストリームを、第１レイヤ符号化データと第２レイヤ符号化データとに分離して、それぞれ第１レイヤ符号化データを第１レイヤ復号部６０２へ、第２レイヤ符号化データを第２レイヤ復号部６０３へ出力する。また、分離部６０１は、入力されるビットストリームに第１レイヤ符号化データおよび第２レイヤ符号化データの両者が含まれる場合には、「２」をレイヤ情報として切替部６０５に出力する。一方、分離部６０１は、ビットストリームに第１レイヤ符号化データしか含まれていない場合には、「１」をレイヤ情報として切替部６０５に出力する。なお、全ての符号化データが廃棄されている場合もあるが、その場合は各レイヤの復号部は所定の誤り補償処理を行い、ポストフィルタはレイヤ情報が「１」として処理を行うものとする。本実施の形態では、復号装置において、全ての符号化データまたは第２レイヤ符号化データが廃棄された符号化データのいずれかが得られることを前提として説明を行う。 Separating section 601 separates the bit stream input via the communication path into first layer encoded data and second layer encoded data, and converts the first layer encoded data to first layer decoding section 602, respectively. The second layer encoded data is output to second layer decoding section 603. In addition, when both the first layer encoded data and the second layer encoded data are included in the input bitstream, the separation unit 601 outputs “2” as layer information to the switching unit 605. On the other hand, when only the first layer encoded data is included in the bitstream, the separation unit 601 outputs “1” to the switching unit 605 as layer information. Note that all encoded data may be discarded. In this case, the decoding unit of each layer performs predetermined error compensation processing, and the post filter performs processing with layer information “1”. . In the present embodiment, description will be made on the premise that all the encoded data or the encoded data in which the second layer encoded data is discarded is obtained in the decoding apparatus.

第１レイヤ復号部６０２は、第１レイヤ符号化データの復号処理を行い、第１レイヤ復号変換係数を生成し、加算部６０４および切替部６０５に出力する。 First layer decoding section 602 performs a decoding process on the first layer encoded data, generates a first layer decoded transform coefficient, and outputs the first layer decoded transform coefficient to addition section 604 and switching section 605.

第２レイヤ復号部６０３は、第２レイヤ符号化データの復号処理を行い、第１レイヤ復号誤差変換係数を生成し、加算部６０４に出力する。 Second layer decoding section 603 performs decoding processing on the second layer encoded data, generates a first layer decoding error transform coefficient, and outputs the first layer decoding error transform coefficient to adding section 604.

加算部６０４は、第１レイヤ復号変換係数と第１レイヤ復号誤差変換係数を加算して第２レイヤ復号変換係数を生成し、切替部６０５に出力する。 Adder 604 adds the first layer decoded transform coefficient and the first layer decoded error transform coefficient to generate a second layer decoded transform coefficient, and outputs the second layer decoded transform coefficient to switching section 605.

切替部６０５は、分離部６０１から入力されるレイヤ情報に基づき、レイヤ情報が「１」の場合には第１レイヤ復号変換係数を、レイヤ情報が「２」の場合には第２レイヤ復号変換係数を、復号変換係数として時間領域変換部６０６に出力する。 Based on the layer information input from the separation unit 601, the switching unit 605 performs the first layer decoding transform coefficient when the layer information is “1” and the second layer decoding transform when the layer information is “2”. The coefficients are output to the time domain transform unit 606 as decoded transform coefficients.

時間領域変換部６０６は、復号変換係数を時間領域の信号に変換し復号信号を生成し、ポストフィルタ６０７に出力する。 The time domain transform unit 606 converts the decoded transform coefficient into a time domain signal, generates a decoded signal, and outputs the decoded signal to the post filter 607.

ポストフィルタ６０７は、時間領域変換部６０６から出力される復号信号に対して、ポストフィルタ処理を行って、出力信号を生成する。 The post filter 607 performs post filter processing on the decoded signal output from the time domain conversion unit 606 to generate an output signal.

図８は、図７に示した第２レイヤ復号部６０３の構成を示す図である。図８に示す第２レイヤ復号部６０３は、形状符号帳７０１と、ゲイン符号帳７０２と、乗算部７０３と、配置部７０４とを備える。 FIG. 8 is a diagram showing a configuration of second layer decoding section 603 shown in FIG. Second layer decoding section 603 shown in FIG. 8 includes shape codebook 701, gain codebook 702, multiplication section 703, and arrangement section 704.

形状符号帳７０１は、分離部６０１から出力された第２レイヤ符号化データに含まれる形状情報を基に形状候補ｓｈ（ｉ，ｋ）を選択して、乗算部７０３に出力する。 The shape codebook 701 selects a shape candidate sh (i, k) based on the shape information included in the second layer encoded data output from the separation unit 601 and outputs the shape candidate sh (i, k) to the multiplication unit 703.

ゲイン符号帳７０２は、分離部６０１から出力された第２レイヤ符号化データに含まれるゲイン情報を基にゲイン候補ｇａ（ｍ）を選択して、乗算部７０３に出力する。 Gain codebook 702 selects gain candidate ga (m) based on the gain information included in the second layer encoded data output from demultiplexing section 601, and outputs it to multiplication section 703.

乗算部７０３は、形状候補ｓｈ（ｉ，ｋ）にゲイン候補ｇａ（ｍ）を乗じ、配置部７０４に出力する。 The multiplication unit 703 multiplies the shape candidate sh (i, k) by the gain candidate ga (m) and outputs the result to the arrangement unit 704.

配置部７０４は、分離部６０１から出力された第２レイヤ符号化データに含まれる第１位置情報と第２位置情報とにより特定されるターゲット周波数帯域に、乗算部７０３から入力されるゲイン候補乗算後の形状候補を配置し、第１レイヤ復号誤差変換係数として、加算部６０４に出力する。 Arrangement section 704 performs gain candidate multiplication input from multiplication section 703 on the target frequency band specified by the first position information and the second position information included in the second layer encoded data output from separation section 601. Subsequent shape candidates are arranged and output to the adding unit 604 as first layer decoding error transform coefficients.

図９は、図８に示した配置部７０４から出力される第１レイヤ復号誤差変換係数の様子を示す図である。ここで、Ｆ_ｍは第１位置情報により特定される周波数、Ｇ_ｎは第２位置情報により特定される周波数を表す。 FIG. 9 is a diagram illustrating a state of the first layer decoding error transform coefficients output from the arrangement unit 704 illustrated in FIG. Here, F _m represents a frequency specified by the first position information, and G _n represents a frequency specified by the second position information.

このように、本実施の形態によれば、第１位置特定部２０１が、入力信号の全帯域に渡って、所定の帯域幅および所定の刻み幅で誤差が大きい帯域を探索して特定し、第２位置特定部２０２は、第１位置特定部２０１により特定された帯域において、上記所定の帯域幅より狭い帯域幅、および上記所定の刻み幅より細かい刻み幅でターゲット周波数帯域を探索して特定することにより、全帯域の中から誤差が大きい帯域を少ない演算量で正確に特定することができ、音質を改善することができる。 As described above, according to the present embodiment, the first position specifying unit 201 searches and specifies a band having a large error with a predetermined bandwidth and a predetermined step width over the entire band of the input signal, The second position specifying unit 202 searches and specifies a target frequency band in a band specified by the first position specifying unit 201 with a bandwidth narrower than the predetermined bandwidth and a step size finer than the predetermined step width. By doing so, it is possible to accurately specify a band having a large error from all bands with a small amount of calculation, and to improve sound quality.

（実施の形態２）
実施の形態２では、第２位置特定部２０２によるターゲット周波数帯域の別の特定方法について説明する。図１０は、図２に示した第２位置特定部２０２が特定するターゲット周波数の位置を示す図である。本実施の形態に係る符号化装置の第２位置特定部は、実施の形態１で説明した符号化装置の第２位置特定部と異なっており、単一のターゲット周波数を特定する。単一のターゲット周波数に対応する誤差変換係数の形状候補はパルス（または線スペクトル）で表される。なお、本実施の形態において、符号化装置の構成は、符号化部２０３の内部構成を除いて、図１に示した符号化装置と同一であり、復号装置の構成は第２レイヤ復号部６０３の内部構成を除いて、図７に示した復号装置と同一であるので、これらの説明を省略し、第２位置特定に関連する符号化部２０３及び復号装置の第２レイヤ復号部６０３のみについて説明する。 (Embodiment 2)
In the second embodiment, another method for specifying the target frequency band by the second position specifying unit 202 will be described. FIG. 10 is a diagram showing the position of the target frequency specified by the second position specifying unit 202 shown in FIG. The second position specifying unit of the coding apparatus according to the present embodiment is different from the second position specifying unit of the coding apparatus described in the first embodiment, and specifies a single target frequency. The shape candidate of the error conversion coefficient corresponding to a single target frequency is represented by a pulse (or line spectrum). In the present embodiment, the configuration of the encoding apparatus is the same as that of the encoding apparatus shown in FIG. 1 except for the internal configuration of encoding section 203, and the configuration of the decoding apparatus is second layer decoding section 603. 7 is the same as the decoding apparatus shown in FIG. 7, the description thereof will be omitted, and only the encoding unit 203 related to the second position identification and the second layer decoding unit 603 of the decoding apparatus will be described. explain.

本実施の形態では、第２位置特定部２０２は、第１位置特定部２０１により特定された帯域において、単一のターゲット周波数を特定する。従って、本実施の形態では、単一の第１レイヤ誤差変換係数が符号化の対象として選択される。ここでは、第１位置特定部２０１が帯域２を特定した場合を例にして説明する。ターゲット周波数帯域の帯域幅をＢＷとした場合、本実施の形態ではＢＷ＝１である。 In the present embodiment, the second position specifying unit 202 specifies a single target frequency in the band specified by the first position specifying unit 201. Therefore, in the present embodiment, a single first layer error transform coefficient is selected as an encoding target. Here, a case where the first position specifying unit 201 specifies the band 2 will be described as an example. When the bandwidth of the target frequency band is BW, in this embodiment, BW = 1.

具体的に、第２位置特定部２０２は、図１０に示すように、帯域２に含まれる複数のターゲット周波数候補Ｇ_Ｎに対して、上記の式（５）により、それぞれの第１レイヤ誤差変換係数のエネルギーを算出し、または上記の式（６）により、それぞれの人間の聴感特性を反映した重み付けが行われた第１レイヤ誤差変換係数のエネルギーを算出する。また、第２位置特定部２０２は、算出したエネルギーが最大となるターゲット周波数Ｇ_ｎ（１≦ｎ≦Ｎ）を特定し、特定されたターゲット周波数Ｇ_ｎの位置情報を第２位置情報として符号化部２０３に出力する。 Specifically, second position specifying section 202, as shown in FIG. 10, for a plurality of target frequency candidates G _N contained in the band 2, by the above equation (5), the first layer error transform each The energy of the coefficient is calculated, or the energy of the first layer error conversion coefficient that is weighted to reflect the human auditory sensation characteristic is calculated by the above equation (6). Further, the second position specifying unit 202 specifies the target frequency G _n (1 ≦ n ≦ N) that maximizes the calculated energy, and encodes the position information of the specified target frequency G _n as the second position information. The data is output to the unit 203.

図１１は、図６に示した符号化部２０３の別の態様の構成を示すブロック図である。図１１に示す符号化部２０３は、図６に対して形状符号帳３０５を削除した構成を採る。なお、この構成は形状符号帳３０４から出力される信号が常に「１」の場合に相当する。 FIG. 11 is a block diagram showing a configuration of another aspect of encoding section 203 shown in FIG. 11 employs a configuration in which the shape codebook 305 is deleted from FIG. This configuration corresponds to the case where the signal output from the shape codebook 304 is always “1”.

符号化部２０３は、第２位置特定部２０２で特定されたターゲット周波数Ｇ_ｎに含まれる第１レイヤ誤差変換係数を符号化し、符号化情報を生成して多重化部２０４に出力する。ここでは、第２位置特定部２０２から入力されるターゲット周波数は単一であり、符号化の対象となる第１レイヤ誤差変換係数も単一となるので、符号化部２０３は形状符号帳３０４からの形状情報を必要とせず、ゲイン符号帳３０５のみで探索を行い、探索結果のゲイン情報を符号化情報として、多重化部２０４に出力する。 The encoding unit 203 encodes the first layer error transform coefficient included in the target frequency _Gn specified by the second position specifying unit 202, generates encoding information, and outputs the encoded information to the multiplexing unit 204. Here, since the target frequency input from the second position specifying unit 202 is single and the first layer error transform coefficient to be encoded is also single, the encoding unit 203 is based on the shape codebook 304. The shape information is not required, and only the gain codebook 305 is searched, and the gain information of the search result is output to the multiplexing unit 204 as encoded information.

図１２は、図８に示した第２レイヤ復号部６０３の別の態様の構成を示すブロック図である。図１２に示す第２レイヤ復号部６０３は、図８に対して、形状符号帳７０１と乗算部７０３とを削除した構成を採る。なお、この構成は形状符号帳７０１から出力される信号が常に「１」である場合に相当する。 FIG. 12 is a block diagram showing a configuration of another aspect of second layer decoding section 603 shown in FIG. Second layer decoding section 603 shown in FIG. 12 adopts a configuration in which shape codebook 701 and multiplication section 703 are deleted from FIG. This configuration corresponds to the case where the signal output from the shape codebook 701 is always “1”.

配置部７０４は、分離部６０１から出力された第２レイヤ符号化データに含まれる第１位置情報と第２位置情報とにより特定される単一のターゲット周波数に、ゲイン情報によりゲイン符号帳から選択したゲイン候補を配置し、第１レイヤ復号誤差変換係数として、加算部６０４に出力する。 Arrangement unit 704 selects a single target frequency specified by the first position information and the second position information included in the second layer encoded data output from separation unit 601 from the gain codebook using gain information. The gain candidates are arranged and output to the adding unit 604 as first layer decoding error transform coefficients.

このように、本実施の形態によれば、第２位置特定部２０２は、第１位置特定部２０１により特定された帯域の中から単一のターゲット周波数を特定することにより、線スペクトルを正確に表すことができるため、母音のようなトーナリティの強い信号（多数のピークが観察されるスペクトル特性を有する信号）の音質を改善することができる。 As described above, according to the present embodiment, the second position specifying unit 202 specifies the single target frequency from the band specified by the first position specifying unit 201, thereby accurately determining the line spectrum. Therefore, it is possible to improve the sound quality of a signal having a strong tonality such as a vowel (a signal having a spectrum characteristic in which many peaks are observed).

（実施の形態３）
実施の形態３では、第２位置特定部によるターゲット周波数帯域の別の特定方法について説明する。なお、本実施の形態において、符号化装置の構成は、第２レイヤ符号化部１０５の内部構成を除いて、図１に示した符号化装置と同一であるので、その説明を省略する。 (Embodiment 3)
In the third embodiment, another method for specifying the target frequency band by the second position specifying unit will be described. In the present embodiment, the configuration of the encoding device is the same as that of the encoding device shown in FIG. 1 except for the internal configuration of second layer encoding section 105, and thus the description thereof is omitted.

図１３は、本実施の形態に係る符号化装置の第２レイヤ符号化部１０５の構成を示すブロック図である。図１３に示す第２レイヤ符号化部１０５は、図２に対して、第２位置特定部２０２に代えて、第２位置特定部３０１を備える構成を採る。図２に示した第２レイヤ符号化部１０５と同一の構成には同一の番号を付し、その説明を省略する。 FIG. 13 is a block diagram showing a configuration of second layer encoding section 105 of the encoding apparatus according to the present embodiment. Second layer encoding section 105 shown in FIG. 13 employs a configuration including second position specifying section 301 in place of second position specifying section 202 with respect to FIG. The same components as those of second layer encoding section 105 shown in FIG.

図１３に示す第２位置特定部３０１は、第１サブ位置特定部３１１−１と、第２サブ位置特定部３１１−２と、…、第Ｊサブ位置特定部３１１−Ｊと、多重化部３１２とを備える。 13 includes a first sub-position specifying unit 311-1, a second sub-position specifying unit 311-2,..., A J-th sub-position specifying unit 311-J, and a multiplexing unit. 312.

複数のサブ位置特定部（３１１−１、…、３１１−Ｊ）は、第１位置特定部２０１より特定された帯域において、それぞれ異なるターゲット周波数を特定する。具体的には、第ｎ番目のサブ位置特定部３１１−ｎは、第１位置特定部２０１より特定された帯域から第１〜第ｎ−１番目のサブ位置特定部（３１１−１、…、３１１−ｎ−１）が特定したターゲット周波数を除いた帯域において、第ｎ番目のターゲット周波数を特定する。 The plurality of sub position specifying units (311-1, ..., 311-J) specify different target frequencies in the band specified by the first position specifying unit 201. Specifically, the n-th sub-position specifying unit 311-n includes the first to (n-1) -th sub-position specifying units (311-1,..., From the band specified by the first position specifying unit 201. In the band excluding the target frequency specified by 311-n-1), the nth target frequency is specified.

図１４は本実施の形態に係る符号化装置の複数のサブ位置特定部（３１１−１、…、３１１−Ｊ）が特定するターゲット周波数の位置を示す図である。ここでは、第１位置特定部２０１が帯域２を特定し、第２位置特定部３０１によりＪ個のターゲット周波数の位置を特定する場合を例にして説明する。 FIG. 14 is a diagram illustrating the positions of target frequencies specified by a plurality of sub-position specifying units (311-1,..., 311-J) of the encoding apparatus according to the present embodiment. Here, a case where the first position specifying unit 201 specifies the band 2 and the second position specifying unit 301 specifies the positions of the J target frequencies will be described as an example.

図１４（Ａ）に示すように、第１サブ位置特定部３１１−１は、帯域２におけるターゲット周波数候補の中から１つのターゲット周波数を特定し（ここではＧ_３）、そのターゲット周波数の位置情報を多重化部３１２に出力するとともに第２サブ位置特定部３１１−２に出力する。 As shown in FIG. 14A, the first sub-position specifying unit 311-1 specifies one target frequency from the target frequency candidates in the band 2 (here, G ₃ ), and the position information of the target frequency Is output to the multiplexing unit 312 and output to the second sub-position specifying unit 311-2.

図１４（Ｂ）に示すように、第２サブ位置特定部３１１−２は、帯域２から第１サブ位置特定部３１１−１が特定したターゲット周波数Ｇ_３を除いたターゲット周波数の候補の中から１つのターゲット周波数を特定し（ここではＧ_Ｎ−１）、そのターゲット周波数の位置情報を多重化部３１２に出力するとともに第３サブ位置特定部３１１−３に出力する。 As shown in FIG. 14 (B), second sub-position specifying section 311-2, from the band 2 of the candidates of the target frequency by the first sub-position specifying section 311-1 except target frequency _{G 3} identified One target frequency is specified (G _{N-1 in this case} ), and the position information of the target frequency is output to the multiplexing unit 312 and output to the third sub-position specifying unit 311-3.

同様に、図１４（Ｃ）に示すように、第Ｊサブ位置特定部３１１−Ｊは、帯域２から第１〜第Ｊ−１サブ位置特定部（３１１−１、…、３１１−Ｊ−１）が特定したＪ−１個のターゲット周波数を除いたターゲット周波数の候補の中から１つのターゲット周波数を選択し（ここではＧ_５）、それを特定する位置情報を多重化部３１２に出力する。 Similarly, as shown in FIG. 14C, the J-th sub-position specifying unit 311 -J starts with the first to J-1 sub-position specifying units (311-1,..., 311-J-1 from the band 2. ) Selects one target frequency from the target frequency candidates excluding the J−1 target frequencies specified (here, G ₅ ), and outputs position information specifying the target frequency to the multiplexing unit 312.

多重化部３１２は、サブ位置特定部（３１１−１、…３１１−Ｊ）から入力されるＪ個の位置情報を多重して第２位置情報を生成し、符号化部２０３と多重化部２０４に出力する。なお、この多重化部３１２は必須ではなく、Ｊ個の位置情報を直接符号化部２０３および多重化部２０４に出力しても良い。 The multiplexing unit 312 generates the second position information by multiplexing the J pieces of position information input from the sub-position specifying units (311-1,..., 311-J), and the encoding unit 203 and the multiplexing unit 204 Output to. Note that this multiplexing unit 312 is not essential, and J pieces of position information may be directly output to the encoding unit 203 and the multiplexing unit 204.

このように、第２位置特定部３０１は、第１位置特定部２０１により特定された帯域において、Ｊ個のターゲット周波数を特定し、複数個のピークを表現することができるため、母音のようなトーナリティの強い信号の音質を更に改善することができる。また、第１位置特定部２０１により特定された帯域の中からJ個のターゲット周波数を決定すれば良いため、全帯域の中からJ個のターゲット周波数を決定する場合に比べて、複数個のターゲット周波数の組み合わせ数を大幅に削減することできる。これにより、低ビットレート化および低演算量化を実現することができる。 As described above, the second position specifying unit 301 can specify J target frequencies and express a plurality of peaks in the band specified by the first position specifying unit 201. The sound quality of signals with strong tonality can be further improved. In addition, since J target frequencies may be determined from the band specified by the first position specifying unit 201, a plurality of targets may be used as compared with the case where J target frequencies are determined from the entire band. The number of frequency combinations can be greatly reduced. Thereby, a low bit rate and a low calculation amount can be realized.

（実施の形態４）
実施の形態４では、第２レイヤ符号化部１０５における別の符号化方法について説明する。なお、本実施の形態において、符号化装置の構成は、第２レイヤ符号化部１０５の内部構成を除いて、図１に示した符号化装置と同一であるので、その説明を省略する。 (Embodiment 4)
In Embodiment 4, another encoding method in second layer encoding section 105 will be described. In the present embodiment, the configuration of the encoding device is the same as that of the encoding device shown in FIG. 1 except for the internal configuration of second layer encoding section 105, and thus the description thereof is omitted.

図１５は、本実施の形態に係る符号化装置の他の態様の第２レイヤ符号化部１０５の構成を示すブロック図である。図１５に示す第２レイヤ符号化部１０５は、図２に示した第２位置特定部２０２を有さず、更に、図２に示した符号化部２０３に代えて符号化部２２１を備える構成を採る。 FIG. 15 is a block diagram showing a configuration of second layer encoding section 105 of another aspect of the encoding apparatus according to the present embodiment. The second layer encoding unit 105 illustrated in FIG. 15 does not include the second position specifying unit 202 illustrated in FIG. 2, and further includes an encoding unit 221 instead of the encoding unit 203 illustrated in FIG. 2. Take.

符号化部２２１は、ターゲット周波数に含まれる誤差変換係数の符号化の際に生じる量子化歪が最小となるように第２位置情報を決定する。この第２位置情報は第２位置情報符号帳３２１に記憶されている。 The encoding unit 221 determines the second position information so that the quantization distortion that occurs when encoding the error transform coefficient included in the target frequency is minimized. This second position information is stored in the second position information codebook 321.

図１６は、図１５に示した符号化部２２１の構成を示すブロック図である。図１６に示す符号化部２２１は、図６に示した符号化部２０３に対して第２位置情報符号帳３２１を追加し、探索部３０３に代えて探索部３２２を備える構成を採る。なお、図６に示した符号化部２０３と同一の構成には同一の番号を付し、その説明を省略する。 FIG. 16 is a block diagram showing a configuration of encoding section 221 shown in FIG. 16 employs a configuration in which a second location information codebook 321 is added to the encoding unit 203 illustrated in FIG. 6 and a search unit 322 is provided instead of the search unit 303. In addition, the same number is attached | subjected to the structure same as the encoding part 203 shown in FIG. 6, and the description is abbreviate | omitted.

第２位置情報符号帳３２１は、後述する探索部３２２からの制御信号に従い、記憶している第２位置情報の候補から１つの第２位置情報を選択し、目標信号構成部３０１に出力する。図１６の第２位置情報符号帳３２１では、黒点はそれぞれの第２位置情報候補のターゲット周波数の位置を表す。 The second position information codebook 321 selects one second position information from the stored second position information candidates according to a control signal from the search unit 322 described later, and outputs the second position information to the target signal configuration unit 301. In the second position information codebook 321 of FIG. 16, the black dots represent the positions of the target frequencies of the respective second position information candidates.

目標信号構成部３０１は、第１位置特定部２０１から入力される第１位置情報と第２位置情報符号帳３２１において選択された第２位置情報を用いてターゲット周波数を特定し、減算部１０４から入力される第１レイヤ誤差変換係数から特定したターゲット周波数に含まれる部分を抽出し、抽出された第１レイヤ誤差変換係数を目標信号として誤差算出部３０２に出力する。 The target signal constituting unit 301 specifies the target frequency using the first position information input from the first position specifying unit 201 and the second position information selected in the second position information codebook 321, A portion included in the specified target frequency is extracted from the input first layer error conversion coefficient, and the extracted first layer error conversion coefficient is output to the error calculation unit 302 as a target signal.

探索部３２２は、誤差算出部３０２から入力される誤差Ｅを基に、誤差Ｅが最小となる形状候補とゲイン候補、そして第２位置情報候補の組み合わせを探索し、探索結果の形状情報、ゲイン情報および第２位置情報を符号化情報として、図１５に示す多重化部２０４に出力する。また、探索部３２２は、第２位置情報候補を選択して目標信号構成部３０１に出力するという制御信号を第２位置情報符号帳３２１に出力する。 Based on the error E input from the error calculation unit 302, the search unit 322 searches for a combination of a shape candidate, a gain candidate, and a second position information candidate that minimizes the error E, and the search result shape information, gain The information and the second position information are output as encoded information to multiplexing section 204 shown in FIG. In addition, the search unit 322 outputs a control signal for selecting and outputting the second position information candidate to the target signal configuration unit 301 to the second position information codebook 321.

このように、本実施の形態によれば、ターゲット周波数に含まれる誤差変換係数の符号化の際に生じる量子化歪が最小となるように第２位置情報を決定するので、最終的な量子化歪が小さくなるため、音声品質を改善することができる。 Thus, according to the present embodiment, since the second position information is determined so as to minimize the quantization distortion generated when the error transform coefficient included in the target frequency is encoded, the final quantization is performed. Since the distortion is reduced, the voice quality can be improved.

なお、本実施の形態では、図１６に示した第２位置情報符号帳３２１が、単一のターゲット周波数を要素として持つ第２位置情報候補を記憶する例について説明したが、本発明はこれに限らず、図１７に示すように第２位置情報符号帳３２１は、複数個のターゲット周波数を要素として持つ第２位置情報候補を記憶しても良い。図１７は、第２位置情報符号帳３２１に記憶している第２位置情報候補がそれぞれ３つのターゲット周波数を持つ場合の符号化部２２１を示す図である。 In the present embodiment, the second position information codebook 321 shown in FIG. 16 has been described as an example of storing the second position information candidate having a single target frequency as an element. Not limited to this, as shown in FIG. 17, the second position information codebook 321 may store second position information candidates having a plurality of target frequencies as elements. FIG. 17 is a diagram illustrating the encoding unit 221 when the second position information candidates stored in the second position information codebook 321 each have three target frequencies.

また、本実施の形態では、図１６に示した誤差算出部３０２が、形状符号帳３０４およびゲイン符号帳３０５に基づいて誤差Ｅを算出する例について説明したが、本発明はこれに限らず、図１８に示すように形状符号帳３０４を削除し、ゲイン符号帳３０５のみに基づいて誤差Ｅを算出してもよい。図１８は、図１５に示した符号化部２２１の別の構成を示すブロック図である。この構成は、形状符号帳３０４から出力される信号が常に「１」である場合に相当する。この場合、形状が複数個のパルスで構成され、形状符号帳３０４は不要になるため、探索部３２２はゲイン符号帳３０５および第２位置情報符号帳３２１のみを探索し、探索結果のゲイン情報および第２位置情報を符号化情報として、図１５に示した多重化部２０４に出力する。 Further, in the present embodiment, the example in which the error calculation unit 302 illustrated in FIG. 16 calculates the error E based on the shape codebook 304 and the gain codebook 305 has been described, but the present invention is not limited thereto, As shown in FIG. 18, the shape codebook 304 may be deleted, and the error E may be calculated based only on the gain codebook 305. 18 is a block diagram showing another configuration of the encoding unit 221 shown in FIG. This configuration corresponds to a case where the signal output from the shape codebook 304 is always “1”. In this case, since the shape is composed of a plurality of pulses and the shape codebook 304 is not required, the search unit 322 searches only the gain codebook 305 and the second position information codebook 321, and gain information of the search result and The second position information is output as encoded information to the multiplexing unit 204 shown in FIG.

なお、本実施の形態では、第２位置情報符号帳３２１が、実際に記憶領域を確保して第２位置情報候補を記憶する形態を採ることを前提に説明したが、本発明はこれに限らず、第２位置情報符号帳３２１が、あらかじめ定められた処理手順に従い第２位置情報候補を生成するようにしても良い。この場合、第２位置情報符号帳３２１には記憶領域が不用となる。 In the present embodiment, the second position information codebook 321 has been described on the premise that the second position information codebook 321 actually secures a storage area and stores the second position information candidates. However, the present invention is not limited to this. Instead, the second position information codebook 321 may generate the second position information candidate according to a predetermined processing procedure. In this case, the second location information codebook 321 does not require a storage area.

（実施の形態５）
実施の形態５では、第１位置特定部による帯域の別の特定方法について説明する。なお、本実施の形態において、符号化装置の構成は、第２レイヤ符号化部１０５の内部構成を除いて、図１に示した符号化装置と同一であるので、その説明を省略する。 (Embodiment 5)
In the fifth embodiment, another band specifying method by the first position specifying unit will be described. In the present embodiment, the configuration of the encoding device is the same as that of the encoding device shown in FIG. 1 except for the internal configuration of second layer encoding section 105, and thus the description thereof is omitted.

図１９は本実施の形態に係る符号化装置の第２レイヤ符号化部１０５の構成を示すブロック図である。図１９示す第２レイヤ符号化部１０５は、図２に示した第１位置特定部２０１に代えて、第１位置特定部２３１を備える構成を採る。 FIG. 19 is a block diagram showing a configuration of second layer encoding section 105 of the encoding apparatus according to the present embodiment. The second layer encoding unit 105 shown in FIG. 19 employs a configuration including a first position specifying unit 231 instead of the first position specifying unit 201 shown in FIG.

図示せぬ算出部は、入力信号に対してピッチ分析し、ピッチ周期を求め、求められたピッチ周期の逆数からピッチ周波数を算出する。なお、算出部は、第１レイヤ符号化部１０２の符号化処理によって生成された第１レイヤ符号化データからピッチ周波数を算出してもよい。この場合、第１レイヤ符号化データが送信されるため、ピッチ周波数を特定する情報を別途送信する必要がなくなる。また、算出部は、ピッチ周期を特定するピッチ周期情報を多重化部１０６に出力する。 A calculation unit (not shown) performs pitch analysis on the input signal to obtain a pitch period, and calculates a pitch frequency from the reciprocal of the obtained pitch period. Note that the calculation unit may calculate the pitch frequency from the first layer encoded data generated by the encoding process of the first layer encoding unit 102. In this case, since the first layer encoded data is transmitted, it is not necessary to separately transmit information for specifying the pitch frequency. In addition, the calculation unit outputs pitch cycle information for specifying the pitch cycle to the multiplexing unit 106.

第１位置特定部２３１は、図示せぬ算出部から入力されるピッチ周波数に基づいて、所定の相対的に広い帯域幅で帯域を特定し、特定された帯域の位置情報を第１位置情報として第２位置特定部２０２、符号化部２０３および多重化部２０４に出力する。 The first position specifying unit 231 specifies a band with a predetermined relatively wide bandwidth based on a pitch frequency input from a calculation unit (not shown), and uses position information of the specified band as first position information. The data is output to the second position specifying unit 202, the encoding unit 203, and the multiplexing unit 204.

図２０は、図１９に示した第１位置特定部２３１が特定する帯域の位置を示す図である。図２０に示す３つの帯域は、入力されるピッチ周波数ＰＦを基に定められる基準周波数Ｆ_１〜Ｆ_３の整数倍の近傍の帯域である。基準周波数は、ピッチ周波数ＰＦに所定の値を加えて求められる周波数である。具体的な例として、ここでは−１、０、１をＰＦに加えた値とし、基準周波数はＦ_１＝ＰＦ−１、Ｆ_２＝ＰＦ、Ｆ_３＝ＰＦ＋１となる。 FIG. 20 is a diagram showing the position of the band specified by the first position specifying unit 231 shown in FIG. The three bands shown in FIG. 20 are bands in the vicinity of integer multiples of the reference frequencies F _{1 to} F ₃ determined based on the input pitch frequency PF. The reference frequency is a frequency obtained by adding a predetermined value to the pitch frequency PF. As a specific example, here, -1, 0, and 1 are added to PF, and the reference frequencies are F ₁ = PF-1, F ₂ = PF, and F ₃ = PF + 1.

ピッチ周波数の整数倍を基準とした帯域を設定する理由は、音声信号には、特にピッチ周期性の強い母音部において、ピッチ周期の逆数(ピッチ周波数)の整数倍近傍にスペクトルのピークが立つという特徴（調波構造またはハーモニクス）があり、また、第１レイヤ誤差変換係数においてもピッチ周波数の整数倍近傍に大きな誤差が生じやすいからである。 The reason for setting a band based on an integer multiple of the pitch frequency is that the audio signal has a spectrum peak in the vicinity of an integral multiple of the reciprocal of the pitch period (pitch frequency), particularly in the vowel part having a strong pitch periodicity. This is because there is a characteristic (harmonic structure or harmonics), and a large error is likely to occur near the integral multiple of the pitch frequency in the first layer error conversion coefficient.

このように、本実施の形態によれば、第１位置特定部２３１は、ピッチ周波数の整数倍近傍の帯域を特定するため、最終的に第２位置特定部２０２により特定されるターゲット周波数はピッチ周波数の近傍になるので、少ない演算量で音声品質を改善することができる。 As described above, according to the present embodiment, since the first position specifying unit 231 specifies a band in the vicinity of an integer multiple of the pitch frequency, the target frequency finally specified by the second position specifying unit 202 is the pitch. Since the frequency is close, the voice quality can be improved with a small amount of calculation.

（実施の形態６）
実施の形態６では、符号化処理において、高域部を雑音などによる近似信号で代用する方法を用いる第１レイヤ符号化部を有する符号化装置に、本発明による符号化方法を適用する場合について説明する。図２２は本実施の形態に係る符号化装置２２０の主要な構成を示すブロック図である。図２２に示す符号化装置２２０は、第１レイヤ符号化部２２０１と、第１レイヤ復号部２２０２と、遅延部２２０３と、減算部１０４と、周波数領域変換部１０１と、第２レイヤ符号化部１０５と、多重化部１０６とを備える。なお、図２２の符号化装置２２０において、図１に示した符号化装置１００と同様の構成要素には同一の番号を付し、その説明を省略する。 (Embodiment 6)
In the sixth embodiment, the encoding method according to the present invention is applied to an encoding apparatus having a first layer encoding unit that uses a method of substituting an approximate signal due to noise or the like in an encoding process. explain. FIG. 22 is a block diagram showing the main configuration of encoding apparatus 220 according to the present embodiment. 22 includes a first layer encoding unit 2201, a first layer decoding unit 2202, a delay unit 2203, a subtraction unit 104, a frequency domain transform unit 101, and a second layer encoding unit. 105 and a multiplexing unit 106. In the encoding device 220 of FIG. 22, the same components as those of the encoding device 100 shown in FIG.

本実施の形態の第１レイヤ符号化部２２０１は、高域部を雑音などによる近似信号で代用する方式を採用する。具体的には、聴感的に重要度の低い高域部を近似信号で表し、その代わりに聴感的に重要な低域部（または低中域部）のビット配分を増やしてこの帯域の原信号に対する忠実度を向上させる。これにより、全体的な音質の向上を図る。例えば、ＡＭＲ−ＷＢ方式(非特許文献３)やＶＭＲ−ＷＢ方式(非特許文献４)が挙げられる。 First layer encoding section 2201 according to the present embodiment employs a scheme that substitutes the high frequency section with an approximate signal such as noise. Specifically, the high-frequency part that is less perceptually important is represented by an approximate signal, and instead, the bit distribution of the low-frequency part (or low-middle part) that is perceptually important is increased and the original signal of this band is increased. Improve fidelity to. As a result, the overall sound quality is improved. For example, an AMR-WB system (Non-patent Document 3) and a VMR-WB system (Non-Patent Document 4) can be mentioned.

第１レイヤ符号化部２２０１は、入力信号を符号化して第１レイヤ符号化データを生成し、多重化部１０６及び第１レイヤ復号部２２０２に出力する。なお、第１レイヤ符号化部２２０１の詳細については後述する。 First layer encoding section 2201 encodes the input signal to generate first layer encoded data, and outputs the first layer encoded data to multiplexing section 106 and first layer decoding section 2202. Details of first layer encoding section 2201 will be described later.

第１レイヤ復号部２２０２は、第１レイヤ符号化部２２０１から入力される第１レイヤ符号化データを用いて復号処理を行い、第１レイヤ復号信号を生成し、減算部１０４に出力する。なお、第１レイヤ復号部２２０２の詳細については後述する。 First layer decoding section 2202 performs a decoding process using the first layer encoded data input from first layer encoding section 2201, generates a first layer decoded signal, and outputs the first layer decoded signal to subtraction section 104. Details of first layer decoding section 2202 will be described later.

次いで、図２３を用いて、第１レイヤ符号化部２２０１の詳細について説明する。図２３は符号化装置２２０の第１レイヤ符号化部２２０１の構成を示すブロック図である。図２３に示すように、第１レイヤ符号化部２２０１は、ダウンサンプリング部２２１０と、コア符号化部２２２０とから構成される。 Next, details of first layer encoding section 2201 will be described using FIG. FIG. 23 is a block diagram showing a configuration of first layer encoding section 2201 of encoding apparatus 220. As shown in FIG. 23, first layer encoding section 2201 includes a downsampling section 2210 and a core encoding section 2220.

ダウンサンプリング部２２１０は、時間領域の入力信号をダウンサンプリングして、所望のサンプリングレートに変換し、ダウンサンプリングした時間領域信号をコア符号化部２２２０に出力する。 The down-sampling unit 2210 down-samples the time-domain input signal, converts it to a desired sampling rate, and outputs the down-sampled time-domain signal to the core encoding unit 2220.

コア符号化部２２２０は、ダウンサンプリング部２２１０の出力信号に対して符号化処理を行い、第１レイヤ符号化データを生成し、第１レイヤ復号部２２０２及び多重化部１０６に出力する。 Core encoding section 2220 performs an encoding process on the output signal of downsampling section 2210, generates first layer encoded data, and outputs the first layer encoded data to first layer decoding section 2202 and multiplexing section 106.

次いで、図２４を用いて第１レイヤ復号部２２０２の詳細について説明する。図２４は符号化装置２２０の第１レイヤ復号部２２０２の構成を示すブロック図である。図２４に示すように、第１レイヤ復号部２２０２は、コア復号部２２３０と、アップサンプリング部２２４０と、高域成分付与部２２５０と、から構成される。 Next, details of first layer decoding section 2202 will be described using FIG. FIG. 24 is a block diagram showing a configuration of first layer decoding section 2202 of encoding apparatus 220. As shown in FIG. 24, first layer decoding section 2202 includes core decoding section 2230, upsampling section 2240, and high frequency component adding section 2250.

コア復号部２２３０は、コア符号化部２２２０から入力される第１レイヤ符号化データを用いて復号処理を行って復号信号を生成し、アップサンプリング部２２４０に出力するとともに、復号処理によって求められた復号ＬＰＣ係数を高域成分付与部２２５０に出力する。 Core decoding section 2230 performs decoding processing using the first layer encoded data input from core encoding section 2220, generates a decoded signal, outputs the decoded signal to upsampling section 2240, and is obtained by the decoding processing. The decoded LPC coefficient is output to high frequency component adding section 2250.

アップサンプリング部２２４０は、コア復号部２２３０から出力された復号信号をアップサンプリングして、入力信号と同じサンプリングレートに変換し、アップサンプリング後の信号を高域成分付与部２２５０に出力する。 The upsampling unit 2240 upsamples the decoded signal output from the core decoding unit 2230, converts the decoded signal into the same sampling rate as the input signal, and outputs the upsampled signal to the high frequency component adding unit 2250.

高域成分付与部２２５０は、ダウンサンプリング部２２４０よりアップサンプリングされた信号に対して、例えば非特許文献３および非特許文献４に記載されている方法により高域成分の近似信号を生成し、欠損した高域部を補う。 The high frequency component adding unit 2250 generates an approximate signal of the high frequency component for the signal up-sampled by the down-sampling unit 2240 by the method described in Non-Patent Document 3 and Non-Patent Document 4, for example, To compensate for high frequencies.

図２５は、本実施の形態に係る符号化装置に対応する復号装置の主要な構成を示すブロック図である。図２５の復号装置２５０は、図７に示した復号装置６００と同様の基本構成を有しており、第１レイヤ復号部６０２に代えて、第１レイヤ復号部２５０１を備える。第１レイヤ復号部２５０１は符号化装置の第１レイヤ復号部２２０２と同様に、図示せぬコア復号部、アップサンプリング部及び高域成分付与部から構成される。ここでは、それらについての詳細な説明を省略する。 FIG. 25 is a block diagram showing the main configuration of a decoding apparatus corresponding to the encoding apparatus according to the present embodiment. 25 has the same basic configuration as decoding apparatus 600 shown in FIG. 7, and includes first layer decoding section 2501 instead of first layer decoding section 602. Similarly to the first layer decoding unit 2202 of the encoding device, the first layer decoding unit 2501 includes a core decoding unit, an upsampling unit, and a high frequency component adding unit (not shown). Here, detailed description thereof is omitted.

コア復号部より与えられる復号ＬＰＣ係数によって構成される合成フィルタに、雑音信号のような符号化部および復号部で付加情報なしに生成できる信号を通し、合成フィルタの出力信号を高域成分の近似信号に用いる。このとき、入力信号の高域成分と第１レイヤ復号信号の高域成分が全く異なる波形となるため、減算部で求められる誤差信号の高域成分のエネルギーは入力信号の高域成分のエネルギーよりもむしろ大きくなってしまう。これにより、第２レイヤ符号化部では、聴感的な重要度が低い高域部に配置された帯域が選択されやすくなるという問題が生じる。 A signal that can be generated without additional information by the encoding unit and decoding unit such as a noise signal is passed through a synthesis filter composed of decoded LPC coefficients given by the core decoding unit, and the output signal of the synthesis filter is approximated to a high frequency component Used for signals. At this time, since the high frequency component of the input signal and the high frequency component of the first layer decoded signal have completely different waveforms, the energy of the high frequency component of the error signal obtained by the subtracting unit is higher than the energy of the high frequency component of the input signal. But it will get bigger. As a result, the second layer encoding unit has a problem that it is easy to select a band arranged in a high frequency part having low auditory importance.

本実施の形態によれば、上記のように第１レイヤ符号化部２２０１の符号化処理において、高域部を雑音などによる近似信号で代用する方法を用いる符号化装置２２０において、予め設定された基準周波数よりも周波数が低い低域部から帯域を選択することにより、誤差信号（又は誤差変換係数）の高域部のエネルギーが増大されても、聴覚的な感度の高い低域部を第２レイヤ符号化部の符号化対象として選択することできるため、音質を改善することができる。 According to the present embodiment, as described above, in encoding process of first layer encoding section 2201, encoding apparatus 220 that uses a method of substituting an approximate signal due to noise or the like in the high-frequency section is set in advance. Even if the energy of the high frequency part of the error signal (or error conversion coefficient) is increased by selecting the band from the low frequency part whose frequency is lower than the reference frequency, the second low frequency part having high auditory sensitivity is selected. Since it can be selected as an encoding target of the layer encoding unit, sound quality can be improved.

なお、本実施の形態では、高域部に関する情報を復号部に送らない構成を例にして説明したが、本発明はこれに限らず、例えば、非特許文献５のように高域部の信号を低域部に比べて低ビットレートで符号化して復号部に送る構成であっても良い。 In the present embodiment, the configuration in which the information related to the high frequency band is not sent to the decoding unit has been described as an example. However, the present invention is not limited to this, and for example, as in Non-Patent Document 5, May be encoded at a lower bit rate than the low-frequency part and sent to the decoding part.

また、図２２に示す符号化装置２２０において、減算部１０４は、時間領域の信号同士の差をとる構成であるが、減算部は周波数領域の変換係数同士の差をとる構成にしても良い。この場合、周波数領域変換部１０１を遅延部２２０３と減算部１０４の間に配置して入力変換係数を求め、第１レイヤ復号部２２０２と減算部１０４の間に周波数領域変換部１０１を新たに追加して第１レイヤ復号変換係数を求める。そして、減算部１０４は入力変換係数と第１レイヤ復号変換係数の差をとり、その誤差変換係数を第２レイヤ符号化部に直接与える構成となる。この構成によれば、ある帯域では差をとり別の帯域では差はとらないなど、各帯域に適した減算処理が可能になり、音質をさらに改善することができる。 In the encoding device 220 shown in FIG. 22, the subtractor 104 is configured to take a difference between signals in the time domain, but the subtractor may be configured to take a difference between transform coefficients in the frequency domain. In this case, the frequency domain transform unit 101 is disposed between the delay unit 2203 and the subtraction unit 104 to obtain an input transform coefficient, and the frequency domain transform unit 101 is newly added between the first layer decoding unit 2202 and the subtraction unit 104. Thus, the first layer decoding transform coefficient is obtained. The subtracting unit 104 is configured to take the difference between the input transform coefficient and the first layer decoded transform coefficient and directly give the error transform coefficient to the second layer encoding unit. According to this configuration, it is possible to perform subtraction processing suitable for each band, such as making a difference in one band and not taking a difference in another band, thereby further improving sound quality.

（実施の形態７）
実施の形態７では、他の構成の符号化装置及び復号装置において、本発明による符号化方法を適用する場合について説明する。図２６は、本実施の形態に係る符号化装置２６０の主要な構成を示すブロック図である。 (Embodiment 7)
In Embodiment 7, a case will be described in which the encoding method according to the present invention is applied to an encoding device and a decoding device having other configurations. FIG. 26 is a block diagram showing the main configuration of encoding apparatus 260 according to the present embodiment.

図２６に示す符号化装置２６０は、図２２に示した符号化装置２２０に対して、重みフィルタ部２６０１を追加した構成を有する。なお、図２６の符号化装置２６０において、図２２と同様の構成要素については同一の番号を付し、その説明を省略する。 26 has a configuration in which a weight filter unit 2601 is added to the encoding device 220 illustrated in FIG. In addition, in the encoding apparatus 260 of FIG. 26, the same number is attached | subjected about the component similar to FIG. 22, and the description is abbreviate | omitted.

重みフィルタ部２６０１は、減算部１０４から入力される誤差信号に聴感的な重み付けをするフィルタ処理を行い、フィルタ処理後の信号を周波数領域変換部１０１に出力する。重みフィルタ部２６０１は、入力信号のスペクトル包絡と逆のスペクトル特性を有し、入力信号のスペクトルを平坦化（白色化）もしくはそれに近いスペクトル特性に変化させる。例えば、重みフィルタＷ（ｚ）は、第１レイヤ復号部２２０２で得られる復号ＬＰＣ係数を用いて、次式（９）のように構成される。

ここで、α（ｉ）は復号ＬＰＣ係数、ＮＰはＬＰＣ係数の次数、そしてγはスペクトル平坦化（白色化）の程度を制御するパラメータであり、０≦γ≦１の範囲の値をとる。γが大きいほど平坦化の程度が大きくなり、例えばγには０.９２を用いる。 The weighting filter unit 2601 performs filter processing for perceptually weighting the error signal input from the subtraction unit 104, and outputs the filtered signal to the frequency domain conversion unit 101. The weighting filter unit 2601 has a spectral characteristic opposite to the spectral envelope of the input signal, and changes the spectrum of the input signal to flattening (whitening) or a spectral characteristic close thereto. For example, the weight filter W (z) is configured as in the following equation (9) using the decoded LPC coefficient obtained by the first layer decoding unit 2202.

Here, α (i) is a decoded LPC coefficient, NP is the order of the LPC coefficient, and γ is a parameter that controls the degree of spectrum flattening (whitening), and takes a value in the range of 0 ≦ γ ≦ 1. As γ increases, the degree of flattening increases. For example, 0.92 is used for γ.

図２７に示す復号装置２７０は、図２５に示した復号装置２５０に対して、合成フィルタ部２７０１を追加した構成を有する。なお、図２７の復号装置２７０において、図２５と同様の構成要素については同一の番号を付し、その説明を省略する。 A decoding device 270 illustrated in FIG. 27 has a configuration in which a synthesis filter unit 2701 is added to the decoding device 250 illustrated in FIG. In the decoding device 270 of FIG. 27, the same components as those of FIG.

合成フィルタ部２７０１は、時間領域変換部６０６から入力される信号に対して、平坦化されたスペクトルの特性を元の特性に復元するフィルタ処理を行い、フィルタ処理後の信号を加算部６０４に出力する。合成フィルタ部２７０１は、式（９）で表される重みフィルタと逆のスペクトル特性、すなわち入力信号のスペクトル包絡と同様の特性を有する。合成フィルタＢ（ｚ）は、式（９）を用いて次式（１０）のように表される。

ここで、α（ｉ）は復号ＬＰＣ係数、ＮＰはＬＰＣ係数の次数、そしてγはスペクトル平坦化（白色化）の程度を制御するパラメータであり、０≦γ≦１の範囲の値をとる。γが大きいほど平坦化の程度が大きくなり、例えばγには０.９２を用いる。 The synthesis filter unit 2701 performs a filter process for restoring the flattened spectrum characteristic to the original characteristic on the signal input from the time domain conversion unit 606, and outputs the signal after the filter process to the addition unit 604. To do. The synthesis filter unit 2701 has a spectrum characteristic opposite to that of the weighting filter expressed by Expression (9), that is, a characteristic similar to the spectrum envelope of the input signal. The synthesis filter B (z) is expressed by the following equation (10) using the equation (9).

一般的には、上記のような符号化装置及び復号装置において、音声信号のスペクトル包絡は低域部のエネルギーが高域部のエネルギーより大きく現れるため、合成フィルタを通す前の信号の符号化歪が低域部と高域部で同等であっても、合成フィルタを通した後では低域部の符号化歪が大きくなる。音声信号を低ビットレートに圧縮して転送する場合には符号化歪を十分に小さくすることができないため、前述したような復号部の合成フィルタ部の影響によって符号化歪の低域部のエネルギーが増大されてしまい、低域部の品質劣化が現れやすくなるという問題がある。 In general, in the encoding device and the decoding device as described above, the spectral envelope of the audio signal appears such that the energy in the low frequency part is larger than the energy in the high frequency part, so the encoding distortion of the signal before passing through the synthesis filter Even when the low frequency region and the high frequency region are equivalent, the coding distortion of the low frequency region becomes large after passing through the synthesis filter. When audio signals are compressed and transferred at a low bit rate, the coding distortion cannot be reduced sufficiently. Therefore, the energy of the low frequency part of the coding distortion is affected by the influence of the synthesis filter part of the decoding part as described above. Is increased, and there is a problem that quality degradation in the low frequency region is likely to appear.

本実施の形態の符号化方法によれば、周波数が基準周波数よりも低い低域部からターゲット周波数が決定されるため、低域部が第２レイヤ符号化部１０５の符号化対象として選択されやすく、それにより低域部の符号化歪を小さくすることができる。すなわち、本実施の形態によれば、合成フィルタによって低域部が強調されても、低域部の符号化歪が知覚されにくくなるため、音質を改善する効果が得られる。 According to the encoding method of the present embodiment, since the target frequency is determined from the low frequency part whose frequency is lower than the reference frequency, the low frequency part is easily selected as the encoding target of second layer encoding part 105. As a result, the coding distortion in the low frequency band can be reduced. That is, according to the present embodiment, even if the low frequency region is emphasized by the synthesis filter, the encoding distortion of the low frequency region is hardly perceived, so that an effect of improving the sound quality can be obtained.

なお、本実施の形態では符号化装置２６０の減算部１０４を時間領域の信号同士の差をとる構成としたが、本発明はこれに限らず、周波数領域の変換係数同士の差をとる構成としても良い。具体的には、重みフィルタ部２６０１と周波数領域変換部１０１を遅延部２２０３と減算部１０４の間に配置して入力変換係数を求め、そして第１レイヤ復号部２２０２と減算部１０４の間に重みフィルタ部２６０１と周波数領域変換部１０１を新たに追加して第１レイヤ復号変換係数を求める。そして、減算部１０４では入力変換係数と第１レイヤ復号変換係数の差をとり、その誤差変換係数を第２レイヤ符号化部１０５に直接与える構成とする。この構成によれば、ある帯域では差をとり別の帯域では差はとらないなど、各帯域に適した減算処理が可能になり、音質をさらに改善することができる。 In the present embodiment, the subtracting unit 104 of the encoding device 260 is configured to take a difference between signals in the time domain, but the present invention is not limited to this, and is configured to take a difference between transform coefficients in the frequency domain. Also good. Specifically, the weight filter unit 2601 and the frequency domain transform unit 101 are arranged between the delay unit 2203 and the subtraction unit 104 to obtain an input transform coefficient, and the weight between the first layer decoding unit 2202 and the subtraction unit 104 is obtained. A filter unit 2601 and a frequency domain transform unit 101 are newly added to obtain first layer decoded transform coefficients. Then, the subtraction unit 104 is configured to take the difference between the input transform coefficient and the first layer decoding transform coefficient and directly give the error transform coefficient to the second layer coding unit 105. According to this configuration, it is possible to perform subtraction processing suitable for each band, such as making a difference in one band and not taking a difference in another band, thereby further improving sound quality.

また、本実施の形態では、符号化装置２２０のレイヤ数が２である場合を例にして説明したが、本発明はこれに限らず、例えば、図２８に示す符号化装置２８０のように、符号化階層をレイヤ数が２以上の構成としてもよい。 Further, in the present embodiment, the case where the number of layers of the encoding device 220 is 2 has been described as an example. However, the present invention is not limited to this, for example, as in the encoding device 280 illustrated in FIG. The encoding layer may have a configuration with two or more layers.

図２８は符号化装置２８０の主要な構成を示すブロック図である。図１に示した符号化装置１００に対して、第２レイヤ復号部２８０１と、第３レイヤ符号化部２８０２と、第３レイヤ復号部２８０３と、第４レイヤ符号化部２８０４と、２つの加算器２８０５を追加し、３つの減算部１０４を有する構成を採る。 FIG. 28 is a block diagram showing the main configuration of encoding apparatus 280. For the coding apparatus 100 shown in FIG. 1, a second layer decoding unit 2801, a third layer coding unit 2802, a third layer decoding unit 2803, a fourth layer coding unit 2804, and two additions A configuration is adopted in which a device 2805 is added and three subtracting units 104 are provided.

図２８に示す第３レイヤ符号化部２８０２と第４レイヤ符号化部２８０４は、図１に示した第２レイヤ符号化部１０５と同様の構成を有し、同様の動作を行い、第２レイヤ復号部２８０１と第３レイヤ復号部２８０３は、図１に示した第１レイヤ復号部１０３と同様の構成を有し、同様の動作を行う。ここでは、各レイヤ符号化部における帯域の位置について、図２９を用いて説明する。 The third layer encoding unit 2802 and the fourth layer encoding unit 2804 shown in FIG. 28 have the same configuration as that of the second layer encoding unit 105 shown in FIG. The decoding unit 2801 and the third layer decoding unit 2803 have the same configuration as the first layer decoding unit 103 shown in FIG. 1 and perform the same operation. Here, the position of the band in each layer encoding unit will be described with reference to FIG.

各レイヤ符号化部における帯域の配置の一例として、図２９Ａは、第２レイヤ符号化部における帯域の位置を示し、図２９Ｂは、第３レイヤ符号化部における帯域の位置を示し、図２９Ｃは、第４レイヤ符号化部における帯域の位置を示し、帯域数はそれぞれ４である。 As an example of band arrangement in each layer encoding unit, FIG. 29A shows a band position in the second layer encoding unit, FIG. 29B shows a band position in the third layer encoding unit, and FIG. The band positions in the fourth layer encoding unit are shown, and the number of bands is 4 respectively.

より詳しく言うと、第２レイヤ符号化部１０５では、レイヤ２の基準周波数Ｆｘ（Ｌ２）を超えないように４つの帯域が配置され、第３レイヤ符号化部２８０２では、レイヤ３の基準周波数Ｆｘ（Ｌ３）を超えないように４つの帯域が配置され、第４レイヤ符号化部２８０４では、レイヤ４の基準周波数Ｆｘ（Ｌ４）を超えないよう帯域が配置されている。そして、各レイヤの基準周波数の間には、Ｆｘ（Ｌ２）<Ｆｘ（Ｌ３）<Ｆｘ（Ｌ４）の関係がある。すなわち、ビットレートが低いレイヤ２では、聴感的な感度の高い低域部の中から符号化の対象となる帯域を決定し、ビットレートが高くなる高位レイヤになるほど高域部まで含めた帯域の中から符号化の対象となる帯域を決定する。 More specifically, in the second layer encoding unit 105, four bands are arranged so as not to exceed the reference frequency Fx (L2) of layer 2, and in the third layer encoding unit 2802, the reference frequency Fx of layer 3 is arranged. Four bands are arranged so as not to exceed (L3), and the fourth layer encoding unit 2804 arranges bands so as not to exceed the reference frequency Fx (L4) of layer 4. There is a relationship of Fx (L2) <Fx (L3) <Fx (L4) between the reference frequencies of the layers. That is, in layer 2 where the bit rate is low, the band to be encoded is determined from the low frequency part with high perceptual sensitivity, and the band including the high frequency part is included in the higher layer where the bit rate is high. The band to be encoded is determined from the inside.

このような構成を採ることにより、低位レイヤにおいて低域部を重視し、高位レイヤにおいてより広い帯域をカバーするようにするため、音声信号の高音質化を実現することができる。 By adopting such a configuration, it is possible to achieve higher sound quality of the audio signal in order to emphasize the low frequency band in the lower layer and cover a wider band in the higher layer.

図３０は、図２８に示した符号化装置２８０に対応する復号装置３００の主要な構成を示すブロック図である。図３０の復号装置３００は、図７に示した復号装置６００に対して、第３レイヤ復号部３００１と第４レイヤ復号部３００２と２つの加算器６０４とを追加した構成を有する。なお、第３レイヤ復号部３００１と第４レイヤ復号部３００２は、図７に示した復号装置６００の第２レイヤ復号部６０３と同様の構成を有し、同様の動作を行うため、ここでは、その詳細な説明を省略する。 FIG. 30 is a block diagram showing a main configuration of decoding apparatus 300 corresponding to encoding apparatus 280 shown in FIG. 30 has a configuration in which a third layer decoding section 3001, a fourth layer decoding section 3002, and two adders 604 are added to decoding apparatus 600 shown in FIG. Note that the third layer decoding section 3001 and the fourth layer decoding section 3002 have the same configuration as the second layer decoding section 603 of the decoding apparatus 600 shown in FIG. Detailed description thereof is omitted.

また、各レイヤ符号化部における帯域の配置の別の一例として、図３１Ａは第２レイヤ符号化部１０５における４つの帯域の位置を示し、図３１Ｂは、第３レイヤ符号化部２８０２における６つの帯域の位置を示し、図３１Ｃは、第４レイヤ符号化部２８０４における８つの帯域の位置を示す。 As another example of the band arrangement in each layer encoding unit, FIG. 31A shows the positions of four bands in the second layer encoding unit 105, and FIG. 31B shows the six bands in the third layer encoding unit 2802. FIG. 31C shows the positions of the eight bands in the fourth layer encoding unit 2804.

図３１では、各レイヤ符号化部において、各帯域は等間隔に配置され、図３１Ａに示すような低位レイヤでは低域部に配置された帯域のみが符号化の対象となり、図３１Ｂまたは図３１Ｃに示すような高位レイヤになるほど符号化の対象となる帯域が増える。 In FIG. 31, in each layer encoding unit, each band is arranged at equal intervals, and in the lower layer as shown in FIG. 31A, only the band arranged in the low band part is the target of encoding, and FIG. 31B or FIG. As the higher layer becomes, the band to be encoded increases.

このような構成によれば、各レイヤでは帯域が等間隔に配置され、低位レイヤでは符号化の対象となる帯域を選択する場合、選択候補である低域部に配置される帯域の数が少ないため、演算量とビットレートを削減することができる。 According to such a configuration, when the bands are arranged at equal intervals in each layer and the band to be encoded is selected in the lower layer, the number of bands arranged in the low frequency part which is a selection candidate is small. Therefore, the calculation amount and the bit rate can be reduced.

（実施の形態８）
本発明の実施の形態８は、第１位置特定部の動作のみにおいて実施の形態１と相違し、それを示すために、本実施の形態に係る第１位置特定部には「８０１」という番号を付す。第１位置特定部８０１は、符号化対象となるターゲット周波数が採り得る帯域を特定する際、全帯域をあらかじめ複数の部分帯域に分割し、各部分帯域において所定の帯域幅および所定の刻み幅で探索を行う。そして、第１位置特定部８０１は、探索により求められた各部分帯域内の帯域を結合して、符号化対象となるターゲット周波数が採り得る帯域とする。 (Embodiment 8)
The eighth embodiment of the present invention is different from the first embodiment only in the operation of the first position specifying unit, and in order to show this, the first position specifying unit according to the present embodiment has a number “801”. Is attached. The first position specifying unit 801 divides the entire band into a plurality of partial bands in advance when specifying the band that can be taken by the target frequency to be encoded, and uses a predetermined bandwidth and a predetermined step size in each partial band. Perform a search. Then, the first position specifying unit 801 combines the bands in the partial bands obtained by the search so that the target frequency to be encoded can be taken.

本実施の形態に係る第１位置特定部８０１の動作について図３２を用いて説明する。図３２は、部分帯域数Ｎ＝２であって、低域部をカバーするように部分帯域１が設定され、高域部をカバーするように部分帯域２が設定される場合を例示する。部分帯域１では、予め所定の帯域幅に設定された複数の帯域の中から１つの帯域が選択される（この帯域の位置情報を第１部分帯域位置情報と呼ぶ）。同様に、部分帯域２では、予め所定の帯域幅に設定された複数の帯域の中から１つの帯域が選択される（この帯域の位置情報を第２部分帯域位置情報と呼ぶ）。 The operation of the first position specifying unit 801 according to the present embodiment will be described with reference to FIG. FIG. 32 illustrates a case where the number of partial bands N = 2, partial band 1 is set so as to cover the low frequency part, and partial band 2 is set so as to cover the high frequency part. In the partial band 1, one band is selected from a plurality of bands set in advance to a predetermined bandwidth (position information of this band is referred to as first partial band position information). Similarly, in the partial band 2, one band is selected from a plurality of bands set in advance to a predetermined bandwidth (position information of this band is referred to as second partial band position information).

次に、第１位置特定部８０１は、部分帯域１において選択された帯域と、部分帯域２において選択された帯域とを結合して結合帯域を構成する。この結合帯域が第１位置特定部８０１により特定される帯域となり、次いで第２位置特定部２０２は当該結合帯域を基に第２位置情報を特定する。例えば、部分帯域１で選択された帯域が帯域２、部分帯域２で選択された帯域が帯域４である場合、第１位置特定部８０１は、図３２の下段に示されるようにこの２つの帯域を結合し、符号化対象となる周波数帯域が採り得る帯域とする。 Next, the first position specifying unit 801 combines the band selected in the partial band 1 and the band selected in the partial band 2 to form a combined band. This combined band becomes a band specified by the first position specifying unit 801, and then the second position specifying unit 202 specifies the second position information based on the combined band. For example, when the band selected in the partial band 1 is the band 2 and the band selected in the partial band 2 is the band 4, the first position specifying unit 801 displays the two bands as shown in the lower part of FIG. To be a band that can be taken by the frequency band to be encoded.

図３３は、部分帯域の数がＮである場合に対応する第１位置特定部８０１の構成を示すブロック図である。図３３において、減算部１０４から入力される第１レイヤ誤差変換係数は部分帯域１特定部８１１−１〜部分帯域Ｎ特定部８１１−Ｎそれぞれに与えられる。各々の部分帯域ｎ特定部８１１−ｎ（ｎ＝１〜Ｎ）は、所定の部分帯域ｎの中から１つの帯域を選択し、選択された帯域の位置を示す情報(第ｎ部分帯域位置情報)を第１位置情報構成部８１２に出力する。 FIG. 33 is a block diagram illustrating a configuration of the first position specifying unit 801 corresponding to the case where the number of partial bands is N. In FIG. 33, the first layer error conversion coefficient input from the subtracting unit 104 is provided to each of the partial band 1 specifying unit 811-1 to the partial band N specifying unit 811-N. Each partial band n specifying unit 811-n (n = 1 to N) selects one band from predetermined partial bands n, and indicates information on the position of the selected band (nth partial band position information). ) To the first position information configuration unit 812.

第１位置情報構成部８１２は、各々の部分帯域ｎ特定部８１１−ｎから入力される第ｎ部分帯域位置情報（ｎ＝１〜Ｎ）を用いて第１位置情報を構成し、当該第１位置情報を第２位置特定部２０２、符号化部２０３、および多重化部２０４に出力する。 The first location information configuration unit 812 configures the first location information using the nth partial band location information (n = 1 to N) input from each of the partial bandwidth n identification units 811-n, and The position information is output to the second position specifying unit 202, the encoding unit 203, and the multiplexing unit 204.

図３４は、第１位置情報構成部８１２において第１位置情報を構成する様子を例示する図である。この図において、第１位置情報構成部８１２は、第１部分帯域位置情報（Ａ１ビット）〜第Ｎ部分帯域位置情報(ＡＮビット)を順番に並べて第１位置情報を構成する。ここで、各々の第ｎ部分帯域位置情報のビット長Ａｎは、各部分帯域ｎに含まれる候補帯域の数により決まり、それぞれ異なる値を有しても良い。 FIG. 34 is a diagram illustrating a state in which the first position information configuring unit 812 configures the first position information. In this figure, the first position information configuration unit 812 configures the first position information by arranging the first partial band position information (A1 bit) to the Nth partial band position information (AN bit) in order. Here, the bit length An of each n-th partial band position information is determined by the number of candidate bands included in each partial band n, and may have different values.

図３５は、本実施の形態の復号処理において、第１位置情報と第２位置情報とを用いて第１レイヤ復号誤差変換係数を求める様子を示す図である。ここでは、部分帯域数が２である場合を例にとって説明する。なお、以下の説明においては、実施の形態１に係る第２レイヤ復号部６０３を構成する各構成要素の名称および番号を流用する。 FIG. 35 is a diagram illustrating a state in which the first layer decoding error transform coefficient is obtained using the first position information and the second position information in the decoding process according to the present embodiment. Here, a case where the number of partial bands is 2 will be described as an example. In the following description, the names and numbers of the constituent elements constituting second layer decoding section 603 according to Embodiment 1 are used.

配置部７０４は、乗算部７０３から入力されるゲイン候補乗算後の形状候補に対して、第２位置情報を用いて再配置を行う。次に、配置部７０４は、第２位置情報を用いた再配置後の形状候補に対して、第１位置情報を用いてさらに部分帯域１および部分帯域２への再配置を行う。配置部７０４は、このようにして求められた信号を第１レイヤ復号誤差変換係数として出力する。 The placement unit 704 rearranges the shape candidate after gain candidate multiplication input from the multiplication unit 703 using the second position information. Next, the placement unit 704 further rearranges the shape candidates after rearrangement using the second position information into the partial band 1 and the partial band 2 using the first position information. Arrangement section 704 outputs the signal obtained in this way as the first layer decoding error transform coefficient.

本実施の形態によれば、第１位置特定部は各々の部分帯域の中から１つの帯域を選択するため、部分帯域に少なくとも１つの復号スペクトルを配置することが可能となる。これにより、全帯域の中から１つの帯域を決定する実施形態に比べて、音質を改善したい複数の帯域をあらかじめ設定しておくことができる。例えば、低域部と高域部の両者の品質改善を同時に図りたい場合などに本実施の形態は有効である。 According to the present embodiment, since the first position specifying unit selects one band from each partial band, it is possible to arrange at least one decoded spectrum in the partial band. As a result, a plurality of bands whose sound quality is to be improved can be set in advance as compared with the embodiment in which one band is determined from all the bands. For example, this embodiment is effective when it is desired to simultaneously improve the quality of both the low frequency region and the high frequency region.

また、本実施の形態によれば、低位レイヤ（本実施の形態では第１レイヤ）において低ビットレートの符号化を行う場合でも、復号信号の主観品質を改善することができる。低位レイヤにＣＥＬＰ方式を用いる構成はその一例である。ＣＥＬＰ方式は、波形マッチングに基づく符号化方式であるため、高域部に比べてエネルギーの大きい低域部の量子化歪がより小さくなるように符号化が行われる。その結果、高域部のスペクトルが減衰してしまい、こもり感（帯域感の欠如）として知覚される。その一方でＣＥＬＰ方式の符号化は低ビットレートの符号化方式であるため、低域の量子化歪を十分に抑えることができず、その量子化歪は雑音感として知覚されてしまう。本実施形態では、低域部と高域部の各々から符号化の対象となる帯域を選択するため、低域部の雑音感、高域部のこもり感という異なる２つの劣化要因を同時に解消し、主観品質を改善することが可能となる。 Further, according to the present embodiment, the subjective quality of the decoded signal can be improved even when encoding at a low bit rate in the lower layer (first layer in the present embodiment). The configuration using the CELP method for the lower layer is an example. Since the CELP method is an encoding method based on waveform matching, the encoding is performed so that the quantization distortion in the low frequency region where the energy is large is smaller than that in the high frequency region. As a result, the spectrum in the high frequency region is attenuated, and this is perceived as a feeling of being full (absence of a band feeling). On the other hand, CELP encoding is a low bit rate encoding method, and thus low-band quantization distortion cannot be sufficiently suppressed, and the quantization distortion is perceived as noise. In this embodiment, since the band to be encoded is selected from each of the low-frequency part and the high-frequency part, two different deterioration factors such as the noise feeling of the low-frequency part and the feeling of the high-frequency part are simultaneously eliminated. It becomes possible to improve subjective quality.

また、本実施の形態によれば、低域から選択された帯域および高域から選択された帯域を結合して結合帯域を構成し、この結合帯域の中でスペクトルの形状を決定するため、高域よりも低域の品質改善が必要なフレームでは、低域を重視したスペクトルの形状を選択し、低域よりも高域の品質改善が必要なフレームでは、高域を重視したスペクトルの形状を選択するという適応処理を行うことができ、主観品質を改善することができる。例えば、スペクトルの形状をパルスで表す場合、高域よりも低域の品質改善が必要なフレームでは低域に多くのパルスを配置し、低域よりも高域の品質改善が必要なフレームでは高域に多くのパルスを配置することができ、このような適応処理により、主観品質を改善することができる。 Further, according to the present embodiment, a band selected from the low band and a band selected from the high band are combined to form a combined band, and the spectrum shape is determined in the combined band. Select a spectrum shape that emphasizes the low range for frames that require quality improvement in the lower range than the low range, and select a spectrum shape that emphasizes the high range for frames that require higher quality improvement than the low range. The adaptive process of selecting can be performed, and the subjective quality can be improved. For example, when the shape of the spectrum is represented by pulses, many pulses are placed in the low frequency range in frames that require quality improvement in the low frequency range rather than in the high frequency range, and high in frames that require quality improvement in the higher frequency range than the low frequency range. Many pulses can be arranged in a region, and subjective quality can be improved by such adaptive processing.

なお、本実施の形態のバリエーションとして、図３６に示すように、特定の部分帯域において常に固定の帯域が選択されるようにしても良い。図３６に示す例では、部分帯域２において常に帯域４が選択され、これが結合帯域の一部となっている。これにより、本実施の形態の効果と同様に、音質を改善したい帯域をあらかじめ設定しておくことが可能となり、かつ、例えば、部分帯域２の部分帯域位置情報が不用となるため、図３４に示されたような第１位置情報を表すためのビット数をより小さくすることができる。 As a variation of the present embodiment, as shown in FIG. 36, a fixed band may always be selected in a specific partial band. In the example shown in FIG. 36, the band 4 is always selected in the partial band 2, and this is a part of the combined band. As a result, similarly to the effect of the present embodiment, it is possible to set in advance the band for which the sound quality is to be improved, and for example, the partial band position information of the partial band 2 becomes unnecessary. The number of bits for representing the first position information as shown can be made smaller.

また、図３６は、高域部（部分帯域２）において常に固定の範囲が選択される場合を例にとって示しているが、これに限定されず、低域部（部分帯域１）において常に固定の範囲が選択されるようにしても良いし、また図３６には図示されていない中域部の部分帯域において、常に固定の範囲が選択されるようにしても良い。 FIG. 36 shows an example in which a fixed range is always selected in the high frequency band (partial band 2). However, the present invention is not limited to this, and the fixed range is always fixed in the low frequency band (partial band 1). A range may be selected, or a fixed range may always be selected in a partial band of the middle region not shown in FIG.

また、本実施の形態のバリエーションとして、図３７に示すように、各部分帯域において設定される候補帯域の帯域幅が異なっていても良い。図３７においては、部分帯域１において設定されている候補帯域よりも部分帯域２において設定されている部分帯域の帯域幅が短い場合を例示している。 As a variation of the present embodiment, as shown in FIG. 37, the bandwidths of candidate bands set in each partial band may be different. In FIG. 37, the case where the bandwidth of the partial band set in the partial band 2 is shorter than the candidate band set in the partial band 1 is illustrated.

以上、本発明の実施の形態について説明した。 The embodiment of the present invention has been described above.

なお、各レイヤ符号化部における帯域の配置として、本発明では上記説明した例に限らず、例えば、低位レイヤでは各帯域の帯域幅を狭く、高位レイヤでは各帯域の帯域幅を広くするように構成しても良い。 The band arrangement in each layer encoding unit is not limited to the example described above in the present invention. For example, the bandwidth of each band is narrowed in the lower layer and the bandwidth of each band is widened in the higher layer. It may be configured.

また、上記の各実施の形態では、過去のフレームで選択した帯域に関連付けて現フレームの帯域を選択しても良い。例えば、前フレームで選択した帯域の近傍に位置する帯域の中から現フレームの帯域を決定してもよい。また、前フレームで選択した帯域の近傍に現フレームの帯域の候補を再配置し、その再配置された帯域の候補の中から現フレームの帯域を決定してもよい。また、範囲情報を数フレームに１度の割合で伝送し、範囲情報を伝送しないフレームでは過去に伝送された範囲情報が表す範囲を用いてもよい（帯域情報の間欠伝送）。 In the above embodiments, the band of the current frame may be selected in association with the band selected in the past frame. For example, the band of the current frame may be determined from bands positioned in the vicinity of the band selected in the previous frame. Alternatively, the current frame band candidate may be rearranged in the vicinity of the band selected in the previous frame, and the current frame band may be determined from the rearranged band candidates. Further, the range information may be transmitted at a rate of once every several frames, and the range represented by the range information transmitted in the past may be used in a frame where the range information is not transmitted (intermittent transmission of band information).

また、上記の各実施の形態では、低位レイヤで選択した帯域に関連付けて現在のレイヤの帯域を選択しても良い。例えば、低位レイヤで選択した帯域の近傍に位置する帯域の中から現在のレイヤの帯域を決定してもよい。低位レイヤで選択した帯域の近傍に現在のレイヤの帯域の候補を再配置し、その再配置された帯域の候補の中から現在のレイヤの帯域を決定してもよい。また、範囲情報を数フレームに１度の割合で伝送し、範囲情報を伝送しないフレームでは過去に伝送された範囲情報が表す範囲を用いてもよい（帯域情報の間欠伝送）。 In each of the above embodiments, the band of the current layer may be selected in association with the band selected in the lower layer. For example, the band of the current layer may be determined from bands positioned in the vicinity of the band selected in the lower layer. The current layer band candidate may be rearranged in the vicinity of the band selected in the lower layer, and the current layer band may be determined from the rearranged band candidates. Further, the range information may be transmitted at a rate of once every several frames, and the range represented by the range information transmitted in the past may be used in a frame where the range information is not transmitted (intermittent transmission of band information).

なお、本発明は、スケーラブル符号化の階層数に制限はない。 In the present invention, the number of scalable encoding layers is not limited.

また、上記実施の形態では、復号信号として音声信号を想定しているが、本発明はこれに限らず、例えば、オーディオ信号などでもよい。 Moreover, in the said embodiment, although the audio | voice signal is assumed as a decoded signal, this invention is not limited to this, For example, an audio signal etc. may be sufficient.

また、上記各実施の形態では、本発明をハードウェアで構成する場合を例にとって説明したが、本発明はソフトウェアで実現することも可能である。 Further, although cases have been described with the above embodiment as examples where the present invention is configured by hardware, the present invention can also be realized by software.

また、上記各実施の形態の説明に用いた各機能ブロックは、典型的には集積回路であるＬＳＩとして実現される。これらは個別に１チップ化されてもよいし、一部または全てを含むように１チップ化されてもよい。ここでは、ＬＳＩとしたが、集積度の違いにより、ＩＣ、システムＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩと呼称されることもある。 Each functional block used in the description of each of the above embodiments is typically realized as an LSI which is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include a part or all of them. The name used here is LSI, but it may also be called IC, system LSI, super LSI, or ultra LSI depending on the degree of integration.

また、集積回路化の手法はＬＳＩに限るものではなく、専用回路または汎用プロセッサで実現してもよい。ＬＳＩ製造後に、プログラムすることが可能なＦＰＧＡ（Field Programmable Gate Array）や、ＬＳＩ内部の回路セルの接続や設定を再構成可能なリコンフィギュラブル・プロセッサーを利用してもよい。 Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. An FPGA (Field Programmable Gate Array) that can be programmed after manufacturing the LSI, or a reconfigurable processor that can reconfigure the connection and setting of circuit cells inside the LSI may be used.

さらには、半導体技術の進歩または派生する別技術によりＬＳＩに置き換わる集積回路化の技術が登場すれば、当然、その技術を用いて機能ブロックの集積化を行ってもよい。バイオ技術の適用等が可能性としてありえる。 Furthermore, if integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Biotechnology can be applied.

本発明は、スケーラブル符号化方式の通信システムに使用される符号化装置、復号装置等に用いるのに好適である。 The present invention is suitable for use in an encoding device, a decoding device, or the like used in a scalable encoding communication system.

１０１周波数領域変換部
１０２、２２０１第１レイヤ符号化部
１０３、２２０２第１レイヤ復号部
１０４減算部
１０５第２レイヤ符号化部
１０６、２０４多重化部
２０１、８０１第１位置特定部
２０２第２位置特定部
２０３、２２１符号化部
３０１目標信号構成部
３０２誤差算出部
３０３探索部
３０４形状符号帳
３０５ゲイン符号帳
３１１−１、…、３１１−Ｊサブ位置特定部
３２１第２位置情報符号帳
６０１分離部
６０２、２５０１第１レイヤ復号部
６０３、２５０２第２レイヤ復号部
６０４加算部
６０５切替部
６０６時間領域変換部
６０７ポストフィルタ
７０１形状符号帳
７０２ゲイン符号帳
７０３乗算部
７０４配置部
２２０３遅延部
２２１０ダウンサンプリング部
２２２０コア符号化部
２２３０コア復号部
２２４０アップサンプリング部
２２５０高域成分付与部
２６０１重みフィルタ部
２７０１合成フィルタ部
２８０１第２レイヤ復号部
２８０２第３レイヤ符号化部
２８０３第３レイヤ復号部
２８０４第４レイヤ符号化部
３００１第３レイヤ復号部
３００２第４レイヤ復号部 101 Frequency domain transform unit 102, 2201 First layer encoding unit 103, 2202 First layer decoding unit 104 Subtracting unit 105 Second layer encoding unit 106, 204 Multiplexing unit 201, 801 First position specifying unit 202 Second position Identification unit 203, 221 Encoding unit 301 Target signal configuration unit 302 Error calculation unit 303 Search unit 304 Shape codebook 305 Gain codebook 311-1, ..., 311-J Sub-position identification unit 321 Second position information codebook 601 Separation Unit 602, 2501 first layer decoding unit 603, 2502 second layer decoding unit 604 addition unit 605 switching unit 606 time domain conversion unit 607 post filter 701 shape codebook 702 gain codebook 703 multiplication unit 704 arrangement unit 2203 delay unit 2210 down Sampling unit 2220 Core encoding unit 22 0 core decoding unit 2240 upsampling unit 2250 high frequency component adding unit 2601 weight filter unit 2701 synthesis filter unit 2801 2nd layer decoding unit 2802 3rd layer encoding unit 2803 3rd layer decoding unit 2804 4th layer encoding unit 3001 1st 3 layer decoding section 3002 4th layer decoding section

Claims

First layer encoding means for performing encoding processing on the input speech signal to generate first layer encoded data;
First layer decoding means for generating a first layer decoded signal by performing decoding processing using the first layer encoded data;
First layer error conversion coefficient calculating means for calculating a first layer error conversion coefficient by converting a first layer error signal, which is an error between the input audio signal and the first layer decoded signal, into a frequency domain;
Second layer encoding means for performing encoding processing on the first layer error transform coefficient to generate second layer encoded data,
The second layer encoding means includes
A first band is selected based on the energy level of the first layer error conversion coefficient in the band candidate from a plurality of band candidates arranged with a predetermined bandwidth and a step size narrower than the bandwidth. Band selection means for generating first position information indicating the position of the selected first band;
Second position information indicating the positions of the plurality of identified pulses by identifying the positions of the plurality of pulses from the pulse candidate positions set with a step size smaller than the step size in the selected first band. Pulse position specifying means for generating
Encoded data generation means for generating the second layer encoded data using the first position information and the second position information,
Speech encoding device.

The pulse position specifying means specifies the position of the pulse based on the magnitude of the energy of the first layer error conversion coefficient;
The speech encoding apparatus according to claim 1.

The second layer encoding means further comprises gain encoding means for generating gain information indicating the amplitude of the pulse at the pulse position based on the first layer error transform coefficient,
The encoded data generation means generates second layer encoded data by further using the gain information.
The speech encoding apparatus according to claim 1.

The band selecting means selects the first band from a low frequency part lower than a preset reference frequency,
The speech encoding apparatus according to claim 1.

First layer encoded data obtained by performing encoding processing on the input speech signal in the speech encoding device, and a first layer decoded signal obtained by decoding the first layer encoded data in the speech encoding device Obtained by converting the first layer error signal, which is an error with the input audio signal, into the frequency domain, calculating a first layer error conversion coefficient, and performing an encoding process on the first layer error conversion coefficient Receiving means for receiving second layer encoded data;
First layer decoding means for decoding the first layer encoded data and generating the first layer decoded signal;
Second layer decoding means for decoding the second layer encoded data to generate first layer decoded error transform coefficients;
Time domain transforming means for transforming the first layer decoding error transform coefficient into the time domain to generate a first layer decoded error signal;
Adding means for adding the first layer decoded signal and the first layer decoded error signal to generate a decoded signal;
The second layer decoding means includes
Decoding the second layer encoded data, first position information indicating a position of a first band having a predetermined bandwidth, and second position information indicating positions of a plurality of pulses in the first band Generate
Using the first position information and the second position information to identify the positions of the plurality of pulses to generate the first layer decoding error transform coefficient;
Speech decoding device.

The second layer decoding means includes
Decoding the second layer encoded data to generate gain information indicating the amplitude of the pulse, and further generating the first layer decoding error transform coefficient using the gain information;
The speech decoding apparatus according to claim 5.

A first layer encoding step of performing encoding processing on the input speech signal to generate first layer encoded data;
A first layer decoding step of generating a first layer decoded signal by performing a decoding process using the first layer encoded data;
A first layer error conversion coefficient calculation step of converting a first layer error signal, which is an error between the input audio signal and the first layer decoded signal, to a frequency domain and calculating a first layer error conversion coefficient;
A second layer encoding step of performing encoding processing on the first layer error transform coefficient to generate second layer encoded data; and
The second layer encoding step includes:
A first band is selected based on the energy level of the first layer error conversion coefficient in the band candidate from a plurality of band candidates arranged with a predetermined bandwidth and a step size narrower than the bandwidth. A band selection step for generating first position information indicating the position of the selected first band;
Second position information indicating the positions of the plurality of identified pulses by identifying the positions of the plurality of pulses from the pulse candidate positions set with a step size smaller than the step size in the selected first band. A pulse locating step for generating
An encoded data generation step for generating the second layer encoded data using the first position information and the second position information;
Speech encoding method.

First layer encoded data obtained by performing encoding processing on an input speech signal in the speech encoding method, and a first layer decoded signal obtained by decoding the first layer encoded data in the speech encoding method Obtained by converting the first layer error signal, which is an error with the input audio signal, into the frequency domain, calculating a first layer error conversion coefficient, and performing an encoding process on the first layer error conversion coefficient Receiving a second layer encoded data; and
A first layer decoding step of decoding the first layer encoded data to generate a first layer decoded signal;
A second layer decoding step of decoding the second layer encoded data to generate a first layer decoding error transform coefficient;
A time domain transforming step of transforming the first layer decoding error transform coefficients into a time domain to generate a first layer decoded error signal;
An adding step of adding the first layer decoded signal and the first layer decoded error signal to generate a decoded signal;
The second layer decoding step includes
Decoding the second layer encoded data, first position information indicating a position of a first band having a predetermined bandwidth, and second position information indicating positions of a plurality of pulses in the first band Generate
Identifying the positions of the plurality of pulses using the first position information and the second position information, and generating the first layer decoding error transform coefficient;
Speech decoding method.