JP2013057792A

JP2013057792A - Speech coding device and speech coding method

Info

Publication number: JP2013057792A
Application number: JP2011195892A
Authority: JP
Inventors: Toshiyuki Morii; 利幸森井
Original assignee: Panasonic Corp
Current assignee: Panasonic Corp
Priority date: 2011-09-08
Filing date: 2011-09-08
Publication date: 2013-03-28

Abstract

PROBLEM TO BE SOLVED: To achieve an efficient use of information source codes at the time of coding time-series signals to suppress a calculation amount and improve coding performance in frequency-domain signals.SOLUTION: A first adjustment unit 102 performs a nonlinear amplitude adjustment to a first frequency spectrum. A CELP coding unit 104 generates a decoded signal by performing a CELP coding and a subsequent decoding. A second MDCT unit 105 subjects the decoded signal to an orthogonal transformation to generate a second frequency spectrum. A second adjustment unit 106 gives characteristics opposite to those which have been given to the first frequency spectrum during the amplitude adjustment in the first adjustment unit 102, to the second frequency spectrum. A subtraction unit 107 subtracts the second frequency spectrum to which the opposite characteristics have been given, from the first frequency spectrum to generate a residual spectrum. An FPC coding unit 108 codes the residual spectrum into an FPC code.

Description

本発明は、多層構造を有するスケーラブルコーデック技術を用いた音声符号化装置及び音声符号化方法に関する。 The present invention relates to a speech coding apparatus and speech coding method using a scalable codec technology having a multilayer structure.

移動体通信においては、伝送帯域の有効利用のために、音声または画像のディジタル情報の圧縮符号化が必須である。その中でも携帯電話で広く利用されている音声コーデック技術に対する期待は大きく、圧縮率の高い従来の高効率符号化に対して、更により高音質となる符号化への要求が高まっている。また、音声コーデックは、公衆で使用されるため、標準化が必須であり、それに伴って世界の各社で研究開発が盛んに行われている。 In mobile communication, in order to make effective use of a transmission band, compression encoding of voice or image digital information is essential. In particular, there is a great expectation for speech codec technology widely used in mobile phones, and there is a growing demand for encoding with higher sound quality than conventional high-efficiency encoding with a high compression rate. In addition, since the voice codec is used by the public, standardization is indispensable, and in accordance with this, research and development are actively performed in various companies around the world.

近年、多層構造を持つスケーラブルコーデックの標準化がＩＴＵ−Ｔ（International Telecommunication Union−Telecommunication Standardization Sector）またはＭＰＥＧ（Moving Picture Experts Group）で検討されており、ＩＴＵ−ＴではＧ．７１８が勧告化されている（非特許文献１参照）。この方式では、第１層から第２層まではＣＥＬＰ（Code Excited Linear Prediction）を用いて符号化を行い、第３層以降はＭＤＣＴ（Modified Discrete Cosine Transform）を用いた変換符号化を行う。また、変換符号化では、音楽信号を効率的に符号化するためにＦＰＣ（Factorial Pulse Coding）という方式を用いているのが特徴である。 In recent years, standardization of a scalable codec having a multi-layer structure has been studied by ITU-T (International Telecommunication Union-Telecommunication Standardization Sector) or MPEG (Moving Picture Experts Group). 718 is recommended (see Non-Patent Document 1). In this method, encoding is performed using CELP (Code Excited Linear Prediction) from the first layer to the second layer, and transform encoding using MDCT (Modified Discrete Cosine Transform) is performed on and after the third layer. Also, the transform coding is characterized in that a method called FPC (Factorial Pulse Coding) is used in order to efficiently encode a music signal.

２０年前に確立された音声の発声機構をモデル化してベクトル量子化を巧みに応用した基本方式「ＣＥＬＰ」によって大きく性能を向上させた音声符号化技術は、非特許文献２に記載の代数的符号帳（Algebraic Codebook）のような少数パルスによる固定音源を発明したことにより、一段とその性能を向上させた。 A speech coding technique whose performance has been greatly improved by the basic method “CELP” that models a speech utterance mechanism established 20 years ago and skillfully applies vector quantization is an algebraic technique described in Non-Patent Document 2. By inventing a fixed sound source with a small number of pulses such as the codebook (Algebraic Codebook), its performance has been further improved.

Ｇ．７１８では、ＣＥＬＰを用いる層の符号化残差信号を、ＦＰＣで変換符号化する。この場合、ＣＥＬＰの復号信号を減衰させることで、総合的に性能が上がる傾向があり、Ｇ．７１８ではこの傾向を利用して減衰係数を２ビットで伝送している。 G. In 718, the coding residual signal of the layer using CELP is transform-coded by FPC. In this case, there is a tendency that overall performance is improved by attenuating the CELP decoded signal. In 718, the attenuation coefficient is transmitted by 2 bits using this tendency.

ここで、ターゲット信号を符号化する際の符号化歪が最少になるようにＣＥＬＰ符号化しているにも関わらず、ＣＥＬＰ符号化するよりもＣＥＬＰの復号信号を減衰した方が、ＦＰＣ方式で符号化した後の全体的符号化性能が向上する。これは、ＣＥＬＰとＦＰＣとの符号化能力の違いを明らかに示している。 Here, although CELP encoding is performed so as to minimize encoding distortion when encoding the target signal, the CELP decoding signal is attenuated by the FPC method rather than CELP encoding. The overall encoding performance after conversion is improved. This clearly shows the difference in coding capability between CELP and FPC.

即ち、ＣＥＬＰでは、時間軸において局所的に精度良く符号化されるが、周波数スペクトル（単にスペクトルと呼ぶ場合もある）ではすべての周波数に誤差が平均的に乗る。一方、ＦＰＣでは、周波数スペクトルを少数パルスで符号化するので、一部のスペクトルで精度よく符号化されるが、その他のスペクトルでは誤差が減らない。従って、ＣＥＬＰの合成音の周波数スペクトルに符号化歪が一律に乗っていることを考慮すると、ＣＥＬＰの復号信号のスペクトルではむしろ符号化歪ごとパワを下げてしまった方が、ＣＥＬＰの復号信号とＦＰＣで符号化されたスペクトルとを合わせた時に、総合的に符号化歪が小さくなる場合が統計的に多いということになる。 That is, in CELP, encoding is performed locally with high accuracy on the time axis, but in the frequency spectrum (sometimes simply referred to as spectrum), errors are averaged on all frequencies. On the other hand, in FPC, since the frequency spectrum is encoded with a small number of pulses, it is encoded with high accuracy in some spectra, but the error is not reduced in other spectra. Therefore, considering that the coding distortion is uniformly on the frequency spectrum of the synthesized sound of CELP, the power of the CELP decoded signal is reduced rather than the decoding signal of CELP. When the spectrum encoded with the FPC is combined, there are statistically many cases where the coding distortion is totally reduced.

ＣＥＬＰの復号信号を減衰することにより総合的に符号化歪が小さくなる場合が統計的に多いということは、減衰しない方がよい場合もあるということである。従って、Ｇ．７１８では、ＦＰＣ符号化器を動かして、４通りの減衰係数のうち、総合的に符号化歪が最も小さくなる減衰係数を求める。 The fact that there is a statistically large number of cases where coding distortion is reduced overall by attenuating the CELP decoded signal means that it may be better not to attenuate. Therefore, G. In 718, the FPC encoder is moved to obtain an attenuation coefficient that minimizes the coding distortion overall among the four attenuation coefficients.

また、従来、符号化歪の量をコントロールするノイズシェイピングにより、ＦＰＣが符号化しやすい部分にスペクトルを集めるものが知られている（例えば、特許文献１）。特許文献１には、標準符号化方式ＡＡＣ（Advanced Audio Coding）内で使用されるＴＮＳ（Temporal Noise Shaping）という処理が開示されている。特許文献１では、周波数軸の信号であるＭＤＣＴ係数を時間軸の信号とみなし、ＬＰＣ（Linear Prediction Coding）フィルタにＭＤＣＴ係数を通すことにより、時間軸上の振幅の大きいところに雑音を集中させ、男性の音声などの低いピッチ周波数を含む信号の音質を向上させる。 Conventionally, there is known a technique that collects a spectrum in a portion where FPC can be easily encoded by noise shaping that controls the amount of encoding distortion (for example, Patent Document 1). Patent Document 1 discloses a process called TNS (Temporal Noise Shaping) used in the standard coding system AAC (Advanced Audio Coding). In Patent Document 1, the MDCT coefficient, which is a frequency axis signal, is regarded as a time axis signal, and noise is concentrated at a large amplitude on the time axis by passing the MDCT coefficient through an LPC (Linear Prediction Coding) filter. Improve the sound quality of signals containing low pitch frequencies, such as male speech.

特開２００２−２３７９８号公報JP 2002-23798 A

Recommendation ITU-T G.718, 2008/6Recommendation ITU-T G.718, 2008/6 Salami,Laflamme,Adoul,”8kbit/s ACELP Coding of Speech with 10ms Speech-Frame:aCandidate for CCITT Standardization”,IEEE Proc. ICASSP94,pp.II-97-II-100Salami, Laflamme, Adoul, “8kbit / s ACELP Coding of Speech with 10ms Speech-Frame: aCandidate for CCITT Standardization”, IEEE Proc. ICASSP94, pp.II-97-II-100

しかしながら、従来の装置においては、ＣＥＬＰの復号信号のスペクトルのパワを符号化歪と共に下げることにより総合的に符号化歪を小さくするので、ＣＥＬＰ符号化の情報源符号が無駄になるという問題がある。また、従来の装置においては、４通りの減衰係数の全てについて符号器を複数回動かすため、多くの計算量を必要とするという問題がある。また、特許文献１においては、スペクトル符号化におけるノイズシェイピングについての開示はあるものの、ＣＥＬＰのような時間軸の符号化におけるノイズシェイピングについての記載はなく、ＦＰＣで符号化性能を向上させる効果のあるノイズシェイピングを、ＣＥＬＰにおいてどのように行うか解明されていないという問題がある。 However, in the conventional apparatus, since the coding distortion is reduced by reducing the power of the spectrum of the decoded signal of CELP together with the coding distortion, there is a problem that the information source code of CELP coding is wasted. . Further, the conventional apparatus has a problem that a large amount of calculation is required because the encoder is moved a plurality of times for all four attenuation coefficients. In addition, although Patent Document 1 discloses noise shaping in spectrum coding, there is no description of noise shaping in time axis coding such as CELP, which has an effect of improving coding performance by FPC. There is a problem that how noise shaping is performed in CELP has not been clarified.

本発明の目的は、時系列の信号に対する符号化の際の情報源符号を無駄にすることをなくすることができ、計算量を抑制することができるとともに、周波数領域の信号における符号化性能を向上させることができる音声符号化装置及び音声符号化方法を提供することである。 An object of the present invention is to avoid wasting an information source code when encoding a time-series signal, to reduce the amount of calculation, and to improve encoding performance in a frequency domain signal. To provide a speech encoding apparatus and speech encoding method that can be improved.

本発明の音声符号化装置は、符号化対象のターゲット信号に対して直交変換を行って第１周波数スペクトルを生成する第１直交変換手段と、前記第１周波数スペクトルに対して非線形な第１の振幅調整を行う第１調整手段と、前記第１の振幅調整を行った前記第１周波数スペクトルに対して前記直交変換の逆変換を行って時間信号を生成する逆直交変換手段と、前記時間信号を符号化した後に復号して復号信号を生成する第１の符号化手段と、前記復号信号に対して直交変換を行って第２周波数スペクトルを生成する第２直交変換手段と、前記第１の振幅調整の際に前記第１周波数スペクトルに与えた特性と逆の特性を前記第２周波数スペクトルに与える第２調整手段と、前記第１直交変換手段で生成した前記第１周波数スペクトルから、前記第２調整手段で前記逆の特性を与えた前記第２周波数スペクトルを減算して残差スペクトルを生成する減算手段と、前記残差スペクトルを変換符号化する第２の符号化手段と、を有する構成を採る。 The speech encoding apparatus according to the present invention includes a first orthogonal transform unit that performs orthogonal transform on a target signal to be encoded to generate a first frequency spectrum, and a first non-linear function with respect to the first frequency spectrum. A first adjusting means for performing amplitude adjustment; an inverse orthogonal transforming means for generating a time signal by performing inverse transform of the orthogonal transform on the first frequency spectrum subjected to the first amplitude adjustment; and the time signal. First encoding means for decoding after decoding to generate a decoded signal, second orthogonal transform means for performing orthogonal transformation on the decoded signal to generate a second frequency spectrum, and the first From the second adjustment means for giving the second frequency spectrum a characteristic opposite to the characteristic given to the first frequency spectrum during amplitude adjustment, and the first frequency spectrum generated by the first orthogonal transform means, Subtracting means for generating a residual spectrum by subtracting the second frequency spectrum to which the reverse characteristic is given by the second adjusting means; and second encoding means for transform-coding the residual spectrum. Take the configuration.

本発明の音声符号化方法は、符号化対象のターゲット信号に対して直交変換を行って第１周波数スペクトルを生成する第１直交変換ステップと、前記第１周波数スペクトルに対して非線形な振幅調整を行う第１調整ステップと、前記第１調整ステップにより振幅調整された前記第１周波数スペクトルに対して前記直交変換の逆変換を行って時間信号を生成する逆直交変換ステップと、前記時間信号を符号化した後に復号して復号信号を生成する第１の符号化ステップと、前記復号信号に対して直交変換を行って第２周波数スペクトルを生成する第２直交変換ステップと、前記第１調整ステップにより前記第１周波数スペクトルに与えた特性と逆の特性を前記第２周波数スペクトルに与える第２調整ステップと、前記第１直交変換ステップにより生成した前記第１周波数スペクトルから、前記第２調整ステップにより前記逆の特性を与えた前記第２周波数スペクトルを減算して残差スペクトルを生成する減算ステップと、前記残差スペクトルを変換符号化する第２の符号化ステップと、を有するようにした。 The speech coding method according to the present invention includes a first orthogonal transform step of performing orthogonal transform on a target signal to be encoded to generate a first frequency spectrum, and non-linear amplitude adjustment with respect to the first frequency spectrum. A first adjustment step to be performed; an inverse orthogonal transformation step for performing an inverse transformation of the orthogonal transformation on the first frequency spectrum whose amplitude has been adjusted in the first adjustment step to generate a time signal; and encoding the time signal A first encoding step that generates a decoded signal after decoding, a second orthogonal transformation step that performs orthogonal transformation on the decoded signal to generate a second frequency spectrum, and the first adjustment step. A second adjustment step for giving the second frequency spectrum a characteristic opposite to the characteristic given to the first frequency spectrum; and the first orthogonal transformation step. A subtracting step for generating a residual spectrum by subtracting the second frequency spectrum to which the reverse characteristic is given by the second adjusting step from the generated first frequency spectrum, and transform encoding the residual spectrum A second encoding step.

本発明によれば、時系列の信号に対する符号化の際の情報源符号を無駄にすることをなくすることができ、計算量を抑制することができるとともに、周波数領域の信号における符号化性能を向上させることができる。 According to the present invention, it is possible to eliminate the waste of an information source code when encoding a time-series signal, it is possible to reduce the amount of calculation, and to improve the encoding performance of a frequency domain signal. Can be improved.

本発明の実施の形態に係る音声符号化装置の構成を示すブロック図The block diagram which shows the structure of the audio | voice coding apparatus which concerns on embodiment of this invention. 本発明の実施の形態における第１調整部の構成例１を示すブロック図The block diagram which shows the structural example 1 of the 1st adjustment part in embodiment of this invention. 本発明の実施の形態における第１調整部の構成例２を示すブロック図The block diagram which shows the structural example 2 of the 1st adjustment part in embodiment of this invention.

以下、本発明の実施の形態について、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

（実施の形態）
＜音声符号化装置の構成＞
図１は、本発明の実施の形態に係る音声符号化装置１００の構成を示すブロック図である。図１では、時系列符号化の例としてＣＥＬＰを、後段の変換符号化の例としてＦＰＣを用いている。 (Embodiment)
<Configuration of speech encoding apparatus>
FIG. 1 is a block diagram showing a configuration of speech encoding apparatus 100 according to an embodiment of the present invention. In FIG. 1, CELP is used as an example of time-series encoding, and FPC is used as an example of subsequent transform encoding.

第１ＭＤＣＴ部１０１は、入力した符号化対象のターゲット信号（ターゲットベクトル）に対してＭＤＣＴによる直交変換を行うことにより第１周波数スペクトルを取得する。具体的には、第１ＭＤＣＴ部１０１は、サブフレーム長の２倍の長さのサイン窓をターゲット信号に乗じるとともにＭＤＣＴの行列をターゲット信号に乗ずることにより、第１周波数スペクトルを取得する。なお、ターゲット信号は、典型的な例としては音声信号が挙げられるが、音楽信号、または音声信号と音楽信号との双方を含む音響信号でもよく、あるいは、第１ＭＤＣＴ部１０１に入力される前にこれらの信号に何らかの前処理を施した信号であっても良い。 The first MDCT unit 101 obtains a first frequency spectrum by performing orthogonal transform by MDCT on the input target signal (target vector) to be encoded. Specifically, the first MDCT unit 101 obtains the first frequency spectrum by multiplying the target signal by a sine window having a length twice as long as the subframe length and by multiplying the target signal by an MDCT matrix. The target signal is typically a voice signal, but may be a music signal or an acoustic signal including both a voice signal and a music signal, or before being input to the first MDCT unit 101. These signals may be signals obtained by performing some kind of preprocessing.

第１ＭＤＣＴ部１０１は、取得した第１周波数スペクトルを第１調整部１０２及び減算部１０７に出力する。なお、ＤＣＴ（Discrete Cosine Transform）については、計算量削減を目的とした様々な方法が検討されているが、第１ＭＤＣＴ部１０１ではいずれの方法を用いてもよい。 The first MDCT unit 101 outputs the acquired first frequency spectrum to the first adjustment unit 102 and the subtraction unit 107. As for DCT (Discrete Cosine Transform), various methods for reducing the amount of calculation have been studied, but the first MDCT unit 101 may use any method.

第１調整部１０２は、第１ＭＤＣＴ部１０１から入力した第１周波数スペクトルに対して振幅調整を行い、振幅調整した第１周波数スペクトルをＩＭＤＣＴ部１０３に出力する。第１調整部１０２は、振幅調整に用いた所定のパラメータの符号を符号化データとして図示しない音声復号装置に送る。なお、振幅調整の具体的な方法については後述する。 The first adjustment unit 102 adjusts the amplitude of the first frequency spectrum input from the first MDCT unit 101, and outputs the amplitude-adjusted first frequency spectrum to the IMDCT unit 103. The first adjustment unit 102 sends the code of a predetermined parameter used for amplitude adjustment as encoded data to a speech decoding apparatus (not shown). A specific method of amplitude adjustment will be described later.

ＩＭＤＣＴ部１０３は、第１調整部１０２から入力した第１周波数スペクトルに対して、第１ＭＤＣＴ部１０１において行った直交変換の逆変換を行って、時系列の信号（以下「時間信号」と記載する）を取得する。ＩＭＤＣＴ部１０３は、取得した時間信号をＣＥＬＰ符号化部１０４に出力する。 The IMDCT unit 103 performs inverse transformation of the orthogonal transformation performed in the first MDCT unit 101 on the first frequency spectrum input from the first adjustment unit 102, and describes a time-series signal (hereinafter referred to as “time signal”). ) To get. The IMDCT unit 103 outputs the acquired time signal to the CELP encoding unit 104.

ＣＥＬＰ符号化部１０４は、ＩＭＤＣＴ部１０３から入力した時間信号に対して符号化及び復号を行って復号信号を取得し、取得した復号信号を第２ＭＤＣＴ部１０５に出力する。ＣＥＬＰ符号化部１０４は、ＣＥＬＰ符号化により取得した符号を符号化データとして図示しない音声復号装置に送る。 CELP encoding section 104 encodes and decodes the time signal input from IMDCT section 103 to acquire a decoded signal, and outputs the acquired decoded signal to second MDCT section 105. CELP encoding section 104 sends a code acquired by CELP encoding to a speech decoding apparatus (not shown) as encoded data.

第２ＭＤＣＴ部１０５は、ＣＥＬＰ符号化部１０４から入力した復号信号に対してＭＤＣＴによる直交変換を行うことにより第２周波数スペクトルを取得する。第２ＭＤＣＴ部１０５は、取得した第２周波数スペクトルを第２調整部１０６に出力する。 Second MDCT section 105 obtains the second frequency spectrum by performing orthogonal transform by MDCT on the decoded signal input from CELP encoding section 104. The second MDCT unit 105 outputs the acquired second frequency spectrum to the second adjustment unit 106.

第２調整部１０６は、第２ＭＤＣＴ部１０５から入力した第２周波数スペクトルに対して、前記第１調整部１０２において第１周波数スペクトルに与えた特性と逆の特性を与える。具体的には、第２調整部１０６は、第１調整部１０２において減衰された部分を伸長させる。第２調整部１０６は、逆の特性を与えた第２周波数スペクトルを減算部１０７に出力する。なお、逆の特性を与える具体的な方法については後述する。 The second adjustment unit 106 gives a characteristic opposite to the characteristic given to the first frequency spectrum in the first adjustment unit 102 to the second frequency spectrum input from the second MDCT unit 105. Specifically, the second adjustment unit 106 extends the portion attenuated in the first adjustment unit 102. The second adjustment unit 106 outputs the second frequency spectrum to which the reverse characteristic is given to the subtraction unit 107. A specific method for providing the opposite characteristic will be described later.

減算部１０７は、第１ＭＤＣＴ部１０１から入力した第１周波数スペクトルから、第２調整部１０６から入力した第２周波数スペクトルを減算して、ＦＰＣで符号化し易い残差スペクトルを取得する。減算部１０７は、取得した残差スペクトルをＦＰＣ符号化部１０８に出力する。 The subtraction unit 107 subtracts the second frequency spectrum input from the second adjustment unit 106 from the first frequency spectrum input from the first MDCT unit 101, and obtains a residual spectrum that can be easily encoded by FPC. The subtraction unit 107 outputs the acquired residual spectrum to the FPC encoding unit 108.

ＦＰＣ符号化部１０８は、減算部１０７から入力した残差スペクトルをＦＰＣ方式で変換符号化して符号化データを出力する。この符号化データは、図示しない音声復号装置に送られる。なお、ＦＰＣの符号化／復号アルゴリズムについてはＩＴＵ−Ｔ規格Ｇ．７１８の規格書に詳細な記載があるのでこの説明を省略する。 The FPC encoding unit 108 transform-encodes the residual spectrum input from the subtracting unit 107 by the FPC method and outputs encoded data. This encoded data is sent to a speech decoding apparatus (not shown). The FPC encoding / decoding algorithm is described in ITU-T standard G.264. Since detailed description is given in the 718 standard, this description is omitted.

＜第１調整部の構成＞
第１調整部１０２の構成として、構成例１及び構成例２の２つの構成について、以下に説明する。なお、本実施の形態においては、構成例１及び構成例２の何れを用いてもよい。 <Configuration of first adjustment unit>
As configurations of the first adjustment unit 102, two configurations of Configuration Example 1 and Configuration Example 2 will be described below. In the present embodiment, any one of Configuration Example 1 and Configuration Example 2 may be used.

（構成例１）
図２は、第１調整部１０２の構成例１を示すブロック図である。 (Configuration example 1)
FIG. 2 is a block diagram illustrating a configuration example 1 of the first adjustment unit 102.

構成例１では、第１調整部１０２における振幅調整の方針として、第１周波数スペクトルの各サンプルにおいて、スペクトル値が比較的小さいサンプルは減衰せずにそのままにし、スペクトル値が比較的大きいサンプルほど大きく減衰させる。例えば、第１調整部１０２は、しきい値未満のスペクトル値を有するサンプルについては減衰させずにそのままにし、しきい値以上のスペクトル値を有するサンプルに対して、スペクトル値が大きいほど大きく減衰させる振幅調整を行う。したがって、構成例１において、第１調整部１０２における振幅調整は非線形である。 In Configuration Example 1, as a policy of amplitude adjustment in the first adjustment unit 102, in each sample of the first frequency spectrum, a sample having a relatively small spectral value is left without being attenuated, and a sample having a relatively large spectral value is increased. Attenuate. For example, the first adjustment unit 102 does not attenuate a sample having a spectral value less than the threshold value, and attenuates the sample having a spectral value greater than or equal to the threshold value as the spectral value increases. Adjust the amplitude. Therefore, in the configuration example 1, the amplitude adjustment in the first adjustment unit 102 is non-linear.

平均値算出部２０１は、第１ＭＤＣＴ部１０１から出力された第１周波数スペクトルを入力し、下記の（１）式の計算を行って平均値ｍを求める。

The average value calculation unit 201 receives the first frequency spectrum output from the first MDCT unit 101, calculates the following equation (1), and obtains the average value m.

平均値算出部２０１は、求めた平均値ｍを振幅調整部２０２に出力する。 The average value calculation unit 201 outputs the obtained average value m to the amplitude adjustment unit 202.

振幅調整部２０２は、平均値算出部２０１から入力した平均値ｍと、定数とを用いて、下記の（２）式によりしきい値を求める。

The amplitude adjustment unit 202 uses the average value m input from the average value calculation unit 201 and a constant to obtain a threshold value according to the following equation (2).

振幅調整部２０２は、第１ＭＤＣＴ部１０１から入力した第１周波数スペクトルに対して、しきい値ｔを用いた下記の（３）式に従って振幅調整を行う。

The amplitude adjustment unit 202 performs amplitude adjustment on the first frequency spectrum input from the first MDCT unit 101 according to the following equation (3) using the threshold value t.

（３）式は、一次微分まで連続的な関数であり、しきい値を境に線形関数と対数関数とが入れ替わるようになっている。即ち、振幅調整部２０２は、振幅がしきい値以下のスペクトルはそのままにし、振幅がしきい値より大きいスペクトルは対数関数で減衰させる。 Equation (3) is a continuous function up to the first derivative, and a linear function and a logarithmic function are interchanged at a threshold. That is, the amplitude adjusting unit 202 keeps the spectrum having the amplitude equal to or smaller than the threshold value, and attenuates the spectrum having the amplitude larger than the threshold value by a logarithmic function.

また、振幅調整部２０２は、しきい値を第２調整部１０６に出力する。また、振幅調整部２０２は、図示しない音声復号装置でも逆変換を行う必要があるので、振幅調整に用いたパラメータであるしきい値ｔを符号化し、しきい値ｔの符号をこの音声復号装置に送る。この際、符号化方法としては、整数化によるスカラ量子化が挙げられる。なお、しきい値ｔの符号化歪による悪影響を避けるために、（２）式でしきい値ｔを求めた後、符号化及び復号を行い、復号されたしきい値ｔを用いて、（３）式の処理、ならびに第２調整部１０６の処理を行う。 In addition, the amplitude adjustment unit 202 outputs the threshold value to the second adjustment unit 106. In addition, since the amplitude adjusting unit 202 needs to perform inverse transform even in a speech decoding device (not shown), the threshold t that is a parameter used for amplitude adjustment is encoded, and the code of the threshold t is converted to the speech decoding device. Send to. In this case, the encoding method includes scalar quantization by integerization. In order to avoid an adverse effect of the threshold value t due to coding distortion, after obtaining the threshold value t by the equation (2), encoding and decoding are performed, and the decoded threshold value t is used ( 3) The process of the formula and the process of the second adjustment unit 106 are performed.

そして、振幅調整部２０２は、振幅調整した第１周波数スペクトルをＩＭＤＣＴ部１０３に出力する。 Then, the amplitude adjustment unit 202 outputs the amplitude-adjusted first frequency spectrum to the IMDCT unit 103.

（構成例２）
図３は、第１調整部１０２の構成例２を示すブロック図である。 (Configuration example 2)
FIG. 3 is a block diagram illustrating a configuration example 2 of the first adjustment unit 102.

構成例２では、第１調整部１０２における振幅調整の方針として、第１周波数スペクトルの各サンプルにおいて、スペクトル値の変化が比較的になだらかなサンプルはスペクトル値を減衰させずにそのままにし、スペクトル値の変化にパルス性があるサンプルはスペクトル値を減衰させる。例えば、第１調整部１０２は、第１周波数スペクトルの隣接するサンプル間のスペクトル値の変化量がしきい値未満のサンプルについては減衰させずにそのままにし、前記の変化量がしきい値以上のサンプルに対して、変化量が大きいほど大きく減衰させる振幅調整を行う。したがって、構成例２でも、第１調整部１０２における振幅調整は非線形である。 In the configuration example 2, as a policy of amplitude adjustment in the first adjustment unit 102, in each sample of the first frequency spectrum, a sample with a relatively gentle change in the spectrum value is left as it is without attenuating the spectrum value. Samples that are pulsed in the change of the value attenuate the spectral value. For example, the first adjustment unit 102 does not attenuate a sample whose spectral value change amount between adjacent samples of the first frequency spectrum is less than a threshold value, and the change amount is equal to or greater than the threshold value. Amplitude adjustment is performed on the sample so that the larger the amount of change, the greater the attenuation. Therefore, also in the configuration example 2, the amplitude adjustment in the first adjustment unit 102 is non-linear.

パルス性分析部３０１は、第１ＭＤＣＴ部１０１から入力した第１周波数スペクトルを用いて、減衰係数ベクトルを求める。具体的には、パルス性分析部３０１は、中心となるサンプルのスペクトル値と、そのサンプルの周辺のサンプルのスペクトル値との関係を分析することによりパルス性を求める。この際、例えば、上記で説明したようにしきい値とスペクトル値の変化量とを比較することによりパルス性を求める。そして、パルス性分析部３０１は、パルス性有りと判断した場合は、１未満の定数値を格納し、それ以外は「１．０」を値として格納することにより減衰係数ベクトルを求める。即ち、パルス性分析部３０１は、下記の（４）式に従って減衰係数ベクトルを求める。

The pulse analysis unit 301 uses the first frequency spectrum input from the first MDCT unit 101 to obtain an attenuation coefficient vector. Specifically, the pulse property analysis unit 301 obtains the pulse property by analyzing the relationship between the spectral value of the sample at the center and the spectral values of the samples around the sample. At this time, for example, as described above, the pulse property is obtained by comparing the threshold value with the change amount of the spectrum value. When the pulse characteristic analysis unit 301 determines that there is a pulse characteristic, the pulse characteristic analysis unit 301 stores a constant value less than 1, and stores “1.0” as a value otherwise, thereby obtaining an attenuation coefficient vector. That is, the pulse characteristic analysis unit 301 obtains an attenuation coefficient vector according to the following equation (4).

パルス性分析部３０１は、求めた減衰係数ベクトルを振幅調整部３０２及び第２調整部１０６に出力する。ここで、減衰係数ベクトルは、パルス性のあるスペクトル値を減衰させる関数である。 The pulse property analysis unit 301 outputs the obtained attenuation coefficient vector to the amplitude adjustment unit 302 and the second adjustment unit 106. Here, the attenuation coefficient vector is a function for attenuating a pulsed spectral value.

また、パルス性分析部３０１は、音声復号装置でも逆変換を行う必要があるので、振幅調整に用いたパラメータである減衰係数ベクトルを符号化し、減衰係数ベクトルの符号を音声復号装置に送る。この際、符号化方法としては、減衰したサンプルを「１」とし、減衰しないサンプルを「０」としてベクトルをそのまま送る方法、または区間をいくつかに分割してベクトル量子化を行う方法があげられる。また、減衰させる部分の数をあらかじめ制限して、減衰係数ベクトルの一部のみを符号化して伝送するという方法も考えられる。また、極端な例としては、第２調整部１０６で得られたスペクトルから、（４）式に示す減衰係数ベクトルを作成し、全く情報を送らないという方法でもよい。この場合、減衰係数ベクトルの符号は送らない。 In addition, since it is necessary for inverse analysis to be performed in the speech decoding apparatus, the pulse analysis unit 301 encodes the attenuation coefficient vector, which is a parameter used for amplitude adjustment, and sends the sign of the attenuation coefficient vector to the speech decoding apparatus. At this time, as an encoding method, a method in which the attenuated sample is set to “1” and a non-attenuated sample is set to “0” and the vector is transmitted as it is, or a method in which vector quantization is performed by dividing a section into several parts is given. . A method of limiting the number of parts to be attenuated in advance and encoding and transmitting only a part of the attenuation coefficient vector is also conceivable. As an extreme example, a method may be used in which the attenuation coefficient vector shown in the equation (4) is created from the spectrum obtained by the second adjustment unit 106 and no information is sent. In this case, the sign of the attenuation coefficient vector is not sent.

振幅調整部３０２は、パルス性分析部３０１から入力した減衰係数ベクトルを用いて、下記の（５）式に従って第１周波数スペクトルの振幅調整を行う。

The amplitude adjustment unit 302 uses the attenuation coefficient vector input from the pulse property analysis unit 301 to adjust the amplitude of the first frequency spectrum according to the following equation (5).

そして、振幅調整部３０２は、振幅調整した第１周波数スペクトルをＩＭＤＣＴ部１０３に出力する。 Then, the amplitude adjustment unit 302 outputs the first frequency spectrum whose amplitude has been adjusted to the IMDCT unit 103.

＜第２調整部における動作＞
第１調整部１０２が上記の構成例１の構成または構成例２の構成を有する各々の場合における第２調整部１０６の動作について、以下に説明する。 <Operation in Second Adjustment Unit>
The operation of the second adjustment unit 106 in each case where the first adjustment unit 102 has the configuration of the above configuration example 1 or the configuration of the configuration example 2 will be described below.

（第１調整部の構成が構成例１である場合の動作）
第２調整部１０６は、第２周波数スペクトルの振幅の大きさの全体的な傾向に応じて、適応的に振幅を調整する。第２調整部１０６の調整は第１調整部１０２の逆変換に相当し、（２）式により求めたしきい値ｔよりも大きい場合に非線形に振幅を拡大することを特徴としている。 (Operation when the configuration of the first adjustment unit is the configuration example 1)
The second adjustment unit 106 adaptively adjusts the amplitude according to the overall tendency of the amplitude of the second frequency spectrum. The adjustment of the second adjustment unit 106 corresponds to the inverse transformation of the first adjustment unit 102, and is characterized in that the amplitude is increased non-linearly when it is larger than the threshold value t obtained by the equation (2).

第２調整部１０６は、振幅調整部２０２から入力したしきい値ｔに基づいて、下記の（６）式に従って逆変換を行うことにより調整する。

Based on the threshold value t input from the amplitude adjustment unit 202, the second adjustment unit 106 performs adjustment by performing inverse conversion according to the following equation (6).

第２調整部１０６は、（６）式より、第２周波数スペクトルの各サンプルのうち、しきい値ｔ未満のスペクトル値を有するサンプルは伸長させずにそのままにし、しきい値ｔ以上のスペクトル値を有するサンプルに対して、スペクトル値が大きいほど大きく伸長させる振幅調整を行う。 From the equation (6), the second adjustment unit 106 determines that the sample having the spectrum value less than the threshold value t out of the samples of the second frequency spectrum is not stretched, and the spectrum value is equal to or greater than the threshold value t. Amplitude adjustment is performed so that the larger the spectral value is, the larger the spectral value is.

（第１調整部の構成が構成例２である場合の動作）
第２調整部１０６は、第２周波数スペクトルの振幅の大きさの全体的な傾向に応じて、適応的に振幅を調整する。第２調整部１０６の調整は第１調整部１０２の逆変換に相当し、隣接するサンプル間のスペクトル値の変化量がしきい値よりも大きい場合に、非線形に振幅を拡大することを特徴としている。 (Operation when the configuration of the first adjustment unit is the configuration example 2)
The second adjustment unit 106 adaptively adjusts the amplitude according to the overall tendency of the amplitude of the second frequency spectrum. The adjustment of the second adjustment unit 106 corresponds to the inverse transformation of the first adjustment unit 102, and is characterized in that the amplitude is increased nonlinearly when the amount of change in the spectral value between adjacent samples is larger than the threshold value. Yes.

第２調整部１０６は、パルス性分析部３０１から入力した減衰係数ベクトルに基づいて、下記の（７）式に従って逆変換を行う。

The second adjustment unit 106 performs inverse conversion according to the following equation (7) based on the attenuation coefficient vector input from the pulse property analysis unit 301.

第２調整部１０６は、（７）式より、第２周波数スペクトルの隣接するサンプル間のスペクトル値の変化量がしきい値未満のサンプルは伸長させずにそのままにし、前記の変化量がしきい値以上のサンプルに対して、前記の変化量が大きいほど大きく伸長させる振幅調整を行う。 From the equation (7), the second adjustment unit 106 determines that the change amount of the spectrum value between the adjacent samples of the second frequency spectrum is less than the threshold value, and does not expand it, and the change amount is the threshold value. Amplitude adjustment is performed so that the larger the amount of change is, the larger the amount of change is for a sample equal to or greater than the value.

なお、本実施の形態の音声符号化装置に対応した音声復号装置は、伝送されてきたしきい値の符号または減衰係数ベクトルの符号に基づいて、しきい値ｔまたは減衰係数ベクトルを復号する。そして、この音声復号装置は、音声符号化装置のＣＥＬＰ符号化部１０４において復号を行う部分、第２ＭＤＣＴ部１０５、第２調整部１０６、と同じ動作を行うことで、ＣＥＬＰの合成音の復号された周波数スペクトルを生成する。さらに、音声復号装置は、このＣＥＬＰの合成音の復号された周波数スペクトルに、ＦＰＣの符号を復号して生成される、復号されたＦＰＣの周波数スペクトルを加算する。各ブロックの説明は前述の通りであるので詳細説明を省略する。 Note that the speech decoding apparatus corresponding to the speech encoding apparatus of the present embodiment decodes the threshold value t or the attenuation coefficient vector based on the transmitted threshold code or attenuation coefficient vector code. This speech decoding apparatus performs the same operation as that performed by the CELP encoding unit 104 of the speech encoding device, the second MDCT unit 105, and the second adjustment unit 106, thereby decoding the synthesized sound of CELP. Generate a frequency spectrum. Further, the speech decoding apparatus adds the decoded FPC frequency spectrum generated by decoding the FPC code to the decoded frequency spectrum of the synthesized sound of CELP. Since the description of each block is as described above, the detailed description is omitted.

＜本実施の形態の効果＞
本発明によれば、ＣＥＬＰ符号化の際の情報源符号を無駄にすることをなくすることができ、計算量を抑制することができるとともに、ＣＥＬＰ符号化におけるノイズシェイピング処理を実現することにより、ＦＰＣ符号化性能を向上させることができる。 <Effects of the present embodiment>
According to the present invention, it is possible to eliminate waste of an information source code at the time of CELP encoding, reduce the amount of calculation, and realize noise shaping processing in CELP encoding. FPC encoding performance can be improved.

＜本実施の形態の変形例＞
上記実施の形態において、第１調整部の構成として構成例１及び構成例２を示したが、本発明はこれに限らず、スペクトル値を減衰させる構成及び動作であれば他の構成及び動作を用いてもよい。本発明は、ＣＥＬＰ符号化部の前後の調整部の構成及び動作に直接依存するものではないからである。 <Modification of the present embodiment>
In the above embodiment, the configuration example 1 and the configuration example 2 are shown as the configuration of the first adjustment unit. However, the present invention is not limited to this, and other configurations and operations are possible as long as the configuration and operation attenuate the spectrum value. It may be used. This is because the present invention does not directly depend on the configuration and operation of the adjustment unit before and after the CELP encoding unit.

また、上記実施の形態において、第１調整部において平均値を算術平均で求めたが、本発明はこれに限らず、モードまたはメジアンを用いてもよい。メジアンを用いる場合は、あらかじめ指定された順位の振幅値を探索してそれを用いることにより、同様の効果を得ることができる。 Moreover, in the said embodiment, although the average value was calculated | required by the arithmetic mean in the 1st adjustment part, this invention is not restricted to this, You may use a mode or a median. In the case of using the median, the same effect can be obtained by searching for the amplitude value of the rank specified in advance and using it.

また、上記実施の形態において、後段の符号器はＦＰＣ符号化部にしたが、本発明はこれに限らず、ＴＣＸ（transform coded excitation）、ＡＭＲ−ＷＢ＋（Extended Adaptive Multi-Rate Wideband）、またはＡＡＣ（Advanced Audio Coding）などの変換符号化を行う符号化部を設けてもよい。これは、パルス性のある残差スペクトルを効率よく符号化するためにはいずれの符号化方式でもよく、ＦＰＣ以外でも本発明が有効であるからである。 In the above embodiment, the subsequent encoder is an FPC encoding unit. However, the present invention is not limited to this, and TCX (transform coded excitation), AMR-WB + (Extended Adaptive Multi-Rate Wideband), or AAC An encoding unit that performs transform encoding such as (Advanced Audio Coding) may be provided. This is because any encoding method may be used to efficiently encode a residual spectrum having a pulse property, and the present invention is effective for other than FPC.

また、上記実施の形態において、ＣＥＬＰ符号化部を用いたが、本発明はこれに限らず、ＭＰＣ（Multiple Pulse Coding）またはＡＤＰＣＭなどの時系列符号化を行う符号化部を設けても良い。これらは、時系列信号を効率よく符号化できるが、後段のスペクトル符号化において復号信号の減衰に効果があるのはＣＥＬＰと同じであるからである。 In the above embodiment, the CELP encoding unit is used. However, the present invention is not limited to this, and an encoding unit that performs time-series encoding such as MPC (Multiple Pulse Coding) or ADPCM may be provided. These are because time series signals can be efficiently encoded, but the effect of the attenuation of the decoded signal in the subsequent spectral encoding is the same as CELP.

また、上記実施の形態において、直交変換の方法としてＭＤＣＴを用いたが、本発明はこれに限らず、ＤＣＴ（Discrete Cosine Transform）またはＦＦＴなどの直交変換方法でもよい。その理由は、周波数スペクトルが得られる方法であれば、上記の実施の形態に適用できるからである。 In the above embodiment, MDCT is used as the orthogonal transform method. However, the present invention is not limited to this, and an orthogonal transform method such as DCT (Discrete Cosine Transform) or FFT may be used. The reason is that any method capable of obtaining a frequency spectrum can be applied to the above embodiment.

なお、上記実施の形態では、装置を音声符号化装置／音声復号装置と称したが、ここでの「音声」とは、広義の意味での音声を示すものである。すなわち、音声符号化装置における入力信号及び音声復号化装置における復号信号は、音声信号、音楽信号、あるいは音声信号と音楽信号との双方を含む音響信号、など、いずれの信号をも示すものである。 In the above-described embodiment, the device is referred to as a speech encoding device / speech decoding device, but “speech” here indicates speech in a broad sense. That is, the input signal in the speech coding apparatus and the decoded signal in the speech decoding apparatus indicate any signal such as a speech signal, a music signal, or an acoustic signal including both a speech signal and a music signal. .

また、上記実施の形態では、本発明をハードウェアで構成する場合を例にとって説明したが、本発明はハードウェアとの連携においてソフトウェアでも実現することも可能である。 Further, although cases have been described with the above embodiment as examples where the present invention is configured by hardware, the present invention can also be realized by software in cooperation with hardware.

上記実施の形態の説明に用いた各機能ブロックは、典型的には集積回路であるＬＳＩとして実現される。これらは個別に１チップ化されてもよいし、一部または全てを含むように１チップ化されてもよい。ここでは、ＬＳＩとしたが、集積度の違いにより、ＩＣ、システムＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩと呼称されることもある。 Each functional block used in the description of the above embodiment is typically realized as an LSI which is an integrated circuit. These may be individually made into one chip, or may be made into one chip so as to include a part or all of them. The name used here is LSI, but it may also be called IC, system LSI, super LSI, or ultra LSI depending on the degree of integration.

また、集積回路化の手法はＬＳＩに限るものではなく、専用回路または汎用プロセッサで実現してもよい。ＬＳＩ製造後に、プログラムすることが可能なＦＰＧＡ（Field Programmable Gate Array）、または、ＬＳＩ内部の回路セルの接続または設定を再構成可能なリコンフィギュラブル・プロセッサーを利用してもよい。 Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. An FPGA (Field Programmable Gate Array) that can be programmed after manufacturing the LSI or a reconfigurable processor that can reconfigure the connection or setting of circuit cells inside the LSI may be used.

さらには、半導体技術の進歩または派生する別技術によりＬＳＩに置き換わる集積回路化の技術が登場すれば、当然、その技術を用いて機能ブロックの集積化を行ってもよい。バイオ技術の適用等が可能性としてありえる。 Further, if integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology or a derivative other technology, it is naturally also possible to carry out function block integration using this technology. Biotechnology can be applied.

本発明にかかる音声符号化装置及び音声符号化方法は、多層構造を有するスケーラブルコーデック技術を用いるのに好適である。 The speech coding apparatus and speech coding method according to the present invention are suitable for using a scalable codec technology having a multilayer structure.

１００音声符号化装置
１０１第１ＭＤＣＴ部
１０２第１調整部
１０３ＩＭＤＣＴ部
１０４ＣＥＬＰ符号化部
１０５第２ＭＤＣＴ部
１０６第２調整部
１０７減算部
１０８ＦＰＣ符号化部 DESCRIPTION OF SYMBOLS 100 Speech coding apparatus 101 1st MDCT part 102 1st adjustment part 103 IMDCT part 104 CELP encoding part 105 2nd MDCT part 106 2nd adjustment part 107 Subtraction part 108 FPC encoding part

Claims

First orthogonal transform means for performing orthogonal transform on a target signal to be encoded to generate a first frequency spectrum;
First adjusting means for performing a first amplitude adjustment that is nonlinear with respect to the first frequency spectrum;
An inverse orthogonal transform means for performing an inverse transform of the orthogonal transform on the first frequency spectrum subjected to the first amplitude adjustment to generate a time signal;
First encoding means for encoding the time signal and then decoding to generate a decoded signal;
Second orthogonal transform means for performing orthogonal transform on the decoded signal to generate a second frequency spectrum;
Second adjusting means for giving the second frequency spectrum a characteristic opposite to the characteristic given to the first frequency spectrum during the first amplitude adjustment;
Subtracting means for generating a residual spectrum by subtracting the second frequency spectrum given the reverse characteristic by the second adjusting means from the first frequency spectrum generated by the first orthogonal transforming means;
Second encoding means for transform encoding the residual spectrum;
A speech encoding apparatus.

The first adjustment means performs the first amplitude adjustment by attenuating the sample having a spectrum value greater than or equal to a threshold value among the samples of the first frequency spectrum as the spectrum value increases. The speech encoding apparatus according to claim 1, wherein

The second adjustment means performs a second amplitude adjustment for expanding a sample with a spectrum value greater than or equal to a threshold value among samples of the second frequency spectrum, as the spectrum value increases. The speech coding apparatus according to claim 2, wherein the inverse characteristic is given to the second frequency spectrum by performing adaptively according to the amplitude of the two frequency spectrum.

The first adjustment means performs the first amplitude adjustment by attenuating the first frequency spectrum in a sample where the change amount of the spectrum value between adjacent samples is greater than or equal to a threshold value, as the change amount is larger. The speech encoding apparatus according to claim 1, wherein:

The second adjustment means performs a second amplitude adjustment for increasing the amount of change in the spectrum value between adjacent samples of the second frequency spectrum that is larger as the amount of change is larger than a threshold value. The speech coding apparatus according to claim 4, wherein the inverse characteristic is given to the second frequency spectrum by performing adaptively according to the amplitude of the second frequency spectrum.

A first orthogonal transform step of performing orthogonal transform on the target signal to be encoded to generate a first frequency spectrum;
A first adjustment step for performing non-linear amplitude adjustment on the first frequency spectrum;
An inverse orthogonal transform step of generating a time signal by performing an inverse transform of the orthogonal transform on the first frequency spectrum whose amplitude has been adjusted by the first adjustment step;
A first encoding step of encoding the time signal and then decoding to generate a decoded signal;
A second orthogonal transform step of performing an orthogonal transform on the decoded signal to generate a second frequency spectrum;
A second adjustment step for giving the second frequency spectrum a characteristic opposite to the characteristic given to the first frequency spectrum by the first adjustment step;
A subtracting step of generating a residual spectrum by subtracting the second frequency spectrum given the inverse characteristic by the second adjusting step from the first frequency spectrum generated by the first orthogonal transforming step;
A second encoding step for transform encoding the residual spectrum;
A speech encoding method comprising: