JP2001100799A

JP2001100799A - Method and device for sound encoding and computer readable recording medium stored with sound encoding algorithm

Info

Publication number: JP2001100799A
Application number: JP28071599A
Authority: JP
Inventors: Atsushi Yamane; 淳山根
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 1999-09-30
Filing date: 1999-09-30
Publication date: 2001-04-13
Anticipated expiration: 2019-09-30
Also published as: JP4007730B2

Abstract

PROBLEM TO BE SOLVED: To improve the tone quality by reducing the difference between the pitch period of an adaptive code vector and that of a sound source code vector without increasing the operation volume required for search of a sound source code book. SOLUTION: A digital sound input signal is divided into frames by a frame division part 12, and they are divided into sub-frames by a sub-frame division part 14, and a formant parameter extraction part 16 analyzes frames to extract and encode formant parameters, and a pitch period extraction part 18 uses formant parameters of sub-frames and frames to extract and encode the pitch period of sub-frames, and a noise source extraction part 20 extracts and encodes noise source components of sub-frames; and when a pulse will be made rise in a non-integer sample position, pulses are made rise in plural integer sample positions between which the non-integer sample position is interposed.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、音声符号化装置、
音声符号化方法および記録媒体に係り、さらに詳しく
は、例えば留守番電話、留守録付きファクシミリ装置、
音声通信、ボイスメモ、ボイスメールシステム、ディジ
タル携帯電話、ＤＳＶＤモデム、あるいはインターネッ
ト電話などの音声符号化応用システムに用いられるディ
ジタル音声信号を符号化する音声符号化装置、音声符号
化方法および音声符号化アルゴリズムを記録したコンピ
ュータ読み取り可能な記録媒体に関する。[0001] The present invention relates to a speech coding apparatus,
For voice encoding method and recording medium, more specifically, for example, answering machine, facsimile machine with answering machine,
Voice coding apparatus, voice coding method and voice coding algorithm for coding a digital voice signal used in voice coding application systems such as voice communication, voice memo, voice mail system, digital cellular phone, DSVD modem, or Internet phone The present invention relates to a computer-readable recording medium having recorded thereon.

【０００２】[0002]

【従来の技術】近年、携帯電話、インターネット電話、
ＤＳＶＤモデムなどの音声信号を低ビットレート符号化
する音声符号化技術を用いたアプリケーションが広く普
及するに伴って、音声符号化技術の高品質化への要求が
高まってきている。2. Description of the Related Art In recent years, mobile phones, Internet phones,
With the widespread use of applications such as DSVD modems that use voice coding technology for coding voice signals at a low bit rate, there is an increasing demand for higher quality voice coding technology.

【０００３】例えば、６ｋｂｐｓ〜１６ｋｂｐｓにおけ
る音声符号化方式の主流は、ＣＥＬＰ（Code Excited L
inear Prediction coding system）であって、このＣＥ
ＬＰの概念を用いた方式が上記した種々のアプリケーシ
ョンや、日本のディジタル携帯電話などさまざまな標準
において利用されている。[0003] For example, the mainstream of speech coding at 6 kbps to 16 kbps is CELP (Code Excited L).
inear Prediction coding system)
A system using the concept of LP is used in various applications described above and various standards such as digital mobile phones in Japan.

【０００４】ＣＥＬＰとは、線形予測による音声の音源
・声道分離モデルを用い、声道情報を量子化して、音源
情報をベクトル量子化することにより、低ビットレート
符号化を実現するものである。具体的には、まず、入力
音声を処理単位であるフレームに分割し、フレームを解
析して、声道情報に対応する音声のホルマント成分を表
現するホルマントパラメータを抽出して、量子化する。
音源情報の量子化は、フレームよりもさらに短いサブフ
レームについて行われることが多く、さらに、二種類の
符号帳を用いて二段階で行われることが多い。The CELP realizes low bit rate coding by quantizing vocal tract information and vector-quantizing the vocal tract information using a speech source / vocal tract separation model based on linear prediction. . Specifically, first, the input voice is divided into frames, which are processing units, and the frames are analyzed to extract and quantize formant parameters representing the formant components of the voice corresponding to the vocal tract information.
The quantization of the excitation information is often performed on a subframe shorter than the frame, and is often performed in two stages using two types of codebooks.

【０００５】まず、一段階目は、音声のピッチ情報を表
す適応符号帳を用いた適応符号帳探索である。適応符号
帳は、直前のサブフレームまでの音源情報をピッチ長分
繰り返してサブフレーム長にしたベクトル（適応符号ベ
クトル）を所定数のピッチ数個分生成することにより、
サブフレーム毎に適応的に構成される。次に、各適応符
号ベクトルを上記のように量子化されたホルマントパラ
メータによって構成される声道フィルタに通し、合成ベ
クトルを生成する。さらに、その合成ベクトルと入力サ
ブフレームとの距離を測定し、最も短い距離を与えるも
のとして最適な適応符号ベクトル、すなわち最適なピッ
チを決定する。[0005] The first stage is an adaptive codebook search using an adaptive codebook representing pitch information of speech. The adaptive codebook generates a vector (adaptive code vector) having a subframe length by repeating the excitation information up to the immediately preceding subframe by the pitch length for a predetermined number of pitches.
It is configured adaptively for each subframe. Next, each adaptive code vector is passed through a vocal tract filter constituted by the formant parameters quantized as described above to generate a synthesized vector. Further, the distance between the combined vector and the input subframe is measured, and the optimal adaptive code vector, that is, the optimal pitch is determined to give the shortest distance.

【０００６】二段階目は、音源符号帳（雑音符号帳とも
いう）と呼ばれる。所定数のベクトル（音源符号ベクト
ル）の固定的な符号帳を用いた音源符号帳探索（雑音符
号帳探索ともいう）である。この音源符号帳の種類に
は、学習によって構成される学習符号帳、乱数によって
構成するランダム符号帳、数本のパルスを用いて代数的
に構成する代数的音源符号帳、あるいは、それらを部分
的に組み合わせたものがあった。The second stage is called an excitation codebook (also called a noise codebook). An excitation codebook search (also referred to as a noise codebook search) using a fixed codebook of a predetermined number of vectors (excitation code vectors). The types of the excitation codebook include a learning codebook configured by learning, a random codebook configured by random numbers, an algebraic excitation codebook configured algebraically using several pulses, or a partial There was a combination.

【０００７】この場合、まず、前記サブフレームベクト
ルから、前記決定された最適な適応符号ベクトルを声道
フィルタに通した合成ベクトルを差し引くことにより、
音源符号帳探索の目標信号を生成する。次いで、各音源
符号ベクトルを適応符号ベクトルと同様に、前記声道フ
ィルタを通して合成ベクトルを生成する。さらに、前記
音源符号帳探索の目標信号と前記合成ベクトルとの距離
を測定し、最も短い距離を与えるものとしての最適な音
源符号ベクトルを決定していた。In this case, first, a synthesized vector obtained by passing the determined optimal adaptive code vector through a vocal tract filter is subtracted from the subframe vector,
A target signal for excitation codebook search is generated. Next, a combined vector is generated from each excitation code vector through the vocal tract filter in the same manner as the adaptive code vector. Further, the distance between the target signal of the excitation codebook search and the composite vector is measured, and the optimal excitation code vector that gives the shortest distance is determined.

【０００８】[0008]

【発明が解決しようとする課題】しかしながら、上記従
来の音声符号化装置やその方法にあっては、音源符号帳
が音声のホルマント成分およびピッチ成分以外の成分を
表現するためのものであるので、ピッチ長がサブフレー
ムよりも短い場合、音源符号探索の目標信号にはピッチ
に基づく周期性が残ってしまう。このため、通常の音源
符号帳探索では、歪みが大きくなるという問題点を生じ
る。However, in the above-mentioned conventional speech coding apparatus and method, the excitation codebook is for expressing components other than the formant component and the pitch component of the speech. If the pitch length is shorter than the subframe, the target signal of the excitation code search will have a periodicity based on the pitch. For this reason, a problem arises that distortion is increased in the ordinary excitation codebook search.

【０００９】当然、この性質は、サブフレーム長が長く
なるにしたがって顕著となる。近年では、インターネッ
ト電話の普及により、４ｋｂｐｓ近辺の低遅延ビットレ
ートの符号化の高品質化が求められてきている。この場
合、ＣＥＬＰを用いて４ｋｂｐｓで低遅延方式を実現す
るには、サブフレーム長を従来から用いられてきた５〜
８ｍｓ程度よりも大きい１０ｍｓ程度にする必要がある
ことから、上記問題点を解決する必要性がより重要にな
ってくる。[0009] Naturally, this property becomes remarkable as the subframe length becomes longer. In recent years, with the spread of Internet telephones, there has been a demand for higher quality encoding at a low delay bit rate near 4 kbps. In this case, in order to realize a low-delay scheme at 4 kbps using CELP, the subframe length must be 5 to 5 which has been conventionally used.
Since it is necessary to set the time to about 10 ms, which is longer than about 8 ms, the necessity of solving the above problem becomes more important.

【００１０】そこで、この問題点に対処する手法の一つ
として、上記音源符号ベクトルを上記ピッチ周期に対応
させて周期化することにより、周期的ではあるが非定常
的な成分の表現が可能となり、有音声の符号化品質を向
上させるピッチ周期化手法が提案されており、この手法
は現在も用いられている。Therefore, as one of the methods for addressing this problem, the excitation code vector is made periodic in correspondence with the pitch period, so that a periodic but non-stationary component can be expressed. A pitch periodization method for improving the coding quality of voiced speech has been proposed, and this method is still used at present.

【００１１】また、ピッチには、１サンプル長を整数単
位としたときに、整数長のピッチ（整数ピッチ）と、小
数点長のピッチ（非整数ピッチ）の二種類があって、非
整数ピッチの方を用いることにより、より正確なピッチ
抽出が可能となる。この非整数ピッチに対応した適応符
号ベクトルは、特開平９−３１９３９６号公報、およ
び、論文「三樹、守谷、間野、大室、“ピッチ同期雑音
励振源をもつＣＥＬＰ符号化（ＰＳＩ−ＣＥＬＰ）”信
学論（Ａ），Ｖｏｌ．Ｊ７７−Ａ，Ｎｏ．３，ｐｐ．３
１４−３２４（１９９４）」などに記載されているよう
に、サンプリング定理を用いたアップサンプリングによ
って構成することができる。There are two types of pitches, where one sample length is an integer unit, an integer length pitch (integer pitch) and a decimal length pitch (non-integer pitch). By using the method, more accurate pitch extraction becomes possible. The adaptive code vector corresponding to this non-integer pitch is disclosed in Japanese Patent Application Laid-Open No. 9-319396 and the paper "Miki, Moriya, Mano, Omuro," CELP coding with pitch synchronous noise excitation source (PSI-CELP) ". IEICE (A), Vol.J77-A, No.3, pp.3
14-324 (1994) "and the like, and can be configured by upsampling using the sampling theorem.

【００１２】また、前記代数的音源符号帳は、複数のパ
ルスによって前記音源符号ベクトルを構成するものであ
るが、同一サイズ（同一音源符号ベクトル数）の符号帳
の場合に、他の符号帳と比較して符号帳を蓄えるために
必要なメモリ量および探索のための演算量が非常に少な
い性質を持つことから、広く用いられている。In the algebraic excitation codebook, the excitation codevector is constituted by a plurality of pulses. In the case of a codebook having the same size (the same number of excitation codevectors), the codebook differs from other codebooks. It is widely used because it has a characteristic that the amount of memory required for storing a codebook and the amount of computation for searching are very small.

【００１３】ところが、この代数的音源符号ベクトルを
構成するパルスを、サンプリング定理を用いて非整数ピ
ッチに対応した周期化を行った場合、多くのサンプル点
にパルスを立てる必要が生じ、マルチパルス音源の持つ
低演算量性が失われてしまう。そのため、この場合は、
小数点以下の部分を切り捨てた整数部分によって近似的
にピッチ周期化を行う手法が用いられている。However, when the pulses constituting the algebraic excitation code vector are subjected to periodicity corresponding to a non-integer pitch using the sampling theorem, it is necessary to raise pulses at many sample points, and the multipulse excitation The low computational complexity of the is lost. So, in this case,
A method of approximately performing pitch periodization by using an integer part obtained by truncating a part below a decimal point is used.

【００１４】しかしながら、この手法は、ピッチ周期化
を近似的ながら簡便に実現できる反面、適応符号ベクト
ルの周期と雑音源ベクトルの周期とが異なるため、その
差が音質に影響するという問題点があった。この音質へ
の影響は、女性音のような短いピッチの場合に特に大き
くなるという問題点があった。However, this technique can easily and easily realize pitch pitching, but has a problem that the difference affects the sound quality because the cycle of the adaptive code vector differs from the cycle of the noise source vector. Was. There is a problem that the influence on the sound quality is particularly large in the case of a short pitch such as a female sound.

【００１５】本発明は上記に鑑みてなされたものであっ
て、音源符号帳探索に要する演算量を大きく増加させる
ことなく、適応符号ベクトルのピッチ周期と音源符号ベ
クトルのピッチ周期との差を減少させて音質を向上させ
ることができる音声符号化装置、音声符号化方法、およ
び音声符号化アルゴリズムを記載したコンピュータ読み
取り可能な記録媒体を提供することを目的とする。The present invention has been made in view of the above, and reduces the difference between the pitch period of an adaptive code vector and the pitch period of an excitation code vector without greatly increasing the amount of computation required for excitation codebook search. It is an object of the present invention to provide a computer-readable recording medium that describes a voice coding device, a voice coding method, and a voice coding algorithm that can improve the sound quality by performing the above.

【００１６】[0016]

【課題を解決するための手段】上記の目的を達成するた
めに、請求項１に記載の発明は、ディジタル音声信号を
符号化する音声符号化装置であって、ディジタル音声信
号をフレームに分割するフレーム分割手段と、前記フレ
ームをさらに短いサブフレームに分割するサブフレーム
分割手段と、前記フレームからホルマントパラメータを
抽出して符号化するホルマントパラメータ抽出手段と、
前記サブフレームおよびホルマントパラメータを用いて
前記サブフレームのピッチ周期を抽出して符号化するピ
ッチ周期抽出手段と、前記サブフレーム、前記ホルマン
トパラメータ、前記ピッチ周期および複数本のパルスに
よって構成されたサブフレーム長の複数の符号ベクトル
によって構成されるマルチパルス符号帳を用いて前記サ
ブフレームの雑音源成分を抽出して符号化する雑音源抽
出手段と、を備え、前記雑音源抽出手段において、非整
数サンプル位置にパルスを立てる必要が生じた場合に、
その非整数サンプル位置を挟んだ複数箇所の整数サンプ
ル位置にパルスを立てることを特徴とする。この請求項
１の発明によれば、雑音源抽出を高精度に行うことがで
き、より高音質化することができる。According to one aspect of the present invention, there is provided an audio encoding apparatus for encoding a digital audio signal, wherein the digital audio signal is divided into frames. Frame dividing means, sub-frame dividing means for dividing the frame into shorter sub-frames, formant parameter extracting means for extracting and encoding formant parameters from the frame,
Pitch period extracting means for extracting and encoding a pitch period of the subframe using the subframe and the formant parameter, and a subframe constituted by the subframe, the formant parameter, the pitch period, and a plurality of pulses. Noise source extraction means for extracting and encoding a noise source component of the sub-frame using a multi-pulse codebook constituted by a plurality of long code vectors, wherein the noise source extraction means If you need to pulse a position,
It is characterized in that a pulse is raised at a plurality of integer sample positions sandwiching the non-integer sample position. According to the first aspect of the present invention, noise source extraction can be performed with high accuracy, and higher sound quality can be achieved.

【００１７】また、請求項２に記載の発明は、請求項１
記載の音声符号化装置において、前記雑音源抽出手段で
非整数サンプル位置αにパルスを立てる必要が生じた場
合に、αの小数部分がβ（０＜β＜１）であるとき、α
より小さい最大の整数サンプル位置（α−β）にパルス
振幅の（１−β）倍の振幅を持つパルスを立て、αより
大きい最小の整数サンプル位置（α−β＋１）にパルス
振幅のβ倍の振幅を持つパルスを立てることを特徴とす
る。この請求項２の発明によれば、より簡単な構成で雑
音源抽出を高精度に行い、高音質化することができる。Further, the invention described in claim 2 is the first invention.
In the speech encoding apparatus described above, when the noise source extracting unit needs to generate a pulse at a non-integer sample position α, when the fractional part of α is β (0 <β <1), α
A pulse having an amplitude of (1−β) times the pulse amplitude is set up at the smaller maximum integer sample position (α−β), and the pulse of β times the pulse amplitude is set at the minimum integer sample position (α−β + 1) larger than α. It is characterized in that a pulse having an amplitude is set up. According to the second aspect of the present invention, noise source extraction can be performed with a simpler configuration with high accuracy and high sound quality can be achieved.

【００１８】また、請求項３に記載の発明は、ディジタ
ル音声信号を符号化する音声符号化装置であって、ディ
ジタル音声信号をフレームに分割するフレーム分割手段
と、前記フレームをさらに短いサブフレームに分割する
サブフレーム分割手段と、前記フレームからホルマント
パラメータを抽出して符号化するホルマントパラメータ
抽出手段と、前記サブフレームおよびホルマントパラメ
ータを用いて前記サブフレームのピッチ周期を抽出して
符号化するピッチ周期抽出手段と、前記サブフレーム、
前記ホルマントパラメータ、前記ピッチ周期および複数
本のパルスによって構成されたサブフレーム長の複数の
符号ベクトルによって構成されるマルチパルス符号帳を
用いて前記サブフレームの雑音源成分を抽出して符号化
する雑音源抽出手段と、を備え、さらに、前記ピッチ周
期抽出手段が整数のピッチ周期のみならず非整数のピッ
チ周期をも抽出し、前記雑音源抽出手段が前記符号ベク
トルを前記ピッチ周期に対応して周期化するピッチ周期
化手段を備え、前記ピッチ周期化手段が非整数のサンプ
ル位置にパルスを立てる必要が生じた場合に、その非整
数サンプル位置を挟んだ複数の整数サンプル位置にパル
スを立てることを特徴とする。この請求項３の発明によ
れば、マルチパルス型音源の非整数ピッチに対するピッ
チ周期化を高精度に行うことができ、より高音質化する
ことができる。According to a third aspect of the present invention, there is provided an audio encoding apparatus for encoding a digital audio signal, comprising: frame dividing means for dividing the digital audio signal into frames; Subframe dividing means for dividing, formant parameter extracting means for extracting and encoding formant parameters from the frame, and pitch period for extracting and encoding the pitch period of the subframe using the subframe and formant parameters Extracting means, the sub-frame,
Noise that extracts and encodes a noise source component of the subframe using a multi-pulse codebook composed of a plurality of code vectors having a subframe length composed of the formant parameter, the pitch period, and a plurality of pulses. Source extraction means, and further, the pitch cycle extraction means extracts not only an integer pitch cycle but also a non-integer pitch cycle, and the noise source extraction means converts the code vector corresponding to the pitch cycle. Providing a pitch periodicizing means for periodicizing, and when it is necessary for the pitch periodicizing means to generate a pulse at a non-integer sample position, generating a pulse at a plurality of integer sample positions sandwiching the non-integer sample position It is characterized by. According to the third aspect of the present invention, the pitch period of the multi-pulse type sound source with respect to the non-integer pitch can be made with high accuracy, and the sound quality can be further improved.

【００１９】また、請求項４に記載の発明は、請求項３
記載の音声符号化装置において、前記ピッチ周期化手段
で非整数サンプル位置αにパルスを立てる必要が生じた
場合に、αの小数部分がβ（０＜β＜１）であるとき、
αより小さい最大の整数サンプル位置（α−β）にパル
ス振幅の（１−β）倍の振幅を持つパルスを立て、αよ
り大きい最小の整数サンプル位置（α−β＋１）にパル
ス振幅のβ倍の振幅を持つパルスを立てることを特徴と
する。この請求項４の発明によれば、より簡単な構成で
マルチパルス型音源の非整数ピッチに対するピッチ周期
化を高精度に行い、高音質化することができる。The invention described in claim 4 is the same as the invention described in claim 3.
In the speech coding apparatus described above, when it is necessary to set a pulse at the non-integer sample position α by the pitch periodizing means, when the fractional part of α is β (0 <β <1),
A pulse having an amplitude of (1−β) times the pulse amplitude is set at the largest integer sample position (α−β) smaller than α, and β times the pulse amplitude at the smallest integer sample position (α−β + 1) larger than α. A pulse having an amplitude of? According to the fourth aspect of the present invention, the pitch period of the non-integer pitch of the multi-pulse type sound source can be highly accurately performed with a simpler configuration, and the sound quality can be improved.

【００２０】また、請求項５に記載の発明は、ディジタ
ル音声信号を符号化する音声符号化方法であって、ディ
ジタル音声信号をフレームに分割するフレーム分割工程
と、前記フレームをさらに短いサブフレームに分割する
サブフレーム分割工程と、前記フレームからホルマント
パラメータを抽出して符号化するホルマントパラメータ
抽出工程と、前記サブフレームおよびホルマントパラメ
ータを用いて前記サブフレームのピッチ周期を抽出して
符号化するピッチ周期抽出工程と、前記サブフレーム、
前記ホルマントパラメータ、前記ピッチ周期および複数
本のパルスによって構成されたサブフレーム長の複数の
符号ベクトルによって構成されるマルチパルス符号帳を
用いて前記サブフレームの雑音源成分を抽出して符号化
する雑音源抽出工程と、を含み、前記雑音源抽出工程に
おいて、非整数サンプル位置にパルスを立てる必要が生
じた場合に、その非整数サンプル位置を挟んだ複数箇所
の整数サンプル位置にパルスを立てることを特徴とす
る。この請求項５の発明によれば、雑音源抽出を高精度
に行うことができ、より高音質化することができる。According to a fifth aspect of the present invention, there is provided a voice coding method for coding a digital voice signal, wherein the digital voice signal is divided into frames, and the frame is divided into shorter subframes. A subframe dividing step of dividing, a formant parameter extracting step of extracting and encoding a formant parameter from the frame, and a pitch period of extracting and encoding a pitch period of the subframe using the subframe and the formant parameter An extraction step, the sub-frame,
Noise that extracts and encodes a noise source component of the subframe using a multi-pulse codebook composed of a plurality of code vectors having a subframe length composed of the formant parameter, the pitch period, and a plurality of pulses. And a source extraction step, wherein in the noise source extraction step, when a pulse needs to be made at a non-integer sample position, a pulse is made at a plurality of integer sample positions sandwiching the non-integer sample position. Features. According to the fifth aspect of the invention, noise source extraction can be performed with high accuracy, and higher sound quality can be achieved.

【００２１】また、請求項６に記載の発明は、請求項５
記載の音声符号化方法において、前記雑音源抽出工程で
非整数サンプル位置αにパルスを立てる必要が生じた場
合に、αの小数部分がβ（０＜β＜１）であるとき、α
より小さい最大の整数サンプル位置（α−β）にパルス
振幅の（１−β）倍の振幅を持つパルスを立て、αより
大きい最小の整数サンプル位置（α−β＋１）にパルス
振幅のβ倍の振幅を持つパルスを立てることを特徴とす
る。この請求項６の発明によれば、より簡単な構成で雑
音源抽出を高精度に行い、高音質化することができる。The invention described in claim 6 is the same as the claim 5
In the speech coding method described above, when it is necessary to make a pulse at a non-integer sample position α in the noise source extraction step, when the fractional part of α is β (0 <β <1), α
A pulse having an amplitude of (1−β) times the pulse amplitude is set up at the smaller maximum integer sample position (α−β), and the pulse of β times the pulse amplitude is set at the minimum integer sample position (α−β + 1) larger than α. It is characterized in that a pulse having an amplitude is set up. According to the sixth aspect of the present invention, noise source extraction can be performed with a simpler configuration with high accuracy and high sound quality can be achieved.

【００２２】また、請求項７に記載の発明は、ディジタ
ル音声信号を符号化する音声符号化方法であって、ディ
ジタル音声信号をフレームに分割するフレーム分割工程
と、前記フレームをさらに短いサブフレームに分割する
サブフレーム分割工程と、前記フレームからホルマント
パラメータを抽出して符号化するホルマントパラメータ
抽出工程と、前記サブフレームおよびホルマントパラメ
ータを用いて前記サブフレームのピッチ周期を抽出して
符号化するピッチ周期抽出工程と、前記サブフレーム、
前記ホルマントパラメータ、前記ピッチ周期および複数
本のパルスによって構成されたサブフレーム長の複数の
符号ベクトルによって構成されるマルチパルス符号帳を
用いて前記サブフレームの雑音源成分を抽出して符号化
する雑音源抽出工程と、を含み、さらに、前記ピッチ周
期抽出工程が整数のピッチ周期のみならず非整数のピッ
チ周期をも抽出し、前記雑音源抽出工程が前記符号ベク
トルを前記ピッチ周期に対応して周期化するピッチ周期
化工程を含み、前記ピッチ周期化工程が、非整数のサン
プル位置にパルスを立てる必要が生じた場合に、その非
整数サンプル位置を挟んだ複数の整数サンプル位置にパ
ルスを立てることを特徴とする。この請求項７の発明に
よれば、マルチパルス型音源の非整数ピッチに対するピ
ッチ周期化を高精度に行うことができ、より高音質化す
ることができる。According to a seventh aspect of the present invention, there is provided a voice coding method for coding a digital voice signal, comprising: a frame dividing step of dividing the digital voice signal into frames; A subframe dividing step of dividing, a formant parameter extracting step of extracting and encoding a formant parameter from the frame, and a pitch period of extracting and encoding a pitch period of the subframe using the subframe and the formant parameter An extraction step, the sub-frame,
Noise that extracts and encodes a noise source component of the subframe using a multi-pulse codebook composed of a plurality of code vectors having a subframe length composed of the formant parameter, the pitch period, and a plurality of pulses. Source extraction step, and further, the pitch period extraction step extracts not only an integer pitch period but also a non-integer pitch period, and the noise source extraction step converts the code vector corresponding to the pitch period. Including a pitch cycling step of cycling, when the pitch cycling step needs to raise a pulse at a non-integer sample position, raise a pulse at a plurality of integer sample positions sandwiching the non-integer sample position It is characterized by the following. According to the seventh aspect of the present invention, the pitch period of the non-integer pitch of the multi-pulse type sound source can be made with high accuracy, and the sound quality can be further improved.

【００２３】また、請求項８に記載の発明は、請求項７
記載の音声符号化方法において、前記ピッチ周期化工程
で非整数サンプル位置αにパルスを立てる必要が生じた
場合に、αの小数部分がβ（０＜β＜１）であるとき、
αより小さい最大の整数サンプル位置（α−β）にパル
ス振幅の（１−β）倍の振幅を持つパルスを立て、αよ
り大きい最小の整数サンプル位置（α−β＋１）にパル
ス振幅のβ倍の振幅を持つパルスを立てることを特徴と
する。この請求項８の発明によれば、より簡単な構成で
マルチパルス型音源の非整数ピッチに対するピッチ周期
化を高精度に行い、高音質化することができる。The invention described in claim 8 is the same as the invention described in claim 7.
In the speech encoding method described above, when it is necessary to make a pulse at a non-integer sample position α in the pitch periodization step, when the fractional part of α is β (0 <β <1),
A pulse having an amplitude of (1−β) times the pulse amplitude is set at the largest integer sample position (α−β) smaller than α, and β times the pulse amplitude at the smallest integer sample position (α−β + 1) larger than α. A pulse having an amplitude of? According to the eighth aspect of the present invention, the pitch period of the non-integer pitch of the multi-pulse type sound source can be highly accurately performed with a simpler configuration, and the sound quality can be improved.

【００２４】また、請求項９に記載の発明は、ディジタ
ル音声信号を符号化する音声符号化アルゴリズムを記録
したコンピュータ読み取り可能な記録媒体であって、デ
ィジタル音声信号をフレームに分割するフレーム分割ス
テップと、前記フレームをさらに短いサブフレームに分
割するサブフレーム分割ステップと、前記フレームから
ホルマントパラメータを抽出して符号化するホルマント
パラメータ抽出ステップと、前記サブフレームおよびホ
ルマントパラメータを用いて前記サブフレームのピッチ
周期を抽出して符号化するピッチ周期抽出ステップと、
前記サブフレーム、前記ホルマントパラメータ、前記ピ
ッチ周期および複数本のパルスによって構成されたサブ
フレーム長の複数の符号ベクトルによって構成されるマ
ルチパルス符号帳を用いて前記サブフレームの雑音源成
分を抽出して符号化する雑音源抽出ステップと、を含
み、さらに、前記ピッチ周期抽出ステップが整数のピッ
チ周期のみならず非整数のピッチ周期をも抽出し、前記
雑音源抽出ステップが前記符号ベクトルを前記ピッチ周
期に対応して周期化するピッチ周期化ステップを含み、
前記ピッチ周期化ステップが、非整数のサンプル位置に
パルスを立てる必要が生じた場合に、その非整数サンプ
ル位置を挟んだ複数の整数サンプル位置にパルスを立て
ることを特徴とする。According to a ninth aspect of the present invention, there is provided a computer-readable recording medium recording an audio encoding algorithm for encoding a digital audio signal, comprising: a frame dividing step for dividing the digital audio signal into frames. Sub-frame division step of dividing the frame into shorter sub-frames, formant parameter extraction step of extracting and encoding formant parameters from the frame, and pitch cycle of the sub-frame using the sub-frame and formant parameters. Pitch period extracting step of extracting and encoding
The sub-frame, the formant parameters, the pitch period and a noise source component of the sub-frame is extracted using a multi-pulse codebook composed of a plurality of code vectors of a sub-frame length configured by a plurality of pulses. A noise source extracting step of encoding, and further comprising the step of extracting not only an integer pitch period but also a non-integer pitch period, and the noise source extracting step converts the code vector into the pitch period. Including a pitch periodicization step that is periodicized according to
The pitch period setting step is characterized in that, when a pulse needs to be raised at a non-integer sample position, a pulse is raised at a plurality of integer sample positions sandwiching the non-integer sample position.

【００２５】この請求項９の発明によれば、マルチパル
ス型音源の非整数ピッチに対するピッチ周期化を高精度
に行うことができ、より高音質化することができる音声
符号化アルゴリズムを記録媒体に記録したことで、その
音声符号化アルゴリズムが機械読み取り可能となり、こ
れらの動作をコンピュータによって実現することができ
る。According to the ninth aspect of the present invention, a speech encoding algorithm capable of performing a pitch cycle with respect to a non-integer pitch of a multi-pulse type sound source with high precision and achieving higher sound quality is recorded on a recording medium. By recording, the speech encoding algorithm becomes machine-readable, and these operations can be realized by a computer.

【００２６】また、請求項１０に記載の発明は、請求項
９記載の音声符号化アルゴリズムを記録したコンピュー
タ読み取り可能な記録媒体において、前記ピッチ周期化
ステップで非整数サンプル位置αにパルスを立てる必要
が生じた場合に、αの小数部分がβ（０＜β＜１）であ
るとき、αより小さい最大の整数サンプル位置（α−
β）にパルス振幅の（１−β）倍の振幅を持つパルスを
立て、αより大きい最小の整数サンプル位置（α−β＋
１）にパルス振幅のβ倍の振幅を持つパルスを立てるこ
とを特徴とする。According to a tenth aspect of the present invention, in the computer readable recording medium having recorded thereon the speech encoding algorithm according to the ninth aspect, it is necessary to form a pulse at a non-integer sample position α in the pitch periodizing step. Occurs, and when the fractional part of α is β (0 <β <1), the largest integer sample position smaller than α (α−
β), a pulse having an amplitude of (1−β) times the pulse amplitude is set, and the smallest integer sample position (α−β +
In 1), a pulse having an amplitude β times the pulse amplitude is set.

【００２７】この請求項１０の発明によれば、より簡単
な構成でマルチパルス型音源の非整数ピッチに対するピ
ッチ周期化を高精度に行い、高音質化することができる
音声符号化アルゴリズムを記録媒体に記録したことで、
その音声符号化アルゴリズムが機械読み取り可能とな
り、これらの動作をコンピュータによって実現すること
ができる。According to the tenth aspect of the present invention, there is provided a recording medium for a speech encoding algorithm capable of performing a pitch period with respect to a non-integer pitch of a multi-pulse type sound source with high accuracy and a high sound quality with a simpler configuration. By recording in
The speech encoding algorithm becomes machine-readable, and these operations can be realized by a computer.

【００２８】[0028]

【発明の実施の形態】以下、本発明の実施の形態を図面
に基づいて説明する。図１は、本実施の形態に係る音声
符号化装置の一構成例を示すブロック図であり、ここで
は、ＣＥＬＰ（Code Excited Linear Prediction codin
g system）に基づくディジタル音声入力信号を符号化す
るための音声符号化装置である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram illustrating a configuration example of a speech encoding device according to the present embodiment. In this example, CELP (Code Excited Linear Prediction codin) is used.
g system) for encoding a digital audio input signal.

【００２９】音声符号化装置１０は、フレーム分割部１
２、サブフレーム分割部１４、ホルマントパラメータ抽
出部１６、ピッチ周期抽出部１８、雑音源抽出部２０、
および利得量子化部３０などを備えている。また、上記
した雑音源抽出部２０は、目標信号構成部２２、ピッチ
周期化部２４、距離計算部２６、および最適ベクトル決
定部２８などにより構成されている。The speech encoding device 10 includes a frame dividing unit 1
2, subframe division unit 14, formant parameter extraction unit 16, pitch period extraction unit 18, noise source extraction unit 20,
And a gain quantization unit 30. Further, the above-described noise source extraction unit 20 is configured by a target signal configuration unit 22, a pitch periodization unit 24, a distance calculation unit 26, an optimum vector determination unit 28, and the like.

【００３０】フレーム分割部１２は、ディジタル音声入
力信号をフレームと呼ぶ処理単位に分割するものであ
る。そのフレーム長としては、例えば２０〜３０ｍｓと
いった値があげられ、フレーム分割の前後に高域フィル
タを通すことで、直流成分を除去するようにしてもよ
い。The frame dividing section 12 divides the digital audio input signal into processing units called frames. The frame length has a value of, for example, 20 to 30 ms, and a DC component may be removed by passing a high-pass filter before and after frame division.

【００３１】サブフレーム分割部１４は、フレームをサ
ブフレームと呼ぶ処理単位に分割するもので、１フレー
ムあたりのサブフレーム数として、例えば２〜６といっ
た値があげられる。The sub-frame division unit 14 divides a frame into processing units called sub-frames. The number of sub-frames per frame is, for example, 2 to 6.

【００３２】ホルマントパラメータ抽出部１６は、分割
されたフレームを解析して、ホルマントパラメータを抽
出するものである。ホルマントパラメータとしては、Ｌ
ＰＣ（Linear Prediction Coding：線形予測）係数、Ｌ
ＳＰ（ Line Spectrum Frequency：線スペクトル周波
数）、ＬＰＣスペクトラム係数、反射係数などがあげら
れる。ホルマントパラメータの抽出方法には、例えば、
線形予測解析などがあり、その線形予測解析の手法とし
ては、フレームの自己相関関数を算出し、Levinson-Dur
bin の再帰解法によってＬＰＣ係数を算出する手法があ
げられ、自己相関関数を算出する前に、フレームにハミ
ング窓あるいはハニング窓などの窓関数を適用してもよ
い。The formant parameter extracting section 16 analyzes the divided frames and extracts formant parameters. As the formant parameter, L
PC (Linear Prediction Coding) coefficient, L
SP (Line Spectrum Frequency), LPC spectrum coefficient, reflection coefficient, and the like. The method of extracting formant parameters includes, for example,
There is linear prediction analysis and the like. As a method of the linear prediction analysis, an autocorrelation function of a frame is calculated, and Levinson-Dur
There is a method of calculating LPC coefficients by a recursive solution of bin, and a window function such as a Hamming window or a Hanning window may be applied to a frame before calculating an autocorrelation function.

【００３３】抽出されたフレームのホルマントパラメー
タは、量子化され、伝送あるいは蓄積される。フレーム
のホルマントパラメータの量子化方法には、スカラ量子
化、ベクトル量子化、多段ベクトル量子化、分割ベクト
ル量子化などがある。ホルマントパラメータを量子化す
る際には、ＬＳＰあるいはＬＳＦといった量子化効率の
良いパラメータを用いることが望ましい。The formant parameters of the extracted frame are quantized and transmitted or stored. As a method of quantizing the formant parameter of a frame, there are scalar quantization, vector quantization, multi-stage vector quantization, split vector quantization, and the like. When quantizing the formant parameter, it is desirable to use a parameter with high quantization efficiency such as LSP or LSF.

【００３４】また、量子化されたフレームのホルマント
パラメータを用いて、サブフレームのホルマントパラメ
ータを算出する。そのサブフレームのホルマントパラメ
ータ算出手法としては、現在および過去に量子化された
フレームのホルマントパラメータから補間によって求め
るという手法があり、その補間手法としては、線形補間
や二次補間などがある。The formant parameter of the sub-frame is calculated using the quantized formant parameter of the frame. As a method of calculating the formant parameters of the subframe, there is a method of obtaining the formant parameters of the current and past quantized frames by interpolation, and examples of the interpolation methods include linear interpolation and quadratic interpolation.

【００３５】ピッチ周期抽出部１８は、上記した各サブ
フレームに対してピッチ周期の抽出を行うものである。
ピッチ周期の抽出手法としては、適応符号帳探索による
手法や自己相関関数を用いる手法などがあげられる。例
えば、適応符号帳探索の場合は、あらかじめ所定数のピ
ッチ候補を用意しておく。ピッチ候補には、サンプリン
グ単位の整数倍の整数ピッチと非整数ピッチとがあり、
このピッチ候補の全てに対して以下の処理が行われる。The pitch period extracting section 18 extracts a pitch period for each of the above-mentioned subframes.
Examples of the pitch period extraction method include a method based on an adaptive codebook search and a method using an autocorrelation function. For example, in the case of an adaptive codebook search, a predetermined number of pitch candidates are prepared in advance. The pitch candidates include an integer pitch that is an integral multiple of the sampling unit and a non-integer pitch.
The following processing is performed on all of the pitch candidates.

【００３６】まず、適応符号ベクトルを生成する。この
適応符号ベクトルは、直前のサブフレームまでの音源ベ
クトルからピッチ長分を切り出し、サブフレーム長にな
るまで繰り返して並べることによって生成する。そし
て、適応符号ベクトルとサブフレームのホルマントパラ
メータとを用いることにより、合成音声ベクトルを生成
する。この合成音声ベクトルの生成手法としては、適応
符号ベクトルにサブフレームのホルマントパラメータに
よって構成された線形フィルタを適用する手法などがあ
る。First, an adaptive code vector is generated. This adaptive code vector is generated by cutting out the pitch length from the excitation vector up to the immediately preceding subframe and arranging it repeatedly until the subframe length is reached. Then, a synthesized speech vector is generated by using the adaptive code vector and the formant parameter of the subframe. As a method of generating the synthesized speech vector, there is a method of applying a linear filter configured by the formant parameters of the subframe to the adaptive code vector.

【００３７】また、合成音声ベクトルが任意の利得を乗
じた場合に、サブフレームに対して最も近くなる最短距
離を算出する。この最短距離の算出においては、聴覚的
な重み付けを行うことにより、聴覚的な誤差が最少とな
るような手法を導入しても良い。以上の処理をピッチ候
補内の全てのピッチに対して行って、最短距離の最も小
さい適応符号ベクトルとピッチ周期とする。そのサブフ
レームのピッチ周期に付与された符号は、伝送あるいは
蓄積される。When the synthesized speech vector is multiplied by an arbitrary gain, the shortest distance closest to the subframe is calculated. In the calculation of the shortest distance, a method may be introduced in which auditory weighting is performed to minimize an auditory error. The above processing is performed on all the pitches in the pitch candidate, and the adaptive code vector having the shortest distance and the minimum pitch are determined. The code assigned to the pitch period of the subframe is transmitted or stored.

【００３８】雑音源抽出部２０は、雑音源成分の抽出お
よび符号化を行うものである。この雑音源成分の抽出方
法としては、複数本のパルスによって構成されたサブフ
レーム長の複数の音源符号ベクトルによって構成される
マルチパルス音源符号帳を用いた音源符号帳探索が用い
られる。The noise source extraction unit 20 extracts and encodes a noise source component. An excitation codebook search using a multipulse excitation codebook composed of a plurality of excitation code vectors having a subframe length composed of a plurality of pulses is used as a method of extracting the noise source component.

【００３９】目標信号構成部２２は、当該サブフレーム
の適応符号ベクトルと、サブフレームのホルマントパラ
メータとによって合成される合成音声ベクトルを、サブ
フレームから差し引くことにより、雑音源探索の目標信
号ベクトルを生成する。The target signal forming section 22 generates a target signal vector for noise source search by subtracting from the subframe a synthesized speech vector synthesized by the adaptive code vector of the subframe and the formant parameter of the subframe. I do.

【００４０】ピッチ周期化部２４は、各音源符号ベクト
ルに対して、抽出された当該サブフレームのピッチ周期
に適応してピッチ周期化を行う。このピッチ周期化の手
法は、サブフレーム長をＭとし、当該サブフレームのピ
ッチ周期をαとし、パルス位置がＰ（０≦Ｐ＜Ｍ）であ
るとし、ピッチ周期αは、整数の場合も、非整数の場合
もあり得る。Ｐ＜α＜Ｍの場合は、図２に示すようにし
て、パルスのピッチ周期化が行われる。例えば、αが整
数の場合は、Ｐ＋ｎα（ｎは正の整数、但し、Ｐ＋ｎα
＜Ｍ）の位置にパルスを立てる。The pitch period generator 24 performs pitch period adaptation for each excitation code vector in accordance with the extracted pitch period of the subframe. In this pitch period technique, the subframe length is M, the pitch period of the subframe is α, the pulse position is P (0 ≦ P <M), and the pitch period α is an integer, It can be non-integer. In the case of P <α <M, as shown in FIG. 2, the pitch period of the pulse is performed. For example, when α is an integer, P + nα (n is a positive integer, provided that P + nα
A pulse is raised at the position <M).

【００４１】また、αが非整数の場合は、Ｐ＋ｎαが非
整数になる場合が生じるため、非整数サンプル位置にパ
ルスを立てる必要が生じる。少なくとも、ｎ＝１の場合
は、非整数となる。そこで、パルスを三角波に近似させ
て補間することにより複数のパルスを生成し、この複数
のパルスで非整数サンプル位置のパルスを代用するよう
にする（図３参照）。When α is a non-integer, P + nα may become a non-integer, and it is necessary to raise a pulse at a non-integer sample position. At least when n = 1, it is a non-integer. Therefore, a plurality of pulses are generated by approximating the pulse to a triangular wave and interpolating, and the pulses at the non-integer sample positions are substituted by the plurality of pulses (see FIG. 3).

【００４２】例えば、ｎ＝１を例にあげると、Ｐ＋αに
頂角を持ち、底辺の長さが２サンプルの三角波形を考え
る。ピッチ周期αの小数点以下の部分をβ（０＜β＜
１）とすると、Ｐ＋αを挟んだ２つの整数サンプル位置
について、Ｐ＋α−βの位置には（１−β）倍の高さの
パルスを立てるようにし、Ｐ＋α−β＋１の位置にはβ
倍の高さのパルスを立てるようにする。このようにし
て、パルスを三角波形に近似させ、補間することより、
非整数サンプル位置にパルスを立てる必要が生じた場合
であっても、近似的に少数のパルスで代用することが可
能となる（図４参照）。For example, taking n = 1 as an example, consider a triangular waveform having an apex at P + α and a base length of 2 samples. The part below the decimal point of the pitch period α is β (0 <β <
Assuming that 1), for two integer sample positions sandwiching P + α, a pulse having a height of (1−β) times is generated at the position of P + α−β, and β is set at the position of P + α−β + 1.
Make a pulse twice as high. Thus, by approximating the pulse to a triangular waveform and interpolating,
Even when it is necessary to make a pulse at a non-integer sample position, it is possible to substitute a small number of pulses approximately (see FIG. 4).

【００４３】距離計算部２６では、ピッチ周期化された
各マルチパルス音源符号ベクトルに対して適応符号帳探
索と同様の距離計算を行い、各音源符号ベクトルの合成
音声ベクトルが任意の利得を乗じた場合に、目標信号ベ
クトルに最も近くなる最短距離が計算される。この場
合、音源符号ベクトルの合成音声ベクトルを適応符号ベ
クトルの合成音声ベクトルに対して直交化してから距離
計算を行うようにしても良い。The distance calculator 26 performs the same distance calculation as in the adaptive codebook search for each pitch-periodic multi-pulse excitation code vector, and the synthesized speech vector of each excitation code vector is multiplied by an arbitrary gain. In this case, the shortest distance closest to the target signal vector is calculated. In this case, the distance calculation may be performed after making the synthesized speech vector of the excitation code vector orthogonal to the synthesized speech vector of the adaptive code vector.

【００４４】最適ベクトル決定部２８では、最短距離が
最も小さくなる音源符号ベクトルが決定される。その決
定された音源符号ベクトルに付与された符号は、伝送あ
るいは蓄積される。The optimum vector determination unit 28 determines the excitation code vector that minimizes the shortest distance. The code assigned to the determined excitation code vector is transmitted or stored.

【００４５】利得量子化部３０は、利得成分を量子化す
るものである。利得成分を量子化する手法としては、適
応符号ベクトルの利得成分、および雑音符号ベクトルの
利得成分を別個に量子化するスカラ量子化と、両者を同
時に最適化するように量子化するベクトル量子化とがあ
げられる。量子化された利得成分は、伝送あるいは蓄積
される。The gain quantization section 30 quantizes a gain component. As a method of quantizing the gain component, there are scalar quantization for separately quantizing the gain component of the adaptive code vector and the gain component of the noise code vector, and vector quantization for quantizing both of them at the same time. Is raised. The quantized gain component is transmitted or stored.

【００４６】以上の構成において、その動作を説明す
る。図５および図６は、ディジタル音声入力信号の符号
化処理のそれぞれの動作例を示したフローチャートであ
る。共通する処理動作については、合わせて説明するも
のとする。まず、フレーム分割部１２では、ディジタル
音声入力信号をフレームと呼ぶ処理単位に分割して、フ
レーム分割の後（前でも良い）に高域フィルタを通すこ
とにより直流成分を除去する（ステップＳ１、Ｓ２
１）。The operation of the above configuration will be described. FIGS. 5 and 6 are flowcharts each showing an operation example of the encoding processing of the digital audio input signal. Common processing operations will be described together. First, the frame division unit 12 divides a digital audio input signal into processing units called frames, and removes a DC component by passing a high-pass filter after (or before) the frame division (steps S1 and S2).
1).

【００４７】そして、サブフレーム分割部１４で分割し
たフレームを、さらにサブフレームと呼ぶ処理単位に分
割する（ステップＳ２、Ｓ２２）。次いで、ステップＳ
３（Ｓ２３）でｉ＝０（ｉは、サブフレーム数チェック
用カウンタである）とし、ステップＳ４（Ｓ２４）にお
いてホルマントパラメータの抽出処理が行われる。すな
わち、ホルマントパラメータ抽出部１６では、分割され
たフレームを解析して、ホルマントパラメータを抽出す
る。ホルマントパラメータの抽出方法としては、フレー
ムの自己相関関数を算出し、Levinson -Durbinの再帰解
法によってＬＰＣ（線形予測）係数を算出する線形予測
解析の手法を用いている。Then, the frame divided by the sub-frame dividing section 14 is further divided into processing units called sub-frames (steps S2 and S22). Then, step S
In 3 (S23), i = 0 (i is a counter for checking the number of subframes), and in step S4 (S24), a formant parameter extraction process is performed. That is, the formant parameter extraction unit 16 analyzes the divided frames and extracts the formant parameters. As a method of extracting the formant parameters, a linear prediction analysis technique of calculating an autocorrelation function of a frame and calculating an LPC (linear prediction) coefficient by a Levinson-Durbin recursive method is used.

【００４８】次いで、そのフレームのホルマントパラメ
ータは、量子化され、伝送あるいは蓄積される。ホルマ
ントパラメータの量子化方法には、スカラ量子化、ベク
トル量子化、多段ベクトル量子化、分割ベクトル量子化
などがあり、ＬＳＰあるいはＬＳＦといった量子化効率
の良いパラメータを用いてホルマントパラメータを量子
化する。Next, the formant parameters of the frame are quantized and transmitted or stored. The formant parameter quantization method includes scalar quantization, vector quantization, multi-stage vector quantization, split vector quantization, and the like. The formant parameter is quantized using a parameter with high quantization efficiency such as LSP or LSF.

【００４９】次いで、その量子化されたフレームのホル
マントパラメータを用いて、サブフレームのホルマント
パラメータを算出する。このサブフレームのホルマント
パラメータ算出方法は、現在および過去に量子化された
フレームのホルマントパラメータから補間によって求め
ることが可能であり、その補間手法としては、線形補間
や二次補間を用いる。Next, the formant parameter of the sub-frame is calculated using the quantized formant parameter of the frame. This subframe formant parameter calculation method can be obtained by interpolation from the formant parameters of the current and past quantized frames, and linear interpolation or quadratic interpolation is used as the interpolation method.

【００５０】次いで、ピッチ周期抽出部１８では、上記
した各サブフレームに対して、ピッチ周期の抽出が行わ
れる（ステップＳ５、Ｓ２５）。ここでは、ピッチ周期
の抽出方法として、適応符号帳探索を用いている。適応
符号帳探索は、サンプリング単位の整数倍の整数ピッチ
と非整数ピッチなどからなる所定数のピッチ候補を予め
用意しておき、このピッチ候補の全てに対して、以下の
処理が行われる。Next, the pitch cycle extracting section 18 extracts a pitch cycle from each of the above-described subframes (steps S5 and S25). Here, an adaptive codebook search is used as a pitch period extraction method. In the adaptive codebook search, a predetermined number of pitch candidates including an integer pitch that is an integral multiple of the sampling unit and a non-integer pitch are prepared in advance, and the following processing is performed on all of the pitch candidates.

【００５１】まず、直前のサブフレームまでの音源ベク
トルからピッチ長分を切り出し、サブフレーム長になる
まで繰り返して並べることで、適応符号ベクトルを生成
する。そして、この適応符号ベクトルに上記したサブフ
レームのホルマントパラメータによって構成された線形
フィルタを適用することで、合成音声ベクトルを生成す
る。さらに、この合成音声ベクトルが任意の利得を乗じ
た場合に、サブフレームに対して最も近くなる最短距離
を算出する。ここでは、最短距離の算出に際して、聴覚
的な重み付けを行うことにより、聴覚的な誤差が最少と
なるようにしている。First, an adaptive code vector is generated by cutting out the pitch length from the excitation vector up to the immediately preceding subframe and arranging it repeatedly until the subframe length is reached. Then, a synthesized speech vector is generated by applying a linear filter formed by the formant parameters of the subframe to the adaptive code vector. Further, when the synthesized speech vector is multiplied by an arbitrary gain, the shortest distance closest to the subframe is calculated. Here, in calculating the shortest distance, an auditory weighting is performed so that an auditory error is minimized.

【００５２】このような処理をピッチ候補内の全てのピ
ッチに対して行い、最短距離の最も小さい適応符号ベク
トル、およびピッチ周期として、そのサブフレームのピ
ッチ周期に付与された符号を伝送あるいは蓄積する。Such processing is performed for all the pitches in the pitch candidate, and the code assigned to the pitch cycle of the subframe is transmitted or stored as the adaptive code vector having the shortest distance and the pitch cycle. .

【００５３】次いで、雑音源抽出部２０では、雑音源成
分の抽出および符号化が行われる。雑音源成分の抽出方
法には、複数本のパルスによって構成されたサブフレー
ム長の複数の音源符号ベクトルによって構成されるマル
チパルス音源符号帳を用いた音源符号帳探索を用いてい
る。Next, the noise source extraction unit 20 extracts and encodes a noise source component. An excitation codebook search using a multipulse excitation codebook composed of a plurality of excitation code vectors having a subframe length composed of a plurality of pulses is used as a method of extracting a noise source component.

【００５４】まず、ステップＳ６（Ｓ２６）の目標信号
構成処理に際して、目標信号構成部２２では、上記した
サブフレームの適応符号ベクトルとサブフレームのホル
マントパラメータとによって合成される合成音声ベクト
ルとをサブフレームから差し引くことにより、雑音源探
索の目標信号ベクトルを生成する。そして、ステップＳ
７（Ｓ２７）では、ｊ＝０（ｊは、パルス数チェック用
カウンタである）として、ステップＳ８（Ｓ２８）へ移
行する。First, in the target signal configuration processing in step S6 (S26), the target signal configuration unit 22 converts the adaptive speech vector synthesized by the above-described adaptive code vector of the subframe and the formant parameter of the subframe into the subframe. , A target signal vector for noise source search is generated. And step S
At 7 (S27), j = 0 (j is a pulse number check counter), and the routine goes to Step S8 (S28).

【００５５】ピッチ周期化部２４では、前記各音源符号
ベクトルに対して、抽出されたそのサブフレームのピッ
チ周期に適応してピッチ周期化が行われる。ここでは、
サブフレーム長（数）をＭとし、そのサブフレームのピ
ッチ周期をαとし、パルス位置がＰ（０≦Ｐ＜Ｍ）であ
るとし、ピッチ周期αは、整数の場合も、非整数の場合
もあり得る。In the pitch period generator 24, the respective excitation code vectors are subjected to the pitch period adaptation to the extracted pitch period of the subframe. here,
The subframe length (number) is M, the pitch period of the subframe is α, and the pulse position is P (0 ≦ P <M). The pitch period α is either an integer or a non-integer. possible.

【００５６】図５では、ステップＳ８において、非整数
サンプル位置か否かを判断し、非整数サンプル位置であ
れば１本のパルスを立て（ステップＳ９）、非整数サン
プル位置でなければ複数本のパルスを立てるようにする
（ステップＳ１０）。In FIG. 5, in step S8, it is determined whether or not the position is a non-integer sample position. If the position is a non-integer sample position, one pulse is raised (step S9). A pulse is set up (step S10).

【００５７】また、図６では、Ｐ＜α＜Ｍの場合にパル
スのピッチ周期化が行われる（ステップＳ２８）。αが
整数の場合は、Ｐ＋ｎα（ｎは正の整数、但し、Ｐ＋ｎ
α＜Ｍ）の位置にパルスが立てられる（図２参照）。α
が非整数の場合は、Ｐ＋ｎαが非整数になる場合が生
じ、すなわち、非整数サンプル位置にパルスを立てる必
要が生じる（ステップＳ２９）。少なくとも、ｎ＝１の
場合は非整数となる。In FIG. 6, when P <α <M, the pitch of the pulse is cycled (step S28). When α is an integer, P + nα (n is a positive integer, provided that P + n
A pulse is generated at the position of α <M (see FIG. 2). α
Is a non-integer, P + nα may be a non-integer, that is, a pulse needs to be raised at a non-integer sample position (step S29). At least, when n = 1, it is a non-integer.

【００５８】ここで、パルスを三角波に近似させて、補
間することにより複数のパルスを生成し、この複数のパ
ルスで非整数サンプル位置のパルスを代用するようにす
る（図３参照）。例えば、ｎ＝１を例にあげると、Ｐ＋
αに頂角を持ち、底辺の長さが２サンプルの三角波形を
考える。ピッチ周期αの小数点以下の部分をβ（０＜β
＜１）とすると、Ｐ＋αを挟んだ２つの整数サンプル位
置について、Ｐ＋α−βの位置には（１−β）倍の高さ
のパルスを立て（ステップＳ３０）、Ｐ＋α−β＋１の
位置にはβ倍の高さのパルスを立てるようにする（ステ
ップＳ３１）。Here, a plurality of pulses are generated by approximating the pulse to a triangular wave and performing interpolation, and a pulse at a non-integer sample position is substituted by the plurality of pulses (see FIG. 3). For example, if n = 1, P +
Consider a triangular waveform with an apex angle at α and a base length of 2 samples. The part below the decimal point of the pitch period α is β (0 <β
Assuming that <1), for two integer sample positions sandwiching P + α, a pulse having a height of (1−β) times is set at the position of P + α−β (step S30), and β is set at the position of P + α−β + 1. A pulse having a double height is set up (step S31).

【００５９】このように、パルスを三角波形に近似さ
せ、補間することよって、非整数サンプル位置にパルス
を立てる必要が生じた場合（ステップＳ２８→Ｓ２９）
でも、近似的に少数のパルスで代用することが可能にな
る（図４参照）。As described above, when it is necessary to raise a pulse at a non-integer sample position by approximating the pulse to a triangular waveform and performing interpolation (steps S28 to S29).
However, it is possible to substitute a small number of pulses approximately (see FIG. 4).

【００６０】次いで、距離計算部２６は、ピッチ周期化
された各マルチパルス音源符号ベクトルに対して、上記
した適応符号帳探索と同様の距離計算処理（ステップＳ
１１、Ｓ３２）を行い、各音源符号ベクトルの合成音声
ベクトルが任意の利得を乗じて、目標信号ベクトルに最
も近くなる最短距離を計算する。この場合、音源符号ベ
クトルの合成音声ベクトルを適応符号ベクトルの合成音
声ベクトルに対して直交化してから距離計算することも
できる。Next, distance calculating section 26 performs the same distance calculating process (step S) as described above for the adaptive codebook search for each pitch-periodic multipulse excitation code vector.
11, S32) is performed, and the shortest distance at which the synthesized speech vector of each excitation code vector is closest to the target signal vector is calculated by multiplying the gain by an arbitrary gain. In this case, the distance can be calculated after orthogonalizing the synthesized speech vector of the excitation code vector with respect to the synthesized speech vector of the adaptive code vector.

【００６１】次いで、最適ベクトル決定部２８におい
て、最短距離が最も小さくなる音源符号ベクトルが決定
され、その決定された音源符号ベクトルに付与された符
号は、伝送あるいは蓄積される。Next, in the optimum vector determination unit 28, the excitation code vector whose shortest distance is the smallest is determined, and the code assigned to the determined excitation code vector is transmitted or stored.

【００６２】さらに、利得量子化部３０において、利得
成分が量子化される。ここでは、利得成分の抽出手法と
して、適応符号ベクトルと雑音符号ベクトルの両方の利
得成分を同時に最適化して量子化するベクトル量子化が
用いられ、量子化された利得成分は、伝送あるいは蓄積
される（ステップＳ１２、Ｓ３３）。Further, the gain component is quantized in the gain quantization section 30. Here, as a gain component extraction method, vector quantization for simultaneously optimizing and quantizing both gain components of the adaptive code vector and the noise code vector is used, and the quantized gain component is transmitted or stored. (Steps S12 and S33).

【００６３】そして、ステップＳ１３（Ｓ３４）におい
て、ｊ＝Ｎ−１（Ｎは、パルス数）か否かが判断され、
ｊ≠Ｎ−１であれば、ステップＳ１４（Ｓ３５）におい
てｊ＝ｊ＋１としてステップＳ８（Ｓ２８）に戻り、ｊ
＝Ｎ−１であれば、ステップＳ１５（Ｓ３６）に移行す
る。Then, in step S13 (S34), it is determined whether or not j = N-1 (N is the number of pulses).
If j ≠ N-1, j = j + 1 is set in step S14 (S35), and the process returns to step S8 (S28).
If = N-1, the process moves to step S15 (S36).

【００６４】ステップＳ１５（Ｓ３６）において、ｉ≠
Ｍ−１（Ｍは、サブフレーム数）であれば、ステップＳ
１６（Ｓ３７）においてｉ＝ｉ＋１としてステップＳ５
（Ｓ２５）に戻り、ｉ＝Ｍ−１であれば、ステップＳ１
７（Ｓ３８）に移行する。ステップＳ１７（Ｓ３８）に
おいて、処理フレームが最終フレームでなければ最初の
ステップＳ１（Ｓ２１）に戻って上記処理が繰り返され
るが、最終フレームであれば処理を終了する。In step S15 (S36), i ≠
If M−1 (M is the number of subframes), step S
16 (S37), i = i + 1 is set, and step S5 is performed.
Returning to (S25), if i = M-1, step S1
7 (S38). In step S17 (S38), if the processing frame is not the last frame, the process returns to the first step S1 (S21) and the above processing is repeated.

【００６５】以上、説明したように、上記実施の形態に
よれば、代数的音源符号帳を用いたＣＥＬＰ符号化のピ
ッチ周期化において、非整数サンプル位置にパルスを立
てる必要のある場合に、パルスを三角波によって近似さ
せ、補間によって整数サンプル位置のパルスの組み合わ
せに変換するようにしたため、音源符号帳探索に要する
演算量を大きく増加させることなく、適応符号ベクトル
のピッチ周期と音源符号ベクトルのピッチ周期との差を
減少させ、音質を向上させることができる。As described above, according to the above-described embodiment, when it is necessary to make a pulse at a non-integer sample position in the pitch period of CELP coding using an algebraic excitation codebook, Is approximated by a triangular wave, and is converted into a combination of pulses at integer sample positions by interpolation. Therefore, the pitch period of the adaptive code vector and the pitch period of the excitation code vector can be increased without greatly increasing the amount of computation required for excitation codebook search. And the sound quality can be improved.

【００６６】なお、上記実施の形態で説明した音声符号
化方法は、あらかじめ用意された音声符号化アルゴリズ
ム（あるいは、これを含むプログラム）をパーソナルコ
ンピュータやワークステーション等のコンピュータで実
行することにより実現することができる。この音声符号
化アルゴリズムは、ハードディスク、フロッピーディス
ク、ＣＤ−ＲＯＭ、ＭＯ、ＤＶＤ等のコンピュータで読
み取り可能な記録媒体に記録され、コンピュータによっ
て記録媒体から読み出されることによって実行される。
また、この音声符号化アルゴリズムは、上記記録媒体を
介し、インターネット等のネットワークを介して配布す
ることも可能である。The speech coding method described in the above embodiment is realized by executing a prepared speech coding algorithm (or a program including the same) on a computer such as a personal computer or a workstation. be able to. This audio encoding algorithm is recorded on a computer-readable recording medium such as a hard disk, a floppy disk, a CD-ROM, an MO, and a DVD, and is executed by being read from the recording medium by the computer.
Also, the audio encoding algorithm can be distributed via the recording medium and a network such as the Internet.

【００６７】[0067]

【発明の効果】以上説明したように、請求項１の発明に
よれば、フレーム分割手段がディジタル音声信号をフレ
ームに分割し、サブフレーム分割手段がそのフレームを
さらに短いサブフレームに分割し、ホルマントパラメー
タ抽出手段がフレームからホルマントパラメータを抽出
して符号化すると、ピッチ周期抽出手段がサブフレーム
とホルマントパラメータとを用いてサブフレームのピッ
チ周期を抽出して符号化し、雑音源抽出手段によりサブ
フレームとフレームのホルマントパラメータとピッチ周
期と複数本のパルスによって構成されたサブフレーム長
の複数の符号ベクトルによって構成されるマルチパルス
符号帳とを用いてサブフレームの雑音源成分を抽出して
符号化する。そして、雑音源抽出手段において、非整数
サンプル位置にパルスを立てる必要が生じた場合は、そ
の非整数サンプル位置を挟んだ複数箇所の整数サンプル
位置にパルスを立てるようにするので、雑音源抽出を高
精度に行うことができ、より高音質化することが可能な
音声符号化装置が得られる。As described above, according to the first aspect of the present invention, the frame dividing means divides the digital audio signal into frames, and the sub-frame dividing means divides the frame into shorter sub-frames. When the parameter extracting unit extracts and encodes the formant parameter from the frame, the pitch period extracting unit extracts and encodes the pitch period of the subframe using the subframe and the formant parameter, and the subframe and the noise source extracting unit encode the pitch period. A noise source component of a subframe is extracted and encoded using a formant parameter of the frame, a pitch period, and a multi-pulse codebook composed of a plurality of code vectors having a subframe length composed of a plurality of pulses. When the noise source extraction means needs to generate a pulse at a non-integer sample position, a pulse is generated at a plurality of integer sample positions sandwiching the non-integer sample position. A speech encoding device that can be performed with high accuracy and that can achieve higher sound quality can be obtained.

【００６８】また、請求項２の発明によれば、請求項１
の発明において、雑音源抽出手段で非整数サンプル位置
αにパルスを立てる必要が生じた場合、αの小数部分が
β（０＜β＜１）であるとき、αより小さい最大の整数
サンプル位置（α−β）にパルス振幅の（１−β）倍の
振幅を持つパルスを立て、αより大きい最小の整数サン
プル位置（α−β＋１）にパルス振幅のβ倍の振幅を持
つパルスを立てるようにするので、より簡単な構成で雑
音源抽出を高精度に行い、高音質化することが可能な音
声符号化装置が得られる。According to the second aspect of the present invention, the first aspect is provided.
In the invention of (1), when it is necessary to raise a pulse at the non-integer sample position α by the noise source extracting means, when the fractional part of α is β (0 <β <1), the largest integer sample position ( A pulse having an amplitude of (1-β) times the pulse amplitude is set at (α-β), and a pulse having an amplitude of β times the pulse amplitude is set at the smallest integer sample position (α-β + 1) larger than α. Therefore, a speech coding apparatus capable of performing noise source extraction with high accuracy and a high sound quality with a simpler configuration can be obtained.

【００６９】また、請求項３の発明によれば、フレーム
分割手段がディジタル音声信号をフレームに分割し、サ
ブフレーム分割手段がそのフレームをさらに短いサブフ
レームに分割し、ホルマントパラメータ抽出手段がフレ
ームからホルマントパラメータを抽出して符号化する
と、ピッチ周期抽出手段がサブフレームとホルマントパ
ラメータとを用いてサブフレームのピッチ周期を抽出し
て符号化し、雑音源抽出手段によりサブフレームとフレ
ームのホルマントパラメータとピッチ周期と複数本のパ
ルスによって構成されたサブフレーム長の複数の符号ベ
クトルによって構成されるマルチパルス符号帳とを用い
てサブフレームの雑音源成分を抽出して符号化する。そ
して、ピッチ周期抽出手段が整数のピッチ周期のみなら
ず非整数のピッチ周期をも抽出し、雑音源抽出手段が符
号ベクトルをピッチ周期に対応して周期化するピッチ周
期化手段を備え、そのピッチ周期化手段が非整数のサン
プル位置にパルスを立てる必要が生じた場合は、その非
整数サンプル位置を挟んだ複数の整数サンプル位置にパ
ルスを立てるようにするので、マルチパルス型音源の非
整数ピッチに対するピッチ周期化を高精度に行うことが
でき、より高音質化することが可能な音声符号化装置が
得られる。According to the third aspect of the present invention, the frame dividing means divides the digital audio signal into frames, the subframe dividing means divides the frame into shorter subframes, and the formant parameter extracting means divides the frame into shorter subframes. When the formant parameter is extracted and encoded, the pitch period extracting means extracts and encodes the pitch period of the subframe using the subframe and the formant parameter, and the noise source extracting means encodes the formant parameter and the pitch of the subframe and the frame. A noise source component of a sub-frame is extracted and encoded using a period and a multi-pulse codebook composed of a plurality of code vectors having a sub-frame length composed of a plurality of pulses. The pitch period extracting means extracts not only an integer pitch period but also a non-integer pitch period, and the noise source extracting means includes a pitch periodizing means for periodicizing a code vector corresponding to the pitch period. When it is necessary for the periodicization means to make a pulse at a non-integer sample position, a pulse is made at a plurality of integer sample positions sandwiching the non-integer sample position. Can be performed with high precision, and a speech encoding device that can achieve higher sound quality can be obtained.

【００７０】また、請求項４の発明によれば、請求項３
の発明において、ピッチ周期化手段で非整数サンプル位
置αにパルスを立てる必要が生じた場合、αの小数部分
がβ（０＜β＜１）であるとき、αより小さい最大の整
数サンプル位置（α−β）にパルス振幅の（１−β）倍
の振幅を持つパルスを立て、αより大きい最小の整数サ
ンプル位置（α−β＋１）にパルス振幅のβ倍の振幅を
持つパルスを立てるようにするので、より簡単な構成で
マルチパルス型音源の非整数ピッチに対するピッチ周期
化を高精度に行い、高音質化することが可能な音声符号
化装置が得られる。Further, according to the invention of claim 4, according to claim 3,
In the invention of (1), when it is necessary to raise a pulse at the non-integer sample position α by the pitch periodicizing means, when the fractional part of α is β (0 <β <1), the largest integer sample position (α) A pulse having an amplitude of (1-β) times the pulse amplitude is set at (α-β), and a pulse having an amplitude of β times the pulse amplitude is set at the smallest integer sample position (α-β + 1) larger than α. Therefore, it is possible to obtain a speech encoding device that can perform a pitch period with respect to a non-integer pitch of a multi-pulse type sound source with high accuracy with a simpler configuration and achieve high sound quality.

【００７１】また、請求項５の発明によれば、フレーム
分割工程でディジタル音声信号をフレームに分割し、サ
ブフレーム分割工程でそのフレームをさらに短いサブフ
レームに分割し、ホルマントパラメータ抽出工程でフレ
ームからホルマントパラメータを抽出して符号化し、ピ
ッチ周期抽出工程でサブフレームとホルマントパラメー
タとを用いてサブフレームのピッチ周期を抽出して符号
化し、雑音源抽出工程によりサブフレームとフレームの
ホルマントパラメータとピッチ周期と複数本のパルスに
よって構成されたサブフレーム長の複数の符号ベクトル
によって構成されるマルチパルス符号帳とを用いてサブ
フレームの雑音源成分を抽出して符号化する。そして、
雑音源抽出工程において、非整数サンプル位置にパルス
を立てる必要が生じた場合、その非整数サンプル位置を
挟んだ複数箇所の整数サンプル位置にパルスを立てるよ
うにするので、雑音源抽出を高精度に行うことができ、
より高音質化することが可能な音声符号化方法が得られ
る。According to the invention of claim 5, the digital audio signal is divided into frames in the frame dividing step, the frame is further divided into shorter sub-frames in the sub-frame dividing step, and the frame is extracted from the frame in the formant parameter extracting step. The formant parameters are extracted and encoded, the pitch period of the subframe is extracted and encoded by using the subframe and the formant parameter in the pitch period extracting step, and the formant parameter and the pitch period of the subframe and the frame are extracted by the noise source extracting step. And a multi-pulse codebook composed of a plurality of subframe-length code vectors composed of a plurality of pulses to extract and encode a noise source component of the subframe. And
In the noise source extraction step, if it is necessary to make a pulse at a non-integer sample position, a pulse is made at a plurality of integer sample positions sandwiching the non-integer sample position, so that noise source extraction can be performed with high accuracy. Can do
A speech coding method that can achieve higher sound quality can be obtained.

【００７２】また、請求項６の発明によれば、請求項５
の発明において、雑音源抽出工程で非整数サンプル位置
αにパルスを立てる必要が生じた場合、αの小数部分が
β（０＜β＜１）であるとき、αより小さい最大の整数
サンプル位置（α−β）にパルス振幅の（１−β）倍の
振幅を持つパルスを立て、αより大きい最小の整数サン
プル位置（α−β＋１）にパルス振幅のβ倍の振幅を持
つパルスを立てるようにするので、より簡単な構成で雑
音源抽出を高精度に行い、高音質化することが可能な音
声符号化方法が得られる。According to the invention of claim 6, according to claim 5,
In the invention of the above, when it is necessary to raise a pulse at the non-integer sample position α in the noise source extracting step, when the fractional part of α is β (0 <β <1), the largest integer sample position ( A pulse having an amplitude of (1-β) times the pulse amplitude is set at (α-β), and a pulse having an amplitude of β times the pulse amplitude is set at the smallest integer sample position (α-β + 1) larger than α. Therefore, it is possible to obtain a speech coding method capable of extracting a noise source with high accuracy and a high sound quality with a simpler configuration.

【００７３】また、請求項７の発明によれば、フレーム
分割工程でディジタル音声信号をフレームに分割し、サ
ブフレーム分割工程でそのフレームをさらに短いサブフ
レームに分割し、ホルマントパラメータ抽出工程でフレ
ームからホルマントパラメータを抽出して符号化する
と、ピッチ周期抽出工程でサブフレームとホルマントパ
ラメータとを用いてサブフレームのピッチ周期を抽出し
て符号化し、雑音源抽出工程によりサブフレームとフレ
ームのホルマントパラメータとピッチ周期と複数本のパ
ルスによって構成されたサブフレーム長の複数の符号ベ
クトルによって構成されるマルチパルス符号帳とを用い
てサブフレームの雑音源成分を抽出して符号化する。そ
して、ピッチ周期抽出工程で整数のピッチ周期のみなら
ず非整数のピッチ周期をも抽出し、雑音源抽出工程が符
号ベクトルをピッチ周期に対応して周期化するピッチ周
期化工程を含み、そのピッチ周期化工程で非整数のサン
プル位置にパルスを立てる必要が生じた場合は、その非
整数サンプル位置を挟んだ複数の整数サンプル位置にパ
ルスを立てるようにするので、マルチパルス型音源の非
整数ピッチに対するピッチ周期化を高精度に行うことが
でき、より高音質化することが可能な音声符号化方法が
得られる。According to the seventh aspect of the present invention, the digital audio signal is divided into frames in the frame division step, the frame is further divided into shorter subframes in the subframe division step, and the frame is extracted from the frame in the formant parameter extraction step. When the formant parameters are extracted and encoded, the pitch period of the subframe is extracted and encoded using the subframe and the formant parameter in the pitch period extraction step, and the formant parameter and the pitch of the subframe and the frame are extracted and encoded in the noise source extraction step. A noise source component of a sub-frame is extracted and encoded using a period and a multi-pulse codebook composed of a plurality of code vectors having a sub-frame length composed of a plurality of pulses. Then, in the pitch period extracting step, not only the integer pitch period but also a non-integer pitch period is extracted, and the noise source extracting step includes a pitch periodizing step of periodicizing the code vector corresponding to the pitch period. If a pulse needs to be raised at a non-integer sample position in the periodization process, a pulse is generated at a plurality of integer sample positions sandwiching the non-integer sample position. Can be performed with high precision, and a speech coding method that can achieve higher sound quality can be obtained.

【００７４】また、請求項８に記載の発明は、請求項７
の発明において、ピッチ周期化工程で非整数サンプル位
置αにパルスを立てる必要が生じた場合、αの小数部分
がβ（０＜β＜１）であるとき、αより小さい最大の整
数サンプル位置（α−β）にパルス振幅の（１−β）倍
の振幅を持つパルスを立て、αより大きい最小の整数サ
ンプル位置（α−β＋１）にパルス振幅のβ倍の振幅を
持つパルスを立てるようにするので、より簡単な構成で
マルチパルス型音源の非整数ピッチに対するピッチ周期
化を高精度に行い、高音質化することが可能な音声符号
化方法が得られる。The invention according to claim 8 is the same as the invention according to claim 7.
In the invention of the above, if it is necessary to raise a pulse at the non-integer sample position α in the pitch periodization step, and when the fractional part of α is β (0 <β <1), the largest integer sample position smaller than α ( A pulse having an amplitude of (1-β) times the pulse amplitude is set at (α-β), and a pulse having an amplitude of β times the pulse amplitude is set at the smallest integer sample position (α-β + 1) larger than α. Therefore, it is possible to obtain a speech encoding method capable of performing a pitch period with respect to a non-integer pitch of a multi-pulse type sound source with high accuracy and a high sound quality with a simpler configuration.

【００７５】また、請求項９の発明によれば、フレーム
分割ステップでディジタル音声信号をフレームに分割
し、サブフレーム分割ステップでそのフレームをさらに
短いサブフレームに分割し、ホルマントパラメータ抽出
ステップでフレームからホルマントパラメータを抽出し
て符号化すると、ピッチ周期抽出ステップでサブフレー
ムとホルマントパラメータとを用いてサブフレームのピ
ッチ周期を抽出して符号化し、雑音源抽出ステップによ
りサブフレームとフレームのホルマントパラメータとピ
ッチ周期と複数本のパルスによって構成されたサブフレ
ーム長の複数の符号ベクトルによって構成されるマルチ
パルス符号帳とを用いてサブフレームの雑音源成分を抽
出して符号化する。そして、ピッチ周期抽出ステップで
整数のピッチ周期のみならず非整数のピッチ周期をも抽
出し、雑音源抽出ステップが符号ベクトルをピッチ周期
に対応して周期化するピッチ周期化ステップを備え、そ
のピッチ周期化ステップで非整数のサンプル位置にパル
スを立てる必要が生じた場合は、その非整数サンプル位
置を挟んだ複数の整数サンプル位置にパルスを立てるよ
うにして、マルチパルス型音源の非整数ピッチに対する
ピッチ周期化を高精度に行えるようになり、より高音質
化することができる音声符号化アルゴリズムを記録媒体
に記録したことで、その音声符号化アルゴリズムが機械
読み取り可能となり、これらの動作をコンピュータによ
って実現することが可能な記録媒体を得ることができ
る。According to the ninth aspect of the present invention, the digital audio signal is divided into frames in the frame division step, the frame is divided into shorter subframes in the subframe division step, and the frame is divided from the frame in the formant parameter extraction step. When the formant parameters are extracted and encoded, the pitch period of the subframe is extracted and encoded using the subframe and the formant parameter in the pitch period extraction step, and the formant parameter and the pitch of the subframe and the frame are extracted and encoded in the noise source extraction step. A noise source component of a sub-frame is extracted and encoded using a period and a multi-pulse codebook composed of a plurality of code vectors having a sub-frame length composed of a plurality of pulses. The pitch period extracting step extracts not only an integer pitch period but also a non-integer pitch period, and the noise source extracting step includes a pitch periodizing step for periodicizing the code vector in accordance with the pitch period. If it is necessary to make a pulse at a non-integer sample position in the periodization step, make a pulse at a plurality of integer sample positions sandwiching the non-integer sample position, so that the pulse with respect to the non-integer pitch of the multi-pulse sound source can be obtained. The pitch cycle can be performed with high precision, and the voice coding algorithm that can improve the sound quality is recorded on a recording medium, so that the voice coding algorithm becomes machine-readable, and these operations can be performed by a computer. A recording medium that can be realized can be obtained.

【００７６】また、請求項１０の発明によれば、請求項
９の発明において、ピッチ周期化ステップで非整数サン
プル位置αにパルスを立てる必要が生じた場合、αの小
数部分がβ（０＜β＜１）であるとき、αより小さい最
大の整数サンプル位置（α−β）にパルス振幅の（１−
β）倍の振幅を持つパルスを立て、αより大きい最小の
整数サンプル位置（α−β＋１）にパルス振幅のβ倍の
振幅を持つパルスを立てるようにするので、より簡単な
構成でマルチパルス型音源の非整数ピッチに対するピッ
チ周期化を高精度に行い、高音質化することが可能な音
声符号化アルゴリズムを記録媒体に記録したことで、そ
の音声符号化アルゴリズムが機械読み取り可能となり、
これらの動作をコンピュータによって実現することが可
能な記録媒体を得ることができる。According to a tenth aspect of the present invention, in the ninth aspect of the present invention, when it is necessary to set a pulse at a non-integer sample position α in the pitch period setting step, the fractional part of α is β (0 < When β <1), the pulse amplitude (1−1) is set at the maximum integer sample position (α−β) smaller than α.
β) A pulse having an amplitude that is twice as large is set, and a pulse having an amplitude that is β times the pulse amplitude is set at the smallest integer sample position (α-β + 1) that is larger than α. Performing the pitch period for the non-integer pitch of the sound source with high precision and recording the voice encoding algorithm capable of improving the sound quality on a recording medium, the voice encoding algorithm becomes machine-readable,
A recording medium capable of realizing these operations by a computer can be obtained.

[Brief description of the drawings]

【図１】本実施の形態に係る音声符号化装置の一構成例
を示すブロック図である。FIG. 1 is a block diagram illustrating a configuration example of a speech encoding device according to the present embodiment.

【図２】Ｐ＜α＜Ｍの場合にパルスのピッチ周期化を行
う場合の説明図である。FIG. 2 is an explanatory diagram of a case where a pulse pitch cycle is performed when P <α <M.

【図３】αが非整数で非整数サンプル位置にパルスを立
てる必要が生じた場合にパルスを三角波に近似させて補
間することで複数のパルスを生成して非整数サンプル位
置のパルスの代用を説明する図である。FIG. 3 is a diagram illustrating a case where a pulse is generated at a non-integer sample position when α is a non-integer and a plurality of pulses are generated by approximating the pulse to a triangular wave and interpolating the pulse to substitute for the pulse at the non-integer sample position FIG.

【図４】パルスを三角波形に近似させて補間することよ
り非整数サンプル位置にパルスを立てる必要が生じた場
合でも近似的に少数のパルスで代用することが可能なこ
とを説明する図である。FIG. 4 is a diagram illustrating that a small number of pulses can be approximately substituted even when it is necessary to raise a pulse at a non-integer sample position by approximating a pulse to a triangular waveform and interpolating the pulse. .

【図５】ディジタル音声入力信号の符号化処理の一動作
例を示すフローチャートである。FIG. 5 is a flowchart showing an example of an operation of encoding a digital audio input signal.

【図６】ディジタル音声入力信号の符号化処理の別の動
作例を示すフローチャートである。FIG. 6 is a flowchart illustrating another operation example of the encoding processing of the digital audio input signal.

[Explanation of symbols]

１０音声符号化装置１２フレーム分割部１４サブフレーム分割部１６ホルマントパラメータ抽出部１８ピッチ周期抽出部２０雑音源抽出部２２目標信号構成部２４ピッチ周期化部２６距離計算部２８最適ベクトル決定部２０利得量子化部 DESCRIPTION OF SYMBOLS 10 Speech coding apparatus 12 Frame division part 14 Subframe division part 16 Formant parameter extraction part 18 Pitch period extraction part 20 Noise source extraction part 22 Target signal construction part 24 Pitch periodization part 26 Distance calculation part 28 Optimal vector determination part 20 Gain Quantization unit

Claims

[Claims]

1. An audio encoding device for encoding a digital audio signal, comprising: a frame dividing unit that divides the digital audio signal into frames; a subframe dividing unit that divides the frame into shorter subframes; A formant parameter extracting unit that extracts and encodes a formant parameter from a frame; a pitch period extracting unit that extracts and encodes a pitch period of the subframe using the subframe and the formant parameter; A noise source for extracting and encoding a noise source component of the subframe using a multi-pulse codebook constituted by a plurality of code vectors having a subframe length constituted by a formant parameter, the pitch period, and a plurality of pulses. Extraction means, and the noise source In detecting means, when the need to make a pulse to a non-integer sample positions occurs, speech coding apparatus characterized by make a pulse integer sample positions of a plurality of locations across the non-integer sample positions.

2. When the noise source extracting means needs to generate a pulse at a non-integer sample position α, when the fractional part of α is β (0 <β <1), the largest integer smaller than α A pulse having an amplitude of (1-β) times the pulse amplitude is set at the sample position (α-β), and a pulse having an amplitude of β times the pulse amplitude is set at the smallest integer sample position (α-β + 1) larger than α. The speech encoding device according to claim 1, wherein the speech encoding device is set up.

3. An audio encoding apparatus for encoding a digital audio signal, comprising: a frame dividing unit that divides the digital audio signal into frames; a subframe dividing unit that divides the frame into shorter subframes; A formant parameter extracting unit that extracts and encodes a formant parameter from a frame; a pitch period extracting unit that extracts and encodes a pitch period of the subframe using the subframe and the formant parameter; A noise source for extracting and encoding a noise source component of the subframe using a multi-pulse codebook constituted by a plurality of code vectors having a subframe length constituted by a formant parameter, the pitch period, and a plurality of pulses. Extraction means, and A pitch period extracting unit that extracts not only an integer pitch period but also a non-integer pitch period, and the noise source extracting unit includes a pitch periodizing unit that periodicizes the code vector in accordance with the pitch period; A speech encoding apparatus characterized in that, when it is necessary for a pitch periodizing means to generate a pulse at a non-integer sample position, a pulse is generated at a plurality of integer sample positions sandwiching the non-integer sample position.

4. When the pitch period generation means needs to generate a pulse at a non-integer sample position α, when the fractional part of α is β (0 <β <1), the largest integer smaller than α A pulse having an amplitude of (1-β) times the pulse amplitude is set at the sample position (α-β), and a pulse having an amplitude of β times the pulse amplitude is set at the smallest integer sample position (α-β + 1) larger than α. The speech encoding device according to claim 3, wherein the speech encoding device is set up.

5. A speech encoding method for encoding a digital speech signal, comprising: a frame division step of dividing the digital speech signal into frames; a subframe division step of dividing the frame into shorter subframes; A formant parameter extraction step of extracting and encoding a formant parameter from a frame; a pitch period extraction step of extracting and encoding a pitch period of the subframe using the subframe and the formant parameter; and A noise source for extracting and encoding a noise source component of the subframe using a multi-pulse codebook constituted by a plurality of code vectors having a subframe length constituted by a formant parameter, the pitch period, and a plurality of pulses. Extracting the noise source In out step, if necessary to make a pulse to a non-integer sample positions occurs, speech coding method, characterized in that to make a pulse integer sample positions of a plurality of locations across the non-integer sample positions.

6. When the fractional part of α is β (0 <β <1) when a pulse needs to be raised at a non-integer sample position α in the noise source extraction step, the largest integer smaller than α A pulse having an amplitude of (1-β) times the pulse amplitude is set at the sample position (α-β), and a pulse having an amplitude of β times the pulse amplitude is set at the smallest integer sample position (α-β + 1) larger than α. The speech encoding method according to claim 5, wherein the speech encoding is performed.

7. A speech encoding method for encoding a digital audio signal, comprising: a frame dividing step of dividing the digital audio signal into frames; a subframe dividing step of dividing the frame into shorter subframes; A formant parameter extraction step of extracting and encoding a formant parameter from a frame; a pitch period extraction step of extracting and encoding a pitch period of the subframe using the subframe and the formant parameter; and Noise that extracts and encodes a noise source component of the subframe using a formant parameter, the pitch period, and a multi-pulse codebook composed of a plurality of code vectors having a subframe length composed of a plurality of pulses. A source extraction step; and The pitch period extracting step extracts not only an integer pitch period but also a non-integer pitch period, and the noise source extracting step includes a pitch periodizing step of periodicizing the code vector in accordance with the pitch period, The voice coding method according to claim 1, wherein, in the pitch period setting step, when a pulse needs to be set at a non-integer sample position, a pulse is set at a plurality of integer sample positions sandwiching the non-integer sample position.

8. In the case where it is necessary to raise a pulse at a non-integer sample position α in the pitch period forming step, when the fractional part of α is β (0 <β <1), the largest integer smaller than α A pulse having an amplitude of (1-β) times the pulse amplitude is set at the sample position (α-β), and a pulse having an amplitude of β times the pulse amplitude is set at the smallest integer sample position (α-β + 1) larger than α. The speech encoding method according to claim 7, wherein the speech encoding is performed.

9. A computer-readable recording medium recording an audio encoding algorithm for encoding a digital audio signal, comprising: a frame dividing step of dividing the digital audio signal into frames; A subframe dividing step; a formant parameter extracting step of extracting and encoding a formant parameter from the frame; and a pitch period of extracting and encoding a pitch period of the subframe using the subframe and the formant parameter. The extracting step, the sub-frame, the formant parameters, the pitch period and the sub-frame of the sub-frame using a multi-pulse codebook composed of a plurality of code vectors of the sub-frame length composed of a plurality of pulses A noise source extraction step of extracting and encoding a sound source component, further comprising: the pitch period extraction step extracts not only an integer pitch period but also a non-integer pitch period, and the noise source extraction step A pitch cycling step of cycling the code vector in accordance with the pitch cycle, wherein the pitch cycling step sandwiches the non-integer sample position when it is necessary to raise a pulse at a non-integer sample position. A computer-readable recording medium on which a speech encoding algorithm is characterized in that pulses are formed at a plurality of integer sample positions.

10. In the case where it is necessary to raise a pulse at a non-integer sample position α in the pitch periodization step, when the fractional part of α is β (0 <β <1), the largest integer smaller than α A pulse having an amplitude of (1-β) times the pulse amplitude is set at the sample position (α-β), and a pulse having an amplitude of β times the pulse amplitude is set at the smallest integer sample position (α-β + 1) larger than α. A computer-readable recording medium on which the voice encoding algorithm according to claim 9 is recorded.