JP3404350B2

JP3404350B2 - Speech coding parameter acquisition method, speech decoding method and apparatus

Info

Publication number: JP3404350B2
Application number: JP2000060932A
Authority: JP
Inventors: 照夫麓; 佐々木誠司
Original assignee: Matsushita Communication Industrial Co Ltd
Current assignee: Panasonic Mobile Communications Co Ltd
Priority date: 2000-03-06
Filing date: 2000-03-06
Publication date: 2003-05-06
Anticipated expiration: 2020-03-06
Also published as: JP2001249698A

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、音声信号をデジタ
ル化して所定の時間間隔毎にその特徴を表す音声符号化
パラメータを取得する音声符号化パラメータ取得方法お
よび装置、ならびに、前記音声符号化パラメータに基づ
いて元の音声信号を合成する音声復号方法及び装置に関
するものであり、音声符号化パラメータを符号化して伝
送または蓄積し、伝送先または蓄積先から必要な時に音
声符号化パラメータを復元し、復元した音声符号化パラ
メータから音声信号を合成して音声を伝えるデジタル携
帯電話やデジタル音声蓄積装置などに使用して好適なも
のである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech coding parameter acquisition method and apparatus for digitizing a speech signal to obtain speech coding parameters representing its characteristics at predetermined time intervals, and the speech coding parameter. The present invention relates to a voice decoding method and apparatus for synthesizing an original voice signal based on, transmitting or storing voice encoding parameters by encoding, and restoring voice encoding parameters when necessary from a transmission destination or an accumulation destination, It is suitable for use in a digital mobile phone, a digital voice storage device, or the like that synthesizes a voice signal from a restored voice coding parameter and transmits a voice.

【０００２】[0002]

【従来の技術】デジタル化された音声信号は、データ圧
縮、誤り処理、多重化などさまざまなデジタル信号処理
が可能になるため、固定電話や移動電話に限らず音声を
利用するマルチメディアシステムなどに広く取り入れら
れている。アナログの音声信号をデジタル化するには、
一般に入力音声周波数帯域の２倍以上の標本化周波数で
標本化し、耳に識別できない程度の量子化ステップで量
子化することが必要なため、アナログ信号と比較し広い
伝送周波数帯域幅を必要とする。そのため、一旦デジタ
ル化された音声信号は、要求される音声品質に応じてさ
まざまな符号化方式や変調方式によりデータの圧縮が行
われている。音声の持つ特徴を積極的に利用する事によ
り、効率的な圧縮を行う事が出来る。例えば、適応型差
分パルス符号変調（ＡＤＰＣＭ）方式は音声波形の周期
性や人間の聴覚感度の対数特性を利用した波形符号化方
式で、１２８kbpsのデジタル音声を３２kbps程度に圧縮
して圧縮前と変わらない音声品質を得ており、電話の基
幹伝送やＰＨＳシステムに利用されている。波形符号化
方式は標本化点を最低１ビットで表現するものであるた
め、標本化周波数が８KHzの場合では原理的に８kbps以
下に音声符号化速度を下げる事は出来ない。2. Description of the Related Art Since a digitized voice signal can be subjected to various digital signal processing such as data compression, error processing, and multiplexing, it is not limited to fixed-line telephones and mobile telephones, but can be applied to multimedia systems using voice. Widely adopted. To digitize analog audio signals,
Generally, it is necessary to perform sampling at a sampling frequency that is at least twice the input voice frequency band and perform quantization with a quantization step that cannot be discriminated by the ear, and thus requires a wider transmission frequency bandwidth than an analog signal. . Therefore, the audio signal once digitized is compressed by various encoding and modulation methods according to the required audio quality. By actively utilizing the characteristics of voice, efficient compression can be performed. For example, the adaptive differential pulse code modulation (ADPCM) method is a waveform coding method that uses the logarithmic characteristics of the speech waveform periodicity and human auditory sensitivity. It compresses 128 kbps digital speech to about 32 kbps, and is the same as before compression. It has no voice quality and is used for backbone transmission of telephones and PHS systems. In the waveform coding method, since the sampling point is represented by at least 1 bit, it is theoretically impossible to reduce the speech coding speed to 8 kbps or less when the sampling frequency is 8 KHz.

【０００３】低い音声符号化速度を得るために、音声を
所定の時間間隔のセグメントに分割し、そのセグメント
毎に、音声合成パラメータと残差音源信号を伝送する符
号励振線形予測（ＣＥＬＰ）を基本とする方式がある。
日本の携帯無線電話で用いられているＶＳＥＬＰやＰＳ
Ｉ−ＣＥＬＰ方式は、２０msecや４０msec間隔の音声信
号の線形予測分析により得られる人間の声道フィルタ特
性を近似する線形予測係数（ＬＰＣ）と、聴感的に入力
音声に近い波形を合成する事が出来る残差音源信号を符
号化する事で低い音声符号化速度を実現している。また
残差音源信号を効率良く符号化するために、複数の残差
音源波形を持った符号帳を用意し、その符合帳のエント
リ番号と利得を伝送している。これらの詳細は電波産業
会の規格書ＲＣＲ−ＳＴＤ２７Ｆに詳しく記載されてい
る。このＣＥＬＰを基本とする方式は、適切な大きさの
符合帳をうまく設計する事で音声符号化速度３〜４kbps
程度まで実現されている。In order to obtain a low speech coding rate, speech is divided into segments at predetermined time intervals, and code excitation linear prediction (CELP) is basically used for transmitting speech synthesis parameters and residual excitation signals for each segment. There is a method.
VSELP and PS used in Japanese mobile phones
In the I-CELP method, a linear prediction coefficient (LPC) that approximates human vocal tract filter characteristics obtained by linear prediction analysis of a voice signal at intervals of 20 msec or 40 msec and a waveform that is audibly similar to the input voice can be synthesized. A low speech coding speed is realized by coding the residual error source signal. Moreover, in order to efficiently encode the residual excitation signal, a codebook having a plurality of residual excitation waveforms is prepared, and the entry number and gain of the codebook are transmitted. These details are described in detail in the standard RCR-STD27F of the Association of Radio Industries and Businesses. This CELP-based method is designed to properly design a codebook of an appropriate size to achieve a voice coding rate of 3 to 4 kbps.
It has been realized to a degree.

【０００４】更に低い音声符号化速度を得るために、音
声合成パラメータのみを伝送して上記のＣＥＬＰ方式に
おける音源符合帳を用いない音声符号化を行う方式があ
る。米国国防省の標準音声符号化方式のＦＳ−１０１５
は、ピッチ周波数、ＬＰＣ係数、ルート二乗平均振幅、
有声／無声判定情報の音声合成パラメータにより音声符
号化・復号を行うＬＰＣボコーダ（Vocoder）方式の音
声符号化方式で、２．４kbpsの音声符号化速度を得てい
る。この方式は音声の特徴を積極的に利用しているが、
合成音声的な音質になり、特に背景雑音下で復号音声品
質が著しく劣化する欠点を有していた。また、衛星携帯
電話に一部使用されているＩＭＢＥ（Improved Multiba
nd Excitation）方式は、音声時間セグメントを周波数
領域に変換して音声ピッチ、音声ハーモニクス振幅、周
波数帯域を複数に分割した周波数バンドの有声／非有声
情報で音声符号化を行う方式で、音声セグメントの各バ
ンド毎に有声音モデルと無声音モデルを選択して合成す
るために、背景雑音下や混合音声の場合にも合成音声の
劣化が少なく、前記ＬＰＣボコーダに比べて優れている
と報告されている。In order to obtain a still lower voice coding speed, there is a system in which only the voice synthesis parameters are transmitted to perform voice coding without using the excitation codebook in the CELP system. US Department of Defense standard speech coding scheme FS-1015
Is the pitch frequency, LPC coefficient, root mean square amplitude,
A voice coding method of an LPC vocoder system that performs voice coding / decoding based on a voice synthesis parameter of voiced / unvoiced determination information has achieved a voice coding rate of 2.4 kbps. This method actively uses the characteristics of voice,
It has a drawback that the quality of the synthesized speech becomes like that of the synthesized speech, and the quality of the decoded speech remarkably deteriorates especially in the background noise. In addition, IMBE (Improved Multiba), which is partially used in satellite mobile phones
nd Excitation) method is a method of converting a voice time segment into a frequency domain and performing voice encoding with voice pitch / voice harmonics amplitude and voiced / unvoiced information of a frequency band obtained by dividing a frequency band into a plurality of voice segments. Since a voiced sound model and an unvoiced sound model are selected and synthesized for each band, it is reported that the synthesized speech is less deteriorated even in the presence of background noise and mixed speech, and is superior to the LPC vocoder. .

【０００５】図１４は、一般的な音声符号化伝送装置の
構成を示した図である。音声符号化パラメータ抽出部３
０２は音声入力端子３０１から入力された標本化・量子
化された音声デジタル信号を、所定の時間間隔のセグメ
ントに分割し、そのセグメント毎に音声符号化パラメー
タを抽出する。抽出する音声符号化パラメータは音声符
号化方式により決定され、例えば前記のＩＭＢＥ方式で
は、音声ピッチ、音声ハーモニクスの振幅、各周波数バ
ンドの有声／無声情報である。パラメータ符号化部３０
３は、抽出した音声符号化パラメータを効果的に符号化
して符号量を低減せしめ、送信部３０４を介して伝送路
３０５に送り出す。パラメータ復号化部３０７は受信部
３０６で受け取った符号を復号して音声符号化パラメー
タを復元し、音声合成部３０８は音声符号化パラメータ
抽出部の動作と逆の動作により合成音声を作成し音声出
力端子３０９から音声デジタル信号を出力する。FIG. 14 is a diagram showing the configuration of a general voice coding transmission device. Speech coding parameter extraction unit 3
Reference numeral 02 divides the sampled and quantized voice digital signal input from the voice input terminal 301 into segments at predetermined time intervals, and extracts voice coding parameters for each segment. The voice encoding parameters to be extracted are determined by the voice encoding method. For example, in the IMBE method, the voice pitch, the amplitude of voice harmonics, and the voiced / unvoiced information of each frequency band are used. Parameter coding unit 30
3 effectively encodes the extracted speech coding parameter to reduce the code amount, and sends it to the transmission path 305 via the transmitting unit 304. The parameter decoding unit 307 decodes the code received by the receiving unit 306 to restore the voice coding parameters, and the voice synthesizing unit 308 creates synthetic voice by the operation reverse to the operation of the voice encoding parameter extracting unit and outputs the voice. An audio digital signal is output from the terminal 309.

【０００６】図１５は前記ＩＭＢＥ方式の場合における
前記音声符号化パラメータ抽出部３０２の内部構成図で
ある。デジタル入力音声信号３０１は基本周波数推定部
４０１に入力され、ここで音声の基本周波数が推定され
る。基本周波数の推定には、自己相関関数や周波数スペ
クトルの対数の逆フーリエ変換であるケプストラムのピ
ークを検出する方法など多くの方法があり、例えば、古
井著「デジタル音声処理」東海大学出版会、１９８５年
９月２５日等に記載されている。周波数スペクトル計
算部４０２では、ハミング窓等の窓関数により切り出し
た有限長の音声セグメントを周波数分析して音声周波数
スペクトルを得る。基本周波数修正部４０３は、前記基
本周波数推定部４０１で推定された音声基本周波数の近
傍の周波数範囲でＡ−ｂ−Ｓ（Analysis-by-Synthesi
s）手法により合成スペクトルと前記周波数スペクトル
計算部４０２で算出した音声周波数スペクトルとの誤差
最小条件により修正した基本周波数ωoを得る。有声強
度計算部４０４は該修正された基本周波数ωoに基づい
て、周波数帯域を複数の周波数バンド（周波数区間）ｋ
（k=1,2,...,K）に分割し、各周波数バンド毎に合成さ
れた合成スペクトルと音声周波数スペクトルの誤差を計
算し、閾値判定により有声／無声情報Ｖ[k]を出力す
る。スペクトル包絡計算部４０５は有声／無声情報Ｖ
[k]により、有声バンドではＡ−ｂ−Ｓ手法で求めた各
ハーモニクスの振幅、無声バンドでは各ハーモニクスの
持つ周波数帯域での周波数スペクトルのルート二乗平均
値（ＲＭＳ値）をスペクトル包絡絶対値|Ａ(ω)|として
出力する。FIG. 15 is an internal configuration diagram of the speech coding parameter extraction unit 302 in the case of the IMBE method. The digital input voice signal 301 is input to the fundamental frequency estimation unit 401, where the fundamental frequency of the voice is estimated. There are many methods for estimating the fundamental frequency, such as a method for detecting a cepstrum peak, which is an inverse Fourier transform of an autocorrelation function or a logarithm of a frequency spectrum. For example, Furui “Digital Speech Processing” Tokai University Press, 1985. It is described on September 25, etc. The frequency spectrum calculation unit 402 frequency-analyzes a finite-length voice segment cut out by a window function such as a Hamming window to obtain a voice frequency spectrum. The fundamental frequency correction unit 403 detects Ab-S (Analysis-by-Synthesi) in a frequency range near the speech fundamental frequency estimated by the fundamental frequency estimation unit 401.
s) The fundamental frequency ωo corrected by the minimum error condition between the synthesized spectrum and the voice frequency spectrum calculated by the frequency spectrum calculation unit 402 is obtained by the method. The voiced strength calculation unit 404 divides the frequency band into a plurality of frequency bands (frequency sections) k based on the corrected fundamental frequency ωo.
Divide into (k = 1,2, ..., K), calculate the error between the synthesized spectrum synthesized for each frequency band and the voice frequency spectrum, and output the voiced / unvoiced information V [k] by threshold judgment To do. The spectrum envelope calculation unit 405 uses the voiced / unvoiced information V
From [k], the root mean square value (RMS value) of the frequency spectrum of the frequency spectrum in the frequency band of each harmonics obtained by the A-B-S method in the voiced band, and in the unvoiced band, the spectrum envelope absolute value | Output as A (ω) |.

【０００７】図１６は、前記ＩＭＢＥ方式の場合におけ
る前記音声合成部３０８の内部構成を示す図である。こ
の図に示すように、音声合成部３０８は、有声音声合成
部５０８と無声音声合成部５０９に大別される。有声音
声合成部５０８において、高調波音源部５０１では、有
声／無声情報Ｖ[k]と基本周波数ωoにより、有声と判定
される周波数区間において、基本周波数ωoとその高調
波の音源をスペクトル包絡|Ａ(ω)|に対応する振幅で駆
動して複数の音源信号を生成する。高調波加算部５０２
では高調波音源部５０１で発生した複数の音源信号を加
算合成し、有声バンドに対応する音声信号を生成する。
また、無声音声合成部５０９において、雑音音源部５０
３は、白色雑音を生成し、周波数変換部５０４で適当な
窓関数により処理した後、フーリエ変換（ＦＦＴ）して
周波数領域の信号に変換する。雑音抽出部５０５で、周
波数領域の信号に変換された白色雑音からＶ[k]により
無声と指定された周波数バンドの白色雑音スペクトルを
取りだし、スペクトル包絡|Ａ(ω)|の周波数バンド毎の
パワーに一致する様に各スペクトルの振幅を調整する。
逆周波数変換部５０６では無声バンドに対応する雑音区
間の周波数スペクトルを逆フーリエ変換（ＩＦＦＴ）す
ることにより音声波形に変換する。加算部５０７では、
有声音声合成部５０８の高調波加算部５０２からの有声
音声波形と、無声音声合成部５０９の逆周波数変換部５
０６で時間軸波形信号に変換した無声音声波形とを加算
し、最終的な有声音と無声音を持った合成音声を得てい
る。このＩＭＢＥ方式の詳細は、”Multiband Excitati
on Vocoder”, IEEETransactions on Acoustics,speec
h, and signal processing, vol.36,No.8,August 1988,
pp1223-1235に詳しく記載されている。このように、音
声をデジタル化して低ビットレートの音声符号化を実現
する方法として、音声合成モデルに基づく音声符号化パ
ラメータを抽出して符号化を行うＩＭＢＥ方式等の分析
合成型の音声符号化方式が提案されている。FIG. 16 is a diagram showing an internal configuration of the voice synthesis unit 308 in the case of the IMBE method. As shown in this figure, the voice synthesis unit 308 is roughly divided into a voiced voice synthesis unit 508 and an unvoiced voice synthesis unit 509. In the voiced voice synthesis unit 508, the harmonic sound source unit 501 uses the voiced / unvoiced information V [k] and the fundamental frequency ωo in the frequency section determined to be voiced to generate a spectral envelope | A plurality of sound source signals are generated by driving with an amplitude corresponding to A (ω) |. Harmonic adder 502
Then, a plurality of sound source signals generated in the harmonic sound source unit 501 are added and combined to generate a voice signal corresponding to the voiced band.
Further, in the unvoiced speech synthesis unit 509, the noise source unit 50
In No. 3, white noise is generated, processed by an appropriate window function in the frequency conversion unit 504, and then Fourier transformed (FFT) to be converted into a frequency domain signal. In the noise extraction unit 505, the white noise spectrum of the frequency band designated as unvoiced by V [k] is extracted from the white noise converted into the frequency domain signal, and the power for each frequency band of the spectrum envelope | A (ω) | Adjust the amplitude of each spectrum so that
The inverse frequency conversion unit 506 converts the frequency spectrum of the noise section corresponding to the unvoiced band into an inverse Fourier transform (IFFT) to convert it into a speech waveform. In the addition unit 507,
The voiced speech waveform from the harmonic addition section 502 of the voiced speech synthesis section 508 and the inverse frequency conversion section 5 of the unvoiced speech synthesis section 509.
The unvoiced voice waveform converted into the time-axis waveform signal in 06 is added to obtain the final synthesized voice having voiced sound and unvoiced sound. For details of this IMBE method, see "Multiband Excitati
on Vocoder ”, IEEETransactions on Acoustics, speec
h, and signal processing, vol.36, No.8, August 1988,
It is described in detail in pp1223-1235. As described above, as a method for digitizing voice to realize low-bit-rate voice encoding, an analysis-synthesis type voice encoding such as IMBE method for extracting and encoding voice encoding parameters based on a voice synthesis model A scheme has been proposed.

【０００８】[0008]

【発明が解決しようとする課題】以上述べた様に、低ビ
ットレート音声符号化のためには分析合成型の音声符号
化方式が有効であるが、残差音源信号を用いず音声合成
パラメータでのみ音声の合成を行うため、符号化方式に
よっては合成音的な音質になりやすい。低ビットレート
音声符号化を実現する分析合成型の音声符号化としての
ＩＭＢＥ方式は、入力音声をセグメントに分割して音声
フレームを切り出し、フレームの周波数帯域を複数のバ
ンドに分割してそのバンドに含まれる周波数成分が有声
か無声かを判定し、バンド毎に有声音合成モデルと無声
音合成モデルを設定し、それらを加算して合成音声を得
る事で、背景雑音等が入った有声音や、有声と無声の混
在したフレームでの合成音声品質を改善している。符号
化パラメータとしての、音声基本周波数（音声ピッ
チ）、有声／無声情報、スペクトル包絡情報の推定の正
確さは、再生音声の品質を決める上で重要である。音声
基本周波数は前述の自己相関法により求める事も出来る
が、上記ＩＭＢＥ方式では整数倍ピッチまで拡張した自
己相関関数で評価を行い１／２ピッチ精度で基本周波数
推定を行う方法が示されている。また、スペクトル包絡
を求めるには、抽出された基本周波数ωoと周波数分析
窓の周波数スペクトルを用いて前述のＡ―ｂ−Ｓ手法で
推定されるが、前記推定された基本周波数の精度ではス
ペクトル包絡の推定には精度が不足するため、推定基本
周波数の近傍を１／４ピッチ精度で探索しながらスペク
トル包絡を同時に推定する手法を取っている。As described above, the analysis-synthesis type speech encoding method is effective for low bit rate speech encoding, but it does not use the residual excitation signal and uses the speech synthesis parameter. Since only the voice is synthesized, the synthetic voice quality tends to be obtained depending on the encoding method. The IMBE method, which is an analysis-synthesis type audio encoding that realizes low bit rate audio encoding, divides the input audio into segments, cuts out an audio frame, divides the frequency band of the frame into a plurality of bands, and divides the band into the bands. Determine whether the included frequency component is voiced or unvoiced, set a voiced sound synthesis model and an unvoiced sound synthesis model for each band, and add them to obtain a synthesized voice. It improves the synthetic speech quality in mixed voiced and unvoiced frames. The accuracy of estimation of the voice fundamental frequency (voice pitch), voiced / unvoiced information, and spectral envelope information as the encoding parameters is important in determining the quality of the reproduced voice. Although the voice fundamental frequency can be obtained by the above-mentioned autocorrelation method, the IMBE method shows a method in which the fundamental frequency is estimated with 1/2 pitch accuracy by evaluating with an autocorrelation function extended to an integral multiple pitch. . Further, in order to obtain the spectrum envelope, the extracted fundamental frequency ωo and the frequency spectrum of the frequency analysis window are used for estimation by the above-mentioned A-B-S method. However, the spectrum envelope is estimated with the accuracy of the estimated fundamental frequency. Since the accuracy of the estimation is insufficient, a method of simultaneously estimating the spectrum envelope while searching the vicinity of the estimated fundamental frequency with ¼ pitch accuracy is adopted.

【０００９】このスペクトル包絡を求める手順は以下の
とおりである。まず、入力音声セグメントの信号ｓ(n)
を周波数分析窓ｗ_R(n)で範囲を−１１０から１１０サン
プルに制限した後、２５６段のＦＦＴにより周波数スペ
クトルＳ_w(m)を式（１）により得る。The procedure for obtaining this spectrum envelope is as follows. First, the signal s (n) of the input speech segment
Is limited to −110 to 110 samples by the frequency analysis window w _R (n), and then the frequency spectrum S _w (m) is obtained by the formula (1) by the FFT of 256 stages.

【数１】次に、基本周波数ωoのＬ次高調波（Ｌ＝1,2,...,Lma
x；(Lmax＋0.5)・ωo＜2π）を中心周波数として上記周
波数分析窓ｗ_R(n)の周波数スペクトルＥ_w(ω)の広がり
を持ち、式（２）で示す包絡値Ａ_Lのスペクトルの総和
で上記音声セグメントのスペクトルＳ_w(m)を近似し、そ
の個々の高調波の包絡値Ａ_Lを求める。[Equation 1] Next, the Lth harmonic of the fundamental frequency ωo (L = 1,2, ..., Lma
x; (Lmax + 0.5) · ωo <2π) with the center frequency as the center frequency and the spectrum of the frequency analysis window w _R (n) E _w (ω), and the envelope value A _L of the equation (2) Then, the spectrum S _w (m) of the voice segment is approximated by the sum of the above, and the envelope value A _L of each harmonic is obtained.

【数２】 [Equation 2]

【００１０】この時、ωoを１／４ピッチ精度で変化さ
せながら、誤差最小二乗法で各高調波の包絡値Ａ_Lを算
出し、求まったωoとＡ_Lによりスペクトル誤差評価値Ｅ
(ωo)At this time, the envelope value A _L of each harmonic is calculated by the error least squares method while changing ω o with 1/4 pitch accuracy, and the spectrum error evaluation value E is calculated from the obtained ω o and A _L.
(ωo)

【数３】が最小となるωoを基本周波数修正値とし、その時の各
高調波の振幅Ａ_Lをスペクトル包絡値とする。また、
（４）式に示すスペクトル誤差相対値Ｄkを閾値判定す
る事により各周波数帯域（ａl〜ｂl）の有声強度を推定
している。[Equation 3] Is the fundamental frequency correction value, and the amplitude A _L of each harmonic at that time is the spectrum envelope value. Also,
The voiced strength of each frequency band (al to bl) is estimated by determining the spectral error relative value Dk shown in the equation (4) as a threshold value.

【数４】 [Equation 4]

【００１１】ここで、ωoを探索する場合の変化ステッ
プと、実際の音声基本周波数の誤差が判定結果に及ぼす
影響について考察する。音声基本周波数は、個人や男女
により開きはあるが、男性では中心が約１２５Hz、女性
は約２倍の基本周波数を持ち、全体で７０Hzから４００
Hzの範囲にある。また評価する基本周波数の誤差はＬ次
高調波ではＬ倍の周波数誤差に拡大される。表１は音声
基本周波数ωo（＝２πｆo）のピッチ誤差ΔＰにより発
生する周波数誤差Δｆと２kHz付近の高調波領域での周
波数誤差Δｆ(２kHz)を式（５）により計算したもので
ある。ここでｆsは音声セグメントの標本化周波数であ
る。Here, the changing step when searching for ωo and the effect of the error of the actual voice fundamental frequency on the determination result will be considered. The fundamental frequency of voice varies between individuals and men and women, but the central frequency of men is approximately 125 Hz, and the fundamental frequency of women is approximately double, with total frequencies of 70 Hz to 400 Hz.
It is in the Hz range. The error of the fundamental frequency to be evaluated is expanded to L times the frequency error of the L-th harmonic. Table 1 shows the frequency error Δf generated by the pitch error ΔP of the voice fundamental frequency ωo (= 2πfo) and the frequency error Δf (2 kHz) in the harmonic region near 2 kHz calculated by the equation (5). Where fs is the sampling frequency of the voice segment.

【数５】 [Equation 5]

【００１２】[0012]

【表１】 [Table 1]

【００１３】表１からわかるように実際の音声基本周波
数（音声ピッチ）の推定誤差ΔＰが１ピッチの場合で
は、２kHz付近の高調波の周波数誤差は±２５〜±７５H
zまで及び、２５６段ＦＦＴで周波数分析した場合のス
ペクトル間隔8000／256＝31.25Hz以上になる。また、ｆ
o＝300Hzでは、ΔＰが0.5ピッチの時に２kHz付近の高調
波スペクトル誤差は38Hzになり、ΔＰが0.25の場合によ
うやく19Hzになる。一方、周波数分析窓をハミング窓と
して式（４）で計算したＦＦＴ２５６段の場合のスペク
トル評価誤差を図１７に示す。例えば、ｆo＝275Hzの基
本周波数を持つ標準的な女性の場合、基本周波数ピッチ
はＰi＝２９であるが、基本周波数の推定をＰi＝２８と
推定し、−１ピッチの誤差があった場合、推定した基本
周波数はｆo＝8000／(29-1)＝285.7(Hz)、基本周波数誤
差はΔｆo＝10.7Hzになり、図１７より正規化スペクト
ル誤差Ｄk＝0.1になる。更に、推定ピッチＰi＝２７の
場合では、−２ピッチの推定誤差、基本周波数誤差は２
１Hzになり、Ｄk＝0.3になり、正規化スペクトル誤差に
よる有声／無声判定へ与える影響が大きい。さらに、ハ
ーモニクス高調波の場合には基本周波数誤差は高調波次
数分拡大される。例えば１／４ピッチ誤差の場合では、
推定基本周波数は8000／(29-0.25)＝278.26(Hz)にな
り、基本周波数誤差Δｆo＝3.26Hzであるが、２kHz付近
では2000／275倍拡大して23.7Hzの周波数誤差になり、
図１７より正規化スペクトル誤差は0.01程度から0.35以
上にまで拡大し、有声／無声判定誤りの原因となる。有
声／無声情報やスペクトル包絡情報は音声セグメント全
体を特徴づけるパラメータでもあり、これらの推定の誤
りは、既に述べた様に符号化音声の品質に与える影響が
大きい。As can be seen from Table 1, when the estimation error ΔP of the actual voice fundamental frequency (voice pitch) is 1 pitch, the frequency error of the harmonic near 2 kHz is ± 25 to ± 75H.
Up to z, the spectrum interval becomes 8000/256 = 31.25 Hz or more when frequency analysis is performed by 256-stage FFT. Also, f
At o = 300 Hz, the harmonic spectrum error near 2 kHz becomes 38 Hz when ΔP is 0.5 pitch, and finally becomes 19 Hz when ΔP is 0.25. On the other hand, FIG. 17 shows the spectrum evaluation error in the case of FFT 256 stages calculated by the equation (4) using the frequency analysis window as the Hamming window. For example, in the case of a standard woman having a fundamental frequency of fo = 275 Hz, the fundamental frequency pitch is Pi = 29, but if the estimation of the fundamental frequency is estimated as Pi = 28 and there is an error of -1 pitch, The estimated fundamental frequency is fo = 8000 / (29-1) = 285.7 (Hz), the fundamental frequency error is Δfo = 10.7 Hz, and the normalized spectrum error Dk = 0.1 is obtained from FIG. Furthermore, in the case of the estimated pitch Pi = 27, the estimation error of -2 pitch and the fundamental frequency error are 2
1 Hz and Dk = 0.3, which has a large effect on the voiced / unvoiced determination due to the normalized spectral error. Further, in the case of harmonics harmonics, the fundamental frequency error is magnified by the harmonic order. For example, in the case of 1/4 pitch error,
The estimated fundamental frequency is 8000 / (29-0.25) = 278.26 (Hz), and the fundamental frequency error Δfo = 3.26Hz, but in the vicinity of 2kHz it is 2000/275 times expanded to a frequency error of 23.7Hz,
From FIG. 17, the normalized spectrum error expands from about 0.01 to 0.35 or more, which causes a voiced / unvoiced decision error. The voiced / unvoiced information and the spectral envelope information are also parameters that characterize the entire speech segment, and errors in these estimations have a large influence on the quality of coded speech, as already mentioned.

【００１４】また、ＩＭＢＥ方式における音声復号にお
いては、前記図１６に示したように、無声音ではランダ
ム雑音音源を周波数変換（ＦＦＴ）し、音声符号化パラ
メータにより指示された無声音の周波数範囲のみ抽出し
た後、逆周波数変換（ＩＦＦＴ）して無声音の音声を合
成している。この場合、周波数変換が２段必要であり、
特に、符号化音声品質を高めるため音声セグメントの更
新周期を短く設定した場合に演算負荷が大きいという欠
点がある。Further, in the voice decoding in the IMBE system, as shown in FIG. 16, a random noise source is frequency-converted (FFT) for unvoiced sound, and only the frequency range of unvoiced sound designated by the voice coding parameter is extracted. Then, inverse frequency conversion (IFFT) is performed to synthesize unvoiced voice. In this case, two steps of frequency conversion are required,
In particular, there is a drawback that the calculation load is large when the update period of the voice segment is set to be short in order to improve the encoded voice quality.

【００１５】そこで、本発明は、音声基本周波数の変化
に係らず、高精度の有声強度判定を行うことができ、ス
ペクトル雑音に対し誤り耐性の強い音声符号化方法およ
び装置を提供することを目的としている。また、演算負
荷の小さな音声復号方法および装置を提供することを目
的としている。Therefore, it is an object of the present invention to provide a speech coding method and apparatus capable of performing highly accurate voiced strength determination regardless of changes in the fundamental frequency of speech and having strong error resistance against spectrum noise. I am trying. Another object of the present invention is to provide a speech decoding method and device with a small calculation load.

【００１６】[0016]

【課題を解決するための手段】上記目的を達成するため
に、本発明の音声符号化パラメータの取得方法は、デジ
タル化された音声信号を、ある一定の繰り返し周期で、
所定のセグメント長で抜き取った音声セグメントから音
声符号化パラメータを取得する音声符号化パラメータの
取得方法であって、前記音声セグメントから音声基本周
波数を取得するステップ、前記音声基本周波数により決
定される可変長の適応窓により前記音声信号を抜き取っ
た可変長セグメントから第１の周波数スペクトルを取得
するステップ、前記音声信号を固定長の窓により抜き取
った固定長セグメントから第２の周波数スペクトルを取
得するステップ、前記第１の周波数スペクトルを複数の
周波数バンドに分割するステップ、前記第１の周波数ス
ペクトルの周波数スペクトルパワー、前記各周波数バン
ドの周波数スペクトルパワー、前記各周波数バンドに含
まれるハーモニクス数、各ハーモニクスのハーモニクス
振幅およびハーモニクス帯域幅により前記各周波数バン
ド毎の有声強度を決定するステップ、および、前記第２
の周波数スペクトルから前記音声基本周波数の整数倍の
周波数を中心としてその周波数帯域幅が音声基本周波数
になる様に分割した各ハーモニクス帯域のスペクトルパ
ワーを計算するステップを含むものである。また、前記
可変長の適応窓の長さは、前記可変長の適応窓の周波数
スペクトル分布の帯域幅と前記音声基本周波数の関係に
より決定されるものである。さらに、前記可変長の適応
窓は、前記音声基本周波数に対応する周期の４倍以上の
長さを持つハミング窓とされている。In order to achieve the above object, a method for acquiring a voice coding parameter according to the present invention is a method for converting a digitized voice signal into a predetermined repeating cycle,
A method for acquiring a voice coding parameter for acquiring a voice coding parameter from a voice segment extracted in a predetermined segment length, the step of acquiring a voice fundamental frequency from the voice segment, a variable length determined by the voice fundamental frequency Acquiring a first frequency spectrum from a variable length segment from which the audio signal is extracted by an adaptive window of, a step of acquiring a second frequency spectrum from a fixed length segment from which the audio signal is extracted by a fixed length window, Dividing the first frequency spectrum into a plurality of frequency bands, frequency spectrum power of the first frequency spectrum, frequency spectrum power of each frequency band, number of harmonics included in each frequency band, harmonics amplitude of each harmonics. And Harmo Determining a voiced strength for each of the respective frequency bands by box bandwidth, and the second
And calculating the spectral power of each harmonics band divided so that the frequency bandwidth becomes the voice fundamental frequency with a frequency that is an integral multiple of the voice fundamental frequency as the center. The length of the variable-length adaptive window is determined by the relationship between the bandwidth of the frequency spectrum distribution of the variable-length adaptive window and the voice fundamental frequency. Further, the variable-length adaptive window is a Hamming window having a length of four times or more the cycle corresponding to the voice fundamental frequency.

【００１７】さらにまた、本発明の音声復号方法は、デ
ジタル化された音声信号を、ある一定の繰り返し周期で
抜き取った音声セグメントの音声基本周波数と、該音声
セグメントの周波数スペクトルを音声基本周波数の整数
倍を中心としてその周波数帯域幅が音声基本周波数にな
る様に分割した各ハーモニクス帯域のスペクトルパワー
と、前記音声セグメントの周波数スペクトルを複数の周
波数バンドに分割した各周波数バンドが有声音か無声音
かを判別した判別情報からなる音声符号化パラメータに
よって音声を合成する音声復号方法であって、前記判別
情報が有声を示す前記周波数バンドでは、その中心周波
数が前記音声基本周波数の整数倍の周波数を持ち、且
つ、対応する前記ハーモニクス帯域のスペクトルパワー
と同等になる振幅を持った正弦波群を生成し、また、前
記判別情報が無声を示す周波数バンドでは、中心対称ラ
ンダム系列と中心反対称ランダム系列を雑音信号の周波
数スペクトル系列の実部と虚部と見なし、前記２つのラ
ンダム系列から該周波数バンドに対応する区間を抽出
し、対応する前記ハーモニクス帯域のスペクトルパワー
と同じになる様に振幅調整した後、逆フーリエ変換によ
りその実部を得て無声フレーム信号とし、１つ前のセグ
メントの無声フレーム信号と今回得た前記無声フレーム
信号間で線形補間することにより無声音声を生成した
後、前記生成した正弦波群と加算して合成音声を得るも
のである。Furthermore, in the voice decoding method of the present invention, the voice fundamental frequency of the voice segment obtained by extracting the digitized voice signal at a certain fixed cycle and the frequency spectrum of the voice segment are integers of the voice fundamental frequency. The spectrum power of each harmonics band divided so that its frequency bandwidth becomes the voice fundamental frequency, and the frequency band obtained by dividing the frequency spectrum of the voice segment into a plurality of frequency bands is voiced or unvoiced. A voice decoding method for synthesizing voice by a voice encoding parameter consisting of discriminated information, wherein the discriminant information has a frequency band in which a voice is present, the center frequency of which has an integral multiple of the voice fundamental frequency. In addition, the amplitude that is equal to the spectral power of the corresponding harmonic band is In the frequency band in which the discrimination information indicates unvoiced, the central symmetric random sequence and the central antisymmetric random sequence are regarded as the real part and the imaginary part of the frequency spectrum sequence of the noise signal, and An interval corresponding to the frequency band is extracted from one random sequence, amplitude adjustment is performed so as to be the same as the spectral power of the corresponding harmonics band, and the real part thereof is obtained by inverse Fourier transform to obtain an unvoiced frame signal. An unvoiced voice is generated by linearly interpolating between the unvoiced frame signal of the previous segment and the unvoiced frame signal obtained this time, and then added to the generated sine wave group to obtain a synthesized voice.

【００１８】さらにまた、本発明の音声符号化パラメー
タの取得装置は、デジタル化された音声信号を、ある一
定の繰り返し周期で、所定のセグメント長で抜き取った
音声セグメントから音声符号化パラメータを取得する音
声符号化パラメータの取得装置であって、前記音声セグ
メントから音声基本周波数を取得する手段、前記音声基
本周波数により決定される可変長の適応窓により前記音
声信号を抜き取った可変長セグメントにより第１の周波
数スペクトルを取得する手段、前記音声信号を固定長の
窓により抜き取った固定長セグメントにより第２の周波
数スペクトルを取得する手段、前記第１の周波数スペク
トルを複数の周波数バンドに分割する手段、前記第１の
周波数スペクトルから周波数スペクトルパワー、前記各
周波数バンドの周波数スペクトルパワー、前記各周波数
バンドに含まれるハーモニクス数、各ハーモニクスのハ
ーモニクス振幅およびハーモニクス帯域幅により前記各
周波数バンド毎の有声強度を決定する手段、および、前
記第２の周波数スペクトルから前記音声基本周波数の整
数倍の周波数を中心としてその周波数帯域幅が音声基本
周波数になる様に分割した各ハーモニクス帯域のスペク
トルパワーを計算する手段を有するものである。Furthermore, the speech coding parameter acquisition apparatus of the present invention acquires speech coding parameters from a speech segment obtained by extracting a digitized speech signal at a predetermined repetition period with a predetermined segment length. An apparatus for acquiring a voice coding parameter, comprising means for acquiring a voice fundamental frequency from the voice segment, and a first variable length segment obtained by extracting the voice signal by an adaptive window of a variable length determined by the voice fundamental frequency. Means for obtaining a frequency spectrum, means for obtaining a second frequency spectrum by a fixed-length segment obtained by extracting the audio signal through a window of a fixed length, means for dividing the first frequency spectrum into a plurality of frequency bands, and 1 frequency spectrum to frequency spectrum power, frequency band of each said Means for determining the voiced strength for each frequency band based on the number spectrum power, the number of harmonics included in each frequency band, the harmonics amplitude of each harmonics and the harmonics bandwidth, and the voice fundamental frequency from the second frequency spectrum It has means for calculating the spectral power of each harmonics band divided so that its frequency bandwidth becomes the fundamental frequency of the voice with the frequency being an integral multiple of.

【００１９】さらにまた、本発明の音声復号装置は、デ
ジタル化された音声信号を、ある一定の繰り返し周期で
抜き取った音声セグメントの音声基本周波数と、該音声
セグメントの周波数スペクトルを音声基本周波数の整数
倍を中心としてその周波数帯域幅が音声基本周波数にな
る様に分割した各ハーモニクス帯域のスペクトルパワー
と、前記音声セグメントの周波数スペクトルを複数の周
波数バンドに分割した各周波数バンドが有声音か無声音
かを判別した判別情報からなる音声符号化パラメータに
よって音声を合成する音声復号装置であって、前記判別
情報が有声を示す前記周波数バンドでは、その中心周波
数が前記音声基本周波数の整数倍の周波数を持ち、且
つ、対応する前記ハーモニクス帯域のスペクトルパワー
と同等になる振幅を持った正弦波群を生成する手段、中
心対称ランダム系列と中心反対称ランダム系列の雑音信
号を発生する手段、前記２つのランダム系列から前記判
別情報が無声を示す前記周波数バンドに対応する区間を
抽出する手段、抽出したランダム系列の雑音信号を、そ
のスペクトルパワーが前記判別情報が無声を示す前記周
波数バンドに対応するハーモニクス帯域のスペクトルパ
ワーと同じになる様に振幅調整する手段、該振幅調整さ
れたランダム系列の雑音信号を逆フーリエ変換し、無声
フレーム信号を生成する手段、１つ前のセグメントの無
声フレーム信号と今回の無声フレーム信号を線形補間す
ることにより無声音声を生成する手段、および、前記生
成された正弦波群と生成された無声音声を加算する手段
を有するものである。Furthermore, in the speech decoding apparatus of the present invention, a speech fundamental frequency of a speech segment obtained by extracting a digitized speech signal at a certain repetition period and a frequency spectrum of the speech segment are integers of the fundamental speech frequency. The spectrum power of each harmonics band divided so that its frequency bandwidth becomes the voice fundamental frequency, and the frequency band obtained by dividing the frequency spectrum of the voice segment into a plurality of frequency bands is voiced or unvoiced. A voice decoding device for synthesizing voice by a voice encoding parameter consisting of discriminated discrimination information, wherein the discrimination information has a frequency band indicating voiced voice, the center frequency of which has an integral multiple of the fundamental frequency of the voice, In addition, the amplitude that is equal to the spectral power of the corresponding harmonic band is Means for generating a sine wave group, a means for generating a noise signal of a central symmetric random sequence and a central antisymmetric random sequence, and a section corresponding to the frequency band in which the discrimination information is unvoiced, from the two random sequences. Means for adjusting the amplitude of the extracted random-sequence noise signal so that its spectral power becomes the same as the spectral power of the harmonics band corresponding to the frequency band in which the discrimination information indicates unvoiced, and the amplitude is adjusted. Means for inverse Fourier transforming a random sequence noise signal to generate an unvoiced frame signal, means for linearly interpolating the unvoiced frame signal of the preceding segment and the unvoiced frame signal this time, and an unvoiced speech signal, and It has means for adding the generated sine wave group and the generated unvoiced voice.

【００２０】[0020]

【発明の実施の形態】本発明の音声符号化パラメータの
取得方法、音声復号方法および装置は、例えば、音声符
号化、特に低ビットレートの音声符号化での音声符号化
パラメータを安定に推定する方法および装置、さらには
推定した音声符号化パラメータによって音声復号する方
法および装置に組み込み使用することができるが、ここ
では、前記図１４に示した音声符号化伝送装置の音声符
号化パラメータ抽出部３０２、および音声合成部３０８
に本発明を適応した場合を例にとって説明する。また、
本発明は、種々の音声符号化方式に適用することが可能
であるが、ここでは、ＩＭＢＥ方式に適用した場合を例
にとって説明する。BEST MODE FOR CARRYING OUT THE INVENTION The speech coding parameter acquisition method, speech decoding method and apparatus according to the present invention stably estimate speech coding parameters in speech coding, particularly in low bit rate speech coding. The speech coding parameter extracting unit 302 of the speech coding transmission apparatus shown in FIG. 14 can be incorporated and used in the method and apparatus, and the method and apparatus for speech decoding by the estimated speech coding parameter. , And speech synthesizer 308
The case where the present invention is applied to will be described as an example. Also,
The present invention can be applied to various speech coding systems, but here, the case of being applied to the IMBE system will be described as an example.

【００２１】図１は本発明の音声符号化パラメータの取
得方法が適用された音声符号化パラメータ抽出部のブロ
ック構成図である。この図に示すように、本発明の音声
符号化パラメータ抽出部は、入力音声信号３０１からそ
の基本周波数ωoを推定する基本周波数推定部４０１、
入力音声信号フレームを周波数分析して得た周波数スペ
クトルを複数の周波数バンドに分割し、各バンドごとに
その有声／無声を示す有声強度情報Ｖ[k]を出力する有
声強度計算部４０４、および、入力音声信号を固定長の
窓を用いて周波数分析し、スペクトル包絡|Ｂ(ω)|を計
算するスペクトル包絡計算部４０５の３つの部分から構
成されている。FIG. 1 is a block diagram of a speech coding parameter extraction section to which the speech coding parameter acquisition method of the present invention is applied. As shown in this figure, the speech coding parameter extraction unit of the present invention includes a fundamental frequency estimation unit 401 that estimates the fundamental frequency ωo of the input speech signal 301.
A voiced strength calculation unit 404 that divides a frequency spectrum obtained by frequency-analyzing the input voice signal frame into a plurality of frequency bands, and outputs voiced strength information V [k] indicating voiced / unvoiced for each band, and The input voice signal is frequency-analyzed using a fixed-length window, and is composed of three parts of a spectrum envelope calculation unit 405 that calculates a spectrum envelope | B (ω) |.

【００２２】ここで、本発明の音声符号化パラメータの
取得方法においては、従来方式のように合成音声と入力
音声の周波数スペクトル誤差を有声・無声の評価値とす
ることはせず、入力音声の周波数スペクトルのある周波
数バンドに含まれる音声のハーモニクス振幅を入力音声
スペクトル振幅から計測して、そのハーモニクス振幅を
有声強度あるいは有声／無声の判定の評価値としてい
る。そして、前記入力音声の周波数スペクトルを計測す
るにあたり、スペクトル分析窓の幅を入力音声の推定基
本周波数に応じて適応的に調節する事で周波数分解能を
調節し、むやみに時間分解能を低下する事なく、必要な
周波数分解能を得る手法を取っている。また、各周波数
バンドに含まれるハーモニクスの数を計測し、そのハー
モニクス数をもう一つの評価値として、期待されるハー
モニクス数にどれだけ近いかを判定する。更に、各ハー
モニクスの周波数の幅（ハーモニクス幅）を計測して前
記スペクトル分析窓により期待されるハーモニクス幅に
どれだけ近いかを判定することにより、判定の確実性を
向上させている。さらにまた、入力音声のパワー（エネ
ルギー）が小さい場合は無声であるとの知見から、入力
音声の周波数スペクトルパワーさらには各周波数バンド
の音声周波数スペクトルパワーも評価値に加えるように
している。また、スペクトル包絡の抽出にあたっては、
前記スペクトル分析窓と分析長の異なる固定長の窓を用
いた第２の周波数分析により入力音声の周波数スペクト
ルを取りだし、推定音声基本周波数の整数倍の周波数間
隔毎に音声基本周波数幅の領域にあるスペクトルパワー
の平方根として抽出している。In the speech coding parameter acquisition method of the present invention, the frequency spectrum error between the synthesized speech and the input speech is not used as a voiced / unvoiced evaluation value as in the conventional method, but the input speech is evaluated. The harmonics amplitude of the voice included in a certain frequency band of the frequency spectrum is measured from the input voice spectrum amplitude, and the harmonics amplitude is used as an evaluation value for voiced strength or voiced / unvoiced judgment. Then, in measuring the frequency spectrum of the input voice, the frequency resolution is adjusted by adaptively adjusting the width of the spectrum analysis window according to the estimated fundamental frequency of the input voice, without unnecessarily decreasing the time resolution. , We are taking the necessary frequency resolution. In addition, the number of harmonics included in each frequency band is measured, and the number of harmonics is used as another evaluation value to determine how close it is to the expected number of harmonics. Furthermore, the certainty of the determination is improved by measuring the frequency width of each harmonic (harmonic width) and determining how close it is to the expected harmonic width by the spectrum analysis window. Furthermore, based on the knowledge that the input voice is silent when the power (energy) is small, the frequency spectrum power of the input voice and the voice frequency spectrum power of each frequency band are added to the evaluation value. Also, in extracting the spectrum envelope,
A frequency spectrum of the input voice is extracted by a second frequency analysis using the spectrum analysis window and a fixed-length window having a different analysis length, and the frequency spectrum of the input voice is in the region of the voice fundamental frequency width at every frequency interval of an integral multiple of the estimated voice fundamental frequency. It is extracted as the square root of the spectral power.

【００２３】この理由について、図３、図４、図５を用
いてさらに説明する。図３はほとんど有声音声で出来て
いる音声セグメントの周波数スペクトル振幅値（対数
値）の例である。横軸は２５６点の高速離散フーリエ変
換（ＦＦＴ）した場合の離散周波数である。この図に示
すように、スペクトル振幅にはある一定の間隔で適度の
幅を持った明瞭な高調波スペクトルが観測されており、
その対数振幅や幅も広範囲の周波数にわたり安定な振幅
を持っている。この事から、ある周波数バンド内のハー
モニクス振幅とその数は、基本周波数ωoの推定誤差の
影響を受けずに計測できる事が予想できる。また、図４
は無声音声が多い音声セグメントの周波数スペクトル振
幅値（対数値）の例である。この場合は、定められた周
波数バンド内でのハーモニクス振幅やハーモニクス幅は
小さく、また一定レベル以上のハーモニクスの数も少な
くなっている事が読みとれ、その値は基本周波数ωoの
推定誤差Δωoの影響をあまり受けない事も読み取れ
る。以上の考察により、有声／無声の判定に、周波数ス
ペクトル振幅対数値から計測したハーモニクス振幅、あ
る閾値以上の振幅を持った有効なハーモニクス数、ハー
モニクスの幅、さらには入力音声のパワー、周波数バン
ドの音声パワーを判定評価に使用するようにしている。The reason for this will be further described with reference to FIGS. 3, 4 and 5. FIG. 3 shows an example of the frequency spectrum amplitude value (logarithmic value) of a voice segment made up of almost voiced voice. The horizontal axis represents the discrete frequency when the fast discrete Fourier transform (FFT) of 256 points is performed. As shown in this figure, a clear harmonic spectrum with a certain width is observed at a certain interval in the spectrum amplitude,
Its logarithmic amplitude and width also have stable amplitude over a wide range of frequencies. From this, it can be expected that the harmonics amplitude and its number within a certain frequency band can be measured without being affected by the estimation error of the fundamental frequency ωo. Also, FIG.
Is an example of the frequency spectrum amplitude value (logarithmic value) of a voice segment with many unvoiced voices. In this case, it can be seen that the harmonics amplitude and the width of the harmonics within the defined frequency band are small, and the number of harmonics above a certain level is also small, and that value is influenced by the estimation error Δωo of the fundamental frequency ωo. You can read that you do not receive much. Based on the above consideration, in the voiced / unvoiced determination, the harmonics amplitude measured from the frequency spectrum amplitude logarithmic value, the number of effective harmonics with an amplitude above a certain threshold, the width of the harmonics, the power of the input voice, and the frequency band The voice power is used for judgment and evaluation.

【００２４】また、周波数分析窓の長さ（時間範囲）Ｔ
_w(sec)又はＬ_w(サンプル)と音声基本周波数ｆo（Hz）又
はピッチＰ(サンプル)の関係を考察すると、図５の様に
スペクトル振幅には基本周波数の整数倍にハーモニクス
中心が現れ、周波数分析窓がハミング窓の場合には各ハ
ーモニクスの帯域幅は４／Ｔ_wになる。従って、ハーモ
ニクスの谷が隣のハーモニクスの谷より中心に侵入しな
い事を条件として、式（６）により周波数分析窓の長さ
を決める。The length of the frequency analysis window (time range) T
Considering the relationship between the _w (sec) or L _w (sample) and the voice fundamental frequency fo (Hz) or the pitch P (sample), as shown in FIG. 5, the harmonic center appears in the spectrum amplitude at an integral multiple of the fundamental frequency. When the frequency analysis window is the Hamming window, the bandwidth of each harmonic is 4 / T _w . Therefore, the length of the frequency analysis window is determined by the equation (6) on condition that the valley of harmonics does not enter the center from the valley of adjacent harmonics.

【数６】 [Equation 6]

【００２５】なお、周波数分析窓の長さは、式（６）に
より基本ピッチの４倍として基本周波数に比例して変化
させても良いが、実用的には基本ピッチにより何段階か
に分類して設定しても良い。例えば、ピッチが２０増加
する毎に切り替えて、式（７）により第１の周波数分析
窓をピッチの変化に応じて設定しても良い。The length of the frequency analysis window may be changed to be four times the basic pitch in accordance with the equation (6), and may be changed in proportion to the basic frequency. You may set it. For example, switching may be performed every time the pitch increases by 20, and the first frequency analysis window may be set according to the change in pitch according to the equation (7).

【数７】ここでceil(x)はｘを超える最小の整数を与える関数で
あり、また、分析窓長は中心対称の奇数で且つ分析窓長
がピッチ範囲より想定される長さ以外になる事を防止す
るため制限をしている。図６に、抽出した基本ピッチＰ
に対するＬ_wの設定例を示す。また、窓関数として式
（８）に示す適応ハミング窓を用いた場合の窓関数値ｗ
_s(n)の計算結果を図７に示す。（ただしＦＦＴ段数Ｍ＝
512として計算した）[Equation 7] Here, ceil (x) is a function that gives the smallest integer exceeding x, and the analysis window length is an odd number with central symmetry and prevents the analysis window length from becoming a length other than that expected from the pitch range. Because of that there are restrictions. FIG. 6 shows the extracted basic pitch P.
An example of setting L _w with respect to Further, the window function value w when the adaptive Hamming window shown in Expression (8) is used as the window function
The calculation result of _s (n) is shown in FIG. (However, the number of FFT stages M =
Calculated as 512)

【数８】 [Equation 8]

【００２６】さらに、本発明では、基本周波数ｆoの間
に何本のスペクトル本数ｐを設定するかで必要なＦＦＴ
段数のＭを決める事が出来る。例えば標本化周波数が８
kHzの場合には式（９）により決定できる。Further, in the present invention, the FFT required depending on how many spectrums p are set between the fundamental frequencies fo.
You can decide the number of steps M. For example, the sampling frequency is 8
In the case of kHz, it can be determined by the equation (9).

【数９】ここで、最小の基本周波数を60Hzとした場合では、ｐ＝
４の場合にはＭ＝５３３になり、５１２段程度のＦＦＴ
段数が必要である事がわかる。[Equation 9] If the minimum fundamental frequency is 60Hz, p =
In case of 4, M = 533, and FFT of about 512 stages
You can see that the number of steps is required.

【００２７】以上のように、本発明の音声符号化パラメ
ータ取得方法によれば、音声基本周波数（ピッチ）によ
り適応的に窓サイズが設定される周波数分析窓を用い、
基本周波数の範囲から決定される段数のＦＦＴにより周
波数スペクトルを得ているため、ハーモニクス間のスペ
クトル相互干渉を少なくすることができる。そして、こ
のようにして得た周波数スペクトルから、ハーモニクス
振幅、ハーモニクス数、ハーモニクス幅、フレームエネ
ルギー（フレームパワー）、バンドエネルギー（バンド
パワー）を計測し、各周波数バンド毎の有声／無声情報
を得るようにしているため、有声／無声の判定に詳細ピ
ッチが不要となり、ピッチ誤りに起因する判定誤りの可
能性を減少させることができる。また、スペクトル包絡
情報の取得は、前記有声／無声判定とは分離して行うよ
うにし、固定窓サイズのＦＦＴによりハーモニクス帯域
毎のエネルギーの平方根から得るようにしている。した
がって、無声／有声判定の誤りがあったとしてもそれに
影響を受けないスペクトル包絡情報を得ることができ
る。As described above, according to the speech coding parameter acquisition method of the present invention, the frequency analysis window in which the window size is adaptively set by the speech fundamental frequency (pitch) is used,
Since the frequency spectrum is obtained by the FFT having the number of stages determined from the range of the fundamental frequency, it is possible to reduce the mutual interference of spectrum between harmonics. Then, from the frequency spectrum thus obtained, the harmonics amplitude, the number of harmonics, the harmonics width, the frame energy (frame power), and the band energy (band power) are measured to obtain voiced / unvoiced information for each frequency band. Therefore, the detailed pitch is unnecessary for voiced / unvoiced determination, and the possibility of a determination error due to a pitch error can be reduced. Further, the spectrum envelope information is obtained separately from the voiced / unvoiced determination, and is obtained from the square root of energy for each harmonic band by FFT with a fixed window size. Therefore, even if there is an error in the unvoiced / voiced determination, it is possible to obtain spectrum envelope information that is not affected by the error.

【００２８】図２は、本発明の音声復号方法が適用され
た音声復号装置の一構成例を示すブロック図である。こ
の図に示すように、音声復号装置は、復号された音声パ
ラメータのうちの基本周波数ωoとスペクトル包絡|Ｂ
(ω)|が入力され有声音声を合成する有声音声合成部５
０８、復号された音声符号化パラメータのうちの有声／
無声情報（有声強度情報）Ｖ[k]および前記スペクトル
包絡|Ｂ(ω)|が入力される無声音声合成部５０９、およ
び、加算部５０７から構成されている。ここで、有声音
声合成部５０８は前記図１６に示した従来の音声合成部
５０８と同様の高調波音源部５０１と高調波加算部５０
２から構成されており、高調波音源部５０１は、基本周
波数ωoおよび有声／無声情報に基づいて、該基本周波
数ωoおよび有声とされた周波数バンドに対応するその
高調波信号を発生し、前記スペクトル包絡情報|Ｂ(ω)|
に基づいて、それら各周波数の信号の振幅を制御して、
高調波加算部５０２でそれらを加算する。FIG. 2 is a block diagram showing an example of the configuration of a speech decoding apparatus to which the speech decoding method of the present invention is applied. As shown in this figure, the speech decoding apparatus has a fundamental frequency ωo and a spectral envelope | B of the decoded speech parameters.
(ω) | is input and the voiced voice synthesis unit 5 that synthesizes voiced voice
08, voiced / out of decoded speech coding parameters
The unvoiced information (voiced strength information) V [k] and the spectrum envelope | B (ω) | are input, and are composed of an unvoiced speech synthesis unit 509 and an addition unit 507. Here, the voiced voice synthesis unit 508 is similar to the conventional voice synthesis unit 508 shown in FIG.
The harmonic sound source unit 501 generates a harmonic signal corresponding to the fundamental frequency ωo and the voiced frequency band based on the fundamental frequency ωo and voiced / unvoiced information, and the spectrum Envelope information | B (ω) |
Based on, control the amplitude of the signal of each of those frequencies,
The harmonic addition unit 502 adds them.

【００２９】また、無声音声合成部５０９は、対称ラン
ダム系列発生部２０１、反対称ランダム系列発生部２０
２、ランダム系列抽出部２０３、逆周波数変換部２０４
およびフレーム補間部２０５から構成されている。そし
て、有声／無声判別情報Ｖ[k]が無声を示す周波数バン
ドでは、対称ランダム系列発生部２０１で発生される中
心対称ランダム系列と反対称ランダム系列発生部２０２
で発生される中心反対称ランダム系列を雑音信号の周波
数スペクトル系列の実部と虚部と見なし、前記ランダム
系列抽出部２０３において前記２つのランダム系列から
対応する無声の周波数バンドを抽出し、そのパワーを対
応した無声のハーモニクス帯域のパワーと同じになる様
に振幅調整した後、逆周波数変換部２０４で逆フーリエ
変換（ＩＦＦＴ）することによりその実部を得てこれを
無声フレーム信号とし、フレーム補間部２０５で１つ前
の無声フレーム信号とフレーム間で線形補間することに
より無声音声を生成した後、加算器５０７において前記
生成した正弦波群と加算して合成音声を得るようにして
いる。The unvoiced speech synthesizer 509 includes a symmetric random sequence generator 201 and an antisymmetric random sequence generator 20.
2, random sequence extraction unit 203, inverse frequency conversion unit 204
And a frame interpolating unit 205. Then, in the frequency band in which the voiced / unvoiced discrimination information V [k] indicates unvoiced, the centrally symmetric random sequence and the antisymmetric random sequence generation unit 202 generated by the symmetric random sequence generation unit 201.
The central antisymmetric random sequence generated in step S1 is regarded as the real part and the imaginary part of the frequency spectrum sequence of the noise signal, and the corresponding random frequency band is extracted from the two random sequences in the random sequence extraction unit 203, and its power After the amplitude is adjusted to be the same as the power of the corresponding unvoiced harmonics band, the inverse frequency transform unit 204 performs an inverse Fourier transform (IFFT) to obtain its real part, which is used as an unvoiced frame signal. At 205, an unvoiced voice is generated by linearly interpolating between the previous unvoiced frame signal and the frame, and then added by the adder 507 with the generated sine wave group to obtain a synthesized voice.

【００３０】すなわち、本発明の音声復号方法において
は、従来方式の様に雑音音源からＦＦＴにより雑音音源
に対応した周波数スペクトルを作成するのではなく、対
称ランダム系列発生部２０１および反対称ランダム系列
発生部２０２から発生されるランダム雑音シーケンスか
ら、直接、雑音音源に相当する周波数スペクトルを作成
する方法を取っている。そして、その周波数スペクトル
から無声周波数バンドに対応する周波数帯域を抽出し、
逆ＦＦＴによって実時間軸での無声音声を生成した後、
フレーム間で補間重みが１になる線形補間によって必要
なフレーム長さの無声音を合成するようにしている。こ
れにより、１回の逆フーリエ変換のみで無声音声を合成
することが可能となり、演算量を少なくすることが可能
となる。That is, in the speech decoding method of the present invention, the frequency spectrum corresponding to the noise source is not created from the noise source by FFT as in the conventional method, but the symmetric random sequence generator 201 and the antisymmetric random sequence generator are generated. A method of directly creating a frequency spectrum corresponding to a noise source from the random noise sequence generated by the unit 202 is used. Then, the frequency band corresponding to the unvoiced frequency band is extracted from the frequency spectrum,
After generating unvoiced speech on the real-time axis by inverse FFT,
An unvoiced sound having a required frame length is synthesized by linear interpolation in which the interpolation weight becomes 1 between frames. As a result, unvoiced speech can be synthesized by only one inverse Fourier transform, and the amount of calculation can be reduced.

【００３１】ここで、前記ランダムシーケンスの発生に
条件を設定する必要がある。これは、逆ＦＦＴによって
時間軸シーケンスに変換した場合に、虚数部分が発生せ
ず、ＦＦＴスペクトルの全パワーが実時間軸シーケンス
に現れるようにする条件と同じである。この条件は式
（１０）で表現できる。Here, it is necessary to set conditions for the generation of the random sequence. This is the same as the condition that the imaginary part does not occur and the total power of the FFT spectrum appears in the real time axis sequence when converted to the time axis sequence by the inverse FFT. This condition can be expressed by equation (10).

【数１０】ここで、Ｓw(m)は周波数スペクトルと見なしたランダム
シーケンス、Reは実部、Imは虚部、ｍは周波数スペクト
ルでｍ＝０の時がＤＣ成分を表す。[Equation 10] Here, Sw (m) is a random sequence regarded as a frequency spectrum, Re is a real part, Im is an imaginary part, m is a frequency spectrum, and when m = 0 represents a DC component.

【００３２】前記図１に示した本発明の音声符号化方法
が適用された音声符号化装置についてさらに詳細に説明
する。図１において、音声入力端子３０１から入力され
た８kHz程度の標本化周波数で標本化された音声デジタ
ル信号は、基本周波数推定部４０１に入力され、ここ
で、例えば２０msecの時間間隔毎に一定長の音声セグメ
ント（フレーム）を取り出し、そのセグメント内での音
声基本周波数ωoを推定する。基本周波数の推定方法に
は、自己相関を用いる方法や、ケプストラムを用いる方
法がある事は前述の通りである。The speech coding apparatus to which the speech coding method of the present invention shown in FIG. 1 is applied will be described in more detail. In FIG. 1, an audio digital signal sampled at a sampling frequency of about 8 kHz, which is input from an audio input terminal 301, is input to a fundamental frequency estimation unit 401, and here, for example, a fixed length is set at every 20 msec time interval. A voice segment (frame) is taken out, and a voice fundamental frequency ωo in the segment is estimated. As described above, the method of estimating the fundamental frequency includes the method of using the autocorrelation and the method of using the cepstrum.

【００３３】また、前記音声デジタル信号は、有声強度
計算部４０４およびスペクトル包絡計算部４０５にも入
力される。スペクトル包絡計算部４０５において、第２
スペクトル計算部１１１は、該音声セグメントを固定窓
処理部１１０でハミング窓等の窓関数で窓処理した信号
を高速フーリエ変換（ＦＦＴ）することにより離散的な
周波数スペクトル値Ｂ[m]を計算する。デジタル音声入
力信号の標本化周波数をｆsとし、２５６点のＦＦＴを
行った場合、計算される周波数スペクトルＢ[m]は次の
式（１１）で表される周波数間隔ｆd毎に計算される。The voice digital signal is also input to the voiced strength calculation unit 404 and the spectrum envelope calculation unit 405. In the spectrum envelope calculation unit 405, the second
The spectrum calculation unit 111 calculates a discrete frequency spectrum value B [m] by performing a fast Fourier transform (FFT) on the signal obtained by windowing the speech segment with a window function such as a Hamming window in the fixed window processing unit 110. . When the sampling frequency of the digital audio input signal is fs and the FFT of 256 points is performed, the calculated frequency spectrum B [m] is calculated for each frequency interval fd represented by the following formula (11).

【数１１】 [Equation 11]

【００３４】スペクトルパワー計算部１１２は、前記基
本周波数推定部４０１で推定された基本周波数ωoの整
数倍の周波数を中心とし該基本周波数と等しい帯域幅を
有する各ハーモニクス帯域毎に、前記周波数スペクトル
Ｂ[m]の二乗和の平方根を算出し、これをスペクトル包
絡値|Ｂ(ω)|として出力する。The spectrum power calculation unit 112 has the frequency spectrum B for each harmonic band having a bandwidth equal to the fundamental frequency centered on a frequency that is an integral multiple of the fundamental frequency ωo estimated by the fundamental frequency estimation unit 401. The square root of the sum of squares of [m] is calculated, and this is output as the spectrum envelope value | B (ω) |.

【００３５】有声強度計算部４０４は、適応窓処理部１
０１、第１スペクトル計算部１０２、フレームエネルギ
ー計算部１０３、バンドエネルギー計算部１０４、対数
変換部１０５、バンドハーモニクス振幅計算部１０６、
バンドハーモニクス幅計算部１０７、バンドハーモニク
ス数計算部１０８、有声強度判定部１０９により構成さ
れる。The voiced strength calculation unit 404 is adapted to the adaptive window processing unit 1.
01, the first spectrum calculation unit 102, the frame energy calculation unit 103, the band energy calculation unit 104, the logarithmic conversion unit 105, the band harmonics amplitude calculation unit 106,
The band harmonics width calculation unit 107, the band harmonics number calculation unit 108, and the voiced strength determination unit 109 are configured.

【００３６】適応窓処理部１０１は、音声信号ｓ(n)に
対し、前記基本周波数推定部４０１で推定された音声基
本周波数ωoから前述した式（６）〜（８）で適応的に
設定した長さのハミング窓で窓処理を行い、第１スペク
トル変換部１０２で式（１２）に示すＦＦＴにより音声
セグメントの周波数スペクトルＡ[m]を得る。ここでＭ
はＦＦＴサンプル数である。The adaptive window processing unit 101 adaptively sets the voice signal s (n) from the voice fundamental frequency ωo estimated by the fundamental frequency estimating unit 401 by the above equations (6) to (8). Window processing is performed using a Hamming window of length, and the first spectrum conversion unit 102 obtains the frequency spectrum A [m] of the voice segment by the FFT shown in Expression (12). Where M
Is the number of FFT samples.

【数１２】 [Equation 12]

【００３７】フレームエネルギー計算部１０３は、周波
数スペクトルＡ[m]から、前記適応窓によるエネルギー
低下分を補償したフレームの平均エネルギー（「フレー
ムエネルギー」あるいは「フレームパワー」と呼ぶ）Ｅ
fを式（１３）により計算する。ここで、第２項により
窓関数によるエネルギー減少を補償している。The frame energy calculation unit 103 calculates from the frequency spectrum A [m] the average energy (referred to as "frame energy" or "frame power") E of the frame in which the amount of energy decrease due to the adaptive window is compensated.
f is calculated by the equation (13). Here, the second term compensates the energy decrease due to the window function.

【数１３】 [Equation 13]

【００３８】バンドエネルギー計算部１０４は、各周波
数バンド毎の平均エネルギー（「バンドエネルギー」あ
るいは「バンドパワー」と呼ぶ）Ｅb[k]（k=1,...,K）
を計算するものであり、バンドエネルギーＥb[k]は、第
ｋバンドのスペクトル区間を[ａk,ｂk]とすると、次の
式（１４）で表わされる。The band energy calculation unit 104 calculates the average energy for each frequency band (referred to as "band energy" or "band power") Eb [k] (k = 1, ..., K).
The band energy Eb [k] is expressed by the following equation (14), where [ak, bk] is the spectrum section of the k-th band.

【数１４】ここで、バンドの周波数範囲を基本周波数ωoの３倍に
設定する場合には、ａk，ｂkは、[Equation 14] Here, when the frequency range of the band is set to three times the fundamental frequency ωo, ak and bk are

【数１５】になる。ただし、floor(x)はｘを越えない最大の整数を
示す。[Equation 15] become. However, floor (x) indicates the maximum integer that does not exceed x.

【００３９】対数変換部１０５は、前記第１スペクトル
計算部１０２で計算された周波数スペクトル値|Ａ[m]|
の対数値を計算して、対数スペクトル振幅列ＬＡ[m]を
計算する。The logarithmic conversion unit 105 has the frequency spectrum value | A [m] | calculated by the first spectrum calculation unit 102.
And the logarithmic spectrum amplitude sequence LA [m] is calculated.

【数１６】 [Equation 16]

【００４０】バンドハーモニクス振幅計算部１０６は、
各周波数バンド内のハーモニクス振幅ＡhまたはＢhを計
算する。図８を用いて、ハーモニクス振幅の評価方法に
ついて説明する。ハーモニクス振幅はスペクトル振幅|
Ａ[m]|のデータ列の極大値とその最近傍の極小値の差で
あるが、ハーモニクス振幅が線形で表されている場合に
はその振幅はスペクトル強度に比例して増減する。そこ
で、スペクトル振幅の極大値Ｈ0とその前後の極小値Ｈ
1、Ｈ2との差を極大値Ｈ0で正規化した値Ａh1、Ａh2を
ハーモニクス振幅の評価値とすれば、スペクトル強度に
関係しないハーモニクス強度が評価できる。ここで、Ａ
h1とＡh2の小さい方をハーモニクス振幅評価値Ａhとす
ると、The band harmonics amplitude calculator 106 is
Calculate the harmonics amplitude Ah or Bh in each frequency band. A method of evaluating the harmonics amplitude will be described with reference to FIG. Harmonics amplitude is the spectral amplitude |
It is the difference between the maximum value of the data string of A [m] | and the minimum value of its nearest neighbor, but when the harmonics amplitude is expressed linearly, the amplitude increases or decreases in proportion to the spectrum intensity. Therefore, the maximum value H0 of the spectrum amplitude and the minimum values H before and after it
If the values Ah1 and Ah2 obtained by normalizing the difference between 1 and H2 with the maximum value H0 are used as the evaluation values of the harmonic amplitude, the harmonic intensity not related to the spectral intensity can be evaluated. Where A
If the smaller of h1 and Ah2 is the harmonics amplitude evaluation value Ah,

【数１７】となる。または、スペクトル極大値とスペクトル極小値
の比でハーモニクス強度を表したハーモニクス評価値Ｂ
hで評価しても良い。すなわち、[Equation 17] Becomes Alternatively, the harmonics evaluation value B, which represents the harmonics intensity by the ratio of the spectrum maximum value and the spectrum minimum value,
You may evaluate with h. That is,

【数１８】このＢh1やＢh2はハーモニクスのピークからの減衰量を
デシベル単位で表したもので、前記図３に示した音声の
スペクトル振幅測定結果からも、スペクトル周波数やス
ペクトル振幅の影響が少ない妥当なハーモニクス強度の
評価単位である事がわかる。[Equation 18] These Bh1 and Bh2 represent the amount of attenuation from the peak of the harmonics in decibel units, and from the result of the spectral amplitude measurement of the voice shown in FIG. You can see that it is an evaluation unit.

【００４１】バンドハーモニクス幅計算部１０７は、前
記対数変換部１０５の出力を受けて、前記各スペクトル
振幅極大値の直前の極小値と直後の極小値との間の周波
数間隔をそのハーモニクスの幅として算出する。バンド
ハーモニクス数計算部１０８は、前記対数変換部１０５
の出力を受けて、前記式（１５）で示した周波数バンド
の周波数スペクトル範囲に含まれるハーモニクスの数Ｈ
nを計算する。ハーモニクス数の計算は、ＦＦＴで得ら
れる離散的周波数ａkからｂkまで周波数スペクトル振幅
20log₁₀|Ａ[m]｜とその前後のスペクトル振幅20log₁₀|
Ａ[m-1]｜、20log₁₀|Ａ[m+1]｜を比較し、いずれの値よ
りも多きければｍ番目のスペクトルはスペクトルの極大
点でハーモニクスの中心周波数に最も近いスペクトルで
あると判断する。すなわち、The band harmonics width calculation unit 107 receives the output of the logarithmic conversion unit 105, and determines the frequency interval between the local minimum value immediately before and the local minimum value immediately after each of the spectral amplitude maximum values as the width of the harmonics. calculate. The band harmonics number calculation unit 108 includes the logarithmic conversion unit 105.
The number H of harmonics included in the frequency spectrum range of the frequency band shown in the above equation (15)
Calculate n. The harmonics number is calculated by the frequency spectrum amplitude from discrete frequencies ak to bk obtained by FFT.
20log ₁₀ | A [m] | and the spectrum amplitude before and after it 20log ₁₀ |
A [m-1] |, 20log ₁₀ | A [m + 1] | are compared. If there is more than either value, the mth spectrum is the spectrum closest to the center frequency of harmonics at the maximum point of the spectrum. To judge. That is,

【数１９】 [Formula 19]

【００４２】ここで、計算されたスペクトル対数値をそ
のまま用いて上記方法により極大値の数を数えると、ス
ペクトル雑音の影響を受けて雑音によるスペクトル極大
値を数えてしまう弊害があるため、予めスペクトル雑音
除去を行い雑音による誤計数を防止するようにしてい
る。このスペクトル雑音除去の方法について図９を参照
して説明する。図９の（Ａ）と（Ｃ）はスペクトル雑音
のある場合を示しており、ｍ＋１とｍ＋２のスペクトル
振幅が逆転している。連続した４本のスペクトルの組に
対してスペクトル振幅の差分の符号が−＋−または＋−
＋の場合には極大値があり、その極大値はそれぞれｍ＋
２番目かｍ＋１番目に現れて、その極大値の振幅はｍ＋
１番目とｍ＋２番目の振幅の差になることがわかる。そ
こで、ｍ＋１番目とｍ＋２番目のスペクトルの差が雑音
レベルを考慮したある閾値より小さければ、ｍ＋１番目
とｍ＋２番目のスペクトル振幅を両者の平均値に置きか
える事により、図９の（Ｂ）と（Ｃ）に示す様にスペク
トル雑音を除去する事が出来る。Here, if the calculated spectrum logarithmic value is used as it is to count the number of local maxima by the above method, there is a harmful effect of counting the spectral maxima due to noise due to the influence of spectral noise. Noise is removed to prevent erroneous counting due to noise. A method of removing the spectrum noise will be described with reference to FIG. 9A and 9C show the case where there is spectrum noise, and the spectrum amplitudes of m + 1 and m + 2 are reversed. The sign of the difference between the spectrum amplitudes for a set of four consecutive spectra is-+-or +-
In the case of +, there is a maximum value, and each maximum value is m +
It appears at the 2nd or m + 1st position, and the amplitude of the maximum value is m +
It can be seen that there is a difference between the first and m + 2th amplitudes. Therefore, if the difference between the m + 1-th and m + 2-th spectra is smaller than a certain threshold considering the noise level, the m + 1-th and m + 2-th spectrum amplitudes are replaced by the average value of the two, and thus (B) and (C in FIG. ), The spectral noise can be removed.

【００４３】有声強度判定部１０９は、前記フレームエ
ネルギー計算部１０３、バンドエネルギー計算部１０
４、対数変換部１０５、バンドハーモニクス振幅計算部
１０６、バンドハーモニクス幅計算部１０７およびバン
ドハーモニクス数計算部１０８で算出された、フレーム
エネルギー（フレームパワー）Ｅf、バンドエネルギー
（バンドパワー）Ｅb[k]、ハーモニクス振幅Ｈpw[n]
[0]、ハーモニクス幅Ｈpw[n][1]、ハーモニクス数Ｈnの
各パラメータを用いて、バンド毎の有声強度Ｖ[k]を計
算し出力する。ここで、Ｈpw[n][0]はその周波数バンド
におけるハーモニクスの振幅（ＡhあるいはＢh）の上位
ｎ個までの振幅で、Ｈpw[n][1]はそれに対応するハーモ
ニクス幅を表している。The voiced strength determination unit 109 includes the frame energy calculation unit 103 and the band energy calculation unit 10.
4. The frame energy (frame power) Ef and the band energy (band power) Eb [k] calculated by the logarithmic conversion unit 105, the band harmonics amplitude calculation unit 106, the band harmonics width calculation unit 107, and the band harmonics number calculation unit 108. , Harmonics amplitude Hpw [n]
[0], Harmonics width Hpw [n] [1], and Harmonics number Hn are used to calculate and output voiced strength V [k] for each band. Here, Hpw [n] [0] is the amplitude of up to n of the harmonic amplitudes (Ah or Bh) in that frequency band, and Hpw [n] [1] represents the corresponding harmonic width.

【００４４】この有声強度Ｖ[k]は、入力パラメータを
閾値判定して得られる２値の有声／無声の判定結果でも
良いし、入力パラメータの判定値の重み付き加算による
多値レベルを持った判定結果でも良い。あるいは、入力
パラメータの判定値の重み付き加算結果を閾値判定して
得られる２値の判定結果であっても良い。有声強度Ｖ
[k]として２値の判定結果を用いる場合は、各バンド毎
に有声か無声かを切り替えて音声合成を行うこととな
る。多値の判定結果（例えば、0.0〜1.0の範囲の値をと
る）の場合には、個々のバンド毎に合成した有声と無声
の合成音声を重みつき加算合成して最終合成音声を生成
すればよい。The voiced strength V [k] may be a binary voiced / unvoiced determination result obtained by thresholding the input parameter, or has a multivalued level by weighted addition of the input parameter determination values. The judgment result may be used. Alternatively, it may be a binary determination result obtained by thresholding the weighted addition result of the determination values of the input parameters. Voiced strength V
When a binary determination result is used as [k], voice synthesis is performed by switching between voiced and unvoiced for each band. In the case of a multi-valued determination result (for example, a value in the range of 0.0 to 1.0), if voiced and unvoiced synthetic voices synthesized for each band are weighted and added to generate a final synthetic voice, Good.

【００４５】図１０、図１１は、図１における前記有声
強度計算部４０４の処理内容を示す処理フロー図であ
る。有声強度計算が開始されると、ステップ１４０１で
基本周波数ωoと周波数スペクトル振幅|Ａ[m]|を受け取
り、１４０２でそれらをデータ領域に設定する。ここで
基本周波数ωoを使用しているが、これはバンド数やバ
ンドの周波数範囲を決定するのに使用するものであり、
有声強度の判定に直接使用するものではない。ステップ
１４０３ではバンド数Ｋを決めるが、各バンドにｈ本の
ハーモニクスを含む様に設定した場合には、バンド数Ｋ
は、FIG. 10 and FIG. 11 are processing flow charts showing the processing contents of the voiced strength calculation section 404 in FIG. When the voiced strength calculation is started, the fundamental frequency ωo and the frequency spectrum amplitude | A [m] | are received in step 1401 and set in a data area in 1402. Here we use the fundamental frequency ωo, which is used to determine the number of bands and the frequency range of the bands,
It is not used directly to judge voiced strength. In step 1403, the number of bands K is determined. However, if each band is set to include h harmonics, the number of bands K
Is

【数２０】で計算される。ここで、ceil(x)はｘ以上で最小の整数
を示す。例えば、ｈ＝３程度に設計してバンド数Ｋを計
算する。ｈとωoが決まれば、前記式（１５）により各
バンド番号ｋ＝1,2,...,Kに対して各バンドの中に入る
ＦＦＴスペクトルの周波数領域[ａk,ｂk]を計算する。[Equation 20] Calculated by Here, ceil (x) represents the smallest integer greater than or equal to x. For example, the number of bands K is calculated by designing about h = 3. Once h and ω o are determined, the frequency region [ak, bk] of the FFT spectrum that falls within each band is calculated for each band number k = 1, 2, ..., K by the above equation (15).

【００４６】ステップ１４０４では、フレームパワーＥ
f、および、バンドパワーＥb[k]（k=1,2,...,K）を、前
記式（１３）、式（１４）より計算する。In step 1404, the frame power E
f and band power Eb [k] (k = 1,2, ..., K) are calculated from the equations (13) and (14).

【数２１】 [Equation 21]

【数２２】次に、ステップ１４０５でスペクトル振幅|Ａ[m]|の対
数を取りデシベルに変換した対数振幅ＬＡ[m]を計算す
る。[Equation 22] Next, in step 1405, the logarithmic amplitude LA [m] obtained by taking the logarithm of the spectral amplitude | A [m] | and converting it into decibels is calculated.

【数２３】次に、１４０６でスペクトル雑音除去を行う。このスペ
クトル雑音除去の処理フロー（ステップ１４２１〜１４
２８）については後述する。[Equation 23] Next, in 1406, spectrum noise removal is performed. This spectrum noise removal processing flow (steps 1421 to 14)
28) will be described later.

【００４７】次に、有声強度Ｖ[k]の判定を行う。ま
ず、ステップ１４０７でフレーム全体のパワー（フレー
ムパワー）Ｅfが所定の閾値Th0より小さいフレームは音
声パワーが少なく雑音領域と考えられる場所であるの
で、ステップ１４１６ですべてのバンドを無声と設定し
てバンドループに入らずに終了する。一方、フレームパ
ワーＥfが閾値Th0より大きいフレームに対しては、ステ
ップ１４０８〜１４１５のバンドループに入る。このバ
ンドループでは、まずステップ１４０９でその周波数バ
ンドのパワーＥb[k]を評価し、所定の閾値Th1以下の場
合はそのバンドにはエネルギーが少ないと判断して、無
声Ｖ[k]＝０と設定する（ステップ１４１４）。閾値Th1
より大きい場合は、ステップ１４１０でバンドのハーモ
ニクス振幅Ｈpw[n][0]とハーモニクス幅Ｈpw[n][1]とハ
ーモニクス数Ｈnを計算する。なお、フローチャート中
では、ハーモニクス振幅Ｈpw[n][0]とハーモニクス幅Ｈ
pw[n][1]とをまとめてＨpw[n][2]と表記している。この
ステップ１４１０のハーモニクス振幅とハーモ二クス
幅、ハーモニクス数の計算の処理フロー（ステップ１４
３０〜１４５０）については、後述する。Next, the voiced strength V [k] is determined. First, in step 1407, a frame in which the power (frame power) Ef of the entire frame is smaller than a predetermined threshold value Th0 is a place where the voice power is small and is considered to be a noise region. Therefore, in step 1416, all bands are set to be unvoiced. Exit without entering a loop. On the other hand, for a frame whose frame power Ef is larger than the threshold Th0, the band loop of steps 1408 to 1415 is entered. In this band loop, first, in step 1409, the power Eb [k] of the frequency band is evaluated, and if it is less than or equal to a predetermined threshold Th1, it is determined that the band has little energy, and unvoiced V [k] = 0. It is set (step 1414). Threshold Th1
If it is larger, in step 1410, the band harmonics amplitude Hpw [n] [0], the harmonics width Hpw [n] [1], and the harmonics number Hn are calculated. In the flowchart, the harmonic amplitude Hpw [n] [0] and the harmonic width H
pw [n] [1] is collectively expressed as Hpw [n] [2]. The processing flow for calculating the harmonic amplitude, the harmonic width, and the number of harmonics in step 1410 (step 14
30-1450) will be described later.

【００４８】次に、ステップ１４１１でハーモニクス数
Ｈnを評価し、設定したバンド内ハーモニクス数ｈとの
差がある範囲外（閾値Th20以下、閾値Th21以上）であれ
ば無声Ｖ[k]＝０と判定する（ステップ１４１４）。例
えば、バンドあたりのハーモニクス数ｈを３本と設定し
た場合は２以下、４以上は無声音と判定する。次に、ス
テップ１４１２でハーモニクス振幅Ｈpw[n][0］とハー
モニクス幅Ｈpw[n][1]を評価し、それぞれ所定の閾値Th
3、Th6より小さい場合はハーモニクス振幅が少ないか、
そのバンド幅が狭い無声音と判定する（ステップ１４１
４）。ハーモニクス幅の閾値は適応窓処理部１０１で設
定された窓関数により適応的に設定される。たとえば
（８）式の適応ハミング窓の場合は、ハーモニクス幅は
適応ハミング窓スペクトル分布の正、負の第１の谷間距
離で表されるメインローブのスペクトル幅と関連づけて
考えるのが妥当である。ハミング窓のメインローブのス
ペクトル幅Ｍｗは窓長Ｌ_wとＦＦＴ段数Ｍにより（２
１）式で計算されるので、Th6はこの値と関連して実用
的な閾値を設定する。Next, in step 1411, the number of harmonics Hn is evaluated, and if there is a difference from the set number of harmonics in the band h, that is, outside the range (threshold value Th20 or less, threshold value th21 or more), then voiceless V [k] = 0. The determination is made (step 1414). For example, when the number of harmonics h per band is set to 3, 2 or less and 4 or more are determined to be unvoiced sound. Next, in step 1412, the harmonics amplitude Hpw [n] [0] and the harmonics width Hpw [n] [1] are evaluated, and predetermined threshold values Th are obtained.
3, if it is smaller than Th6, the harmonics amplitude is small,
It is determined that the voice has a narrow band width (step 141).
4). The threshold of the harmonics width is adaptively set by the window function set by the adaptive window processing unit 101. For example, in the case of the adaptive Hamming window of the formula (8), it is appropriate to consider the harmonics width in association with the spectrum width of the main lobe represented by the positive and negative first valley distances of the adaptive Hamming window spectrum distribution. Spectral width Mw of the main lobe of the Hamming window by window length L _w and FFT the number of stages M (2
Since it is calculated by the equation (1), Th6 sets a practical threshold value in relation to this value.

【数２４】同様に、Th3はハミング窓の第１の減衰量に関連してお
り、基本周波数と適応窓処理部の窓長が前記（６）式の
条件を満たしている場合には、ハミング窓の第１の谷の
減衰量をベースとして実用的な値を設定する。以上で無
声音と判定されなかったバンドは、ステップ１４１３で
有声バンド（Ｖ[k]＝１）と設定する。以上の動作を各
バンド毎に最大Ｋバンドまで計算し各有声強度Ｖ[k]に
設定し終えると、ステップ１４１７でこの有声強度計算
部４０４の処理を終える。[Equation 24] Similarly, Th3 is related to the first attenuation amount of the Hamming window, and if the fundamental frequency and the window length of the adaptive window processing unit satisfy the condition of the above equation (6), the first Hamming window Set a practical value based on the amount of attenuation in the valley. The band not determined as unvoiced sound is set as a voiced band (V [k] = 1) in step 1413. When the above operation is calculated up to the maximum K bands for each band and the setting of each voiced strength V [k] is completed, the processing of the voiced strength calculation unit 404 is finished in step 1417.

【００４９】このようにして、フレームパワーＥfにつ
いて閾値判定し（１４０７）、各バンドについて、その
バンドパワーＥb[k]について閾値判定し（１４０９）、
ハーモニクス数Ｈnについて閾値判定し（１４１１）、
さらに、ハーモニクス振幅Ｈpw［n][0]とハーモニクス
幅Ｈpw［n][1]について閾値判定（１４１２）して、こ
れらの判定結果から２値（０あるいは１）の有声強度Ｖ
[k]を決定することができる。なお、前述のように、有
声強度Ｖ[k]は、このような２値の情報に限られること
はなく、前記各閾値判定の結果に対してそれぞれ所定の
重みを付け、これらを加算することにより、多値（例え
ば、0.0〜1.0の範囲）の有声強度を算出するようにして
もよい。あるいは、重み付け加算の結果を所定の閾値を
用いて判定し、２値の値とすることもできる。In this way, the threshold value is determined for the frame power Ef (1407), and the threshold value is determined for the band power Eb [k] of each band (1409).
A threshold judgment is made for the harmonic number Hn (1411),
Further, threshold determination (1412) is performed on the harmonics amplitude Hpw [n] [0] and the harmonics width Hpw [n] [1], and a binary (0 or 1) voiced strength V is obtained from these determination results.
[k] can be determined. Note that, as described above, the voiced strength V [k] is not limited to such binary information, and a predetermined weight is given to each of the results of the threshold determinations, and these are added. Thus, a multivalued (for example, a range of 0.0 to 1.0) voiced strength may be calculated. Alternatively, the weighted addition result may be determined using a predetermined threshold value and set to a binary value.

【００５０】次に、前記ステップ１４０６のスペクトル
雑音除去のサブルーチン１４２１〜１４２８の処理内容
について説明する。ステップ１４２１で受け取ったスペ
クトル振幅の対数値ＬＡ[*]に対して、ステップ１４２
２〜１４２７のノイズ除去ループに入る。このノイズ除
去ループでは、連続した４点の周波数スペクトル振幅の
中に小さな極大点があるかどうかをチェックしている。
もし小さな極大点があれば、その極大点に最も振幅値が
近いスペクトル振幅との平均を取り、両者のスペクトル
振幅をその平均値で置き換え、小さなスペクトル極大点
を無くす処理を行う。Next, the processing contents of the spectral noise elimination subroutines 1421 to 1428 in step 1406 will be described. For the logarithmic value LA [*] of the spectral amplitude received in step 1421, step 142
The noise removal loop of 2-1427 is entered. In this noise elimination loop, it is checked whether or not there are small maximum points in the frequency spectrum amplitudes of four consecutive points.
If there is a small maximum point, the average with the spectrum amplitude having the closest amplitude value to the maximum point is averaged, and the spectrum amplitudes of both are replaced by the average value to eliminate the small spectrum maximum point.

【００５１】まず、ステップ１４２３で、連続した４点
の差分ｄ１、ｄ２、ｄ３を計算し、その符号ｓ１、ｓ
２、ｓ３を計算する。次に、ステップ１４２４でｓ１と
ｓ３が同じ符号でｓ２と異なるかを判定する。その結果
が真である場合は極大点が真中の２点のいずれかであ
る。前記図９に示した様に、極大点の振幅はｓ１とｓ２
が両方正、両方負の場合いずれでも同じｄ２の絶対値で
表され、ステップ１４２５により|ｄ２|が所定の閾値Th
4より小さな場合には、ステップ１４２６でＬＡ[m+1]と
ＬＡ[m+2]をそれらの平均値で置きかえる事で小さな極
大値の平滑除去を行う。以上の平滑化処理を最後の４点
のスペクトルが取れるまでバンド内で繰り返し、スペク
トル雑音による極大点の除去を行っている。なお、前記
図９から、極大点を除去すれば、その直前または直後の
極小点も同時に取れる事がわかる。First, at step 1423, the differences d1, d2, d3 of four consecutive points are calculated, and their codes s1, s are calculated.
2. Calculate s3. Next, in step 1424, it is determined whether s1 and s3 have the same sign and are different from s2. When the result is true, the maximum point is either of the two middle points. As shown in FIG. 9, the amplitudes of the maximum points are s1 and s2.
Are both positive and both are negative, they are represented by the same absolute value of d2. In step 1425, | d2 |
If it is smaller than 4, in step 1426, LA [m + 1] and LA [m + 2] are replaced with their average values to smooth out a small maximum value. The above smoothing process is repeated in the band until the spectra of the last four points are obtained, and the maximum points due to spectrum noise are removed. From FIG. 9, it is understood that if the maximum point is removed, the minimum point immediately before or immediately after the maximum point can be obtained at the same time.

【００５２】次に、前記ステップ１４１０のハーモニク
ス数Ｈnとハーモニクス振幅Ｈpw[n][0]、ハーモニクス
幅Ｈpw[n][1]の計算サブルーチン１４３０〜１４５０の
処理内容を図１１を用いて説明する。まず、ステップ１
４３１で対数スペクトル振幅ＬＡ[m]、基本周波数ωo、
バンド番号ｋ（k=1,2,...,K）、バンドスペクトル範囲
[ａk,ｂk]を入力として処理を開始する。ステップ１４
３２で、極大値の数を計数する極大値数カウンタＮpk、
極小値の数を計数する極小値数カウンタＮbtm、極大値
の振幅を格納する極大値メモリＡpk[*]、極小値の振幅
を格納する極小値メモリＡbtm[*]、ハーモニクスの振幅
を格納するハーモニクス振幅メモリＨpw[*][0]、ハーモ
ニクスの帯域幅を格納するハーモニクス幅メモリＨpw
[*][1]、ハーモニクスの数を計数するハーモニクス数カ
ウンタＨnをそれぞれ０に初期化する。また、ハーモニ
クス幅の開始点mb1と終了点mb2をそのバンドのスペクト
ル開始点ａkに設定する。Next, the processing contents of the calculation subroutines 1430 to 1450 for the number of harmonics Hn, the amplitude of harmonics Hpw [n] [0], and the width of harmonics Hpw [n] [1] in step 1410 will be described with reference to FIG. . First, step 1
At 431, the logarithmic spectrum amplitude LA [m], the fundamental frequency ωo,
Band number k (k = 1,2, ..., K), band spectrum range
The processing is started by inputting [ak, bk]. Step 14
32, a maximum value number counter Npk for counting the number of maximum values;
A minimum value counter Nbtm that counts the number of minimum values, a maximum value memory Apk [*] that stores the amplitude of the maximum value, a minimum value memory Abtm [*] that stores the amplitude of the minimum value, and a harmonics that stores the amplitude of harmonics. Amplitude memory Hpw [*] [0], harmonic width memory Hpw that stores the bandwidth of harmonics
[*] [1], The harmonics number counter Hn that counts the number of harmonics is initialized to 0. Further, the starting point mb1 and the ending point mb2 of the harmonics width are set to the spectrum starting point ak of the band.

【００５３】次に、ステップ１４３３でピーク・ボトム
計算ループ（ステップ１４３３〜１４４８）に入り、ス
テップ１４３４で対数スペクトル振幅ＬＡ[m]がＬＡ[m-
1]、ＬＡ[m+1]より大きい場合は、ＬＡ[m]が極大値と判
定しステップ１４３５へ移動する。ステップ１４３５
で、発見された極大値がバンド内で始めて発見された場
合であるかを検出し、始めて検出された場合には、ステ
ップ１４３６で極大値数カウンタＮpk及び極小値数カウ
ンタＮbtmに１を設定し、その極大値ＬＡ[m]を極大値メ
モリＡpk[1]に、初期値ＬＡ[ａk]を極小値メモリＡbtm
[1]に記録する。始めて検出されたものでないときは、
ステップ１４３７で極大値数カウンタＮpkをインクリメ
ントし、極大値ＬＡ[m]を極大値メモリＡpk[Ｎpk]に記
録する。Next, in step 1433, the peak / bottom calculation loop (steps 1433 to 1448) is entered, and in step 1434, the logarithmic spectrum amplitude LA [m] is LA [m-.
If it is larger than 1] and LA [m + 1], it is determined that LA [m] is the maximum value and the process proceeds to step 1435. Step 1435
Then, it is detected whether or not the detected maximum value is the first detected value in the band. If detected, the maximum value number counter Npk and the minimum value number counter Nbtm are set to 1 in step 1436. , The maximum value LA [m] is stored in the maximum value memory Apk [1], and the initial value LA [ak] is stored in the minimum value memory Abtm.
Record in [1]. If it is not the first detected,
In step 1437, the maximum value number counter Npk is incremented, and the maximum value LA [m] is recorded in the maximum value memory Apk [Npk].

【００５４】一方、前記ステップ１４３４のピーク検出
でピークでないと判定された場合には、引き続いてステ
ップ１４３８で極小値であるかの判定を行う。この判定
は、前記ステップ１４３４の極大値判定と同様な手法で
行い、この結果極小値と判定された場合には、ステップ
１４３９で極小値数カウンタＮbtmをインクリメント
し、極小値ＬＡ[m]を極小値メモリＡbtm[Ｎbtm]に記録
する。さらに、ハーモニクス幅の計算のため、mb1をmb2
に更新し、mb2には現在のスペクトル周波数ｍを設定す
る。極大値、極小値判定ともＮｏと判定された場合は、
ステップ１４４１でボトム／ピーク検出ループの最後で
あるかを判定し、最後のループの場合はステップ１４４
２に進み、極大値数カウンタ値Ｎpkと極小値数カウンタ
値Ｎbtmが同じであるか否かを判定する。同じである場
合には、ステップ１４４０で極小値数カウンタＮbtmを
インクリメントし、極小値メモリＡbtm[Ｎbtm]にＬＡ
[ｂk]を記録し、ハーモニクス幅の計算のため、mb1をmb
2に更新し、mb2には現在のバンドの最終スペクトル周波
数ｂkを設定する。この手順ですべての極大値が検出さ
れ、その前後の極小値も記録される。On the other hand, when it is determined in step 1434 that the peak is not detected, it is subsequently determined in step 1438 whether it is the minimum value. This determination is performed by the same method as the maximum value determination in step 1434, and when it is determined as the minimum value as a result, the minimum value number counter Nbtm is incremented in step 1439 and the minimum value LA [m] is minimized. The value is stored in the value memory Abtm [Nbtm]. In addition, mb1 is replaced by mb2 to calculate the harmonics width.
And the current spectrum frequency m is set to mb2. If the maximum value and the minimum value are both judged as No,
In step 1441, it is determined whether it is the end of the bottom / peak detection loop, and if it is the last loop, step 144
In step 2, it is determined whether the maximum value number counter value Npk and the minimum value number counter value Nbtm are the same. If they are the same, at step 1440 the minimum value number counter Nbtm is incremented and LA is stored in the minimum value memory Abtm [Nbtm].
Record [bk] and set mb1 to mb to calculate the harmonic width.
Update to 2 and set the final spectral frequency bk of the current band to mb2. This procedure detects all local maxima and records the local minima before and after it.

【００５５】次に、ステップ１４４３で、極小値が検出
された時点でその前に極大値があるかを判定し、もしあ
れば、その極大値を新たなハーモニクスとしてステップ
１４４４でその振幅Ｈaを計算する。ステップ１４４４
では、その極大値と前後の極小値との振幅差の平均値を
ハーモニクス振幅Ｈaとしている。しかし、ハーモニク
ス振幅形状の対称性を重要と考えて判定する場合には、
前記式（１８）で示した様に、最小値でＨaを計算して
も良い。次に、ステップ１４５０でハーモニクス幅Ｈw
を計算し、ステップ１４４５で、Ｈaを所定の閾値Th5と
比較し、閾値より大きい場合だけ、ハーモニクス数Ｈn
を更新し（ステップ１４４６）、上位ｎ個のハーモニク
ス振幅をＨpw[n][0]にハーモニクス幅Ｈpw[n][1]を記録
する（ステップ１４４７）。ステップ１４４７のmaxＮ
(Ｈpw[n],Ｈa,Ｈw)は、ＨaがＨpw[n][0]の配列要素の最
小値より大きい場合にハーモニクス振幅を示す第１の配
列要素の最小値と置きかえ、同時にハーモニクス幅を示
すその配列番号の第２要素をＨｗと置き換える関数を示
している。すべてのピーク／ボトム計算ループを終える
と、ステップ１４４９で、バンド内でのハーモニクスの
数Ｈnと上位ｎ個のハーモニクス振幅と幅Ｈpw[n][2]を
戻している。以上、有声強度計算部４０４の処理内容を
詳細なフロー図で説明した。Next, in step 1443, it is judged whether or not there is a local maximum value before the local minimum value is detected, and if there is, the local maximum value is used as a new harmonics to calculate its amplitude Ha in step 1444. To do. Step 1444
Then, the average value of the amplitude difference between the maximum value and the minimum values before and after the maximum value is defined as the harmonic amplitude Ha. However, if we consider the symmetry of the harmonics amplitude shape as important,
As shown in the above equation (18), Ha may be calculated with the minimum value. Next, in step 1450, the harmonics width Hw
Is calculated, and in step 1445, Ha is compared with a predetermined threshold value Th5.
Is updated (step 1446), and the harmonics width Hpw [n] [1] is recorded in the upper n harmonics amplitudes Hpw [n] [0] (step 1447). MaxN of step 1447
(Hpw [n], Ha, Hw) replaces the minimum value of the first array element that shows the harmonics amplitude when Ha is larger than the minimum value of the array element of Hpw [n] [0], and at the same time sets the harmonics width. The function which replaces the 2nd element of the shown sequence number with Hw is shown. When all the peak / bottom calculation loops have been completed, in step 1449, the number Hn of harmonics in the band, the amplitudes of the top n harmonics and the width Hpw [n] [2] are returned. The processing contents of the voiced strength calculation unit 404 have been described above with reference to the detailed flow chart.

【００５６】次に、前記図２に示した本発明の音声復号
方法が適用された音声復号装置における無声音声合成部
５０９について詳細に説明する。前述のように対称ラン
ダム系列発生部２０１は中心対称ランダム系列を発生
し、反対称ランダム系列発生部２０２は中心反対称ラン
ダム系列を発生する。ここで、中心対称ランダム系列
は、系列中のある１点（中心とする）からみて、振幅極
性ともに左右対称（すなわち、中心で折り返したとき
に、中心の左右にある系列が完全に一致している状態）
であるランダム系列をいい、中心反対称ランダム系列
は、中心からみて振幅は左右対称であるが極性は反転し
ているランダム系列をいう。実際には、逆周波数変換部
２０４において実行される逆フーリエ変換処理の段数
（逆ＦＦＴ段数とよぶ）の１／２の長さのランダム系列
を発生させ、これを発生順序の逆方向に複写することに
より、前記中心対称のランダム系列を発生させることが
でき、また、前記逆ＦＦＴ段数の１／２のランダム系列
を発生させ、これを発生順序の逆方向に極性を反転して
複写することにより、前記中心反対称のランダム系列を
発生させることができる。Next, the unvoiced speech synthesizer 509 in the speech decoding apparatus to which the speech decoding method of the present invention shown in FIG. 2 is applied will be described in detail. As described above, the symmetric random sequence generation unit 201 generates a central symmetric random sequence, and the antisymmetric random sequence generation unit 202 generates a central antisymmetric random sequence. Here, the centrally symmetric random sequence is symmetrical in both amplitude polarities with respect to a certain point (center) in the sequence (that is, when folded back at the center, the sequences on the left and right sides of the center are completely matched. State)
The central anti-symmetric random sequence is a random sequence whose amplitude is bilaterally symmetrical from the center but whose polarity is reversed. Actually, a random sequence having a length of 1/2 of the number of stages of inverse Fourier transform processing executed in the inverse frequency transform unit 204 (referred to as the number of inverse FFT stages) is generated, and this is copied in the reverse direction of the generation order. By doing so, it is possible to generate the centrally symmetric random sequence, and generate a random sequence of ½ of the inverse FFT stage number, and invert the polarity in the reverse direction of the generation order to copy the random sequence. , The central antisymmetric random sequence can be generated.

【００５７】このようにして前記対称ランダム系列発生
部２０１および反対称ランダム系列発生部２０２で発生
された２つのランダム系列は、ランダム系列抽出部２０
３に供給され、ここで、該２つのランダム系列を周波数
スペクトル系列の実部と虚部と見なして、前記有声／無
声情報により無声と指定された周波数バンドに対応する
区間の系列が抽出されるとともに、抽出したスペクトル
パワーを前記スペクトル包絡情報Ｂ|(ω)|に対応した無
声ハーモニクス帯域のパワーと同じになる様に振幅調整
される。この振幅調整された無声ハーモニクス帯域スペ
クトルは逆周波数変換部２０４において逆フーリエ変換
されて時間領域の信号に変換され、無声フレーム信号に
対応する該逆ＦＦＴの段数と同じ数の時間軸データ系列
が得られる。The two random sequences generated by the symmetric random sequence generating unit 201 and the antisymmetric random sequence generating unit 202 in this way are the random sequence extracting unit 20.
3 where the two random sequences are regarded as the real part and the imaginary part of the frequency spectrum sequence, and the sequence of the section corresponding to the frequency band designated as unvoiced by the voiced / unvoiced information is extracted. At the same time, the amplitude is adjusted so that the extracted spectrum power becomes the same as the power of the unvoiced harmonics band corresponding to the spectrum envelope information B | (ω) |. The amplitude-adjusted unvoiced harmonics band spectrum is inverse-Fourier-transformed by the inverse frequency converter 204 to be converted into a signal in the time domain, and the same number of time axis data sequences as the number of stages of the inverse FFT corresponding to the unvoiced frame signal are obtained. To be

【００５８】このようにして得られた例えば２５６個
（逆ＦＦＴ段数が２５６段の場合）のデータは、フレー
ム補間部２０５に入力され、前記音声セグメントの更新
周期に対応したデータ数（例えば、２０msec周期であれ
ば１６０個のデータ）に補間合成される。これは、前の
セグメントから得られた時間軸データとこのセグメント
の時間軸データとを補間重みの和が１になる条件で線形
補間するものである。このようにして合成された無声音
声は、前記加算器５０７に供給され、前述した有声音声
合成部５０８からの有声音声と加算される。The thus obtained, for example, 256 pieces of data (when the number of inverse FFT stages is 256) is input to the frame interpolating unit 205, and the number of data corresponding to the update period of the audio segment (for example, 20 msec). If it is a cycle, 160 pieces of data) are interpolated and synthesized. This is for linearly interpolating the time axis data obtained from the previous segment and the time axis data of this segment under the condition that the sum of the interpolation weights becomes 1. The unvoiced voice synthesized in this way is supplied to the adder 507 and added to the voiced voice from the voiced voice synthesis unit 508 described above.

【００５９】図１２は上述した無声音声合成の処理フロ
ーを示す図である。まず、ステップ１６０２でパラメー
タ復号化部からハーモニクスのスペクトル包絡情報|Ｂ
(ω)|、音声基本周波数ωo、バンド有声無声情報Ｖ[k]
を受け取り、バンド数Kmaxを（２０）式により再生す
る。ここで、各バンドに含まれるハーモニクス数ｈはシ
ステムで予め決定されている。フレームサイズFsizeは
予め設定している音声セグメント更新間隔で、ｆs＝800
0Hz、セグメント更新周期を１０msecとした場合はFsize
＝８０である。ステップ１６０３はＩＦＦＴ段数Ｍに２
５６を使用する場合はＦＦＴスペクトル数の実部、虚部
それぞれ２５６要素を０に初期化する。ステップ１６０
４はランダムＦＦＴスペクトル発生の初期化で、システ
ム立ち上げ時のみ必要で、連続音声再生時には不要であ
る。FIG. 12 is a diagram showing a processing flow of the above-mentioned unvoiced voice synthesis. First, in step 1602, the spectrum decoding information of harmonics | B from the parameter decoding unit.
(ω) |, voice fundamental frequency ωo, band voiced unvoiced information V [k]
Is received and the number of bands Kmax is reproduced by the equation (20). Here, the number of harmonics h included in each band is predetermined by the system. The frame size Fsize is a preset audio segment update interval, fs = 800
Fsize when 0Hz and segment update cycle is 10msec
= 80. Step 1603 is 2 for IFFT stage number M.
When 56 is used, the real part and the imaginary part of the FFT spectrum number are each initialized to 256 elements. Step 160
Reference numeral 4 is an initialization for generating a random FFT spectrum, which is necessary only when the system is started up and not necessary when reproducing continuous voice.

【００６０】ステップ１６０５〜１６１４は処理フレー
ムのバンド数の回数だけ処理されるループで、無声バン
ドの周波数帯域の無声音声のスペクトルをハーモニクス
帯域毎に再生加算し、フレームの無声音スペクトル全体
を再生するループである。ステップ１６０６は要素数が
ＩＦＦＴ段数の半分のランダムシーケンスを順次生成す
るもので、例えば、前記ＩＭＢＥ方式では（２２）式で
発生させており、本方式でも同様の方式で生成しても良
い。但し、ここでは実部と虚部用の２系列のランダムシ
ーケンスｕ[n]を発生し、ｕ[n]は直流成分を除去するた
め、53125／2を減じた値を用いる。Steps 1605 to 1614 are loops that are processed by the number of bands of the processing frame. A loop for reproducing and adding the spectrum of unvoiced voice in the frequency band of the unvoiced band for each harmonics band to reproduce the entire unvoiced sound spectrum of the frame. Is. In step 1606, a random sequence whose number of elements is half the number of IFFT stages is sequentially generated. For example, in the IMBE method, the random sequence is generated by Expression (22), and this method may be generated by the same method. However, here, a random sequence u [n] of two series for the real part and the imaginary part is generated, and u [n] is a value obtained by subtracting 53125/2 in order to remove the DC component.

【数２５】 [Equation 25]

【００６１】ステップ１６０７〜１６１３はハーモニク
スループであり、各バンドに含まれるハーモニクス数の
回数だけ処理される。まず、ステップ１６０８で、各バ
ンド内にある第ｌ次ハーモニクスのスペクトル範囲[ａ
l,ｂl]を（２３）式により計算し、ステップ１６０９
で、その範囲だけ前記ランダムシーケンスｕ[n]から抜
き取り、ｕ[ａl,ｂl]を取り出す。Steps 1607 to 1613 are a harmonics loop, and are processed by the number of harmonics contained in each band. First, in step 1608, the spectral range of the 1st order harmonics in each band [a
l, bl] is calculated by the equation (23), and step 1609 is performed.
Then, that range is extracted from the random sequence u [n] and u [al, bl] is extracted.

【数２６】ここで、Ｍは逆ＦＦＴ（ＩＦＦＴ）の段数である。[Equation 26] Here, M is the number of stages of inverse FFT (IFFT).

【００６２】次に、ステップ１６１０で取り出した抽出
スペクトルのパワーが１になる様に（２４）式によりス
ペクトルを正規化する。ここでＵ(m)は実部と虚部のラ
ンダムシーケンスu_real[m]とu_imag[m]をベクトルで表
現したものであり、Ｕ１(m)は正規化された抽出スペク
トルu_Real[m]とu_Imag[m]をベクトル表現したものであ
る。Next, the spectrum is normalized by the equation (24) so that the power of the extracted spectrum extracted in step 1610 becomes 1. Where U (m) is a vector representation of random sequences u_real [m] and u_imag [m] of real and imaginary parts, and U1 (m) is a normalized extracted spectrum u_Real [m] and u_Imag. This is a vector representation of [m].

【数２７】ステップ１６１１はハーモニクスのスペクトル包絡情報
|Ｂ(ω)|により、ハーモニクス帯域内のエネルギーが元
音声の帯域内エネルギーと同じになる様に（２５）式に
より抽出スペクトルを振幅調整する。[Equation 27] Step 1611 is the spectrum envelope information of harmonics.
By | B (ω) |, the amplitude of the extracted spectrum is adjusted by the equation (25) so that the energy in the harmonics band becomes the same as the energy in the band of the original voice.

【数２８】ここで最後の項のＭは、ステップ１６１６でのＭ段ＩＦ
ＦＴの出力が実時間信号レベルに合うために必要な係数
である。次に、ステップ１６１２でレベル調整された抽
出スペクトルを対応するＦＦＴスペクトルバッファーS_
real[M],S_imag[M]に設定する。[Equation 28] Here, M in the last term is the M stage IF in step 1616.
It is a coefficient necessary for the output of the FT to match the real-time signal level. Next, in step 1612, the level-adjusted extracted spectrum is converted into the corresponding FFT spectrum buffer S_.
Set to real [M], S_imag [M].

【００６３】以上の処理を各バンド、各バンド内のハー
モニクス毎に実行した後、ステップ１６１５に進み、前
記式（１０）の関係を満足する負の周波数のＦＦＴスペ
クトル部分を設定し、ステップ１６１６でのＭ段のＩＦ
ＦＴで得る時間軸信号の実部に、全スペクトルエネルギ
ーが集まり、虚部には信号が現れない様にしている。ス
テップ１６１７では現在フレームと前フレームから得た
それぞれサンプル数Ｍの信号から、式（２６）に示す補
間関数ws(n)のフレーム間補間によりフレームサンプル
数（Fsize）の無声音の復号音声を得る。そして、ステ
ップ１６１８で無声音の復号音声を報告し、図２の加算
部５０７にて別に合成された有声音声合成部５０８の音
声と加算し最終の合成音声を得る。図１３は、前記フレ
ーム補間関数ws(n)の例を示す図である。ここで、Ｌ1は
補間関数の一定レベル範囲、Ｌ2は補間最大範囲、Ｌ1〜
Ｌ2間は直線補間範囲である。After the above processing is executed for each band and each harmonics in each band, the process proceeds to step 1615 to set a negative frequency FFT spectrum portion satisfying the relation of the above equation (10), and at step 1616. M-stage IF
All spectral energy is collected in the real part of the time axis signal obtained by FT, and no signal appears in the imaginary part. In step 1617, the unvoiced decoded speech of the frame sample number (Fsize) is obtained from the signals of the sample number M obtained from the current frame and the previous frame by interframe interpolation of the interpolation function ws (n) shown in Expression (26). Then, in step 1618, the unvoiced decoded speech is reported and added to the speech of the voiced speech synthesis section 508 separately synthesized by the addition section 507 of FIG. 2 to obtain the final synthesized speech. FIG. 13 is a diagram showing an example of the frame interpolation function ws (n). Here, L1 is a constant level range of the interpolation function, L2 is a maximum interpolation range, and L1 to
Between L2 is a linear interpolation range.

【数２９】 [Equation 29]

【００６４】なお、以上においては、音声符号化方式と
してＩＭＢＥ方式を採用した音声符号化伝送装置の音声
符号化パラメータ抽出部に本発明の音声符号化パラメー
タの取得方法を適用した場合を例にとって説明したが、
本発明の音声符号化パラメータ抽出方法および装置は、
これに限られることはなく、ＭＥＬＰ（Mixed Excitati
on Linear Prediction）方式など、１フレームの周波数
スペクトルを複数の周波数バンドに分割し、各周波数バ
ンド毎に有声／無声を判定する場合に全く同様に適用す
ることができる。また、音声復号方式としてＩＭＢＥ方
式を採用した音声符号化伝送装置の無声音の音声復号部
に適用した場合を例にとって説明したが、本発明の無声
音声復号方法および装置は、これに限られることはな
く、ＭＥＬＰ方式など、１フレームの周波数スペクトル
を複数の周波数バンドに分割し、各周波数バンド毎に有
声／無声を判定する場合にも同様に適用することができ
る。In the above description, the case where the speech coding parameter acquisition method of the present invention is applied to the speech coding parameter extraction unit of the speech coding transmission apparatus that employs the IMBE method as the speech coding method will be described as an example. However,
A speech coding parameter extraction method and apparatus of the present invention are
The MELP (Mixed Excitati) is not limited to this.
The method can be applied in exactly the same manner when dividing a frequency spectrum of one frame into a plurality of frequency bands and determining voiced / unvoiced for each frequency band, such as on linear prediction. Further, the case where the present invention is applied to the unvoiced voice decoding unit of the voice encoding / transmission device that employs the IMBE system as the voice decoding system has been described as an example, but the unvoiced voice decoding method and device of the present invention are not limited to this. Alternatively, the present invention can be similarly applied to the case of dividing the frequency spectrum of one frame into a plurality of frequency bands and determining voiced / unvoiced for each frequency band, such as the MELP method.

【００６５】[0065]

【発明の効果】以上述べた様に、本発明の音声符号化パ
ラメータの取得方法および装置によれば、音声基本周波
数により、音声ハーモニクス高調波スペクトルがお互い
に分離する様に適応的な可変長窓処理を行った音声セグ
メントから周波数スペクトルを求め、検出するハーモニ
クス振幅とハーモニクス幅とハーモニクス数の信頼性を
高め、さらに、音声セグメントのパワー、音声セグメン
トを複数の周波数バンドに分割した各周波数バンドのパ
ワーにより、有声強度あるいは有声／無声情報を取得し
ているため、音声基本周波数の変化によらず、ハーモニ
クスのレベルの低い部分へのハーモニクス高調波雑音の
影響が少ない有声強度判定を行うことが可能となる。し
たがって、スペクトル雑音に対し、誤り耐性の強い音声
符号化パラメータの取得方法を提供することができる。
また、本発明の音声復号方法および装置によれば、無声
音声の復号において、ランダム雑音からＦＦＴによって
ランダム周波数スペクトルを作成せず、直接ランダム周
波数スペクトルを生成する事が出来るため、無声音声信
号の復号時にＦＦＴとＩＦＦＴの計算のうち、ＩＦＦＴ
計算のみで無声音声を合成する事が出来、従来の方法に
比べて演算負荷の小さな音声復号方法を提供する事が出
来る。As described above, according to the speech coding parameter acquisition method and apparatus of the present invention, the adaptive variable length window is adapted so that the speech harmonics spectrums are separated from each other by the speech fundamental frequency. The frequency spectrum is obtained from the processed speech segment to increase the reliability of the detected harmonics amplitude, harmonics width and number of harmonics, and the power of the speech segment and the power of each frequency band obtained by dividing the speech segment into multiple frequency bands. As a result, voiced strength or voiced / unvoiced information is acquired. Therefore, it is possible to perform voiced strength judgment that is less affected by harmonics harmonic noise to low harmonics levels regardless of changes in the fundamental frequency of the voice. Become. Therefore, it is possible to provide a method for acquiring a speech coding parameter that is highly error resistant to spectrum noise.
Further, according to the speech decoding method and apparatus of the present invention, in decoding unvoiced speech, it is possible to directly generate a random frequency spectrum from random noise by FFT without generating a random frequency spectrum. Of the FFT and IFFT calculations, the IFFT
An unvoiced voice can be synthesized only by calculation, and a voice decoding method with a smaller calculation load than the conventional method can be provided.

[Brief description of drawings]

【図１】本発明の音声パラメータ取得方法が適用され
た音声符号化パラメータ抽出部の構成を示すブロック図
である。FIG. 1 is a block diagram showing a configuration of a speech coding parameter extraction unit to which a speech parameter acquisition method of the present invention is applied.

【図２】本発明の音声復号方法が適用された音声合成
部の構成を示すブロック図である。FIG. 2 is a block diagram showing a configuration of a speech synthesis unit to which a speech decoding method of the present invention is applied.

【図３】音声セグメントの対数スペクトル振幅（有声
部）の例を示す図である。FIG. 3 is a diagram showing an example of a log spectrum amplitude (voiced part) of a voice segment.

【図４】音声セグメントの対数スペクトル振幅（無声
部）の例を示す図である。FIG. 4 is a diagram showing an example of a log spectrum amplitude (unvoiced part) of a voice segment.

【図５】ハーモニクスのスペクトル形状を説明するた
めの図である。FIG. 5 is a diagram for explaining a spectrum shape of harmonics.

【図６】第１の周波数分析窓長の設定例を示す図であ
る。FIG. 6 is a diagram showing a setting example of a first frequency analysis window length.

【図７】第１の周波数分析窓の形状の例を示す図であ
る。FIG. 7 is a diagram showing an example of the shape of a first frequency analysis window.

【図８】ハーモニクス振幅を説明するための図であ
る。FIG. 8 is a diagram for explaining harmonics amplitude.

【図９】スペクトル雑音除去を説明するための図であ
る。FIG. 9 is a diagram for explaining spectrum noise removal.

【図１０】有声強度計算処理の流れを示すフローチャ
ートである。FIG. 10 is a flowchart showing a flow of voiced strength calculation processing.

【図１１】ハーモニクス計算処理の流れを示すフロー
チャートである。FIG. 11 is a flowchart showing a flow of harmonics calculation processing.

【図１２】無声音声合成処理の流れを示すフローチャ
ートである。FIG. 12 is a flowchart showing the flow of unvoiced voice synthesis processing.

【図１３】無声音声のフレーム間補間を説明するため
の図である。FIG. 13 is a diagram for explaining inter-frame interpolation of unvoiced voice.

【図１４】音声符号化伝送装置の構成を説明するため
の図である。[Fig. 14] Fig. 14 is a diagram for describing the configuration of a voice coding and transmission device.

【図１５】従来の音声符号化パラメータ抽出部のブロ
ック図である。FIG. 15 is a block diagram of a conventional speech coding parameter extraction unit.

【図１６】従来の音声合成部のブロック図である。FIG. 16 is a block diagram of a conventional speech synthesizer.

【図１７】正規化スペクトル誤差とピッチ周波数誤差
の関係を説明するための図である。FIG. 17 is a diagram for explaining a relationship between a normalized spectrum error and a pitch frequency error.

[Explanation of symbols]

１０１適応窓処理部１０２第１スペクトル計算部１０３フレームエネルギー計算部１０４バンドエネルギー計算部１０５対数変換部１０６バンドハーモニクス振幅計算部１０７バンドハーモニクス幅計算部１０８バンドハーモニクス数計算部１０９有声強度判定部１１０固定窓処理部１１１第２スペクトル計算部１１２スペクトル包絡計算部２０１対称ランダム系列発生部２０２反対称ランダム系列発生部２０３ランダム系列抽出部２０４逆周波数変換部２０５フレーム補間部 101 Adaptive window processing unit 102 first spectrum calculation unit 103 Frame energy calculator 104 Band energy calculator 105 logarithmic converter 106 band harmonics amplitude calculator 107 Band harmonics width calculator 108 Band harmonics number calculator 109 Voiced strength determination unit 110 Fixed window processing unit 111 Second spectrum calculation unit 112 Spectrum Envelope Calculator 201 Symmetric random sequence generator 202 Antisymmetric random sequence generator 203 random sequence extraction unit 204 inverse frequency converter 205 frame interpolator

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平７−295593（ＪＰ，Ａ) 特開平８−272398（ＪＰ，Ａ) 特開平10−124092（ＪＰ，Ａ) 特開平７−261796（ＪＰ，Ａ) 特表昭62−502572（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 19/02 ─────────────────────────────────────────────────── --- Continuation of the front page (56) Reference JP-A-7-295593 (JP, A) JP-A-8-272398 (JP, A) JP-A-10-124092 (JP, A) JP-A-7- 261796 (JP, A) Special table Sho 62-502572 (JP, A) (58) Fields investigated (Int.Cl. ⁷ , DB name) G10L 19/02

Claims

(57) [Claims]

1. A voice coding parameter acquisition method for acquiring a voice coding parameter from a voice segment obtained by extracting a digitized voice signal at a predetermined repetition period at a predetermined segment length, wherein Acquiring a voice fundamental frequency from a segment, acquiring a first frequency spectrum from a variable length segment obtained by extracting the voice signal by a variable length adaptive window determined by the voice fundamental frequency, fixing the voice signal to a fixed length Acquiring a second frequency spectrum from the fixed-length segment extracted by the window, dividing the first frequency spectrum into a plurality of frequency bands, frequency spectrum power of the first frequency spectrum, each frequency band Frequency spectrum power of each frequency band Determining the voiced strength for each frequency band according to the number of harmonics included, the harmonics amplitude of each harmonics, and the harmonics bandwidth; and centering a frequency that is an integral multiple of the voice fundamental frequency from the second frequency spectrum. A method of acquiring a speech coding parameter, comprising the step of calculating the spectrum power of each harmonics band divided so that the frequency bandwidth becomes a speech fundamental frequency.

2. The length of the variable-length adaptive window is determined by the relationship between the bandwidth of the frequency spectrum distribution of the variable-length adaptive window and the voice fundamental frequency. Method for obtaining the speech coding parameters of.

3. The speech coding parameter according to claim 1, wherein the variable-length adaptive window is a Hamming window having a length four times or more of a cycle corresponding to the speech fundamental frequency. Acquisition method.

4. A voice fundamental frequency of a voice segment obtained by extracting a digitized voice signal at a certain repetitive cycle, and a frequency spectrum of the voice segment whose frequency bandwidth is centered at an integral multiple of the voice fundamental frequency. Speech coding that consists of spectrum power of each harmonics band divided to become a voice fundamental frequency and discrimination information that discriminates whether each frequency band obtained by dividing the frequency spectrum of the voice segment into a plurality of frequency bands is voiced or unvoiced A voice decoding method for synthesizing voice according to a parameter, wherein in the frequency band in which the discrimination information indicates a voice, its center frequency has a frequency that is an integral multiple of the voice fundamental frequency, and the spectrum of the corresponding harmonic band. Generate a sine wave group with an amplitude that is equivalent to the power, and In the frequency band different information indicates unvoiced regards centrosymmetric random sequence and the center antisymmetric random sequence and the real and imaginary parts of the frequency spectrum sequence of the noise signal, the 2
An interval corresponding to the frequency band is extracted from the two random sequences, and the amplitude is adjusted so as to be the same as the spectral power of the corresponding harmonic band, and then the real part thereof is obtained by inverse Fourier transform to obtain an unvoiced frame signal. A voice decoding method for generating a voiceless voice by linearly interpolating between the voiceless frame signal of the previous segment and the voiceless frame signal obtained this time, and then adding the generated sine wave group to obtain a synthesized voice.

5. A voice coding parameter acquisition device for acquiring a voice coding parameter from a voice segment obtained by extracting a digitized voice signal at a predetermined repetition period at a predetermined segment length, Means for obtaining a voice fundamental frequency from a segment; means for obtaining a first frequency spectrum by a variable length segment obtained by extracting the voice signal by a variable length adaptive window determined by the voice fundamental frequency; Means for obtaining a second frequency spectrum by a fixed length segment extracted by the window of, a means for dividing the first frequency spectrum into a plurality of frequency bands, a frequency spectrum power from the first frequency spectrum, each of the frequency bands Frequency spectrum power of each frequency band included Monikusu number, means for determining a voiced strength for each of the respective frequency bands by harmonics amplitude and harmonics bandwidth of each harmonics,
And a means for calculating the spectrum power of each harmonic band divided from the second frequency spectrum so that the frequency bandwidth becomes the voice fundamental frequency with a frequency that is an integral multiple of the voice fundamental frequency as the center. A device for acquiring a voice coding parameter.

6. A voice fundamental frequency of a voice segment obtained by extracting a digitized voice signal at a certain repetition period, and a frequency spectrum of the voice segment whose frequency bandwidth is centered at an integral multiple of the voice fundamental frequency. Speech coding that consists of spectrum power of each harmonics band divided to become a voice fundamental frequency and discrimination information that discriminates whether each frequency band obtained by dividing the frequency spectrum of the voice segment into a plurality of frequency bands is voiced or unvoiced A voice decoding device for synthesizing voice by a parameter, wherein, in the frequency band in which the discrimination information indicates voiced, the center frequency thereof has a frequency that is an integral multiple of the voice fundamental frequency, and the spectrum of the corresponding harmonic band. A means to generate a sine wave group with an amplitude equal to the power, center Means for generating a noise signal of a so-called random sequence and a central antisymmetric random sequence, means for extracting a section corresponding to the frequency band in which the discrimination information indicates unvoiced from the two random sequences, a noise signal of the extracted random sequence A means for adjusting the amplitude so that the spectrum power becomes the same as the spectrum power in the harmonics band corresponding to the frequency band in which the discrimination information indicates unvoiced, inverse Fourier transforming the amplitude-adjusted random sequence noise signal, Means for generating an unvoiced frame signal, means for generating unvoiced speech by linearly interpolating the unvoiced frame signal of the preceding segment and the unvoiced frame signal of this time, and the generated sine wave group and the generated unvoiced voice A voice decoding device comprising means for adding voices.