JPH03222000A

JPH03222000A - Voice analyzing device using vocal cord sound source wave model

Info

Publication number: JPH03222000A
Application number: JP2018241A
Authority: JP
Inventors: Keiichi Funaki; 舟木　慶一
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1990-01-29
Filing date: 1990-01-29
Publication date: 1991-09-30

Abstract

PURPOSE:To eliminate the useless search of a parameter, to decrease the arithmetic quantity and to shorten the analytical time by preliminarily setting the range of a sound source parameter by a duty ratio at the time of deriving the sound source parameter by an estimating circuit. CONSTITUTION:The pitch period of a sound signal inputted to a buffer device 102 is derived by an extracting device 103, and in accordance therewith, the range of a duty ratio of glottis closing and glottis opening sections of a vocal cord sound source wave is set by a setting circuit 104. Within the set range of this set sound source parameter, the sound source parameter is estimated by an estimating circuit 106 and a voice meatus parameter is estimated by an analyzing circuit 105. At the time, by a sound source wave generating circuit 108, a vocal cord sound source wave is generated from the sound source parameter. Subsequently, the analysis by the circuit 106 and the voice meatus parameter and the sound source parameter are outputted from a port 107. In such a way, the arithmetic quantity is decreased and the analytical time can be shortened.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、入力される音声信号の音源パラメータと声道
パラメータとを同時に推定する音声分析装置に関する。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a speech analysis device that simultaneously estimates sound source parameters and vocal tract parameters of an input speech signal.

（従来の技術）音声合成方式において、声帯音源波で声道フィルタを駆
動することにより、より自然な合成音声の生成が可能に
なる。この合成方式において、音声信号から音源と声道
のパラメータを推定する分析法が必要である。声帯音源
波モデルを用いて、音声信号から声道特性と音源特性と
の両者を推定しようとする分析法は、藤崎らにより提案
されている（電子情報通信学会論文誌Ｊ７２−ＤＩ［Ｎ
。(Prior Art) In a speech synthesis method, more natural synthesized speech can be generated by driving a vocal tract filter with a vocal cord sound source wave. This synthesis method requires an analysis method to estimate the sound source and vocal tract parameters from the speech signal. An analysis method that attempts to estimate both vocal tract characteristics and sound source characteristics from speech signals using a vocal cord sound source wave model has been proposed by Fujisaki et al. (IEICE Journal J72-DI [N
.

８．１１０９−１１１７．平成元年９月）６声帯音源波
モデルとは声帯波をいくっがの音源パラメータからなる
数式で定式化したモデルである。声帯音源波モデルを用
いた音声分析とは、声帯音源波モデルにより生成された
声帯波を仮定して、声帯波と音声信号の位置合わせをし
て、入力既知のＡＲ分析やＡ　ＲＭ　Ａ分析を行い声道
の特性を表ずＡＲ係数またはＡＲＭＡ係数を求める。全
ての声帯波の中で分析誤差が最小になる声帯波とその時
に推定されるＡ　Ｒ，係数よたはＡＲＭＡ係数を声道パ
ラメータとして声帯波ど声道特性の両者を最適に推定す
る分析法である。8.1109-1117. (September 1989) The six-vocal fold sound source wave model is a model in which the vocal cord waves are formulated using a mathematical formula consisting of several sound source parameters. Speech analysis using a vocal cord sound source wave model assumes vocal cord waves generated by a vocal cord sound source wave model, aligns the vocal cord waves and the voice signal, and performs AR analysis or ARM A analysis of the known input. The AR coefficient or ARMA coefficient is determined by expressing the characteristics of the vocal tract. An analysis method that optimally estimates both the vocal fold wave and the vocal tract characteristics using the vocal fold wave with the smallest analysis error among all vocal fold waves and the AR, coefficient or ARMA coefficient estimated at that time as vocal tract parameters. It is.

（発明か解決しようとする課題）従来の音声分析法では、最初にピッチ抽出を行い、ピッ
チ周期の範囲全体にわたり声帯音源波を仮定して音源パ
ラメータ及び声道パラメータの推定を行っていゐから、
仮定する声帯音源波の数はかなり多く、従って所要の計
算量が多くなり、分析に時間かかかる。(Problems to be Solved by the Invention) In conventional speech analysis methods, pitch is first extracted, and the sound source parameters and vocal tract parameters are estimated by assuming vocal cord sound source waves over the entire pitch period range.
The number of assumed vocal cord sound source waves is quite large, and therefore the amount of calculation required is large and the analysis takes time.

本発明は声帯波及び声道特性の両者を推定するのに要す
る計算量を低減し、分析の速度を早くすることを目的と
する。The present invention aims to reduce the amount of calculation required to estimate both vocal fold waves and vocal tract characteristics, and to speed up the analysis.

（課題を解決するための手段）前述の課題を解決するために本願の第１の発明が提供す
る手段は、入力される音声信号の音源パラメータと声道
パラメータとを推定する音声分析装置であって、ピッチ
周期を抽出する手段と、この抽出手段で抽出されたピッ
チ周期から設定される声帯波の開口区間と閉止区間の比
の値の範囲を設定する手段と、この設定手段で設定され
た比の範囲内で音源パラメータの値を推定する手段と、
この推定手段で推定された音源パラメータから声帯音源
波を生成する手段と、前記声帯音源波でもって前記音声
信号の声道パラメータを推定する手段とを有し、前記音
源パラメータ推定手段は前記比の値の範囲内で前記音源
パラメータを修正することを特徴とする声帯音源波モデ
ルを用いた音声分析装置である。(Means for Solving the Problems) Means provided by the first invention of the present application in order to solve the above-mentioned problems is a speech analysis device that estimates sound source parameters and vocal tract parameters of input speech signals. means for extracting a pitch period; means for setting a value range of the ratio of the open interval to the closed interval of the vocal cord wave set from the pitch period extracted by the extracting means; means for estimating the value of the sound source parameter within the range of the ratio;
The source parameter estimating means includes means for generating a vocal cord sound source wave from the sound source parameter estimated by the estimating means, and means for estimating a vocal tract parameter of the speech signal using the vocal cord sound source wave, and the sound source parameter estimating means The present invention is a speech analysis device using a vocal cord sound source wave model, characterized in that the sound source parameters are modified within a range of values.

前述の課題を解決するために本願の第２の発明が提供す
る手段は、入力される音声信号の音源パラメータと声道
パラメータとを推定する音声分析装置であって、ピッチ
周期を抽出する手段と、この抽出手段で抽出されたピッ
チ周期から設定される声帯波の開口区間と閉止区間の比
の値の範囲を設定する手段と、音源パラメータをある範
囲内で量子化して得られた有限個の組合せからなる音源
コードを記憶する手段と、前記設定手段で設定された開
口区間と閉止区間の比の値の範囲から音源コーｌくの範
囲を設定する手段と、この音源コード範囲設定手段で設
定された範囲内で前記記憶手段から前記音源コードを選
択する手段と、この選択手段で選択された音源コードの
声帯波を生成する手段と、この声帯波生成手段で生成さ
れた声帯波により前記音声信号の声道パラメータを分析
する手段と、前記選択手段で選択された音源コード内で
前記分析手段による分析誤差が最小になる音源コードを
決定する手段とを有することを特徴どする声帯音源波モ
デルを用いた音声分析装置である。Means provided by the second invention of the present application in order to solve the above-mentioned problem is a speech analysis device that estimates sound source parameters and vocal tract parameters of an input speech signal, and includes means for extracting a pitch period. , a means for setting the range of the ratio of the open interval to the closed interval of the vocal cord wave set from the pitch period extracted by this extraction means, and a finite number of values obtained by quantizing the sound source parameters within a certain range. means for storing a sound source code consisting of a combination; means for setting a range of sound source calls from the value range of the ratio of the open section to the closed section set by the setting means; and setting by the sound source code range setting means. means for selecting the sound source code from the storage means within the selected range; means for generating a vocal cord wave of the sound source code selected by the selection means; A vocal cord sound source wave model comprising: means for analyzing vocal tract parameters of a signal; and means for determining a sound source code that minimizes an analysis error by the analysis means within the sound source codes selected by the selection means. This is a speech analysis device using

（作用）第１の発明においては音源パラメータを最急降下法等か
らなる音源パラメータ推定回路により求めて行く際に、
デユーティ比によりあらかじめ音源パラメータの範囲を
設定すると、無駄なパラメータの探索を行うことか相当
に少なくなる。従って、パラメータの組合せの数が少な
くなる分たけ最適解への収束か早くなり、分析時間が短
くなる。(Function) In the first invention, when obtaining sound source parameters using a sound source parameter estimation circuit using the steepest descent method, etc.
By setting the range of sound source parameters in advance using the duty ratio, unnecessary searches for parameters can be considerably reduced. Therefore, as the number of parameter combinations decreases, convergence to the optimal solution becomes faster and the analysis time becomes shorter.

第２の発明においては、探索するコードの数が音源パラ
メータの範囲設定により相当に減るので、分析に必要な
計算量が減り、分析時間が短くなる。In the second invention, the number of chords to be searched is considerably reduced by setting the range of sound source parameters, so the amount of calculation required for analysis is reduced and the analysis time is shortened.

生理学的観測からピッチ周期によりデユーティ比自体も
変化することか知られている。そのことも考慮してピッ
チ周期によるデユーティ比の制御も行う。It is known from physiological observations that the duty ratio itself changes depending on the pitch period. Taking this into consideration, the duty ratio is also controlled based on the pitch period.

（実施例）第１の発明の実施例を第１図に示ず。まず、各モジュー
ルの動作説明する。バッファ装置１０２は音声信号を記
憶しておく装置である。ピッチ抽出装置１０３は音声信
号からピッチ周期を抽出する装置であり、従来からある
装置を用いる。音源パラメータ設定回路１０４は、１０
３により抽出されたピッチ周期から声帯音源波の声門閉
止区間と声門間［１区間のデユーティ比を設定し、音源
パラメータの可変範囲を設定する回路である。音源波生
成回路１０８は音源パラメータから声帯音源波を生成す
る回路である。分析回路１０５で１０８で生成された声
帯音源波を用いた音声分析を行い、声道パラメータを推
定する。この分析回路１０５には従来からあるものを用
いる。音源パラメータ推定回路１０６で音源パラメータ
を推定する。ここでは、最急降下法や山登り法等の従来
の音源パラメータ推定アルゴリズムを用いる。(Example) An example of the first invention is not shown in FIG. First, the operation of each module will be explained. Buffer device 102 is a device that stores audio signals. The pitch extraction device 103 is a device for extracting a pitch period from an audio signal, and uses a conventional device. The sound source parameter setting circuit 104 includes 10
This circuit sets the duty ratio of the glottal closure section and the glottis [1 section] of the vocal cord sound source wave from the pitch period extracted by step 3, and sets the variable range of the sound source parameter. The sound source wave generation circuit 108 is a circuit that generates vocal cord sound source waves from sound source parameters. An analysis circuit 105 performs speech analysis using the vocal cord sound source wave generated in step 108 to estimate vocal tract parameters. A conventional analysis circuit 105 is used. A sound source parameter estimation circuit 106 estimates sound source parameters. Here, conventional sound source parameter estimation algorithms such as the steepest descent method and the hill climbing method are used.

次に、第１の実施例全体の流れを説明する。信号線１０
１から音声信月を入力しバッファ装置１０２で音声信号
を記憶しておき、ピッチ抽出装置１０３でその音声信号
のピッチ周期を求め、ピッチ周期に応じて声帯音源波の
声門閉止区間と声門開口区間のデユーティ比の値の範囲
を設定回路１０４により設定する。１０４で設定された
音源パラメータの設定範囲内で音源パラメータを音源パ
ラメータ推定回路１０６で推定し、分析回路１０５で声
道パラメータを推定する。その時、１０８で音源パラメ
ータから声帯音源波を生成する。音源パラメータ推定回
路１０６で推定された音源パラメータ毎に分析回路１０
５に上る分析を繰り返し、最終的に求められる声道パラ
メータと音源パラメータが分析結果としてポート１０７
がら出力される。Next, the overall flow of the first embodiment will be explained. signal line 10
1, the voice signal is inputted, the voice signal is stored in the buffer device 102, the pitch period of the voice signal is obtained in the pitch extraction device 103, and the glottal closure period and the glottis opening period of the vocal cord sound source wave are determined according to the pitch period. The setting circuit 104 sets the value range of the duty ratio. A sound source parameter estimation circuit 106 estimates the sound source parameters within the sound source parameter setting range set in step 104, and an analysis circuit 105 estimates vocal tract parameters. At that time, a vocal cord sound source wave is generated from the sound source parameters at 108 . Analysis circuit 10 for each sound source parameter estimated by sound source parameter estimation circuit 106
After repeating the above analysis, the final vocal tract parameters and sound source parameters are sent to port 107 as analysis results.
is output.

第２の発明の実施例を第２図に示ず。最初に各モジ、ｌ
−ルの動作について説明する。バッファ装置１０２、ピ
ッチ抽出装置１ｏ３、分析装置１０５は第１図と同じで
ある。コード範囲設定回路２０１で１０３のピッチ周期
がら声帯波の声門閉止区間と声門開口区間のデユーティ
比を設定する。コード制御回路２０２では、２０１で設
定されたコードの範囲を満たすコードを音源コード２０
４から選択し、音源波生成回路１０８にコード情報を送
る。範囲内の全コードによる分析が終了したら、コード
決定回路２０３に分析終了情報を送る。コード決定回路
２０３では２０２で選択された全コードによる分析終了
後、分析誤差等の評価関数を最小とするコードを決定す
る。An embodiment of the second invention is not shown in FIG. First each modi, l
- Explain the operation of the key. The buffer device 102, pitch extraction device 1o3, and analysis device 105 are the same as in FIG. A chord range setting circuit 201 sets the duty ratio of the glottal closing section and the glottal opening section of the vocal cord wave from a pitch period of 103. The chord control circuit 202 selects a chord that satisfies the chord range set in step 201 as a sound source code 20.
4 and sends the code information to the sound source wave generation circuit 108. When the analysis of all codes within the range is completed, analysis completion information is sent to the code determination circuit 203. After completing the analysis of all the codes selected in step 202, the code determination circuit 203 determines the code that minimizes the evaluation function such as analysis error.

次に第２の実施例全体の流れを説明する。信号線１０１
から人、力された音声信ぢがバッファ装置１０２で記憶
され、ピッチ抽出回路１０３はその音声信号のピッチ周
期を抽出する。ピッチ周期に基づいて２０１でデユーテ
ィ比を設定し、コードの範囲を設定する。設定されたコ
ードの範囲により２０２でコードの選択を行い、選択さ
れたコードに対する声帯音源波を１０８で生成し、１０
５て分析を行う。選択させた全コードによる分析終了後
、コード決定回路２０３で最適コードを選択し、最適コ
ードに対応する声道パラメータとコードをボート１０７
から分析結果として出力する。Next, the overall flow of the second embodiment will be explained. Signal line 101
The input voice signal is stored in the buffer device 102, and the pitch extraction circuit 103 extracts the pitch period of the voice signal. A duty ratio is set in step 201 based on the pitch period, and a code range is set. A chord is selected in step 202 according to the set chord range, and a vocal cord sound source wave for the selected chord is generated in step 108.
5 and perform the analysis. After completing the analysis using all the selected chords, the chord determination circuit 203 selects the optimal chord, and the vocal tract parameters and chord corresponding to the optimal chord are sent to the boat 107.
output as analysis results.

（発明の効果）以上に述へたように、この発明により声帯音源波モデル
を用いた分析を行うときに演算量を４分の１以下にする
ことができる。また、実際には有り得ない声帯波を推定
することがなくなり、分析の信頼性か向上する。(Effects of the Invention) As described above, according to the present invention, the amount of calculation can be reduced to one fourth or less when performing analysis using a vocal cord sound source wave model. Furthermore, it is no longer necessary to estimate vocal cord waves that are actually impossible, and the reliability of the analysis is improved.

[Brief explanation of drawings]

第１図は本願の第１の発明の実施例を示す構成図、第２
図は本願の第２の発明の実施例を示す構成図である。１０１・・・音声穴カポ−１へ、１０２・・・バッファ
装置、１０３・・・ピッチ抽出装置、１０４・・・音源
パラメータ設定回路、１０５・・・分析回路、１０６・
・・音源パラメータ推定回路、１０７・・・分析結果の
出力ボート、１０８・・・音源波生成回路、２０１・・
・コード範囲設定回路、２０２・・・コード制御回路、
２０３・・・コード決定回路、２０４・・・音源コード
。FIG. 1 is a configuration diagram showing an embodiment of the first invention of the present application, and FIG.
The figure is a configuration diagram showing an embodiment of the second invention of the present application. 101... To audio hole capo 1, 102... Buffer device, 103... Pitch extraction device, 104... Sound source parameter setting circuit, 105... Analysis circuit, 106...
...Sound source parameter estimation circuit, 107...Output port for analysis results, 108...Sound source wave generation circuit, 201...
- Code range setting circuit, 202... code control circuit,
203...Code determination circuit, 204...Sound source code.

Claims

[Claims]

(1) In a speech analysis device that estimates the sound source parameters and vocal tract parameters of an input speech signal, there is a means for extracting a pitch period, and an aperture section of a vocal cord wave that is set from the pitch period extracted by the extraction means. means for setting a value range of the ratio of the closed interval to and means for estimating a vocal tract parameter of the speech signal using the vocal cord sound source wave, and the sound source parameter estimating means corrects the sound source parameter within the range of the value of the ratio. A speech analysis device using a featured vocal cord sound source wave model.

(2) In a speech analysis device that estimates the sound source parameters and vocal tract parameters of an input speech signal, there is provided a means for extracting a pitch period, and an aperture section of a vocal cord wave that is set from the pitch period extracted by the extracting means. means for setting a range of values of the ratio of the closed interval to means for setting a range of sound source codes from a range of values of ratios of intervals and closed intervals; means for selecting said sound source code from said storage means within the range set by said sound source code range setting means; and said selection means. means for generating vocal cord waves of the sound source code selected by the vocal cord wave generating means; means for analyzing vocal tract parameters of the voice signal using the vocal cord waves generated by the vocal cord wave generating means; and means for determining a sound source code that minimizes an analysis error by the analysis means.