JP2655902B2

JP2655902B2 - Voice feature extraction device

Info

Publication number: JP2655902B2
Application number: JP1023377A
Authority: JP
Inventors: 憲治坂本; 耕市山口
Original assignee: Consejo Superior de Investigaciones Cientificas CSIC
Current assignee: Consejo Superior de Investigaciones Cientificas CSIC
Priority date: 1989-02-01
Filing date: 1989-02-01
Publication date: 1997-09-24
Anticipated expiration: 2012-09-24
Also published as: JPH02203396A

Description

【発明の詳細な説明】＜産業上の利用分野＞この発明は、入力音声から話者や言語に依存しない特
徴量を抽出する音声の特徴抽出装置に関する。Description: BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech feature extraction device that extracts a feature amount independent of a speaker or language from input speech.

＜従来の技術＞従来、音声認識装置によって入力音声を認識する場合
には、次のようにして行っている。すなわち、特徴抽出
部によって、入力された音声信号を周波数分析して認識
対象の幾つかの音素の特徴量を予め抽出しておく。そし
て、この複数の音素の特徴量を各音素の標準パターンと
して記憶部に記憶しておく。次に、各単語をそれらの音
素標準パターンの系列として表現し、この音素標準パタ
ーン系列を単語の音素列と対応付けて各単語毎に記憶部
に記憶し、単語辞書として蓄えておく。一方、未知の音
声が入力されると、上記特徴抽出部によって、上述のよ
うにして入力音声の特徴量をフレーム毎に抽出する。そ
して、抽出された未知の音声のフレーム毎の特徴量と上
記記憶部に記憶された音素標準パターンとの類似度を調
べ、最も類似度の高い音素標準パターンを有する音素を
そのフレームの音素として決定する。以下同様にして、
順次各フレームの音素を決定して未知の音声を音素の系
列として表す。そして、この未知の音声からの音素系列
と上記記憶部に格納されている単語辞書の各単語の音素
標準パターン系列との類似度を調べ、最も類似度の高い
音素標準パターン系列を有する単語を、入力音声の単語
として決定するのである。<Conventional Technology> Conventionally, input speech is recognized by a speech recognition device in the following manner. That is, the feature extraction unit performs frequency analysis on the input speech signal to extract in advance the feature amounts of some phonemes to be recognized. Then, the feature amounts of the plurality of phonemes are stored in the storage unit as standard patterns of the respective phonemes. Next, each word is expressed as a sequence of those phoneme standard patterns, and this phoneme standard pattern sequence is stored in the storage unit for each word in association with the phoneme sequence of the word, and stored as a word dictionary. On the other hand, when an unknown voice is input, the feature extracting unit extracts the feature amount of the input voice for each frame as described above. Then, the similarity between the extracted feature amount of each unknown speech frame and the phoneme standard pattern stored in the storage unit is checked, and the phoneme having the phoneme standard pattern with the highest similarity is determined as the phoneme of the frame. I do. Similarly,
The phonemes of each frame are sequentially determined, and the unknown speech is represented as a phoneme sequence. Then, the similarity between the phoneme sequence from the unknown voice and the phoneme standard pattern sequence of each word in the word dictionary stored in the storage unit is checked, and the word having the phoneme standard pattern sequence with the highest similarity is determined. It is determined as a word of the input voice.

また、調音器官の構造に基づいて、声道形を直接表現
する調音モデルを設定し、モデルマッチングの手法によ
って、音声波から調音状態を推定する方法が提案されて
いる（白井，誉田：“音声波からの調音パラメータの推
定”電子通信学会論文誌'78/5 Vol.J61−A No.5）。こ
の方法においては、調音器官の位置を表す調音パラメー
タから音声スペクトルに関する音響パラメータへ変換す
る非線形な関数モデルが与えられるものと仮定し、逆に
上記音響パラメータから非線形最適化問題を解くことに
よって上記調音パラメータを求めて（すなわち、上記関
数モデルの適合誤差を最小にするように調音パラメータ
を求めて）、調音状態を推定するものである。In addition, a method has been proposed in which an articulatory model that directly expresses the vocal tract shape is set based on the structure of articulatory organs, and the articulation state is estimated from speech waves by a model matching method (Shirai, Yoshida: “Speech Estimation of Articulation Parameters from Waves ”Transactions of the Institute of Electronics, Information and Communication Engineers, '78 / 5 Vol.J61-A No.5). In this method, it is assumed that a non-linear function model for converting articulatory parameters representing the position of articulatory organs into acoustic parameters related to the speech spectrum is provided, and conversely, the above-described articulatory problem is solved by solving a nonlinear optimization problem from the acoustic parameters. The parameters are determined (that is, the articulation parameters are determined so as to minimize the fitting error of the functional model), and the articulation state is estimated.

＜発明が解決しようとする課題＞しかしながら、上記従来の入力音声の認識方法には次
のような問題がある。すなわち、音素標準パターンに基
づく方法においては、上記特徴抽出部によって抽出され
る音素の特徴量は、同じ表記の音素記号を発声した音声
であっても話者の生理的差（例えば、声道長の差等）に
よって異なるだけでなく、単語中の母音の場合には前後
の音韻環境による調音結合の影響によっても異なってし
まうという問題がある。すなわち、このような音素の特
徴量を用いて音声認識を行うと、同じ音素記号を発声し
た音声であっても異なる音素であると判断されて、リジ
ェクトされる場合や誤認識される場合があり、高い認識
性能が得られないのである。このような問題は、話者や
音韻環境によってその特徴量が変動する音素の特徴量を
用いて音声認識を行っているために生じる。<Problems to be Solved by the Invention> However, the above-described conventional input speech recognition method has the following problems. In other words, in the method based on the phoneme standard pattern, the feature amount of the phoneme extracted by the feature extraction unit is the speaker's physiological difference (for example, vocal tract length) Of the vowels in a word, there is a problem that the vowels in a word also differ due to the effect of articulatory coupling by the preceding and succeeding phonological environments. That is, if speech recognition is performed using such phoneme features, even if the speech utters the same phoneme symbol, it may be determined to be a different phoneme and rejected or erroneously recognized. High recognition performance cannot be obtained. Such a problem occurs because speech recognition is performed using a feature amount of a phoneme whose feature amount varies depending on a speaker or a phoneme environment.

したがって、このような問題を解決するためには、話
者が変わったり音韻環境が変わっても変動しない、すな
わち、話者や言語に依存しない音声の発声に固有の特徴
量を抽出する必要がある。Therefore, in order to solve such a problem, it is necessary to extract a characteristic amount that does not change even if the speaker changes or the phonological environment changes, that is, a characteristic amount unique to the utterance of the voice independent of the speaker or the language. .

また、調音モデルを用いて音響パラメータから調音パ
ラメータを推定する方法においては、解の唯一性や収束
の安定性が問題となる。そこで、白井等は、調音パラメ
ータの変動範囲および分析フレーム間の連続性の制約を
評価関数に取り入れること、声道特性の分離基準として
関数モデルの特性を考慮すること、適切な初期値の設定
値の制限を設けることによって解を求めている。しかし
ながら、上述のような解の求め方は複雑であるため、処
理に時間がかかり、母音中心等の安定した所でしか適用
できず、特定の話者にしか適応できな等の問題がある。Further, in the method of estimating articulatory parameters from acoustic parameters using an articulatory model, uniqueness of solution and stability of convergence pose problems. Therefore, Shirai et al. Incorporated the constraints of the variation range of articulatory parameters and the continuity between analysis frames into the evaluation function, considered the characteristics of the function model as separation criteria for vocal tract characteristics, and set appropriate initial values. The solution is sought by setting a limit. However, since the method of finding a solution as described above is complicated, it takes a long time to process, and it can be applied only to a stable place such as the center of a vowel, and can be applied only to a specific speaker.

そこで、この発明の目的は、簡単な処理によって、話
者や言語に依存しない音声の発声に固有な特徴量である
調音位置（声道中に形成される狭めの位置）を抽出する
ことができる音声の特徴抽出装置を提供することにあ
る。Therefore, an object of the present invention is to extract, by simple processing, an articulation position (a narrow position formed in the vocal tract), which is a characteristic amount unique to the utterance of a voice independent of a speaker or language. An object of the present invention is to provide an audio feature extraction device.

＜課題を解決するため手段＞上記目的を達成するため、この発明は、入力音声を周
波数分析し，得られた周波数成分から音声の特徴量を抽
出する音声の特徴抽出装置において、入力音声の母音区
間および子音区間を判定する母音／子音区間判定部と、
母音における少なくとも２つの周波数成分を調音位置に
変換するためのテーブルであって，発声内容が既知であ
る複数の単母音に係る要素値を含む変換テーブルと、発
声内容が既知の対象母音の調音位置を求める調音位置抽
出部を備えて、上記調音位置抽出部は、上記母音／子音
区間判定部によって母音区間であると判定された発声内
容が既知の対象母音に係る上記変換テーブルの上の位置
を，上記単母音に係る上記２つの周波数成分と上記変換
テーブル上の位置とに基づいて求め，この求められた変
換テーブル上の位置に在る要素値を用いて当該対象母音
の調音位置を算出する第１の調音位置算出手段と、発声
内容および調音位置が既知である複数の単母音の周波数
成分に基づいて，上記母音／子音区間判定部によって母
音区間であると判定され，且つ，上記第１の調音位置算
出手段によって調音位置が算出されない発声内容が既知
の対象母音の調音位置を，この対象母音の周波数成分か
ら所定のアルゴリズムに従って算出する第２の調音位置
算出手段を有し、上記音声の周波数成分から、話者や言
語に依存しない音声の特徴量である調音位置を抽出する
ことを特徴としている。<Means for Solving the Problems> In order to achieve the above object, the present invention relates to a voice feature extraction device for performing frequency analysis of an input voice and extracting a voice feature from an obtained frequency component. A vowel / consonant section determining unit for determining a section and a consonant section;
A table for converting at least two frequency components of a vowel into articulation positions, the conversion table including element values relating to a plurality of single vowels whose utterance contents are known, and an articulation position of a target vowel whose utterance contents are known The vowel / consonant interval determination unit determines a position on the conversion table relating to a target vowel whose utterance content determined to be a vowel interval is known by the vowel / consonant interval determination unit. , Based on the two frequency components of the single vowel and the position on the conversion table, and using the obtained element value at the position on the conversion table to calculate the articulation position of the target vowel. The first vowel / consonant section determination unit determines that the vowel section is a vowel section based on first articulation position calculation means and the frequency components of a plurality of single vowels whose utterance content and articulation position are known. And a second articulation position calculating means for calculating the articulation position of the target vowel whose utterance content is not calculated by the first articulation position calculation means from the frequency component of the target vowel according to a predetermined algorithm. And extracting an articulation position, which is a feature amount of the voice independent of the speaker or language, from the frequency components of the voice.

＜作用＞音声が入力されると、この入力音声が周波数分析され
て周波数成分が得られる。また、母音／子音区間判定部
によって、入力された音声が母音区間であるか子音区間
であるかが判定される。<Operation> When a voice is input, the input voice is subjected to frequency analysis to obtain a frequency component. Further, the vowel / consonant section determination unit determines whether the input voice is a vowel section or a consonant section.

そうすると、調音位置抽出部における第１の調音位置
算出手段によって、上記母音／子音区間判定部で母音区
間であると判定された発声内容が既知の対象母音に係る
変換テーブル上の位置が、発声内容が既知である複数の
単母音に係る少なくとも２つの周波数成分と上記変換テ
ーブル上の位置とに基づいて求められる。そして更に、
この求められた変換テーブル上の位置に在る要素値を用
いて、当該対象母音の調音位置が算出される。一方、上
記母音／子音区間判定部によって母音区間であると判定
され、且つ、上記第１の調音位置算出手段によって調音
位置が算出されない発声内容が既知の対象母音の調音位
置は、上記調音位置抽出部における第２の調音位置算出
手段によって、発声内容および調音位置が既知である複
数の単母音の周波数成分に基づいて、上記対象母音の周
波数成分から所定のアルゴリズムに従って算出される。Then, the first articulation position calculating means in the articulation position extraction unit determines the position on the conversion table relating to the target vowel whose utterance content determined by the vowel / consonant interval determination unit to be a vowel interval is known. Is determined based on at least two frequency components of a plurality of known single vowels and positions on the conversion table. And furthermore
Using the obtained element value at the position on the conversion table, the articulation position of the target vowel is calculated. On the other hand, the articulation position of the target vowel whose vowel content is known by the vowel / consonant interval determination unit and whose utterance content is not calculated by the first articulation position calculation means is determined by the articulation position extraction. The second articulation position calculating means in the section calculates the frequency content of the target vowel according to a predetermined algorithm based on the frequency components of a plurality of single vowels whose utterance contents and articulation positions are known.

こうして、音声の周波数成分から、話者や言語に依存
しない調音位置が抽出される。In this way, the tonal position independent of the speaker or language is extracted from the frequency component of the voice.

＜実施例＞以下、この発明の図示の実施例により詳細に説明す
る。<Example> Hereinafter, the present invention will be described in detail with reference to an illustrated example.

第１図はこの発明に係る音声認識装置のブロック図で
ある。マイクロホン１から入力された音声信号はアンプ
２によって増幅され、音響分析部３に入力される。この
音響分析部３では、帯域濾波器（以下、BPFと言う）群
を用いて、あるいは、音声波形データに窓を掛けた値に
対する高速フーリエ変換によって入力音声信号の周波数
分析が行われる。FIG. 1 is a block diagram of a speech recognition apparatus according to the present invention. The audio signal input from the microphone 1 is amplified by the amplifier 2 and input to the acoustic analysis unit 3. In the acoustic analyzer 3, the frequency analysis of the input audio signal is performed by using a group of band-pass filters (hereinafter referred to as BPF) or by performing a fast Fourier transform on a value obtained by applying a window to the audio waveform data.

母音／子音区間判定部４は、音声信号の母音区間およ
び子音区間の判定を行う。この母音区間と子音区間との
判定は、入力音声のパワーやスペクトル変化等を参照し
て行われる。その結果、母音区間であると判定された場
合は、この発明に係る調音位置抽出部５によって母音の
調音位置が抽出される。この母音の調音位置の抽出は、
音声の周波数成分から調音位置を算出する際の変換式や
規則を変換式格納部６から読み込み、この読み込んだ変
換式や規則を用いて行われる。一方、母音／子音区間判
定部４によって子音区間であると判定された場合は、子
音パターン格納部８に格納されている子音パターンと子
音区間の周波数成分とのマッチングが子音パターン変換
部７によって行われ、子音パターンの候補が出力され
る。このようにして、入力音声は母音の調音位置と子音
パターンとの時系列に変換されるのである。The vowel / consonant section determination unit 4 determines a vowel section and a consonant section of the audio signal. The determination of the vowel section and the consonant section is performed with reference to the power and spectrum change of the input voice. As a result, when it is determined that the vowel section is a vowel section, the articulation position of the vowel is extracted by the articulation position extraction unit 5 according to the present invention. The extraction of the articulation position of this vowel
The conversion formulas and rules for calculating the articulation position from the frequency components of the voice are read from the conversion formula storage unit 6, and the conversion is performed using the read conversion formulas and rules. On the other hand, if the vowel / consonant section determining unit 4 determines that the consonant section is a consonant section, the consonant pattern conversion unit 7 performs matching between the consonant pattern stored in the consonant pattern storage unit 8 and the frequency component of the consonant section. Thus, consonant pattern candidates are output. In this way, the input voice is converted into a time series of the vowel articulation position and the consonant pattern.

パターンマッチング部９は、上述のようにして入力音
声から求められた母音の調音位置と子音パターンとの時
系列と、予め既知の単語毎に上述と同様の手法によって
求められて標準パターン格納部10に格納されている単語
毎の標準パターンとの類似度が計算され、この類似度計
算結果に基づいて単語が認識される。そして、その認識
結果が結果表示部11に表示されるのである。The pattern matching unit 9 includes a time series of a vowel articulation position and a consonant pattern obtained from the input voice as described above, and a standard pattern storage unit 10 obtained for each known word in advance by the same method as described above. Is calculated for each word stored in the standard pattern, and words are recognized based on the similarity calculation result. Then, the recognition result is displayed on the result display unit 11.

この発明は、上記調音位置抽出部５において実行され
る母音の調音位置抽出動作に関するものであって、入力
された単語中における発声内容が既知の母音（以下、単
語中の母音と言う）の調音位置をその音声の周波数成分
から所定のアルゴリズムに従って算出するものである。The present invention relates to an articulation position extraction operation of a vowel executed by the articulation position extraction unit 5, and articulation of a vowel having a known utterance content in an input word (hereinafter referred to as a vowel in a word). The position is calculated from the frequency component of the sound according to a predetermined algorithm.

以下、この発明に係る第１実施例について詳細に述べ
る。Hereinafter, the first embodiment according to the present invention will be described in detail.

第１実施例本実施例は、上述のように、発声内容および調音位置
が既知の単母音の周波数成分を基にして、単語中の母音
の調音位置をその音声の周波数成分から所定のアルゴリ
ズムに従って算出するものである。本実施例において
は、調音位置が既知の単母音として、日本語母音（／ア
/,/イ/,/ウ/,/エ/,/オ／）を用いる。第２図は種々の母
音の調音位置を示す図である。ｘは調音位置の前後を表
し、数字の大きいほうが前方である。また、ｙは調音位
置の上下を表し、数字の大きい方が下方である。図中、
○によって囲まれた片仮名表記が上記日本語母音であ
る。以下、調音位置を下記のような範囲内にある座標
（x,y）によって表す。First Embodiment In this embodiment, as described above, based on the frequency component of a single vowel whose utterance content and articulation position are known, the articulation position of a vowel in a word is determined from the frequency component of the speech according to a predetermined algorithm. It is to be calculated. In the present embodiment, Japanese vowels (/ A
/, / A /, / c /, / d /, / o /) are used. FIG. 2 is a diagram showing articulation positions of various vowels. x represents before and after the articulation position, and the larger number is the front. Further, y represents the upper and lower positions of the articulation position, and the larger number is the lower side. In the figure,
The katakana notation enclosed by ○ is the Japanese vowel. Hereinafter, the articulation position is represented by coordinates (x, y) within the following range.

１≦x,y≦７ただしx,y:整数値本実施例においては、各調音位置は座標の格子点上に
あるとする。このことは、人間の聴感上の精度から妥当
であると言える。すなわち、ここでは、単母音／ア／の
調音位置を（2,7）、単母音／イ／の調音位置を（6,
2）、単母音／ウ／の調音位置を（2,2）、単母音／エ／
の調音位置を（5,4）、単母音／オ／の調音位置を（1,
4）とするのである。そして、このように設定された単
母音の調音位置に基づいて、単語中の母音の調音位置を
上記座標（x,y）によって表現するのである。1 ≦ x, y ≦ 7 where x, y: integer values In this embodiment, it is assumed that each articulation position is on a grid point of coordinates. This can be said to be appropriate from the viewpoint of human hearing. That is, here, the articulation position of the single vowel / a / is (2,7), and the articulation position of the single vowel / a / is (6,7).
2), the articulation position of a single vowel / c / (2,2), a single vowel / d /
The articulation position of (5,4) and the articulation position of single vowel / o /
4). Then, based on the articulation position of the single vowel set in this way, the articulation position of the vowel in the word is expressed by the coordinates (x, y).

次に、上記母音の調音位置とその調音位置において発
声された母音の周波数成分との関係について述べる。第
３図は第２図における日本語母音の第１ホルマント周波
数（以下、Ｆ（１）と表す）と第２ホルマント周波数
（以下、Ｆ（２）と表す）との範囲を、男女別に表示し
た図である。また、第４図は第２図における日本語母音
以外の種々の母音を特定の話者が発声した場合のＦ
（１）とＦ（２）との関係を示した図である。すなわ
ち、第２図と第３図および第４図から、一般にホルマン
ト周波数と調音位置との関係は、Ｆ（１）は増減がｙの
増減に対応し、Ｆ（２）の増減がｘの増減に対応してい
ることが分かる。また、一部の母音（／イ／および／エ
／）においては、第３ホルマント周波数（以下、Ｆ
（３）と表す）の増減がx,yの増減に影響している。し
たがって、これらの関係と単語中の母音の周波数成分と
から、単語中の母音の調音位置を推定するである。Next, the relationship between the articulation position of the vowel and the frequency component of the vowel uttered at the articulation position will be described. FIG. 3 shows the range between the first formant frequency (hereinafter, referred to as F (1)) and the second formant frequency (hereinafter, referred to as F (2)) of the Japanese vowel in FIG. 2 for each gender. FIG. FIG. 4 is a graph showing F when various speakers other than Japanese vowels in FIG. 2 are uttered by a specific speaker.
FIG. 3 is a diagram showing a relationship between (1) and F (2). That is, from FIG. 2, FIG. 3, and FIG. 4, generally, the relationship between the formant frequency and the articulation position is such that the increase / decrease of F (1) corresponds to the increase / decrease of y, and the increase / decrease of F (2) corresponds to the increase / decrease of x. It turns out that it corresponds. For some vowels (/ a / and / d /), the third formant frequency (hereinafter, F
(Represented by (3)) affects the increase and decrease of x and y. Therefore, the articulation position of the vowel in the word is estimated from these relationships and the frequency component of the vowel in the word.

次に、単語中の母音の調音位置を推定する方法につい
て具体的に説明する。Next, a method for estimating the articulation position of a vowel in a word will be specifically described.

入力された音声波形は、第１図において説明したよう
に、音響分析部３および母音／子音区間判定部４によっ
て、予め母音区間あるいは子音区間に切り出されてラベ
リングされると共に音響分析が行われてホルマント周波
数が抽出される。そして、本実施例においては、上述の
ようにして母音のラベルが付けられた音素区間のホルマ
ント周波数が用いられる。As described with reference to FIG. 1, the input speech waveform is cut out into a vowel section or a consonant section in advance by the acoustic analysis section 3 and the vowel / consonant section determination section 4, and is subjected to acoustic analysis. The formant frequency is extracted. In the present embodiment, the formant frequency of the phoneme section labeled with the vowel as described above is used.

第５図は第１図の調音位置抽出部５において実行され
る単語中における一つの母音の調音位置算出動作のフロ
ーチャートである。FIG. 5 is a flowchart of the operation of calculating the articulation position of one vowel in a word, which is executed by the articulation position extraction unit 5 of FIG.

ステップS1で、母音／子音区間判定部４によって母音
区間であると判定された区間のホルマント周波数が入力
され、その入力された母音区間のホルマント周波数に付
加されたラベル（すなわち、発声内容）がいずれのラベ
ルであるかが判別される。その結果、判別されたラベル
の内容に従ってステップS2,ステップS3,ステップS4,ス
テップS5,ステップS6のいずれかに進む。In step S1, the formant frequency of the section determined to be a vowel section by the vowel / consonant section determination section 4 is input, and the label (that is, the utterance content) added to the formant frequency of the input vowel section is Is determined. As a result, the process proceeds to step S2, step S3, step S4, step S5, or step S6 according to the content of the determined label.

ステップS2で、後に詳述する母音／ア／の調音位置算
出ルーチンが実行され、一母音の調音位置算出動作が終
了する。In step S2, a vowel / a / articulatory position calculation routine, which will be described in detail later, is executed, and the operation of calculating an articulation position of one vowel ends.

ステップS3で、後に詳述する母音／イ／の調音位置算
出ルーチンが実行され、一母音の調音位置算出動作が終
了する。In step S3, a vowel / a / articulation position calculation routine, which will be described in detail later, is executed, and the operation of calculating a vowel articulation position ends.

ステップS4で、後に詳述する母音／ウ／の調音位置算
出ルーチンが実行され、一母音の調音位置算出動作が終
了する。In step S4, a vowel / c / articulation position calculation routine, which will be described in detail later, is executed, and the operation of calculating the articulation position of one vowel ends.

ステップS5で、後に詳述する母音／エ／の調音位置算
出ルーチンが実行され、一母音の調音位置算出動作が終
了する。In step S5, a vowel / d / articulation position calculation routine, which will be described in detail later, is executed, and the operation of calculating the articulation position of one vowel ends.

ステップS6で、後に詳述する母音／オ／の調音位置算
出ルーチンが実行され、一母音の調音位置算出動作が終
了する。In step S6, a vowel / e / articulation position calculation routine, which will be described in detail later, is executed, and the operation of calculating the articulation position of one vowel ends.

以下、上記各ステップS2〜ステップS6において実行さ
れる各母音の調音位置算出ルーチンについて、さらに詳
細に述べる。Hereinafter, the articulation position calculation routine of each vowel executed in steps S2 to S6 will be described in more detail.

（Ａ）母音／ア／の調音位置算出ルーチン単母音／ア／の調音位置近傍においては、調音位置が
変化するとＦ（１）,F（２）が非線形に変化する。そこ
で、単語中の母音／ア／のＦ（１）,F（２）の値を調音
位置に直接変換するためのテーブル（以下、変換テーブ
ルと言う）を用意し（第１表にその一例を示す）変換式
格納部６に格納しておく。(A) Vowel / a / articulatory position calculation routine In the vicinity of a single vowel / a / articulatory position, when the articulatory position changes, F (1) and F (2) change nonlinearly. Therefore, a table (hereinafter, referred to as a conversion table) for directly converting the values of F (1) and F (2) of the vowel / a / in a word to an articulation position is prepared (Table 1 shows an example thereof). (Shown) in the conversion formula storage unit 6.

この変換テーブルは、種々の話者によって種々の調音
位置において発声してもらい、調音位置とホルマントと
の関係を考察することによって得たものである。変換テ
ーブル上における単母音の座標（以下、テーブル位置と
言う）を（I,J）で表すと次のようになる。すなわち、
単母音／ア／のテーブル位置は（8,11）、単母音／エ／
のテーブル位置は（2,4）、単母音／オ／のテーブル位
置は（2,15）である。ここで、Ｉの増減方向はＦ（１）
の増減の方向（すなわち、ｙの増減方向）を示し、Ｊの
増減方向はＦ（２）の増減の方向（すなわち、ｘの増減
方向）を示している。 This conversion table is obtained by having various speakers utter at various articulation positions and considering the relationship between the articulation positions and the formants. The coordinates of a single vowel on the conversion table (hereinafter referred to as table position) are represented by (I, J) as follows. That is,
Table position of single vowel / a / is (8,11), single vowel / d /
Is (2,4), and the single vowel / o / table position is (2,15). Here, the increasing / decreasing direction of I is F (1).
The direction of increase / decrease of F (2) indicates the direction of increase / decrease of F (2) (ie, the direction of increase / decrease of x).

上述のような変換テーブルを用いて、単語中の母音の
Ｆ（１）,F（２）からその母音の調音位置を算出するに
は次のようにして行う。すなわち、単語中の母音／ア／
のＦ（２）が単母音／ア／のＦ（２）よりも高い場合に
は、調音位置が単母音／エ／の調音位置の方にずれてい
る。したがって、単母音／ア／のＦ（１）と単母音／エ
／のＦ（１）とで単語中の母音／ア／のＦ（１）を正規
化して単語中の母音／ア／のテーブル位置（I,J）のＩ
を求める。さらに、単母音／ア／のＦ（２）と単母音／
エ／のＦ（２）とで単語中の母音／ア／のＦ（２）を正
規化して単語中の母音／ア／のテーブル位置（I,J）の
Ｊを求める。こうして、単語中の母音／ア／のテーブル
位置（I,J）を算出すものである。また、単語中の母音
のＦ（２）が単母音／ア／のＦ（２）よりも低い場合に
は、調音位置が単母音／オ／の調音位置の方にずれてい
る。したがって、単母音／ア／のＦ（１）と単母音／オ
／のＦ（１）とで単語中の母音のＦ（１）を正規化して
単語中の母音／ア／のテーブル位置（I,J）のＩを求め
る。さらに、単母音／ア／のＦ（２）と単母音／オ／の
Ｆ（２）とで単語中の母音のＦ（２）を正規化して単語
中の母音／ア／のテーブル位置（I,J）のＪを求める。
こうして、単語中の母音のテーブル位置（I,J）を算出
する。Using the conversion table as described above, the articulation position of a vowel in a word is calculated from F (1) and F (2) as follows. In other words, vowels / a /
Is higher than F (2) of the single vowel / a /, the articulation position is shifted toward the articulation position of the single vowel / d /. Therefore, the vowel / a / F (1) in a word is normalized by the single vowel / a / F (1) and the single vowel / d / F (1) to obtain a table of vowels / a / in the word. I at position (I, J)
Ask for. Further, F (2) of a single vowel / a / and a single vowel /
The vowel / a / F (2) in the word is normalized by d / F (2) to determine the J of the table position (I, J) of the vowel / a / in the word. Thus, the table position (I, J) of the vowel / a / in the word is calculated. If the vowel F (2) in the word is lower than the single vowel / a / F (2), the articulation position is shifted toward the single vowel / o / articulation position. Therefore, F (1) of a vowel in a word is normalized by F (1) of a single vowel / a / and F (1) of a single vowel / o /, and the table position of the vowel / a / in the word (I , J). Further, the vowel F (2) in the word is normalized by the single vowel / a / F (2) and the single vowel / o / F (2), and the table position (I , J).
Thus, the table position (I, J) of the vowel in the word is calculated.

そして、この算出されたテーブル位置（I,J）におけ
る変換テーブル上の値（以下、TEBLE（I,J）と言う）を
変換テーブルから求め、下記のTABLE（I,J）と調音位置
（x,y）との関係式（１）を用いて、変換テーブルから
求めたTABLE（I,J）に基づいて単語中の母音／ア／の調
音位置（x,y）を算出するのである。Then, a value on the conversion table at the calculated table position (I, J) (hereinafter referred to as TEBLE (I, J)) is obtained from the conversion table, and the following TABLE (I, J) and the articulation position (x , y), the articulation position (x, y) of the vowel / a / in the word is calculated based on TABLE (I, J) obtained from the conversion table.

［Ｎ］はＮを越えない最大の整数第６図は第５図のフローチャートにおける単語中の母
音／ア／の調音位置算出ルーチンのフローチャートであ
る。ここで、以下に述べる各母音の調音位置算出ルーチ
ンの説明において使用される各変数について説明する。 [N] is the maximum integer not exceeding N. FIG. 6 is a flowchart of the articulation position calculation routine of the vowel / a / in the word in the flowchart of FIG. Here, each variable used in the description of the articulation position calculation routine of each vowel described below will be described.

F^V（ｎ）（Ｖ＝a,i,u,e,o、ｎ＝1,2,3） …単語中の母音Ｖの第ｎホルマント周波数 F^V _lV（ｎ）（Ｖ＝a,i,u,e,o、ｎ＝1,2,3） …単母音Ｖの第ｎホルマント周波数（I,J）（Ｖ＝a,i,u,e,o） …単語中の母音Ｖのテーブル位置（I_V,J_V）（Ｖ＝a,j,u,e,o） …単母音Ｖのテーブル位置次に、第６図に従って単語中の母音／ア／の調音位置
算出ルーチンについて詳細に説明する。F ^V (n) (V = a, i, u, e, o, n = 1, 2, 3) ... n-th formant frequency of vowel V in a word F ^V _lV (n) (V = a, i, u, e, o, n = 1,2,3) ... nth formant frequency of single vowel V (I, J) (V = a, i, u, e, o) ... table position of vowel V in word (I _V , J _V ) (V = a, j, u, e, o) ... table position of single vowel V Next, the routine for calculating the articulation position of vowel / a / in a word will be described in detail with reference to FIG. I do.

ステップS11で、F^a（２）がF^a _lV（２）よりも高いか
否かが判定される。その結果F^a（２）がF^a _lV（２）より
も高い場合にはステップS12に進み、そうでなければス
テップS14に進む。In step S11, F ^a (2) whether higher is determined than F ^a _lV (2). As a result, if F ^a (2) is higher than F ^a _lV (2), the process proceeds to step S12; otherwise, the process proceeds to step S14.

ステップS12で、F^a _lV（１）:F^a（１）:F^e _lV（１）＝I
_a:I:I_eに、F^a _lV（１）,F^a（１）,F^e _lV（１）,I_aおよびI
_eを代入してＩが算出される。また,F^a _lV（２）:F
^a（２）:F^e _lV（２）＝J_a:J:J_eに、F^a _lV（２）,F
^a（２）,F^e _lV（２）,J_aおよびJ_eを代入してＪが算出さ
れる。In step ^{_{S12, F a lV (1)}} : F a (1): F e lV (1) = I
_a: I: to _{^{_{I e, F a lV (1}}} ), F a (1), F e lV (1), I a and I
I is calculated by substituting _e . F ^a _lV (2): F
^{^{_{a (2): F e lV}}} (2) = J a: J: the _{^{_{J e, F a lV (2}}} ), F
^{^{_{a (2), F e lV}}} (2), J is calculated by substituting J _a and J _e.

ここで、上述の単母音／ア／のテーブル位置（I,J）
＝（8,11）からI_a＝8,J_a＝11であり、単母音／エ／のテ
ーブル位置（I,J）＝（2,4）からI_e＝2,J_e＝４である。
また、F^a _lV（１）,F^a _lV（２）,F^a（１）,F^a（２）,F^e _lV
（１）およびF^e _lV（２）の値は、上述のように音響分析
部３によって抽出された値が用いられる。Here, the table position (I, J) of the above single vowel / a /
= (8,11) to I _a = 8, J _a = 11 and the table position (I, J) = (2,4) for a single vowel / d / I _e = 2, J _e = 4 .
^{_{Further, F a lV (1),}} F a lV (2), F a (1), F a (2), F e lV
The value of (1) and F ^e _lV (2), the value extracted by the acoustic analysis section 3 as described above is used.

ステップS13で、上記ステップS12において算出された
単語中の母音／ア／のテーブル位置（I,J）に基づい
て、変換テーブルに従ってTABLE（I,J）が求められる。
そして、この求められたTABLE（I,J）に基づいて（１）
式から調音位置（x,y）が算出されて、単語中の母音／
ア／の調音位置算出ルーチンが終了する。In step S13, TABLE (I, J) is obtained according to the conversion table, based on the vowel / a / table position (I, J) in the word calculated in step S12.
Then, based on the obtained TABLE (I, J), (1)
The articulation position (x, y) is calculated from the formula, and the vowel /
The sounding position calculation routine of (a) ends.

ステップS14で、F^a _lV（１）:F^a（１）:F^o _lV（１）＝I
_a:I:I_oからＩが算出され、F^a _lV（２）:F^a（２）:F
^o _lV（２）＝J_a:J:J_oからＪが算出される。In step ^{_{S14, F a lV (1)}} : F a (1): F o lV (1) = I
_a: I: I _o I is calculated ^{_{from, F a lV (2):}} F a (2): F
^o _1V (2) = J _a : J: J J is calculated from _o .

ここで、上述の単母音／オ／のテーブル位置（I,J）
＝（2,15）からI_o＝2,J_o＝15である。Here, the table position (I, J) of the above-mentioned single vowel / o /
= (2,15) to I _o = 2, J _o = 15.

ステップS15で、上記ステップS14において算出された
単語中の母音／ア／のテーブル位置（I,J）に基づい
て、変換テーブルに従ってTABLE（I,J）が求められる。
そして、この求められたTABLE（I,J）に基づいて（１）
式から調音位置（x,y）が算出されて、単語中の母音／
ア／の調音位置算出ルーチンが終了する。In step S15, TABLE (I, J) is obtained according to the conversion table based on the vowel / a / table position (I, J) in the word calculated in step S14.
Then, based on the obtained TABLE (I, J), (1)
The articulation position (x, y) is calculated from the formula, and the vowel /
The sounding position calculation routine of (a) ends.

（Ｂ）母音／イ／の調音位置算出ルーチン母音／イ／の調音位置の上下方向（すなわち、ｙの
値）は、おもにＦ（１）の高低によって決まる。そこ
で、単語中の母音／イ／の調音位置ｙの値は、単母音／
イ／のＦ（１）の値と単母音／エ／のＦ（１）の値とに
応じて決められる閾値“BNDIE1"および“BNDIE2"と、単
語中の母音／イ／のＦ（１）との比較結果に従って決定
される。(B) Routine position calculation routine for vowel / a / The up-down direction (ie, the value of y) of the vowel / a / articulation position is determined mainly by the level of F (1). Therefore, the value of the articulation position y of the vowel / a / in the word is a single vowel /
Thresholds “BNDIE1” and “BNDIE2” determined according to the value of F (1) of a // and the value of F (1) of a single vowel / d /, and F (1) of a vowel / a / in a word Is determined according to the result of comparison with.

また、単語中の単母音／イ／の調音位置の前後方向
（すなわち、ｘの値）は、上述のようにして決定された
調音位置ｙの値の大小によって算出方法が異なる。すな
わち、ｙ≦２の場合には、単語中の母音／イ／の調音位
置は単母音／ウ／の調音位置の方向にずれている。ここ
で、上述のように母音／イ／における調音位置x,yの増
減にはＦ（３）が関係するので、Ｆ（３）の高低によっ
て調音位置ｘが算出されるのである。すなわち、母音／
イ／の調音位置が単母音／イ／の調音位置から単母音／
ウ／の調音位置まで連続的に変化する場合、Ｆ（３）が
低下してＦ（２）に近付き、Ｆ（２）に重なるかあるい
は最接近する（この間、Ｆ（２）の値は殆ど変化しな
い）。ここまでの範囲が母音／イ／の領域である。さら
に、調音位置が後ろになると母音／ウ／の領域に入り、
今度はＦ（３）は余り変化せずＦ（２）が低下して再度
Ｆ（２）はＦ（３）から離れる。Further, the calculation method of the front-rear direction (that is, the value of x) of the articulation position of the single vowel / a / in the word differs depending on the magnitude of the articulation position y determined as described above. That is, when y ≦ 2, the articulation position of the vowel / a / in the word is shifted in the direction of the articulation position of the single vowel / u /. Here, as described above, since F (3) is related to the increase / decrease of the articulation position x, y in the vowel / a /, the articulation position x is calculated based on the level of F (3). That is, vowel /
The articulation position of a / is a single vowel /
When continuously changing to the articulation position of c /, F (3) decreases and approaches F (2), and overlaps or approaches F (2) (during this time, the value of F (2) is almost It does not change). The range up to this is the vowel / a / region. Furthermore, when the articulation position is behind, it enters the area of vowel / c /,
This time, F (3) does not change much, F (2) drops, and F (2) moves away from F (3) again.

したがって、ｙ≦２の場合における母音／イ／の領域
においては、単語中の母音／イ／の調音位置ｘの値は、
母音／イ／のＦ（２）,F（３）の値が重なる場合の調音
位置ｘ＝jx（＝５）におけるＦ（３）の値（以下、Fjx
とする）と単母音／イ／のＦ（３）の値とから決まる閾
値“BNDX"と、単語中の母音／イ／のＦ（３）の値との
比較結果に従って算出される。Therefore, in the vowel / a / region where y ≦ 2, the value of the articulation position x of the vowel / a / in the word is:
The value of F (3) at the articulation position x = jx (= 5) when the values of F (2) and F (3) of vowel / a / overlap (hereinafter, Fjx
) And the value of F (3) of a single vowel / a // and the value of F (3) of a vowel / a / in a word.

一方、ｙ＞２の場合においては、単語中の母音／イ／
の調音位置は単母音／エ／の調音位置の方向にずれてい
る。その際に、単語中の母音／イ／の調音位置が単母音
／イ／の調音位置から単母音／エ／の調音位置まで連続
的に変化する場合、Ｆ（２）が低下すると共にＦ（３）
も低下する。そこで、Ｆ（２）とＦ（３）との変化が調
音位置の変化におよぼす寄与度2:1であるとし、単母音
／イ／のＦ（２）,F（３）の値と単母音／エ／のＦ
（２）,F（３）の値とで単語中の母音／イ／のＦ
（２）,F（３）の値を正規化して、単語中の母音／イ／
の調音位置ｘを算出するのである。On the other hand, when y> 2, the vowel / a /
Are shifted in the direction of the single vowel / d / articulation position. At this time, when the articulation position of the vowel / a / in the word continuously changes from the articulation position of the single vowel / a / to the articulation position of the single vowel / e /, F (2) decreases and F (2) decreases. 3)
Also decrease. Therefore, it is assumed that the change between F (2) and F (3) has a contribution ratio of 2: 1 to the change in the articulation position, and the values of F (2) and F (3) of the single vowel / a / and the single vowel / D / F
(2), the value of F (3) and the vowel / a / F of the word
Normalize the values of (2) and F (3) to obtain the vowel / a /
Is calculated.

第７図は第５図のフローチャートにおける単語中の単
語／イ／の調音位置算出ルーチンのフローチャートであ
る。以下、第７図に従って、単語中の母音／イ／の調音
位置算出ルーチンについて詳細に説明する。FIG. 7 is a flow chart of a routine for calculating the articulation position of the word / a / in the words in the flow chart of FIG. Hereinafter, the articulation position calculation routine of the vowel / a / in the word will be described in detail with reference to FIG.

ステップS21で、Fⁱ（１）の値が250Hzよりも小さいか
否かが判定される。その結果、250Hzよりも小さい場合
にはステップS22に進み、そうでなければステップS23に
進む。In step S21, it is determined whether the value of F ⁱ (1) is smaller than 250 Hz. As a result, if it is lower than 250 Hz, the process proceeds to step S22; otherwise, the process proceeds to step S23.

ステップS22で、調音位置ｙがｙ＝１に設定され、ス
テップS28に進む。In step S22, the tone position y is set to y = 1, and the process proceeds to step S28.

ステップS23で、Fⁱ（１）の値が“BNDIE1"よりも小さ
いか否かが判別される。その結果、“BNDIE1"よりも小
さい場合にはステップS24に進み、そうでなければステ
ップS25に進む。In step S23, it is determined whether the value of F ⁱ (1) is smaller than “BNDIE1”. As a result, if it is smaller than "BNDIE1", the process proceeds to step S24; otherwise, the process proceeds to step S25.

ここで、上記閾値“BNDIE1"は下記のように設定され
る値である。Here, the threshold “BNDIE1” is a value set as follows.

BNDIE1＝（5Fⁱ _lV（１）＋F^e _lV（１））/6 ステップS24で、調音位置ｙがｙ＝２に設定され、ス
テップS28に進む。BNDIE1 = In ^{_{(5F i lV (1) +}} F e lV (1)) / 6 Step S24, articulation position y is set to y = 2, the process proceeds to step S28.

ステップS25で、Fⁱ（１）の値が“BNDIE2"よりも小さ
いか否かが判別される。その結果、“BNDIE2"よりも小
さい場合にはステップS26に進み、そうでなければステ
ップS27に進む。In step S25, it is determined whether the value of F ⁱ (1) is smaller than “BNDIE2”. As a result, if it is smaller than "BNDIE2", the process proceeds to step S26; otherwise, the process proceeds to step S27.

ここで、上記閾値“BNDIE2"は下記のように設定され
る値である。Here, the threshold “BNDIE2” is a value set as follows.

BNDIE2＝（Fⁱ _lV（１）＋2^e _lV（１））/3 ステップS26で、調音位置ｙがｙ＝３に設定され、ス
テップS28に進む。In ^{_{BNDIE2 = (F i lV (1}} ) +2 e lV (1)) / 3 Step S26, articulation position y is set to y = 3, the process proceeds to step S28.

ステップS27で、調音位置ｙがｙ＝４に設定され、ス
テップS28に進む。In step S27, the tone position y is set to y = 4, and the process proceeds to step S28.

ステップS28で、上記ステップS22,ステップS24,ステ
ップS26およびステップ27において設定された調音位置
ｙの値が２以下であるか否かが判別される。その結果、
２以下であればステップS29に進み、そうでなければス
テップS34に進む。In step S28, it is determined whether the value of the articulation position y set in steps S22, S24, S26, and 27 is 2 or less. as a result,
If it is 2 or less, the process proceeds to step S29; otherwise, the process proceeds to step S34.

ステップS29で、Fⁱ（３）の値が“BNDX"よりも小さい
か否かが判別される。その結果、“BNDX"よりも小さい
場合にはステップS30に進み、そうでなければステップS
31に進む。In step S29, it is determined whether the value of F ⁱ (3) is smaller than “BNDX”. As a result, if it is smaller than “BNDX”, the process proceeds to step S30; otherwise, the process proceeds to step S30.
Continue to 31.

ここで、上記閾値“BNDX"は下記のように設定される
値である。Here, the threshold value “BNDX” is a value set as follows.

BNDX＝（Fⁱ _lV（３）＋F_jx）/2 F_jx＝（Fⁱ _lV（２）＋F^u _lV（３））/2 ステップS30で、調音位置ｘがｘ＝５に設定され、単
語中の母音／イ／の調音位置算出ルーチンが終了する。BNDX = In ^{_{(F i lV (3) +}} F jx) / 2 F jx = (F i lV (2) + F u lV (3)) / 2 step S30, articulation position x is set to x = 5, in a word The vowel / a / articulation position calculation routine ends.

ステップ31で、F_jx:Fⁱ（３）:Fⁱ _lV（３）＝x_j:x:x_iか
らｘが算出される。In step _{^{31, F jx: F i (}} 3): F i lV (3) = x j: x: x is calculated from the x _i.

ここで、単母音／イ／の調音位置（x,y）＝（6,2）か
らx_i＝６である。また、母音／イ／の調音位置が単母音
／イ／の調音位置から単母音／ウ／の調音位置まで連続
的に変化する際にＦ（２）とＦ（３）とが重なる調音位
置ｘ＝xj＝５からx_j＝５である。Here, x _i = 6 from the articulation position (x, y) = (6,2) of a single vowel / a /. Further, when the articulation position of vowel / a / continuously changes from the articulation position of single vowel / a / to the articulation position of single vowel / u /, the articulation position x where F (2) and F (3) overlap. = a x _j = 5 from xj = 5.

ステップS32で、上記ステップS31において算出された
調音位置ｘの値が６より小さいか否かが判別される、そ
の結果、６より小さければステップS33に進み、そうで
なければ単語中の母音／イ／の調音位置算出ルーチンを
終了する。In step S32, it is determined whether the value of the articulation position x calculated in step S31 is smaller than 6. As a result, if it is smaller than 6, the process proceeds to step S33. The sounding position calculation routine of / ends.

ステップS33で、調音位置ｘがｘ＝６に設定され、単
語中の母音／イ／の調音位置算出ルーチンが終了する。In step S33, the articulation position x is set to x = 6, and the articulation position calculation routine for the vowel / a / in the word ends.

ステップS34で、｛2Fⁱ _lV（２）＋Fⁱ _lV（３）｝：｛2F
ⁱ（２）＋Fⁱ（３）｝：｛2F^e _lV（２）＋F^e _lV（３）｝＝
x_i:x:x_eからｘが算出される。In step ^{_{S34, {2F i lV (2}} ) + F i lV (3)}: {2F
^{^{i (2) + F i (}} 3)}: {2F e lV (2) + F e lV (3)} =
x is calculated from x _i : x: x _e .

ここで、単母音／イ／の調音位置（x,y）＝（6,2）か
らx_i＝６であり、単母音／エ／の調音位置（x,y）＝
（5,4）からx_e＝５である。Here, the articulation position (x, y) of the single vowel / a / = (6, 2) to x _i = 6, and the articulation position (x, y) of the single vowel / a /
(5,4) from a x _e = 5.

ステップS35で、上記ステップS34において算出された
ｘの値が４より小さいか否かが判別される。その結果、
４より小さければステップS36に進み、そうでなければ
単語中の母音／イ／の調音位置算出ルーチンを終了す
る。In step S35, it is determined whether or not the value of x calculated in step S34 is smaller than 4. as a result,
If it is smaller than 4, the process proceeds to step S36, otherwise, the articulation position calculation routine for the vowel / a / in the word is ended.

ステップS36で、調音位置ｘがｘ＝４に設定され、単
語中の母音／イ／の調音位置算出ルーチンが終了する。In step S36, the articulation position x is set to x = 4, and the articulation position calculation routine for the vowel / a / in the word ends.

（Ｃ）母音／ウ／の調音位置算出ルーチン母音／ウ／の調音位置ｙはＦ（１）の高低によって求
めることができる。すなわち、単語中の母音／ウ／のＦ
（１）が単母音／ウ／のＦ（１）より高い場合には、単
語中の母音／ウ／の調音位置が単母音／オ／の調音位置
側にずれているので、単母音／ウ／のＦ（１）の値と単
母音／オ／のＦ（１）の値とで単語中の母音／ウ／のＦ
（１）の値を正規化して、単語中の母音／ウ／の調音位
置ｙを算出する。逆に、単語中の母音／ウ／のＦ（１）
が単母音／ウ／のＦ（１）より低い場合には、単語中の
母音／ウ／の調音位置ｙがｙ＝１側にずれているので、
単母音／ウ／のＦ（１）の値とｙ＝１に対応したＦ
（１）の値とで単語中の母音／ウ／のＦ（１）の値を正
規化して、単語中の母音／ウ／の調音位置ｙを算出する
のである。(C) Routine of vowel / C / articulation position calculation routine The articulation position y of vowel / C / can be obtained by the level of F (1). That is, the vowel / u / F in the word
If (1) is higher than F (1) of a single vowel / u /, the articulation position of the vowel / u / in the word is shifted to the articulation position of the single vowel / u /, so that the single vowel / u / The value of F (1) // and the value of F (1) of a single vowel / o /
The value of (1) is normalized to calculate the articulation position y of the vowel / U / in the word. Conversely, F (1) of a vowel / u / in a word
Is lower than F (1) of a single vowel / u /, the articulation position y of the vowel / u / in the word is shifted to the y = 1 side.
The value of F (1) for a single vowel / U / and F corresponding to y = 1
By normalizing the value of F (1) of the vowel / u / in the word with the value of (1), the articulation position y of the vowel / u / in the word is calculated.

一方、母音／ウ／の調音位置ｘはＦ（２）の高低によ
って求めることができる。すなわち、単語中の母音／ウ
／のＦ（２）が単母音／ウ／のＦ（２）より高い場合に
は、単語中の母音／ウ／の調音位置が単母音／イ／の調
音位置側にずれているので、単母音／ウ／のＦ（２）の
値と上記調音位置ｘ＝jx（＝５）におけるＦ（２）＝Fj
xの値とで単語中の母音／ウ／のＦ（２）の値を正規化
して、単語中の母音／ウ／の調音位置ｙを算出する。逆
に、単語中の母音／ウ／のＦ（２）が単母音／ウ／のＦ
（２）より低い場合には、単語中の母音／ウ／の調音位
置が単母音／オ／の調音位置側にずれているので、単母
音／ウ／のＦ（２）の値と単母音／オ／のＦ（２）の値
とで単語中の母音／ウ／のＦ（２）の値を正規化して、
単語中の母音／ウ／の調音位置ｘを算出するのである。On the other hand, the articulation position x of the vowel / C / can be determined by the level of F (2). That is, if the vowel / U / F (2) in the word is higher than the single vowel / U / F (2), the articulation position of the vowel / U / in the word is a single vowel / A / articulation position. Side, the value of F (2) of a single vowel / U / and F (2) = Fj at the above articulation position x = jx (= 5)
The value of F (2) of the vowel / u / in the word is normalized with the value of x to calculate the articulation position y of the vowel / u / in the word. Conversely, a vowel / U / F (2) in a word is a single vowel / U / F
If it is lower than (2), the articulation position of the vowel / u / in the word is shifted to the articulation position side of the single vowel / u /, so the value of F (2) of the single vowel / u / and the single vowel By normalizing the value of F (2) of a vowel / U / in a word with the value of F (2) of / o /,
The articulation position x of the vowel / u / in the word is calculated.

第８図は第５図のフローチャートにおける単語中の母
音／ウ／の調音位置算出ルーチンのフローチャートであ
る。以下、第８図に従って、単語中の母音／ウ／の調音
位置算出ルーチンについて説明する。FIG. 8 is a flowchart of a routine for calculating the articulation position of a vowel / U / in a word in the flowchart of FIG. Hereinafter, the routine for calculating the articulation position of vowels / U / in words will be described with reference to FIG.

ステップS41で、F^o _lV（１）の値がF^u _lV（１）の値よ
りも大きく、かつ、F^u（１）の値がF^u _lV（１）の値より
も大きいか否かが判別される。その結果、F^o _lV（１）の
値がF^u _lV（１）の値よりも大きく、F^u（１）の値がF^u _lV
（１）の値よりも大きい場合にはステップS42に進み、
そうでなければステップS43に進む。In step S41, F ^o _lV greater than the value of the value F ^u _lV (1) (1), and, whether or not the value of F ^u (1) is greater than the value of F ^u _lV (1) is Is determined. As a result, F ^o _lV greater than the value of the value F ^u _lV (1) (1), the value F ^u _lV of F ^u (1)
If it is larger than the value of (1), the process proceeds to step S42,
Otherwise, go to step S43.

ステップS42で、F^u _lV（１）:F^u（１）:F^o _lV（１）＝y
_u:y:y_oからｙが算出される。In step ^{_{S42, F u lV (1)}} : F u (1): F o lV (1) = y
_u : y: y y is calculated from _o .

ここで、単母音／ウ／の調音位置（x,y）＝（2,2）か
らy_u＝２である。また、単母音／オ／の調音位置（x,
y）＝（1,4）からy_o＝４である。Here, a single vowel / U / articulatory position (x, y) = a (2,2) and y _u = 2. In addition, the articulation position of a single vowel / o / (x,
y) = (1,4) to _yo = 4.

ステップS43で、F^u _lV（１）＝F^u（１）:200＝y_u:y:1
からｙが算出される。In step ^{_{S43, F u lV (1)}} = F u (1): 200 = y u: y: 1
Is calculated from

ステップS44で、上記ステップS42およびステップS43
で求められた調音位置ｙの値が２より小さく、かつ、F^u
（１）の値が300Hz以上であるか否かが判別される。そ
の結果、調音位置ｙの値が２より小さく、F^u（１）の値
が300Hz以上である場合にはステップS45に進み、そうで
なければステップS46に進む。In step S44, the above steps S42 and S43
Is less than 2 and the value of ^Fu
It is determined whether the value of (1) is 300 Hz or more. As a result, if the value of the articulation position y is smaller than 2, and the value of F ^u (1) is 300 Hz or more, the process proceeds to step S45; otherwise, the process proceeds to step S46.

ステップS45で、調音位置ｙの値がｙ＝２に設定さ
れ、ステップS46に進む。In step S45, the value of the tone position y is set to y = 2, and the process proceeds to step S46.

ステップS46で、上記ステップS42およびステップS43
で求められた調音位置ｙの値が３よりも大きいか否かが
判別される。その結果、３よりも大きい場合にはステッ
プS47に進み、そうでなければステップS48に進む。In step S46, the above steps S42 and S43
It is determined whether or not the value of the articulation position y obtained in is larger than 3. As a result, if it is larger than 3, the process proceeds to step S47; otherwise, the process proceeds to step S48.

ステップS47で、調音位置ｙの値がｙ＝３に設定さ
れ、ステップS48に進む。In step S47, the value of the tone position y is set to y = 3, and the process proceeds to step S48.

ステップS48で、F^u（２）の値がF^u _lV（２）の値より
も大きいか否かが判別される。その結果、F^u（２）の値
がF^u _lV（２）の値よりも大きい場合にはステップS49に
進み、そうでなければステップS50に進む。In step S48, the whether or not the value of F ^u (2) is greater than the value of F ^u _lV (2) is determined. As a result, the process proceeds to step S49 if the value of F ^u (2) is greater than the value of F ^u _lV (2), the process proceeds to step S50 if not.

ステップS49で、F_jx:F^u（２）:F^u _lV（２）＝x_j:x:x_u
からｘが算出される。In step S49, F _jx : F ^u (2): F ^u _lV (2) = x _j : x: x _u
X is calculated from

ここで、単母音／ウ／の調音位置（x,y）＝（2,2）か
らx_u＝２である。また、上記調音位置＝ｘ＝xj＝５から
x_j＝５である。Here, x _u = 2 from the articulation position (x, y) = (2,2) of a single vowel / u /. Also, from the above articulation position = x = xj = 5
x _j = 5.

ステップS50で、F^u _lV（２）:F^u（２）:F^o _lV（２）＝x
_u:x:x_oからｘが算出される。In step ^{_{S50, F u lV (2)}} : F u (2): F o lV (2) = x
x is calculated from _u : x: _xo .

ここで、単母音／オ／の調音位置（x,y）＝（1,4）か
らx_o＝１である。Here, x _o = 1 from the articulation position (x, y) = (1,4) of a single vowel / o /.

ステップS51で、上記ステップS49およびステップS50
で求められた調音位置ｘの値が５よりも大きいか否かが
判別される。その結果、５よりも大きい場合にはステッ
プS52に進み、そうでなければステップS53に進む。In step S51, the above steps S49 and S50
It is determined whether or not the value of the articulation position x obtained in is larger than 5. As a result, if it is larger than 5, the process proceeds to step S52; otherwise, the process proceeds to step S53.

ステップS52で、調音位置ｘの値がｘ＝５に設定さ
れ、ステップS53に進む。In step S52, the value of the articulation position x is set to x = 5, and the process proceeds to step S53.

ステップS53で、上記ステップS49およびステップS50
で求められた調音位置ｘの値がｘ＝５であり、かつ、F^u
（２）の値が0.9F_jxよりも小さいか否かが判別される。
その結果、ｘ＝５であり、F^u（２）の値が0.9F_jxよりも
小さい場合にはステップS54に進み、そうでなければ単
語中の一つの母音／ウ／の調音位置算出ルーチンを終了
する。In step S53, the above steps S49 and S50
The value of the articulation position x obtained in the above is x = 5, and ^Fu
It is determined whether the value of (2) is smaller than 0.9F _jx .
As a result, if x = 5 and the value of F ^u (2) is smaller than 0.9F _jx , the process proceeds to step S54. Otherwise, the articulation position calculation routine for one vowel / u / in the word is _executed. finish.

ステップS54で、調音位置ｘの値がｘ＝４に設定さ
れ、単語中の母音／ウ／の調音位置算出ルーチンが終了
する。In step S54, the value of the articulation position x is set to x = 4, and the articulation position calculation routine for the vowel / u / in the word ends.

（Ｄ）母音／エ／の調音位置算出ルーチン母音／エ／の調音位置の算出ルーチンは、母音／エ／
の調音位置が単母音／ア／の調音位置側にずれている場
合と単母音／イ／の調音位置側にずれている場合とによ
って異なる。すなわち、単語中の母音／エ／のＦ（１）
の値が調音位置ｙ＝３とｙ＝４との境界を定めるための
Ｆ（１）の閾値“BNDIE2"よりも大きければ、単語中の
母音／エ／の調音位置は単母音／ア／の調音位置側にず
れていると判定する。そして、（Ａ）において説明した
ように、単母音／ア／のＦ（１）と単母音／エ／のＦ
（１）とで単語中の母音／エ／のＦ（１）を正規化し、
単母音／ア／のＦ（２）と単母音／エ／のＦ（２）とで
単語中の母音／エ／のＦ（２）の正規化することによっ
て単語中の母音／エ／のテーブル位置（I,J）を算出す
る。そして、このテーブル位置（I,J）に基づいて上記
変換テーブルからTABLE（I,J）を求め、（１）式より単
語中の母音／エ／の調音位置（x,y）を算出するのであ
る。(D) Routine of vowel / d / articulation position calculation routine
Is different from the case where the tone position of the single vowel / a / is shifted to the position of the tone position of the single vowel / a /. That is, F (1) of a vowel / d / in a word
Is greater than the threshold "BNDIE2" of F (1) for defining the boundary between the articulation positions y = 3 and y = 4, the articulation position of the vowel / e / in the word is a single vowel / a / It is determined that it is shifted to the articulation position side. Then, as described in (A), the single vowel / a / F (1) and the single vowel / d / F
(1) normalizes F (1) of the vowel / d / in the word with
Table of vowels / d / in words by normalizing vowels / d / F (2) in words with single vowels / a / F (2) and single vowels / d / F (2) Calculate the position (I, J). Then, TABLE (I, J) is obtained from the conversion table based on the table position (I, J), and the articulation position (x, y) of the vowel / d / in the word is calculated from the equation (1). is there.

逆に、単語中の母音／エ／のＦ（１）の値が閾値“BN
DIE2"よりも小さければ、単語中の母音／エ／の調音位
置は単母音／イ／の調音位置側にずれていると判定す
る。その場合には、（Ｂ）において説明した単語中の母
音／イ／の調音位置が単母音／エ／の調音位置側にずれ
ている場合の調音位置算出アルゴリズム（第７図）と同
様のアルゴリズムによって、単語中の母音／エ／の調音
位置を算出する。ただし、この場合には、調音位置ｙの
値をｙ＝１に設定するステップと、ｙ＝１のときのｘを
算出するステップは除かれる。Conversely, the value of F (1) of the vowel / d / in the word is the threshold "BN
If it is smaller than DIE2 ", it is determined that the articulation position of the vowel / e / in the word is shifted to the articulation position of the single vowel / a /. In that case, the vowel in the word described in (B) is used. By using the same algorithm as the articulation position calculation algorithm (FIG. 7) when the articulation position of / a / is shifted to the articulation position side of the single vowel / e /, the articulation position of the vowel / e / in the word is calculated. However, in this case, the step of setting the value of the articulation position y to y = 1 and the step of calculating x when y = 1 are excluded.

第９図は第５図のフローチャートにおける単語中の母
音／エ／の調音位置算出ルーチンのフローチャートであ
る。以下、第９図に従って、単語中の母音／エ／の調音
位置算出ルーチンについて説明する。FIG. 9 is a flowchart of a routine for calculating the articulation position of a vowel / d / in a word in the flowchart of FIG. Hereinafter, the articulation position calculation routine of the vowel / d / in the word will be described with reference to FIG.

ステップS61で、F^e（１）の値が上記閾値“BNDIE2"の
値よりも大きいか否かが判別される。その結果、“BNDI
E2"の値よりも大きい場合にはステップS62に進み、そう
でなければステップS64に進む。In step S61, it is determined whether the value of F ^e (1) is greater than the value of the threshold “BNDIE2”. As a result, "BNDI
If it is larger than the value of E2 ", the process proceeds to step S62; otherwise, the process proceeds to step S64.

ステップS62で、F^a _lV（１）:F^e（１）:F^e _lV（１）:
I_a:I:I_eからＩが算出され、F^a _lV（２）:F^e（２）:F^e _lV
（２）＝J_a:J:J_eからＪが算出される。In step ^{_{S62, F a lV (1)}} : F e (1): F e lV (1):
I is calculated from I _a : I: I _e and F ^a _lV (2): F ^e (2): F ^e _lV
_{(2) = J a: J} : J is calculated from the J _e.

ステップS63で、上記ステップS62において算出された
単語中の母音／エ／のテーブル位置（I,J）に基づい
て、変換テーブルに従ってTABLE（I,J）が求められる。
そして、この求められたTABLE（I,J）に基づいて（１）
式から調音位置（x,y）が算出されて、単語中の母音／
エ／の調音位置算出ルーチンが終了する。In step S63, TABLE (I, J) is obtained according to the conversion table based on the vowel / d / table position (I, J) in the word calculated in step S62.
Then, based on the obtained TABLE (I, J), (1)
The articulation position (x, y) is calculated from the formula, and the vowel /
The sounding position calculation routine of d / ends.

ステップS64で、F^e（１）の値が“BNDIE1"よりも小さ
いか否かが判別される。その結果、“BNDIE1"よりも小
さい場合にはステップS65に進み、そうでなければステ
ップS66に進む。In step S64, it is determined whether the value of F ^e (1) is smaller than “BNDIE1”. As a result, if it is smaller than "BNDIE1", the process proceeds to step S65; otherwise, the process proceeds to step S66.

ステップS65で、調音位置ｙの値がｙ＝２に設定さ
れ、ステップS69に進む。In step S65, the value of the tone position y is set to y = 2, and the process proceeds to step S69.

ステップS66で、F^e（１）の値が“BNDIE2"よりも小さ
いか否かが判別される。その結果、“BNDIE2"よりも小
さい場合にはステップS67に進み、そうでなければステ
ップS68に進む。In step S66, it is determined whether the value of F ^e (1) is smaller than “BNDIE2”. As a result, if it is smaller than "BNDIE2", the process proceeds to step S67; otherwise, the process proceeds to step S68.

ステップS67で、調音位置ｙの値がｙ＝３に設定さ
れ、ステップS69に進む。In step S67, the value of the tone position y is set to y = 3, and the process proceeds to step S69.

ステップS68で、調音位置ｙの値がｙ＝４に設定さ
れ、ステップS69に進む。In step S68, the value of the tone position y is set to y = 4, and the process proceeds to step S69.

ステップS69で、｛2Fⁱ _lV（２）＋Fⁱ _lV（３）｝：｛2F
^e（２）＋F^e（３）｝：｛2F^e _lV（２）＋F^e _lV（３）｝＝
x_i:x:x_eからｘが算出され、単語中の母音／エ／の調音
位置算出ルーチンが終了する。In step S69, {2F ⁱ _lV (2) + F ⁱ _lV (3)}: {2F
^e (2) + F ^e (3)｝: {2F ^e _lV (2) + F ^e _lV (3)} =
x is calculated from x _i : x: x _e, and the articulation position calculation routine for the vowel / d / in the word ends.

（Ｅ）母音／オ／の調音位置算出ルーチン母音／オ／の調音位置の算出ルーチンは、母音／オ／
の調音位置が単母音／ア／の調音位置側にずれている場
合と単母音／ウ／の調音位置側にずれている場合とによ
って異なる。すなわち、単語中の母音／オ／のＦ（１）
の値が調音位置ｙ＝３とｙ＝４との境界を定めるための
Ｆ（１）の閾値“BNDOU2"よりも大きければ、単語中の
母音／エ／の調音位置は単母音／ア／の調音位置側にず
れていると判定する。そして、（Ａ）において説明した
ように、単母音／ア／のＦ（１）と単母音／オ／のＦ
（１）とで単語中の母音／オ／のＦ（１）を正規化する
一方、単母音／ア／のＦ（２）と単母音／オ／のＦ
（２）とで単語中の母音／オ／のＦ（２）を正規化する
ことによって単語中の母音／オ／のテーブル位置（I,
J）を算出する。そして、このテーブル位置（I,J）に基
づいて上記変換デーブルからTABLE（I,J）を求め、
（１）式より単語中の母音／オ／の調音位置（x,y）を
算出するのである。(E) The vowel / o / articulation position calculation routine is performed by the vowel / o / articulation position calculation routine.
Is different from the case where the tone position of the single vowel / a / is shifted to the position of the tone position of the single vowel / a /. That is, F (1) of a vowel / o / in a word
Is greater than the F (1) threshold "BNDOU2" for defining the boundary between the articulation positions y = 3 and y = 4, the articulation position of the vowel / d / in the word is a single vowel / a / It is determined that it is shifted to the articulation position side. Then, as described in (A), the single vowel / a / F (1) and the single vowel / o / F
(1) normalizes the vowel / o / F (1) in a word while the single vowel / a / F (2) and the single vowel / o / F
By normalizing F (2) of the vowel / o / in the word with (2), the table position (I,
J) is calculated. Then, TABLE (I, J) is obtained from the conversion table based on the table position (I, J),
The articulation position (x, y) of the vowel / o / in the word is calculated from the equation (1).

逆に、単語中の母音／オ／のＦ（１）の値が閾値“BN
DOU2"よりも小さければ、単語中の母音／オ／の調音位
置は単母音／ウ／の調音位置側にずれていると判定す
る。その場合には、（Ｃ）において説明した単語中の母
音／ウ／の調音位置が単母音／オ／の調音位置側にずれ
ている場合の調音位置算出アルゴリズム（第８図）と同
様のアルゴリズムによって、単語中の母音／オ／の調音
位置を算出する。ただし、この場合には、調音位置ｙの
値は閾値“BNDOU1",“BNDOU2"に基づいて設定する。Conversely, the value of F (1) of the vowel / o / in the word is equal to the threshold "BN
If it is smaller than "DOU2", it is determined that the articulation position of the vowel / o / in the word is shifted toward the articulation position of the single vowel / u /. In that case, the vowel in the word described in (C) is determined. The articulation position of the vowel / o / in the word is calculated by the same algorithm as the articulation position calculation algorithm (FIG. 8) when the articulation position of / u / is shifted to the articulation position side of the single vowel / o /. However, in this case, the value of the tone position y is set based on the thresholds “BNDOU1” and “BNDOU2”.

第10図は第５図のフローチャートにおける単語中の母
音／オ／の調音位置算出ルーチンのフローチャートであ
る。以下、第10図に従って、単語中の母音／オ／の調音
位置算出ルーチンについて説明する。FIG. 10 is a flowchart of a routine for calculating an articulation position of a vowel / o / in a word in the flowchart of FIG. Hereinafter, the articulation position calculation routine of the vowel / o / in the word will be described with reference to FIG.

ステップS71で、F^o（１）の値が上記閾値“BNDOU2"の
値よりも大きいか否かが判別される。その結果、“BNDO
U2"の値よりも大きい場合にはステップS72に進み、そう
でなければステップS74に進む。In step S71, it is determined whether the value of F ^o (1) is greater than the value of the threshold “BNDOU2”. As a result, "BNDO
If it is larger than the value of U2 ", the process proceeds to step S72; otherwise, the process proceeds to step S74.

ここで、上記“BNDOU2"下記のように設定される値で
ある。Here, the above “BNDOU2” is a value set as follows.

BNDOU2＝（F^u _lV（１）＋2F^o _lV（１））/3 ステップS72で、F^a _lV（１）:F^o（１）:F^o _lV（１）＝I
_a:I:I_oからＩが算出され、F^a _lV（２）:F^o（２）:F
^o _lV（２）＝J_a:J:J_oからＪが算出される。BNDOU2 = In ^{_{(F u lV (1) +}} 2F o lV (1)) / 3 Step ^{_{S72, F a lV (1)}} : F o (1): F o lV (1) = I
_a: I: I _o I is calculated ^{_{from, F a lV (2):}} F o (2): F
^o _1V (2) = J _a : J: J J is calculated from _o .

ステップS73で、上記ステップS72において算出された
単語中の母音／オ／のテーブル位置（I,J）に基づい
て、変換テーブルに従ってTABLE（I,J）が求められる。
そして、この求められたTABLE（I,J）に基づいて（１）
式から調音位置（x,y）が算出されて、単語中の母音／
オ／の調音位置算出ルーチンが終了する。In step S73, TABLE (I, J) is obtained according to the conversion table based on the vowel / o / table position (I, J) in the word calculated in step S72.
Then, based on the obtained TABLE (I, J), (1)
The articulation position (x, y) is calculated from the formula, and the vowel /
The sounding position calculation routine of (e) ends.

ステップS74で、F^o（１）の値が“BNDOU1"よりも小さ
いか否かが判別される。その結果、“BNDOU1"よりも小
さい場合にはステップS75に進み、そうでなければステ
ップS76に進む。In step S74, it is determined whether the value of F ^o (1) is smaller than “BNDOU1”. As a result, if it is smaller than "BNDOU1", the process proceeds to step S75; otherwise, the process proceeds to step S76.

ここで、上記“BNDOU1"は下記のように設定される値
である。Here, the above “BNDOU1” is a value set as follows.

BNDOU1＝（5F^u _lV（１）＋F^o _lV（１））/6 ステップS75で、調音位置ｙの値がｙ＝２に設定さ
れ、ステップS79に進む。BNDOU1 = In ^{_{(5F u lV (1) +}} F o lV (1)) / 6 step S75, the value of the articulation position y is set to y = 2, the process proceeds to step S79.

ステップS76で、F^o（１）の値が“BNDOU2"よりも小さ
いか否かが判別される。その結果、“BNDOU2"よりも小
さい場合にはステップS77に進み、そうでなければステ
ップS78に進む。In step S76, it is determined whether the value of F ^o (1) is smaller than “BNDOU2”. As a result, if it is smaller than "BNDOU2", the flow proceeds to step S77; otherwise, the flow proceeds to step S78.

ステップS77で、調音位置ｙの値がｙ＝３に設定さ
れ、ステップS79に進む。In step S77, the value of the tone position y is set to y = 3, and the process proceeds to step S79.

ステップS78で、調音位置ｙの値がｙ＝４に設定さ
れ、ステップS79に進む。In step S78, the value of the tone position y is set to y = 4, and the process proceeds to step S79.

ステップS79で、F^u _lV（２）:F^u（２）:F^o _lV（２）＝x
_u:x:x_oからｘが算出され、単語中の母音／オ／の調音位
置算出ルーチンが終了する。In step ^{_{S79, F u lV (2)}} : F u (2): F o lV (2) = x
x is calculated from _u : x: _xo, and the articulation position calculation routine for the vowel / o / in the word ends.

上述のような各単語中の母音の調音位置算出アルゴリ
ズムや各単母音のホルマント周波数，各単母音の調音位
置（x,y），各単母音のテーブル位置（I_V,J_V），上記変
換テーブルおよび各閾値等は第１図の変換式格納部６に
格納され、調音位置抽出部５が単語中の母音の調音位置
算出動作を実行する際に、必要に応じて変換式格納部６
から読み出される。ここで、本実施例においては、上記
第１の調音位置算出手段は、第６図中のステップS11〜S
15と、第９図中のステップS61〜S63と、第10図中のステ
ップS71〜S73に対応する。一方、上記第２の調音位置算
出手段は、第７図中のステップS21〜S36と、第８図中の
ステップS41〜S54と、第９図中のステップS64〜S69と、
第10図中のステップS74〜S79に対応するのである。Vowel articulatory position calculation algorithms and formant frequency of each vowel in each word, as described above, articulation position of each single vowel (x, y), the table position of each vowel (I _{_V,} J _V), the conversion The table, each threshold value, and the like are stored in the conversion formula storage unit 6 shown in FIG. 1. When the articulation position extraction unit 5 performs the operation of calculating the articulation position of a vowel in a word, the conversion formula storage unit 6
Is read from. Here, in the present embodiment, the first articulation position calculating means performs steps S11 to S11 in FIG.
15 and correspond to steps S61 to S63 in FIG. 9 and steps S71 to S73 in FIG. On the other hand, the second articulation position calculating means includes steps S21 to S36 in FIG. 7, steps S41 to S54 in FIG. 8, steps S64 to S69 in FIG.
This corresponds to steps S74 to S79 in FIG.

上述のように、本実施例においては、既知の単母音の
ホルマント周波数を用いて、入力された音声における単
語中の母音のホルマント周波数から、所定のアルゴリズ
ムに従って単語中の母音の調音位置を算出するようにし
ている。したがって、本実施例によれば、音素を特徴づ
けるホルマント周波数に基づいて話者や言語に依存しな
い音声の特徴量である調音位置を抽出することができ
る。As described above, in the present embodiment, the articulation position of a vowel in a word is calculated from the formant frequency of a vowel in a word of an input voice using a known formant frequency of a single vowel according to a predetermined algorithm. Like that. Therefore, according to the present embodiment, it is possible to extract an articulation position, which is a feature amount of speech independent of a speaker or language, based on a formant frequency characterizing a phoneme.

本実施例における変換テーブルは第１表に例示したも
のに限らないことは言うまでもない。It goes without saying that the conversion table in the present embodiment is not limited to the one illustrated in Table 1.

次に、第２実施例について詳細に述べる。 Next, a second embodiment will be described in detail.

第２実施例本実施例は、上記ニューラル・ネットワークを用い
て、上記単語中の母音の周波数成分から、この母音の調
音位置を生成する規則を学習によって自動的に作成し、
この規則に従って単語中の母音の調音位置を生成するも
のである。Second Embodiment In this embodiment, a rule for generating the articulation position of a vowel from the frequency component of the vowel in the word is automatically created by learning using the neural network,
According to this rule, the articulation position of a vowel in a word is generated.

ここで、ニューラル・ネットワークの概略について説
明する。ニューラル・ネットワークとは、人間の脳の構
造を真似たネットワークであって、脳のニューロンに対
応したユニットが複数個複雑に接続しあって形成されて
いる。Here, an outline of the neural network will be described. A neural network is a network that mimics the structure of the human brain, and is formed by connecting a plurality of units corresponding to neurons in the brain in a complex manner.

上記ユニットの構造は他のユニットからの入力信号を
受ける部分と、入力信号を一定の規則で変換する部分
と、変換した結果を出力する部分とから成る。上記複数
のユニットは、後に詳述するように入力層，中間層およ
び出力層からなる階層構造のネットワークを形成し、他
のユニットとの結合部には結合の強さを表す結合係数が
付けられている。The structure of the above unit is composed of a part that receives an input signal from another unit, a part that converts the input signal according to a certain rule, and a part that outputs the result of the conversion. The plurality of units form a hierarchical network composed of an input layer, an intermediate layer, and an output layer, as will be described in detail later. Coupling coefficients indicating the strength of coupling are attached to coupling portions with other units. ing.

上記結合係数はユニット間の結合の強さをあらわすも
のであり、この結合係数の値を変えるとネットワークの
構造が変わるのである。すなわち、上記ニューラル・ネ
ットワークの学習とは、ある既知の関係を有する２つの
事象の一方の事象に属するデータを次々に上記階層構造
に形成されたネットワークの入力層に入力し、その際
に、出力層に出力される出力データと上記入力されたデ
ータに対応する他の事象に属するデータ（すなわち、目
標値＝教師データ）との間の差を減らすように、上記結
合係数を変更することである。換言すれば、所定の関係
を有する２つの事象のうちの一方の事象に属するデータ
を入力すると、そのデータに対応する他方の事象に属す
るデータを出力するようにネットワークの構造を変える
ことである。The coupling coefficient indicates the strength of coupling between the units, and changing the value of the coupling coefficient changes the network structure. That is, the learning of the neural network means that data belonging to one of two events having a certain known relationship is sequentially input to the input layer of the network formed in the hierarchical structure, and at that time, the output is Changing the coupling coefficient so as to reduce the difference between output data output to the layer and data belonging to another event corresponding to the input data (ie, target value = teacher data). . In other words, the structure of the network is changed so that when data belonging to one of two events having a predetermined relationship is input, data corresponding to the other event is output.

本実施例において用いたニューラル・ネットワークは
第11図に示すように構造を有している。すなわち、この
ニューラル・ネットワークは図中下側から順に入力層2
1,中間層22および出力層23から成る３層構造を有する。
入力層21には16個のユニット24,24,…を配し、中間層22
には10個のユニット25,25,…配し、出力層23には７個の
ユニット26a,26b,…,26gと７個のユニット27a,27b,…,2
7gとから成る14個のユニットを配している。The neural network used in this embodiment has a structure as shown in FIG. In other words, this neural network consists of input layers 2 in order from the bottom in the figure.
1. It has a three-layer structure consisting of an intermediate layer 22 and an output layer 23.
The input layer 21 has 16 units 24, 24,.
, And 10 units 25, 25,..., And the output layer 23 has seven units 26a, 26b,..., 26g and seven units 27a, 27b,.
There are 14 units consisting of 7g.

ここで、入力層21の16個のユニット24の一つには、16
チャンネルのBPF群の一つのチャンネルから出力信号が
入力される。また、出力層23の一方の７個のユニット26
a,…,26gは、調音位置ｘの座標値（第２図に示す１〜
７）のいずれか（例えば、ユニット26a→26gの順にｘの
座標１→７が対応しているとする）を出力し、他方の７
個のユニット27a,…,27gは、調音位置ｙの座標値（第２
図に示す１〜７）のいずれか（例えば、ユニット27a→2
7gの順にｙの座標１→７が対応しているとする）を出力
するのである。入力層21の各ユニット24,24,…は夫々中
間層22の全ユニット25,…,25と接続している。また、中
間層22の各ユニット25,25,…は夫々出力層23の全ユニッ
ト26a,…,26g,27a,…,27gと接続している。しかしなが
ら、各層内のユニット間は接続されない。Here, one of the 16 units 24 of the input layer 21 includes 16 units.
An output signal is input from one channel of the channel BPF group. Also, one of the seven units 26 of the output layer 23
26g are coordinate values of the articulation position x (1 to 1 shown in FIG. 2).
7) (for example, it is assumed that the coordinates 1 → 7 of x correspond in the order of the units 26a → 26g) and the other 7
, 27g are coordinate values of the articulation position y (second
Any one of 1 to 7 shown in the figure (for example, unit 27a → 2
(It is assumed that the coordinates 1 → 7 of y correspond to the order of 7g). Each of the units 24, 24,... Of the input layer 21 is connected to all the units 25,. .. Of the intermediate layer 22 are connected to all the units 26a,..., 26g, 27a,. However, the units in each layer are not connected.

上記構造のニューラル・ネットワークは結合係数と共
に上記変換式格納部６に格納されている。The neural network having the above structure is stored in the conversion formula storage 6 together with the coupling coefficient.

上記構造のニューラル・ネットワークは次のように動
作する。The neural network having the above structure operates as follows.

上記入力層21の各ユニット24,24,…に入力音声におけ
る母音の周波数成分を入力する。すなわち、本実施例に
おいては、16チャンネルのBPF群からの出力値が各チャ
ンネル別に入力層21の対応するユニット24,…,24に入力
するのである。このBPF群の中心周波数は、300Hz〜3400
Hzの周波数をメルスケールで16個に等間隔に分割した値
を用いる。The frequency components of the vowels in the input voice are input to the units 24, 24,... Of the input layer 21. That is, in this embodiment, the output values from the 16-channel BPF group are input to the corresponding units 24,..., 24 of the input layer 21 for each channel. The center frequency of this BPF group is 300Hz ~ 3400
The value obtained by dividing the frequency of Hz into 16 equal intervals on the mel scale is used.

そうすると、この入力されたBPF群からの出力値は各
ユニット24,24…において、シグモイド（sigmoid）関数
によって変換されて、中間層22の各ユニット25,25,…に
伝えられる。その際に、中間層22の各ユニット25,25,…
には、入力層21の各ユニット24,24,…の出力値に対して
上記結合係数を掛けた値の総和が入力される。同様に、
中間層22の各ユニット25,25,…は入力層21の各ユニット
24,24,…から入力された値をシグモイド関数によって変
換し、出力層23の各ユニット26a,…,26g,27a,…,27gに
出力する。出力層23の各ユニット26a,…,26g,27a,…,27
gには中間層22の各ユニット25,25,…の出力値に対して
結合係数を掛けた値の総和が入力される。そして、出力
層23の各ユニット26a,…,26g,27a,…,27gは中間層22の
各ユニット25,25,…から入力された値をシグモイド関数
によって変換して出力する。Then, the input output values from the BPF group are converted by the sigmoid function in each unit 24, 24, and are transmitted to each unit 25, 25, in the intermediate layer 22. At that time, each unit 25, 25,.
, The sum of values obtained by multiplying the output values of the units 24, 24,... Of the input layer 21 by the above coupling coefficient is input. Similarly,
Each unit 25, 25,... Of the intermediate layer 22 is a unit of the input layer 21.
The values input from 24, 24,... Are converted by the sigmoid function and output to the units 26a,..., 26g, 27a,. Each unit 26a, ..., 26g, 27a, ..., 27 of the output layer 23
The sum of the values obtained by multiplying the output values of the units 25, 25,... of the intermediate layer 22 by the coupling coefficient is input to g. , 26g, 27a,..., 27g of the output layer 23 convert the values input from the units 25, 25,.

ここで、上記シグモイド関数ｆ（ｘ）は次式で与えら
れる。Here, the sigmoid function f (x) is given by the following equation.

ｆ（ｘ）＝1/（１＋exp（−ｘ＋ａ）） a:定数上述のように、出力層23の各ユニット26a,…,26g,27
a,…,27gから出力された出力値から、次のようにして調
音位置（x,y）を決定する。すなわち、出力層23の調音
位置ｘに対応する一方の７個のユニット26a,…,26gのう
ち、一番大きな値を出力しているユニット（例えば、ユ
ニット26b）に対応したｘの座標値（例えば、ｘ＝２）
を調音位置ｘの値とする。また、調音位置ｙに対応する
一方の７個のユニット27a,…,27gのうち、一番大きな値
を出力しているユニット（例えば、27g）に対応したｙ
の座標値（例えば、ｙ＝７）を調音位置ｙの値とする。
こうして、調音位置（x,y）（例えば、（2,7））が決定
されるのである。f (x) = 1 / (1 + exp (−x + a)) a: constant As described above, each unit 26a,..., 26g, 27 of the output layer 23
From the output values output from a,..., 27g, the articulation position (x, y) is determined as follows. That is, of the seven units 26a,..., 26g corresponding to the articulation position x of the output layer 23, the coordinate value of x corresponding to the unit outputting the largest value (for example, the unit 26b) ( For example, x = 2)
Is the value of the articulation position x. Further, of the seven units 27a,..., 27g corresponding to the articulation position y, y corresponding to the unit outputting the largest value (for example, 27g)
(For example, y = 7) is set as the value of the articulation position y.
Thus, the articulation position (x, y) (for example, (2, 7)) is determined.

上記ニューラル・ネットワークの学習は、学習アルゴ
リズムとして誤差逆伝播アルゴリズムを用いて次のよう
にして行う。すなわち、まず、入力層21のユニット24,2
4,…,24にBPF群からの出力値（例えば、目標値（2,7）
を出力するような値）を入力する。次に、出力層23の調
音位置ｘの座標値を出力するユニット26a,26b,…,26gの
うち、調音位置ｘの目標値（例えば、ｘ＝２）に対応す
るユニット（例えば、ユニット26b）にのみ“1"を入力
し、他のユニット（例えば、ユニット26a,26c,…,26g）
には“0"を入力する。一方、出力層23の調音位置ｙの座
標値を出力するユニット27a,27b,…,27gのうち、調音位
置ｙの目標値（例えば、ｙ＝７）に対応するユニット
（例えば、ユニット27g）にのみ“1"を入力し、他のユ
ニット（例えば、ユニット26a,26b,…,26f）には“0"を
入力する。そして、誤差逆伝播アルゴリズムによって目
標値（例えば、（2,7））に対する結合係数の変化量が
求められ、新たな各ユニット間の結合係数が設定される
のである。上述の操作を数回繰り返すと、やがて、出力
層23の調音位置ｘの座標値を出力するユニット26a,26b,
…,26gのうち、目標値（例えば、ｘ＝２）に対応したユ
ニット（例えば、ユニット26b）のみが“1"を出力して
他のユニットは“0"を出力する一方、出力層23の調音位
置ｙの座標値を出力するユニット27a,27b,…,27gのう
ち、目標値（例えば、ｙ＝７）に対応したユニット（例
えば、ユニット27g）のみが“1"を出力して他のユニッ
トは“0"を出力するように、入力層21と中間層22との間
および中間層22と出力層23との間の各結合係数が設定さ
れる。Learning of the neural network is performed as follows using an error back propagation algorithm as a learning algorithm. That is, first, the units 24, 2 of the input layer 21
Output values from the BPF group (for example, target value (2,7)
Is output.) Next, among the units 26a, 26b,..., 26g that output the coordinate values of the articulation position x of the output layer 23, the unit (for example, the unit 26b) corresponding to the target value (for example, x = 2) of the articulation position x Enter “1” only for other units (for example, units 26a, 26c, ..., 26g)
Input "0". On the other hand, among the units 27a, 27b,..., 27g that output the coordinate values of the articulation position y of the output layer 23, the units (for example, the unit 27g) corresponding to the target value (for example, y = 7) of the articulation position y Only “1” is input, and “0” is input to other units (for example, units 26a, 26b,..., 26f). Then, the amount of change of the coupling coefficient with respect to the target value (for example, (2, 7)) is obtained by the error back propagation algorithm, and a new coupling coefficient between the units is set. When the above operation is repeated several times, eventually, the units 26a, 26b, which output the coordinate value of the articulation position x of the output layer 23,
, 26g, only the unit (eg, unit 26b) corresponding to the target value (eg, x = 2) outputs “1” and the other units output “0”, while the other unit outputs “0”. Among the units 27a, 27b,..., 27g that output the coordinate values of the articulation position y, only the unit (for example, the unit 27g) corresponding to the target value (for example, y = 7) outputs “1” and outputs the other. The coupling coefficients between the input layer 21 and the intermediate layer 22 and between the intermediate layer 22 and the output layer 23 are set so that the unit outputs “0”.

上述のような学習を、種々の話者や言語について行
う。そうすると、最終的に上記結合係数の値は収束し
て、殆どの単語中の母音における周波数成分の入力に対
して正しい調音位相（x,y）を出力するようになる。す
なわち、上述のようにして十分な学習が行われたニュー
ラル・ネットワークでは、上記各BPF群の出力値（単語
中の母音の周波数成分）から直接調音位置を生成する規
則が自動的に作成されている。したがって、学習後のニ
ューラル・ネットワークは総ての入力値に対して正しい
調音位置（x,y）を出力することができるのである。The learning as described above is performed for various speakers and languages. Then, finally, the value of the coupling coefficient converges, and a correct articulation phase (x, y) is output with respect to the input of the frequency component of the vowel in most words. That is, in the neural network that has been sufficiently trained as described above, a rule for directly generating an articulation position from the output value (frequency component of a vowel in a word) of each BPF group is automatically created. I have. Therefore, the trained neural network can output the correct articulation position (x, y) for all input values.

このように、本実施例においては、単語中の母音にお
けるBPF群の出力値（すなわち、周波数成分）から、単
語中の母音の調音位置の値を生成することができる。し
たがって、本実施例によれば、音声の周波数成分に基づ
いて話者や言語に依存しない音声の特徴量である調音位
置を直接抽出することができる。As described above, in the present embodiment, the value of the articulation position of a vowel in a word can be generated from the output value (ie, frequency component) of the BPF group in the vowel in the word. Therefore, according to the present embodiment, it is possible to directly extract an articulation position, which is a feature amount of a sound independent of a speaker or a language, based on a frequency component of the sound.

上記第２実施例において、入力層21に音声波形のピッ
チの値を入力するユニットを一つ追加し、入力層21のユ
ニット数を17としてもよい。この場合、上記ピッチの値
は従来より行われている手法である自己相関法やケプス
トラム法等の手法を用いて求めればよい。男性のピッチ
の平均値と標準偏差は夫々125Hzおよび20.5Hzであり、
女性のピッチの平均値と標準偏差は夫々男性の値の約２
倍に等しい。また、一般的に、女性の周波数成分は男性
の周波数成分より高域側にシフトした形になっている。
これは、男性と女性との生理的差異によるものである。
したがって、ピッチの値の情報を入力層21に入力するこ
とによって、ニューラル・ネットワークによって男性の
周波数成分と女性の周波数成分とが正規化され、単語中
の母音の調音位置をより正確に算出することが期待でき
る。In the second embodiment, one unit for inputting the pitch value of the audio waveform to the input layer 21 may be added, and the number of units of the input layer 21 may be set to 17. In this case, the value of the pitch may be obtained by using a conventional technique such as an autocorrelation method or a cepstrum method. The average and standard deviation of the male pitch are 125 Hz and 20.5 Hz, respectively.
The average and standard deviation of the female pitch is about 2 times the male value, respectively.
Equal to twice. In general, female frequency components are shifted to higher frequencies than male frequency components.
This is due to physiological differences between men and women.
Therefore, by inputting the information of the pitch value into the input layer 21, the male frequency component and the female frequency component are normalized by the neural network, and the articulation position of the vowel in the word can be calculated more accurately. Can be expected.

上記第２実施例において、入力層21のユニット24,24,
…にBPF群からの出力値を入力するようにしている。し
かしながら、この発明はこれに限定されるものではな
く、第１実施例において用いたホルマント周波数を入力
して、ホルマント周波数から調音位置を求めるようにし
てもよい。In the second embodiment, the units 24, 24,
… Input the output value from the BPF group. However, the present invention is not limited to this, and the formant frequency used in the first embodiment may be input, and the tone position may be obtained from the formant frequency.

上記第２実施例においては、入力層21あるいは中間層
22に入力された入力信号を変換する際の変換関数とし
て、シグモイド関数を用いている。しかしながら、この
発明においてはこれに限定されるものではなく、閾値関
数を用いてもよい。In the second embodiment, the input layer 21 or the intermediate layer
A sigmoid function is used as a conversion function when converting the input signal input to 22. However, the present invention is not limited to this, and a threshold function may be used.

上記第２実施例において、ニューラル・ネットワーク
を構成する入力層21,中間層22および出力層23のユニッ
トの数は、ニューラル・ネットワークへ入力する周波数
成分を出力するBPF群のチャンネル数や調音位置を表す
ためのｘ座標値およびｙ座標値の数に応じて適当に変更
してもよいことは言うまでもない。In the second embodiment, the number of units of the input layer 21, the intermediate layer 22, and the output layer 23 constituting the neural network depends on the number of channels and the articulation position of the BPF group for outputting the frequency components to be input to the neural network. Needless to say, it may be appropriately changed according to the number of x coordinate values and y coordinate values to be represented.

＜発明の効果＞以上より明らかなように、この発明の音声の特徴抽出
装置においては、母音／子音区間判定部，変換テーブル
および調音位置抽出部を設けて、上位調音位置抽出部の
第１の調音位置算出手段によって、上記母音／子音区間
判定部で母音区間とあると判定された発声内容が既知の
対象母音に係る上記変換テーブル上の位置を上記変換テ
ーブルを用いて求め、そして更に、この求められた変換
テーブル上の位置に在る要素値を用いて当該対象母音の
調音位置を算出する一方、第２の調音位置算出手段によ
って、発声内容および調音位置が既知である複数の単母
音の周波数成分に基づいて、上記母音／子音区間判定部
によって母音区間であると判定され、且つ、上記第１の
調音位置算出手段で調音位置が算出されない発声内容が
既知の対象母音の調音位置を、この対象母音の周波数成
分から算出するようにしたので、母音の周波数成分から
話者や言語に依存しない音声の特徴量である調音位置を
生成することができる。<Effects of the Invention> As is apparent from the above description, the speech feature extraction device of the present invention includes the vowel / consonant interval determination unit, the conversion table, and the articulation position extraction unit, and the first of the upper articulation position extraction units. Using the conversion table, the articulatory position calculation means obtains a position on the conversion table relating to the target vowel whose utterance content determined to be a vowel section by the vowel / consonant section determination unit is known. While the articulation position of the target vowel is calculated using the element value at the position on the obtained conversion table, the second articulation position calculation means calculates a plurality of single vowels whose utterance content and articulation position are known. Based on the frequency component, the vowel / consonant interval determination unit determines that the vowel is a vowel interval, and the utterance content whose articulation position is not calculated by the first articulation position calculation unit is known. Since the articulation position of the target vowel is calculated from the frequency component of the object vowel, it is possible to generate an articulation position, which is a feature amount of speech independent of a speaker or a language, from the frequency component of the vowel.

したがって、この発明を用いれば、音声の発声に固有
な特徴量を簡単な処理で精度良く抽出することができ
る。Therefore, according to the present invention, it is possible to accurately extract a characteristic amount unique to the utterance of a voice by a simple process.

[Brief description of the drawings]

第１図はこの発明に係る音声認識装置における一実施例
のブロック図、第２図は種々の母音の調音位置を示す
図、第３図は日本語母音における第１ホルマント周波数
と第２ホルマント周波数との関係を示す図、第４図はあ
る話者の種々の母音における第１ホルマント周波数と第
２ホルマント周波数との関係を示す図、第５図は単語中
における一つの母音の調音位置算出動作のフローチャー
ト、第６図は第５図における単語中の母音／ア／の調音
位置算出ルーチンのフローチャート、第７図は第５図に
おける単語中の母音／イ／の調音位置算出ルーチンのフ
ローチャート、第８図は第５図における単語中の母音／
ウ／の調音位置算出ルーチンのフローチャート、第９図
は第５図における単語中の母音／エ／の調音位置算出ル
ーチンのフローチャート、第10図は第５図における単語
中の母音／オ／の調音位置算出ルーチンのフローチャー
ト、第11図はニューラル・ネットワークの製造の説明図
である。１……マイクロホン、２……アンプ、３……音響分析
部、４……母音／子音区間判定部、５……調音位置抽出部、６……変換式格納部、７……子
音パターン変換部、８……子音パターン格納部、９……パターンマッチング
部、 10……標準パターン格納部、11……結果表示部、21……
入力層、 22……中間層、23……出力層、24……入力層のユニッ
ト、 25……中間層のユニット、26a〜26g……調音位置ｘを出
力するユニット、 27a〜27g……調音位置ｙを出力するユニット。FIG. 1 is a block diagram of an embodiment of a speech recognition apparatus according to the present invention, FIG. 2 is a diagram showing articulation positions of various vowels, and FIG. 3 is a first formant frequency and a second formant frequency in Japanese vowels. FIG. 4 is a diagram showing a relationship between a first formant frequency and a second formant frequency in various vowels of a speaker, and FIG. 5 is an articulation position calculating operation of one vowel in a word. FIG. 6 is a flowchart of a routine for calculating an articulation position of a vowel / a / in a word in FIG. 5, FIG. 7 is a flowchart of an articulation position calculation routine of a vowel / a / in a word in FIG. FIG. 8 shows vowels / words in FIG.
9 is a flowchart of a routine for calculating the articulation position of /, FIG. 9 is a flowchart of a routine for calculating the articulation position of a vowel / d / in a word in FIG. 5, and FIG. 10 is an articulation of a vowel / o / in a word in FIG. FIG. 11 is a flowchart of the position calculation routine, and FIG. 11 is an explanatory diagram of the production of the neural network. DESCRIPTION OF SYMBOLS 1 ... Microphone 2 ... Amplifier 3 ... Sound analysis part 4 ... Vowel / consonant section determination part 5 ... Articulation position extraction part 6 ... Conversion type storage part 7 ... Consonant pattern conversion part , 8 ... consonant pattern storage unit, 9 ... pattern matching unit, 10 ... standard pattern storage unit, 11 ... result display unit, 21 ...
Input layer, 22 ... Intermediate layer, 23 ... Output layer, 24 ... Input layer unit, 25 ... Intermediate layer unit, 26a-26g ... Unit for outputting articulation position x, 27a-27g ... Articulation A unit that outputs the position y.

Claims

(57) [Claims]

A vowel / consonant section determining unit for determining a vowel section and a consonant section of an input voice in a voice feature extracting apparatus for frequency-analyzing an input voice and extracting a voice feature from an obtained frequency component; A table for converting at least two frequency components of a vowel into an articulation position, the conversion table including element values of a plurality of single vowels whose utterance contents are known, and the articulation of a target vowel whose utterance content is known A vowel / consonant interval determination unit that determines the position of the vowel in the conversion table according to the target vowel whose known vowel content is known. Is calculated based on the two frequency components of the single vowel and the position on the conversion table, and using the obtained element value at the position on the conversion table. A first articulation position calculating means for calculating an articulation position of the target vowel; and a vowel section by the vowel / consonant section determination unit based on frequency content of a plurality of single vowels whose utterance content and articulation position are known. And a second articulation position for calculating the articulation position of the target vowel whose utterance content is not calculated by the first articulation position calculation means from the frequency component of the object vowel according to a predetermined algorithm. An audio feature extraction device, comprising: calculation means for extracting, from frequency components of the audio, an articulation position which is a feature amount of audio independent of a speaker or language.