JPS5849996A

JPS5849996A - Average phonemic pattern preparation system

Info

Publication number: JPS5849996A
Application number: JP56147775A
Authority: JP
Inventors: 三船　義照; 英一坪香; 樺沢　哲; 裕一谷口
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1981-09-21
Filing date: 1981-09-21
Publication date: 1983-03-24
Also published as: JPS6335996B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】記音韻パターン系列の音韻に対応する音韻区間ごとに、
音韻の種類と長さについて平均を求めて平均音韻母ター
ン系列を作成することによシ、複数の音韻ぞターン系列
に共通な局部的な特徴（子音の脱落，挿入および置換（
調音結合を含む）や母音の置換（調音結合を含む）およ
び長さ等）を保存しつつ全体的な長さについても平均を
とることを可能とし、例えば音声認識装置等の単語音韻
辞書作成に適用することによシ、単語に固有な音韻変動
を吸収する単語音韻辞書の自動作成を行ない単語認識率
の向上を図ることを目的とする。[Detailed Description of the Invention] For each phoneme section corresponding to the phoneme of the phoneme pattern series,
By creating an average phoneme vowel turn sequence by averaging the types and lengths of phonemes, we can identify local features common to multiple phoneme vowel turn sequences (elimination, insertion, and substitution of consonants).
This makes it possible to average the overall length while preserving vowel substitutions (including articulatory combinations), vowel substitutions (including articulatory combinations), length, etc., and is useful for creating word phonological dictionaries for speech recognition devices, etc. By applying this method, the aim is to automatically create a word phonological dictionary that absorbs phonological fluctuations specific to words, and to improve the word recognition rate.

従来の音韻認識に基づく音声認識装置の単語音韻辞書の
作成例を第１図〜第２図を用いて行なう。An example of creating a word phoneme dictionary for a speech recognition device based on conventional phoneme recognition will be described using FIGS. 1 and 2. FIG.

第１図（、）は、単語音韻の音韻長の違いを考慮しない
単純な音韻表記（ローマ字表現）による単語音韻辞書を
示しておシ、同図（ｂ）は、子音と母音（無音（・））
の音韻長の違いを１：３、無音１２と考慮した音韻表記
（ローマ字表現）による単語音韻辞書を示している。し
かしながら実際の単語における発声音韻の系列は、第２
図（、）　（ｂ）に示すように単語ごとに音韻の長さが
異なシまた、単語ごとに共通な子音の脱落（第２図（、
）では語頭の有声子音（Ｂ）の脱落）や挿入および置換
（第２（８）（、）では語中の有声子音（りが無声子音
（イ）に置換、第２図（ｂ）では語中の半母音（ト）が
有声子音（Ｇ）に置換）や母音の置換（第２図（ｂ）で
は、母音（０）が母音（、）に置換）する現象等が鳴る
。このため従来のような音韻表記ニよる単語辞書とのノ
やターン・マツチングラ行なうと、単語に固有な音韻長
の違いや音韻の変形のために、時間軸の正規化を行なう
ような・母ターン・マツチング（Ｄｐマツチング等）を
行なっても、マツチングの得点が低下した。そのために
結果として単語認識誤りを生じ、音韻認識に基づく音声
認識装置の認識率を低下させる原因となっていた。Figure 1 (,) shows a word phonology dictionary based on simple phonological notation (romanization) that does not take into account differences in the length of word phonemes, and Figure 1 (b) shows consonants and vowels (silence). ))
This shows a word phoneme dictionary based on phoneme notation (romanization) that takes into account the difference in phoneme length of 1:3 and 12 silences. However, the sequence of phonetic sounds in actual words is
As shown in Figure 2(b), the length of the phoneme varies from word to word.
), the voiced consonant (B) at the beginning of the word is dropped), and insertion and substitution (in the second (8) (,), the voiced consonant in the middle of the word (RI is replaced with a voiceless consonant (i); in Figure 2 (b), the The middle semi-vowel (G) is replaced with a voiced consonant (G)) and vowel replacement (in Figure 2 (b), the vowel (0) is replaced with a vowel (,)). When performing turn matching with a word dictionary based on phonological notation such as Matching, etc.) resulted in a decrease in the matching score.This resulted in word recognition errors, causing a decrease in the recognition rate of speech recognition devices based on phoneme recognition.

以上のことから本発明は、箪語に固有な音韻長の違いや
音韻の変形を表わす単語音韻辞書を、複数の話者の単語
発声音韻ぐターンから自動的にイ均して作成し、音韻認
識に基づく音声認識装置等の認識率を向上させるもので
ある。Based on the above, the present invention creates a word phoneme dictionary that expresses the differences in phoneme length and phoneme deformation specific to doodles by automatically equalizing them from the phoneme turns of word utterances by multiple speakers, and This improves the recognition rate of speech recognition devices and the like based on recognition.

本発明における全体的な構成例を第３図に示し、以下に
説明を行なう。An example of the overall configuration of the present invention is shown in FIG. 3, and will be described below.

第３図において、１は音韻認識部であシ、音素認識部３
、音韻系列マージ部４、音素標準やターン記憶部５から
なる。２は音韻平均部であシ、音韻系列平均部６、単語
音韻辞書７からなる。In FIG. 3, 1 is a phoneme recognition unit, and phoneme recognition unit 3
, a phoneme sequence merging section 4, and a phoneme standard and turn storage section 5. 2 consists of a phoneme averaging section, a phoneme sequence averaging section 6, and a word phoneme dictionary 7.

音韻認識部１において、入力音声は次式、（１）のよう
に特徴ベクトルの系列として表わされているものとする
。In the phoneme recognition unit 1, it is assumed that the input speech is expressed as a series of feature vectors as shown in the following equation (1).

Ｘ１Ｘ２・・・ＸＮ・・・（１）各々のＸｉ、ｉ＝１．・・・、Ｎはそれぞれｍ次元のベ
クトルであって、Ｘｉ＝（Ｘｉ１，・・・，ｘｉｍ）と衣わされる。ここで、特徴ベクトルとしては例えばｍ
チャンネルのバンドパスフィルタノ出力ｘ１（ｔ），・
・・，ｘｊ（ｔ），・・・，ｘｍ（ｔ）を時間標本化し
たものと考えることができる。また、特徴ベクトルで表
わされる音声の区間をフレームということがある。（１
）式の添字１，２．・・・、Ｎは時間を表わすパラメー
タである。X1X2...XN...(1) Each Xi, i=1. ..., N are m-dimensional vectors, and are expressed as Xi=(Xi1,..., xim). Here, for example, the feature vector is m
Channel bandpass filter output x1(t),・
..., xj(t), ..., xm(t) can be considered to be time-sampled. Furthermore, a section of audio represented by a feature vector is sometimes referred to as a frame. (1
) subscripts 1, 2. ..., N is a parameter representing time.

（１）式において各特徴ベクトルはズカ音声の特定の音
素に対応すると考えることができる。In equation (1), each feature vector can be considered to correspond to a specific phoneme of Zuka speech.

たとえば、／ｎａｒａ／という単語が入力されたときに
（１）式の系列において、Ｘ１，・・・，＋Ｘｉ１■／ｎ／Ｘｉ１＋１，・・・，Ｘｉ２■／ａ／Ｘｉ２＋１，・・・，Ｘｉ３■／ｒ／Ｘｉ３＋１，・・・，ＸＮ■／ａ／という対応づけができる。ここで、１＜ｉ１＜ｉ２＜ｉ
３＜Ｎである。For example, when the word /nara/ is input, in the series of equations (1), The following correspondence can be made: ■/r/ Xi3+1,...,XN■/a/. Here, 1<i1<i2<i
3<N.

以下では、特徴ベクトルを音素に対応させることを狭義
の音素認識という。混乱を生じない限シ単に音素認識と
いうことがある。また、音声・母ターンから音素の区間
を決定することをセグメンテーションという。上述の例
では、１，・・・，ｉ１｜ｉ１＋１，・・・，ｉ２｜ｉ２＋１
，・・・，ｉ３｜ｉ３＋１，・・・，Ｎとセグメントされたことになる。ここで１は音素区間の
境界を表わす記号である。さらに、狭義の音素認識とセ
グメンテーションにもとすいて入力音声ノリ一ンを音素
の系列に変換することを広義の音素認識と−う。すなわ
ち、（１）式で表わされる音声パターンは音素認識部３
に取シ込まれ、音素標準・母ターン記憶部５に格納され
ているすべての音素標準・母ターンとの類似度を計算し
、最も類似性の高い音素をその特徴ベクトルの認識結果
とし、必要ならば第二候補第三候補なども計算する。音
素系列マ−ジ部４は、音素認識部３の出力を受けとシ同
一の音素が継続していたならばこれらを一つの音素にま
とめるなどの処理をおこない音素系列を出力する。In the following, associating feature vectors with phonemes is referred to as phoneme recognition in a narrow sense. It can simply be called phoneme recognition as long as it does not cause confusion. In addition, the process of determining phoneme intervals from sounds and vowel turns is called segmentation. In the above example, 1,...,i1|i1+1,...,i2|i2+1
,...,i3|i3+1,...,N. Here, 1 is a symbol representing a boundary between phoneme sections. Furthermore, in addition to phoneme recognition and segmentation in a narrow sense, phoneme recognition in a broad sense refers to converting an input speech line into a sequence of phonemes. In other words, the speech pattern expressed by equation (1) is determined by the phoneme recognition unit 3.
The degree of similarity with all the phoneme standards and mother turns imported into the phoneme standard and mother turn storage unit 5 is calculated, and the phoneme with the highest similarity is set as the recognition result of its feature vector, and the necessary If so, calculate the second candidate, third candidate, etc. When the phoneme sequence merging unit 4 receives the output from the phoneme recognition unit 3, if the same phoneme continues, it performs processing such as combining these into one phoneme and outputs a phoneme sequence.

さて説明の都合上、音韻認識部１の処理を詳述する。音
素標準パターン記憶部５に格納されているパターンをｙ
、、ｙ、、・・・、ｙＭとする。各々のパターンはｍ次
元のベクトルであシＹｊ＝（ｙｊ１，・・・，ｙｊｍ）ｊ＝１，・・・，Ｍと表わされているものとする。Now, for convenience of explanation, the processing of the phoneme recognition unit 1 will be described in detail. The pattern stored in the phoneme standard pattern storage unit 5 is
,,y,...,yM. It is assumed that each pattern is an m-dimensional vector and is expressed as Yj=(yj1,...,yjm) j=1,...,M.

今、（１）式で示される音声パターンがｍチャンネルの
バンドパスフィルタによって生成され、標準パターンも
またバンドパスフィルタの出力に基づいて作成されてい
たとするならば、ｉ番目の特徴ベクトルＸｉと音素標準
？ターンＹｊとの類似度をたとえばで定義されるユークリッド距離によって表現することが
できる。音声パターンおよび音素標準パターンがバンド
パスフィルタの出力であるとし、類似度をユークリッド
距離で評価することにしよう。Now, if the speech pattern shown by equation (1) is generated by an m-channel bandpass filter, and the standard pattern is also created based on the output of the bandpass filter, then the i-th feature vector Xi and the phoneme standard? The degree of similarity with turn Yj can be expressed by the Euclidean distance defined, for example. Assume that the speech pattern and the phoneme standard pattern are the output of a bandpass filter, and let us evaluate the degree of similarity using Euclidean distance.

ＭｉｎＳ（ｉ，ｊ）１≦ｊ≦Ｍさて、ｉ番目の特徴ベクトルの認識音韻はをみたす最小
の音韻Ｊ１ｉであるとし、必要ならば第二候補音韻を次
に小さな類似度を持つ音韻ｊ２とし、第三候補以後も同
様に定義する゛。MinS(i,j) 1≦j≦M Now, assume that the recognized phoneme of the i-th feature vector is the minimum phoneme J1i that satisfies , and if necessary, set the second candidate phoneme to the phoneme j2 with the next smallest similarity. , the third and subsequent candidates are defined in the same way.

このような処理をすべての特徴ベクトルＸ１。This process is applied to all feature vectors X1.

・・・、ＸＮ、について計算し、その結果を音韻認識部
１の出力とする。..., XN, and the result is output from the phoneme recognition unit 1.

音素は母音（／ａ／、／ｉ／、・・・、／ｏ／）長母音
（／ａｉ／、／ｉｕ／、・・・）、半母音（／ｙ／。The phonemes are vowels (/a/, /i/, ..., /o/), long vowels (/ai/, /iu/, ...), and semi-vowels (/y/).

／ｗ／）有声子音（／ｍ／、／ｎ／、／ｇ／。/w/) voiced consonants (/m/, /n/, /g/.

／ｚ／、・・・）、無声子音、撥音、促音、無晋等に分
類される。これらの音素のうちで母音はその発声区間が
比較的長く、前後の音素の影響を受けることが少なく安
定であるのに対し、子音は前後の母音（日本語では語頭
を除き、子音は母音に囲まれた形で出現する）の影響を
うけることがきわめて大きく不安定なぐターンである。/z/,...), voiceless consonants, plosives, consonants, mujins, etc. Among these phonemes, vowels have a relatively long utterance interval and are stable as they are not affected by the phonemes before and after them, whereas consonants are the vowels before and after them (in Japanese, except for the beginning of a word, consonants are similar to vowels). This is an extremely unstable turn that is affected by (which appears in a surrounded form).

次に、音韻平均部について説明する。Next, the phoneme average part will be explained.

音韻平均部２において、音韻系列平均部６は、１つの単
語に対応した複数の入力音韻やターン系列の平均を計算
する。In the phoneme averaging section 2, the phoneme sequence averaging section 6 calculates the average of a plurality of input phonemes and turn sequences corresponding to one word.

単語音韻辞書７は、音韻系列平均部６で計算した平均音
韻・クターン系列を単語ごとに記憶する。The word phoneme dictionary 7 stores the average phoneme/cuttern sequence calculated by the phoneme sequence averaging unit 6 for each word.

以下に、音韻系列平均部６の働きについて説明する。The function of the phoneme sequence averaging section 6 will be explained below.

音韻系列平均部６は、第４図に示したように、表記音韻
ノリーン記憶部８．入力音韻系列音韻区間分離部９．入
力音韻系列記憶部１０．音韻区間平均部１１．平均音韻
系列記憶部１２からなる。As shown in FIG. 4, the phoneme sequence averaging unit 6 includes a written phoneme storage unit 8. Input phoneme sequence phoneme segment separation unit 9. Input phoneme sequence storage unit 10. Phonological interval average part 11. It consists of an average phoneme sequence storage section 12.

表記音韻パターン記憶部８は、平均音韻・母ターン系列
を求めようとする単語表記音韻ぐターン系列を単語ごと
に記憶する。例えば、第１図（、）に示したように、ロ
ーマ字表現を使用し単語が、”牧場”あるいは１海水浴
”であるならば、／ＢＯ・ＫｕＺｉＹＯ／／ＫＡｉＳｕｉＹＯ・Ｋｕ／を記憶する。（／は単語区切マーク）入力音韻系列音韻区間分離部９は、はじめに表記音韻ぐ
ターン記憶部８と入力音韻系列記憶部１０を一定の特徴
音韻区間（例えばＣＶＣ区間あるいはＶＣＶ区間：Ｖ母
音区間、Ｃ子音区間）に分割して、両者の特徴音韻区間
の対応づけを行なった後に、対応づけられた特徴音韻区
間について音韻区間（Ｃ区間、■区間等）の対応づけを
行ない、最終的に対応づけられた音韻区間について表記
音韻ぐターンの１音韻と入力音韻／ターンの複数音韻の
対応づけを行なって、入力音韻パターン系列を表記音韻
パターン系列の１音韻に対応した音韻区間に分離する。The written phoneme pattern storage unit 8 stores, for each word, a word written phoneme-turn sequence for which an average phoneme/mother turn sequence is to be determined. For example, as shown in Figure 1 (,), if the word is "ranch" or "1 sea bathing" using the Roman alphabet, /BO・KuZiYO/ /KAiSuiYO・Ku/ is stored. (/ is a word break mark) The input phoneme series phoneme segment separation unit 9 first divides the written phoneme turn storage unit 8 and the input phoneme sequence storage unit 10 into certain characteristic phoneme intervals (for example, CVC interval or VCV interval: V vowel interval, C consonant). After dividing the phoneme into two sections (intervals) and associating the characteristic phoneme intervals of both, the associated characteristic phoneme intervals are associated with phoneme intervals (C interval, ■ interval, etc.), and finally the correspondence is established. The input phoneme pattern sequence is separated into phoneme sections corresponding to one phoneme of the written phoneme pattern sequence by associating one phoneme of the turn with the written phoneme and a plurality of phonemes of the input phoneme/turn for the phoneme interval.

音韻区間平均部１１は、入力音韻系列記憶部１０と平均
音韻系列記憶部１２の音韻区間が分離された音韻母ター
ン系列について、音韻区間ごとに、音韻の種別（音韻認
識部で識別される音韻の種別）と音韻継続長（音韻のフ
レーム数）の平均をとシ、再び平均音韻系列記憶部１２
に出力する。The phoneme segment averaging unit 11 calculates the type of phoneme (the phoneme identified by the phoneme recognition unit) for each phoneme segment for the phoneme vowel turn series in which the phoneme segments in the input phoneme sequence storage unit 10 and the average phoneme sequence storage unit 12 are separated. After calculating the average of the phoneme duration (type of phoneme) and phoneme duration (number of phoneme frames), the average phoneme sequence storage unit 12
Output to.

−回目の入力音韻パターン系列については、平均計算を
行なわずに音韻区間の分離を行なった後すぐに、平均音
韻系列記憶部１２への転送を一行なう。Regarding the -th input phoneme pattern sequence, immediately after the phoneme sections are separated without performing average calculation, it is transferred to the average phoneme sequence storage section 12.

ここで、入力音韻系列音韻区間分離部９の働きについて
詳細な説明を以下に行なう。Here, a detailed explanation of the function of the input phoneme sequence phoneme segment separation section 9 will be given below.

入力音韻系列音韻区間分離部９は、特徴音韻区間をＣＶ
Ｃ区間あるいはＶＣＶ区間とする場合には、はじめに第
１表に示した様に表記音韻・ぐターン系列と入力音韻パ
ターン系列について子音区間と母音区間の分離を行なう
。The input phoneme sequence phoneme segment separation unit 9 converts the characteristic phoneme intervals into CV
In the case of creating a C interval or a VCV interval, first, as shown in Table 1, the written phoneme/guttern series and the input phoneme pattern series are separated into consonant intervals and vowel intervals.

表記音韻パターン系列については、日本語の音節の性質
から、子音区間については、語頭が母音で始まる場合と
、無音（・）を子音と考えると、一般的には、第１表（
、）に示したように、Ｃｉ＝ＣＲｉＣＲｉ＋１１≦ｉ≦−ＮＣＲｊ（−
（ＣＵ０Ｕ・）　１≦ｊ≦Ｎ＋１と表わすことができ
る。また母音区間は、２重母音を考慮すると、一般的に
は、第１表（ｂ）に示したように、Ｖｉ＝ＶＲｉＶＲｉ＋１１≦ｉ≦ＮＶＲｉ＝（−
（ＶＵ０）１≦ｉ≦Ｎ＋１と表わすことができる
。入力音韻パターン系列については、音韻が変形するこ
とと音韻の継続長が一定でないことから、子音区間は一
般的には、第１表（ｂ）に示したように、ＣＰｉ＝ＣＰＲｉＣＰＲｉ＋１・・・ＣＰＲｎ　１≦ｉ
≦ＭＣＰＲｉ（−（ＣＵ０Ｕ・）　　　　　　　　１
≦ｊ≦Ｍ＋ｎと表わすことができる↓ 母音区間も同様に、一般的には第１表（ｂ）に示したよ
うに、ＶＰｉ＝■ＰＲｉＶＰＲｉ＋１・・・ＶＰＲｍ１≦ｉ
≦ＭＶＰＲｊ（−（ＶＵ０）
１≦ｊ≦Ｍ＋ｎと表わすことができる。Regarding the written phonological pattern series, due to the nature of Japanese syllables, regarding consonant intervals, if the beginning of a word starts with a vowel and silence (・) is considered a consonant, Table 1 (
, ), Ci=CRiCRi+1 1≦i≦−NCRj(−
(CU0U.) It can be expressed as 1≦j≦N+1. In addition, considering diphthongs, the vowel interval is generally as shown in Table 1 (b), Vi=VRiVRi+1 1≦i≦NVRi=(-
(VU0) It can be expressed as 1≦i≦N+1. Regarding the input phoneme pattern series, since the phoneme changes and the duration of the phoneme is not constant, the consonant interval is generally as shown in Table 1 (b), CPi=CPRiCPRi+1...CPRn 1≦i
≦MCPRi(-(CU0U・) 1
≦j≦M+n↓ Similarly, the vowel interval is generally expressed as shown in Table 1 (b), VPi=■PRiVPRi+1...VPRm 1≦i
≦MVPRj(-(VU0)
It can be expressed as 1≦j≦M+n.

入力音韻パターン系列は、音韻認識部１で説明したよう
に、母音は比較的継続長が長く安定であり、子音は継続
長が短かく不安定である。しかし子音は、音韻の変化点
としての情報を多く持っている。In the input phoneme pattern series, as explained in the phoneme recognition unit 1, vowels have a relatively long duration and are stable, and consonants have a short duration and are unstable. However, consonants contain a lot of information as points of phonological change.

つ１ＢＣＶＣやＶＣＶのような特徴音韻区間によって、
音韻区間の対応づけを行なうと子音の脱落、挿入および
置換や母音の置換に対しても正確な対応づけが可能とな
る。1.By characteristic phonetic intervals such as BCVC and VCV,
Correlation of phoneme intervals allows accurate mapping of consonant omissions, insertions, and substitutions, as well as vowel substitutions.

ＣＶＣによる特徴音韻区間の対応づけを、第２表と第３
表を用いて説明を以下に行なう。Tables 2 and 3 show the correspondence between characteristic phoneme intervals using CVC.
The explanation will be given below using a table.

表記音韻パターン系列と入力音韻ぐターン系列のＣＶＣ
区間が対応づけられた終端の子音区間が第２表（、）に
示したようにそれぞれＣ１およびＣ１ｔであったとする
と、表記音韻やターン系列のＣ１■ｋＣｋ＋。CVC of written phonological pattern sequence and input phonological turn sequence
Assuming that the terminal consonant intervals to which the intervals are associated are C1 and C1t, respectively, as shown in Table 2 (,), the written phoneme or turn sequence is C1■kCk+.

区間（Ｒｋ区間）あるいはＣｋｖｋＣｋ＋、ｖｋ＋、Ｃ
ｋ＋２区間（Ｒｋ＋１区間）と入力音韻・母ターン系列
のＣＰｔｖＰｔＣＰｔ＋１区間（Ｉ、区間）あるいはＣ
ＰｔｖＰｔＣＰｔ＋１ｖＰｔ＋、ＣＰｔ＋２区間（’ｔ
＋１区間）の対応づけを行なう。子音区間Ｃｋ＋１が脱
落した場合にはＲｋ＋１１を区間の対応、子音区間ＣＰ
ｔ＋１が挿入された場合にはＲｋ、■ｔ＋４区間の対応
が起こる。これら対応づけを簡単に行なうには第２表（
ｂ）に示したように、子音が音韻の変化点として情報を
多くもっていることから、（Ｃｋ＋４．ＣＰｔ＋、）、
（Ｃｋ＋１゜ＣＰｔ＋２）、（Ｃｋ＋□＃ＣＰｔ＋１）
の子音区間の類似度を計算し、その、最大値が、（Ｃｋ
＋１．Ｃ１ｔ＋１）の場合にはＲｋ、Ｉｔ区間の即応、
（Ｃｋ＋、・ＣＰｔ＋２）の場合にはＲ，ｋ”ｔ＋１区
間の対応、（Ｃｋ＋２．ＣＰｔ＋１）の場合にはＲｋ＋
１１’を区間の対応づけを行なう。これらの対応づけを
直接的に行なうには、第３表に示したように、（Ｒｋ＃
Ｉｚ）＃（Ｒｋ’ｅＪｔ＋１）＃（Ｒｋ＋１＃’ｔ）の
類似度を計算し、その最大値をとる組み合わせを対応区
間とする。section (Rk section) or CkvkCk+, vk+, C
k+2 interval (Rk+1 interval) and CPtvPtCPt+1 interval (I, interval) or C of the input phoneme/mother turn sequence
PtvPtCPt+1vPt+, CPt+2 section ('t
+1 interval). If consonant section Ck+1 is dropped, use Rk+11 as the corresponding section and consonant section CP.
When t+1 is inserted, correspondence between Rk and ■t+4 sections occurs. Table 2 (
As shown in b), since consonants have a lot of information as phonological change points, (Ck+4.CPt+,),
(Ck+1°CPt+2), (Ck+□#CPt+1)
The similarity of the consonant intervals of is calculated, and the maximum value is (Ck
+1. C1t+1), immediate response in Rk, It section,
In case of (Ck+,・CPt+2), correspondence of R,k”t+1 interval, in case of (Ck+2.CPt+1), Rk+
11' is used to associate sections. To make these correspondences directly, as shown in Table 3, (Rk#
The similarity of Iz)#(Rk'eJt+1)#(Rk+1#'t) is calculated, and the combination that takes the maximum value is defined as the corresponding interval.

ＣＶＣ区間の対応づけが終了すると、Ｒｋ、ｓＩｚ区間
の場合は、ＣｋｖｋとＣＰｔｖＰｔを鎖からはずし、Ｒ
ｋ。When the association of CVC sections is completed, in the case of Rk and sIz sections, Ckvk and CPtvPt are removed from the chain, and R
k.

”ｔ＋１区間の場合は１０ｋｖｋと”ｐｔｖｐｔ”ｐｔ
＋１ｖｐｔ＋１を鎖からはずし、Ｒｋ＋１．Ｉｔ区間の
場合にはＣｋｖｋＣｋ＋、ｖｋ＋、とＣＰｔｖＰ′ｔヲ
鎖からはずし、すれ以降の区間についても再び同様な対
応づけを行なう。10kvk and ptvpt for t+1 section
+1vpt+1 is removed from the chain and Rk+1. In the case of the It section, CkvkCk+, vk+, and CPtvP't are removed from the chain, and the same correspondence is made again for subsequent sections.

ＶＣＶによる特徴音韻区間の対応づけを、第４表と第５
表を用いて説明を以下に行なう。Tables 4 and 5 show the correspondence between characteristic phoneme intervals using VCV.
The explanation will be given below using a table.

表記音韻・母ターン系列と入力音韻パターン系列のＶＣ
Ｖ区間が対応づけられた終端の母音区間が第４表（、）
に示したようにそれぞれＶ、およびｖ、ｔであったとす
ると、表記音韻ぐターン系列のＶｋＣｋ＋１ｖｋ＋１区
間（Ｒ２′区間）あるいは””ｋ＋１”ｋ＋１区間（”
ｋ＋１区間）と入力音韻・母ターン系列のｖｐｔ区間（
Ｉ、′区間）あるいは■ＰｔＣＰｔ＋１ｖＰｔ＋１区間
（”Ｐ＋１区間）の対応づけを行なう。子音区間Ｃｋ＋
、が脱落した場合にはＲ’に＋１”　ｐ’区間の対応、
子音区間ＣＰｔ＋１が挿入された場合には　Ｒｋ／”Ｐ
＋１区間の対応づけが起こる。これら対応づけを簡単に
行なうには第４表（ｂ）に示したように、母音が安定し
ているということから、（Ｖｋ、Ｖ、ｔ）。VC of written phoneme/mother turn sequence and input phoneme pattern sequence
The terminal vowel interval to which the V interval is associated is shown in Table 4 (,)
As shown in , if V, v, and t are respectively VkCk+1vk+1 section (R2' section) or ""k+1"k+1 section ("
k+1 interval) and the vpt interval of the input phoneme/mother turn sequence (
Consonant interval Ck+
, if R' is dropped, +1"p' interval correspondence to R',
If consonant interval CPt+1 is inserted, Rk/”P
+1 interval correspondence occurs. To easily make these correspondences, as shown in Table 4 (b), since the vowels are stable, (Vk, V, t).

（ｖｋ、ｖｐｌｖｐｌ＋１）、（ｖｋｖｋ＋１、ｖｐｌ
）ノ母音区間の類似度を計算し、その最大値が、（ｖｋ
、ｖＰｔＶｐｌ＋１）ノ場合にはＲＩｋ、工／、区間の
対応、（Ｖ、。(vk, vplvpl+1), (vkvk+1, vpl
) The similarity of the vowel interval is calculated, and its maximum value is (vk
, vPtVpl+1), then the correspondence between RIk and the interval is (V,).

ＶＰｔｖＰｔ＋、）ノ場合にハＲ′に＋”ｊ＋１区間の
対応、（ｖｋｖｋ＋１．ｖＰｔ）ノ場合ニハＲ（＋１．
エン区間）対応づけを行なう。これらの対応づけを直接
的に行なうには、第５表に示したように、（ＲＩｋ、Ｉ
’ｌ）、（ＲＩｋ、■Ｉｔ＋、）、（Ｒ′に＋１．均）
の類似度を計算し、その最大値をとる組み合わせを対応
区間とする。In the case of VPtvPt+, ), there is a correspondence of +"j+1 interval to R', and in the case of (vkvk+1.vPt), there is a correspondence of 2R(+1.
(En interval) Make a correspondence. To make these correspondences directly, as shown in Table 5, (RIk, I
'l), (RIk, ■It+,), (+1.equal to R')
The similarity is calculated, and the combination that takes the maximum value is defined as the corresponding interval.

ＶＣＶ区間の対応づけが終了すると、Ｒ′、Ｃｌ、区間
の場合は、ｖｋＣｋ＋１とｖＰｔＣＰｔ＋１を鎖からは
ずしＲＫ札、Ｉｌ＋１区間の場合は、■ｋＣｋ＋、とＶ
ＰｔＣＰｔ＋１ｖＰｔ＋１ＣＰｔ＋２を鎖からはずしＲ
”ｋ＋１ｌ１区間の場合は％ｖｋＣｋ＋１ｖｋ＋ＩＣｋ
＋２＋ｖｐｔＣｐｔ＋１？鎖からはずし、それ以降の区
間についても再び同様な対応づけを行なう。When the association of the VCV sections is completed, in the case of the R', Cl, section, vkCk+1 and vPtCPt+1 are removed from the chain, and in the case of the Il+1 section, ■kCk+, and V
Remove PtCPt+1vPt+1CPt+2 from the chain and R
”For k+1l1 section, %vkCk+1vk+ICk
+2+vptCpt+1? Remove it from the chain and perform the same mapping again for subsequent sections.

簡単かつ正確な特徴音韻区間の対応づけを行なうには、
ＣｖＣ区間の対応とＶＣＶ区間の対応づけを混在して考
えることが出来る。先に説明したＣＶＣ区間の対応づけ
を子音区間のみの類似度で決定し、ＶＣＶ区間の対応づ
けを母音区間のみの類似度で決定するものとする。そし
てはじめはＣＶＣ区間の対応づけを適用し、著しい子音
区間の脱落や挿入が起こシ前記の方式による対応づけが
不可能となった時点で、ｖＣ■区間の対応づ忙を適用し
、前記の方式による対応づけが終了した時点で、再びＣ
ＶＣ区間の対応づけを行う方法が考えられる。To easily and accurately match characteristic phoneme intervals,
It is possible to consider a combination of correspondence between CvC intervals and correspondence between VCV intervals. Assume that the correspondence between the CVC sections described above is determined based on the similarity of only the consonant sections, and the correspondence between the VCV sections is determined based on the similarity of only the vowel sections. Initially, the mapping of CVC sections is applied, and when significant consonant sections are dropped or inserted, and when it becomes impossible to map using the above method, the mapping of vC ■ sections is applied, and the above method is applied. When the matching by method is completed, C
A method of associating VC sections can be considered.

表記音韻パターン系列と入力音韻ぞターン系列の間のＣ
ＶＣあるいはＶＣＶ特徴音韻区間の対応づけが終了する
と、特徴音韻区間内での子音区間の脱落＋挿入はただち
に検出できるため、特徴音韻区間内で両系列の子音区間
どうしおよび母音区間どうしの対応づけが行なえる。表
記音韻ノリーン系列の子音、母音区間の定義および入力
音韻ぐターンの定義から、第６表（ａ）　（ｂ）に処理
を示したように第５図に示した音韻間の類似度をもとに
、表記音韻母ターン系列の１音韻と入力音韻ノリーン系
列の複数音韻の対応づけを行なうことかで今る。C between the written phonological pattern sequence and the input phonological turn sequence
Once the correspondence between VC or VCV characteristic phoneme intervals is completed, dropouts and insertions of consonant intervals within the characteristic phoneme interval can be immediately detected, so that the correspondence between consonant intervals and vowel intervals of both series within the characteristic phoneme interval can be easily detected. I can do it. Based on the definitions of the consonant and vowel intervals of the written phoneme Noreen series and the definition of the input phoneme turn, the similarity between the phonemes shown in Figure 5 is calculated as shown in Table 6 (a) and (b). This is accomplished by associating one phoneme of the written phoneme vowel turn sequence with multiple phonemes of the input phoneme Noreen sequence.

以上のように本発明においては、複数の入力音韻パター
ン系列と、表記音韻／タ一ン系列との特徴汁韻区間（ｅ
ｖｅ区間、ＶＣＶ区間等）の対応づけをした後に、音韻
区間（Ｃ区間、■区間）の対応づけを行ない、最終的に
表記音韻母ターン系列の１音韻と入力音韻パターン系列
の複数音韻との対応づけを行って、入力音韻ｉターン系
列を分割する。そして分割され六音韻区間ごとに音韻裡
別と音韻継続長の平均を行りて複数の入力音韻やターン
系列の平均を決定することにょシ、例えば単語に固有な
音韻変動を表わす、単語音韻辞書や作成を自動的に行な
うことができる。さらに音韻認識に基づく音声認識装置
における単語音韻辞書に適用することによって８識率の
向上を図ることが可能である。As described above, in the present invention, the characteristic rhyme interval (e
ve interval, VCV interval, etc.), then phonological intervals (C interval, The input phoneme i-turn sequence is divided by matching. Then, the phoneme classification and phoneme duration are averaged for each of the six phoneme sections to determine the average of multiple input phonemes and turn sequences. and creation can be done automatically. Furthermore, by applying the present invention to a word phoneme dictionary in a speech recognition device based on phoneme recognition, it is possible to improve the recognition rate.

[Brief explanation of the drawing]

第１図は、従来のローマ字書音韻表記による単語音韻辞
書を示す図で、同図（ａ）は、１音順を１文字で表わし
た場合を示し、同図（ｂ）は、子旨と母音の音韻長の違
いを文字数で表わした場合を示す図、第２図は、単語の
音韻認識結果の音韻系列を示す図で、同図（ａ）は“牧
場”の７名の話者の発声音韻系列を示し、同図（ｂ）は
“海水浴”の７名の話者の発声音韻系列を示す図、第３
図は、本発明における平均音韻パターン系列作成方式の
構成例を示す図、第４図は、第３図に示した本発明にお
ける構成例の音韻系列平均部の詳細な構成例を示す図、
第５図は、音韻間の類似度を示す図。１・・・音韻認識部、２・・・音韻平均部、３・・・音
率認識部、４・・・マージ部、５・・・音素標準パター
ン記憶部、６・・・音韻系列平均部、７・・・単語音韻
辞書、８・・・表記音韻・母ターン記憶部、９・・・入
力音ｍｌ系列音韻区間分離部、１０・・・入力音韻系列
記憶部、１１・・・音韻区間平均部、１２・・・平均音
韻系列記憶部。特許出願人松下電器産業株式会社Figure 1 shows a word phonological dictionary based on the conventional Roman alphabet phonetic notation. Figure 1 (a) shows the case in which the order of one sound is represented by one character, and Figure 1 (b) shows the case where the order of one sound is represented by one character. Figure 2 is a diagram showing the difference in the phoneme length of vowels expressed by the number of characters. Figure 2 is a diagram showing the phoneme sequence of the phoneme recognition results for words. FIG.
4 is a diagram showing a configuration example of the average phoneme pattern sequence creation method according to the present invention, FIG. 4 is a diagram showing a detailed configuration example of the phoneme sequence averaging section of the configuration example according to the present invention shown in FIG. 3,
FIG. 5 is a diagram showing the degree of similarity between phonemes. 1... Phoneme recognition section, 2... Phoneme averaging section, 3... Speech rate recognition section, 4... Merging section, 5... Phoneme standard pattern storage section, 6... Phoneme sequence averaging section , 7... Word phoneme dictionary, 8... Written phoneme/mother turn storage section, 9... Input sound ml series phoneme section separation section, 10... Input phoneme sequence storage section, 11... Phoneme section Average part, 12... Average phoneme sequence storage part. Patent applicant Matsushita Electric Industrial Co., Ltd.

Claims

[Claims]

A phoneme recognition means for recognizing phonemes (vowels, voiced consonants, voiceless consonants, silence, etc.) from an input speech turn sequence, a phoneme sequence averaging means for determining the average or turn of a plurality of phoneme/mother turn sequences, The phoneme series averaging means calculates the written phoneme (romanization, etc.) for each phoneme/lean series.
The phoneme interval corresponding to the phoneme is divided into the phoneme Noreen series and the written phoneme/ya turn series as a constant characteristic phoneme interval (CVC).
After dividing into sections (c: consonant, v: vowel), VCV section, etc.) and making a correspondence between the characteristic phoneme sections, the characteristic phoneme section is detected by making a correspondence between the phonemes. An average phoneme method that is characterized by creating an average phoneme-turn sequence of a plurality of phoneme/mother-turn sequences by calculating the average of the phoneme type and phoneme length for each phoneme interval corresponding to the phoneme of the written phoneme. Turn creation method.