JPS6075891A

JPS6075891A - Phoneme segmentation

Info

Publication number: JPS6075891A
Application number: JP58183697A
Authority: JP
Inventors: 秋場　国夫; 入間野　孝雄; 金指　久則
Original assignee: Computer Basic Technology Research Association Corp
Current assignee: Computer Basic Technology Research Association Corp
Priority date: 1983-10-01
Filing date: 1983-10-01
Publication date: 1985-04-30
Also published as: JPH022156B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（産業上の利用分野）本発明は、入力音声の音素認識を行い、その結果を用い
て音節、単語、文章などを認識するだめの音素セグメン
テーション方法に関するものである。DETAILED DESCRIPTION OF THE INVENTION (Field of Industrial Application) The present invention relates to a phoneme segmentation method for performing phoneme recognition of input speech and recognizing syllables, words, sentences, etc. using the results.

（従来例の構成とその問題点）第１図は従来のセグメンテーション方法を実施する装置
の機能を概略的に示している。以下この従来例について
第１図とともに説明する。第１図において１はセグメン
テーション用の・やラメータ抽出部、２は単語辞書中の
辞書項目Ｄｎの読ｊ、込牟５は音素毎のセグメンテーン
ヨフルール部、６は音素標準パタンを記憶ｌ〜だ音素標
準・々タン部、７はフレーム毎の標準・やタンとの距離
を算出する距離計算部を示す。音素標準パタンは各音素
の各種・ぐラメータにおける平均値を表わしたものであ
り、予め予備実験等により作成しておく。あるフレーム
におけるＮ個の・Ｑラメ−りをｃｌ　、　ｃ２・・・ｃ
Ｎとし、ある音素Ｘのそれらパラメータにおける平均値
をμＸ＋　＋μＸ２　＋・・・μＸＮとすると、そのフ
レームと音素Ｘの標準・ぐタンとの距離ｄｘ（Ｃ＋＋の
・・・ｃＮ）はで計算される。(Structure of conventional example and its problems) FIG. 1 schematically shows the functions of an apparatus that implements a conventional segmentation method. This conventional example will be explained below with reference to FIG. In FIG. 1, 1 is a parameter extraction unit for segmentation, 2 is a reading of a dictionary item Dn in a word dictionary, 5 is a segmentation rule part for each phoneme, and 6 is a memory for phoneme standard patterns. 7 indicates a distance calculation unit that calculates the distance from the standard y-tan for each frame. The phoneme standard pattern represents the average value of each phoneme in various parameters, and is created in advance through preliminary experiments. N ・Q rays in a certain frame are cl, c2...c
Let N be the average value of those parameters for a certain phoneme .

入力音声とマツチングする対象単語を九とすると、単語
辞書中の辞書項目Ｄｎを辞書項目読込部２で読み込み、
Ｄｎの音素系列に応じて音素境界候補位置決定部４でＰ
ｎ、からＰｎｍ’−ｉでセグメンテーションヲ行つ。音
素セグメンテーションはセグメンテーションルール部５
のルールに従って上記０式で得゛られたｄｘ（（−＋＋
Ｃ２・・ＣＮ　）の変化などを／？ラメータとして行う
。しかしながら上記従来例では、標準パタンはパラメー
タの平均値のみを持つために、・ぐラメータの値のバラ
ツキの度合いは０式による距離計算には反映されず、バ
ラツキの大きい音素ではセグメンテーションエラーが起
こるという欠点があった。If the target word to be matched with the input speech is 9, the dictionary item Dn in the word dictionary is read by the dictionary item reading unit 2,
In accordance with the phoneme sequence of Dn, the phoneme boundary candidate position determining unit 4
Segmentation is performed from n to Pnm'-i. Phoneme segmentation is performed by segmentation rule section 5.
dx ((-++
Changes in C2...CN)/? Performed as a parameter. However, in the above conventional example, since the standard pattern has only the average value of the parameters, the degree of variation in the parameter values is not reflected in the distance calculation using formula 0, and segmentation errors occur for phonemes with large variations. There were drawbacks.

（発明の目的）本発明は上記従来例の欠点を除去するものであり、セグ
メンテーションエラーに起因する音声認識誤シを減少す
ることを目的とするものである。(Objective of the Invention) The present invention eliminates the drawbacks of the above-mentioned conventional example, and aims to reduce speech recognition errors caused by segmentation errors.

（発明の構成）本発明は上記目的を達成するために、入力音声がら得ら
れたパラメータ時系列を音素標準・やターン及び単語辞
書と照合し、各辞書項目を構成する音素のセグメンテー
ションに、各音素がそのノｅラメータの値を示す確率密
度値を使うことにより、セグメンテーション誤りを減少
させるという効果を得るものである。(Structure of the Invention) In order to achieve the above object, the present invention compares the parameter time series obtained from the input speech with the phoneme standard, turn and word dictionary, and performs segmentation of the phonemes constituting each dictionary item. By using the probability density value indicating the value of the e-parameter of a phoneme, the effect of reducing segmentation errors is obtained.

（実施例の説明）以下に本発明の一実施例について、図面とともに説明す
る。第２図において１〜５は第１図の従来例と同じであ
る。８は音素毎の平均値及び共分散値からなる音素標準
・ぐタンを記憶した音素標準バタン部、９は音素毎の確
率密度計算部である。(Description of Embodiment) An embodiment of the present invention will be described below with reference to the drawings. In FIG. 2, numerals 1 to 5 are the same as in the conventional example shown in FIG. Reference numeral 8 denotes a phoneme standard button unit that stores phoneme standard numbers consisting of average values and covariance values for each phoneme, and 9 is a probability density calculation unit for each phoneme.

第２図において、ノソラメータ抽出部ｌで入力音声をフ
レーム単位に分析し、パラメータを抽出し、確率密度計
算部９でそのパラメータが各音素から生成される確率密
度を計算する。次に辞書項目読込部部２で認識対象単語
Ｄｎを単語辞書部３から読ミ、その構成音素毎のセグメ
ンテーションルール５に従い前記確率密度を・やラメー
タとして境界を検出する。In FIG. 2, a nosolameter extraction unit 1 analyzes input speech frame by frame and extracts parameters, and a probability density calculation unit 9 calculates the probability density that the parameters are generated from each phoneme. Next, the dictionary item reading unit 2 reads the recognition target word Dn from the word dictionary unit 3, and detects the boundary using the probability density as a parameter according to the segmentation rule 5 for each constituent phoneme.

ここで確率密度計算部９における計算は以下０式によっ
て行う。あるフレームにおけるｎ個のパラメータをベク
トルＣで表わす。音素標準・やタン部８に記憶されてい
るある音素Ｘのそれら・ぐラメータにおける平均値をか
Ｘ１共分散値をΣ工と表わすと、音素Ｘに対するベクト
ルＣの確率密度φｘ　（Ｃ）は・・・■ で表わされる。但し分布形はガウス分布を仮定している
。Here, the calculation in the probability density calculation section 9 is performed using the following equation 0. A vector C represents n parameters in a certain frame. If we express the average value of a phoneme X in the parameters of a certain phoneme X stored in the phoneme standard/tan section 8 and the covariance value of X1 as Σ, then the probability density φx (C) of the vector C for the phoneme X is ...Represented by ■. However, the distribution shape is assumed to be Gaussian distribution.

本実施例においては、辞書音素系列の音素毎の確率密度
の変化を用いることにより、パラメータのバラツキの度
合を考慮した境界検出ができ、セグメンテーション、精
度が向上するという利点がある。In this embodiment, by using changes in the probability density for each phoneme in the dictionary phoneme series, boundary detection can be performed taking into account the degree of variation in parameters, which has the advantage of improving segmentation and accuracy.

（発明の効果）本発明は上記のような構成であシ、以下に示す効果が得
られるものである。各辞書項目を構成する音素のセグメ
ンテーションに、各音素がそのパラメータの値を示す確
率密度値を使うことにより、バラツキの大きい音素にお
けるセグメンテーション精度を向上できるという利点を
有する。(Effects of the Invention) The present invention has the above configuration, and provides the following effects. By using the probability density value in which each phoneme indicates the value of its parameter for segmentation of the phonemes constituting each dictionary entry, there is an advantage that segmentation accuracy for phonemes with large variations can be improved.

[Brief explanation of the drawing]

第１図は従来の音素セグメンテーション方法を実施する
装置の機能を示すブロック図、第２図は本発明の一実施
例における音素セグメンテーション方法を実施する装置
の機能ズロックを示す図である。１・・りぐラメータ抽出部、２・・・辞書項目読込部、
３・・・単語辞書部、４・・・音素境界候補決定部、５
・・・セグメンテーションルール部、８・・・音素標準
ノ？タン部、９・・・確率密度計算部。 ’４５．ｉ；。第１図第２図FIG. 1 is a block diagram showing the functions of an apparatus for implementing a conventional phoneme segmentation method, and FIG. 2 is a diagram showing the functions of an apparatus for implementing a phoneme segmentation method according to an embodiment of the present invention. 1... Rigram meter extraction section, 2... Dictionary item reading section,
3... Word dictionary section, 4... Phoneme boundary candidate determination section, 5
...Segmentation rules part, 8... Phoneme standard? Tan part, 9... Probability density calculation part. '45. i;. Figure 1 Figure 2

Claims

[Claims]

The parameters of the input speech were extracted, and the parameters were compared with a standard phoneme pattern to calculate the probability density that the above-mentioned mother makeup is generated from each phoneme, and the word to be recognized was expressed as a symbol string for each phoneme. A phoneme segmentation method characterized in that each dictionary item in a word dictionary is matched with input speech, and phoneme boundaries are detected using the value of the probability density as a parameter according to a dictionary phoneme sequence constituting each dictionary item.