JPS6073593A

JPS6073593A - Phoneme dictionary preparation system

Info

Publication number: JPS6073593A
Application number: JP58181359A
Authority: JP
Inventors: 裕二木島; 奈良　泰弘; 小林　敦仁; 晋太木村
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1983-09-29
Filing date: 1983-09-29
Publication date: 1985-04-25

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は音韻認識装置に係り、特に音韻辞書を作成する
場合において特徴ベクトルに対する音１Ｎ記号の付、与
及びそのチェックを自動的に行って。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a phoneme recognition device, and in particular, when creating a phoneme dictionary, automatically attaching and checking sound 1N symbols to feature vectors.

例えば定常音韻の特徴ベクトルを簡単に抽出できるよう
にした音韻辞書作成方式に関する。For example, the present invention relates to a phoneme dictionary creation method that allows easy extraction of feature vectors of stationary phonemes.

[Prior art and problems]

音声認識を行う場合、あらかじめ既知の久方音声を例え
ばミリ秒単位でサンプリングしてその周波数スペクトル
をめて特徴ベクトルを作成し。When performing speech recognition, known speech is sampled in milliseconds, for example, and a feature vector is created by calculating the frequency spectrum.

これらの特徴ベクトルに音韻を示すラベルを付与して辞
書を作成する。A dictionary is created by adding labels indicating phonemes to these feature vectors.

例えば母音や子音というような音韻を登録する音韻辞書
を作成する場合において問題となるのは。For example, the problem arises when creating a phoneme dictionary that registers phonemes such as vowels and consonants.

各時刻における特徴ベクトルの音韻性の同定であリ、従
来は人間がその特徴ベクトルを目視して。The phonological nature of feature vectors at each time has traditionally been identified by humans visually observing the feature vectors.

その音韻の区分を識別し、入力した音韻のラベル付けを
行っていたので非常に煩雑であった。The process involved identifying the phoneme category and labeling the input phoneme, which was very complicated.

また、母音やＳのようにある程度の時間同じ周波数スペ
クトルが続く定常音韻に限った場合、自動的に特徴ベク
トルのラベル付与を行うことも考えられるが、この場合
音韻間の境界部分を除外して極めて定常性の高い部分の
みを自動的Ｇこ抽出しその部分に限って同定を行うもの
であり特徴ベクトルの抽出範囲に限界があった。In addition, in the case of stationary phonemes such as vowels and S that have the same frequency spectrum for a certain period of time, it is possible to automatically label them with feature vectors, but in this case, the boundaries between phonemes are excluded. This method automatically extracts only parts with extremely high stationarity and identifies only those parts, so there is a limit to the extraction range of feature vectors.

いずれにしても、従来のこのような特徴ベクトルのラベ
ル付け、は、音韻認識との関わりが明確でなく、音韻認
識に対する有効性の保証がなく、このようにして作成し
た辞書がどｐ程度有効なものか判断することができなか
った。In any case, the relationship of conventional feature vector labeling to phonological recognition is not clear, and there is no guarantee of its effectiveness for phonological recognition. I couldn't decide what it was.

[Purpose of the invention]

本発明の目的は上記の欠点を改善するため音韻性の同定
は簡略な手法で行ない、その後音声を入力して音韻認識
と同様の手法でこのチェックを行って同定誤りを゛取り
除くことにより１例えば定常音韻の同定を自動的に行っ
て音韻辞書を作成し。The purpose of the present invention is to improve the above-mentioned drawbacks by identifying phonology using a simple method, and then inputting speech and performing this check using a method similar to phonological recognition to eliminate identification errors. Automatically identify stationary phonemes and create a phonological dictionary.

かつ音韻認識時における有効性を保証する音韻辞。and a phonological dictionary that guarantees its effectiveness during phonological recognition.

書作成方式を提供することである。The objective is to provide a method for creating documents.

[Structure of the invention]

この目的を達成するため本発明の音韻辞書作成方式では
、入力音声データを特徴ベクトルの時系列に変換する特
徴ベクトル時系列変換子段と、入力音声データから音声
区間を切り出す音声区間切り出し手段と２発声内容を表
わす音韻記号列と音声データとの対応をとり上記特徴ベ
クトルに音韻記号を付与する音韻記号付与手段と、この
音韻記号の付与された特徴ベクトルを格納する音韻辞書
と、入力音声データの特徴ベクトルが伝達されたときこ
れと類似する特徴ベクトルを音韻辞書より抽出して音韻
記号が一致するか否かを検出するチェック手段を有し、
まず与えられた入力音声データの各時刻の特徴ベクトル
に音韻記号を付与してこれらを音韻辞書に格納し１次に
既知の他の入力音声データから与えられた各時刻の特徴
ベクトルに対して類似する音韻記号付きの特徴ベクトル
を前記音韻辞書より抽出し、音韻性の一致しない場合に
は音韻記号付きの特徴ベクトルデータを音韻辞書より取
り除くことを特徴とする。To achieve this objective, the phonetic dictionary creation method of the present invention includes a feature vector time series converter stage for converting input speech data into a time series of feature vectors, and a speech segment extraction means for cutting out speech segments from the input speech data. A phonological symbol adding means for associating a phonological symbol string representing utterance content with audio data and adding a phonological symbol to the feature vector; a phonological dictionary storing the feature vector to which the phonological symbol has been added; a checking means for extracting a feature vector similar to the transmitted feature vector from a phonetic dictionary to detect whether or not the phonetic symbols match;
First, phonetic symbols are attached to the feature vectors at each time of the given input speech data, and these are stored in a phonological dictionary. First, they are similar to the feature vectors at each time given from other known input speech data. The present invention is characterized in that a feature vector with a phoneme symbol is extracted from the phoneme dictionary, and if the phoneme characteristics do not match, the feature vector data with a phoneme symbol is removed from the phoneme dictionary.

[Embodiments of the invention]

本発明の一実施例構成を第１図および第２図にもとづき
説明する。The configuration of an embodiment of the present invention will be explained based on FIGS. 1 and 2. FIG.

第１図は本発明の一実施例構成図、第２図はその動作説
明図である。FIG. 1 is a configuration diagram of an embodiment of the present invention, and FIG. 2 is an explanatory diagram of its operation.

図中、ｌは音声入力部、２は分析部、３は音声区間切出
部、４は特徴ペクト“ル時系列変換部、５は音韻記号列
入力部、６は音声分割部、７は音韻記号付与部、８は音
韻辞書、９はチェック部、１０は削除部である。In the figure, l is a speech input section, 2 is an analysis section, 3 is a speech segment extraction section, 4 is a feature vector time series conversion section, 5 is a phoneme symbol string input section, 6 is a speech division section, and 7 is a phoneme segmentation section. 8 is a symbol adding section, 8 is a phonetic dictionary, 9 is a checking section, and 10 is a deletion section.

音声入力部１は既知音韻の音声入力信号が入力されるも
のである。The speech input section 1 is to which a speech input signal of known phonemes is input.

分析部２はこの音声入力信号を例えば数７１！、５’で
サンプリングしてそのノくワー強度をめたり、サンプリ
ングデータを特徴ベクトル時系より変換部４に送出した
り、サンプリング時間信号を音声分割部６に出力する等
各種分析を行うものである。The analysis unit 2 converts this audio input signal into, for example, the number 71! , 5' to measure the noise strength, send the sampling data from the feature vector time series to the converter 4, output the sampling time signal to the audio divider 6, and perform various analyzes. be.

音声区間切出部３はこの分析部２．から出力されたパワ
ー情報をもとにして音声区間の切り出しを行うものであ
る。The voice section extraction section 3 is the analysis section 2. The audio section is cut out based on the power information output from the .

特徴ベクトル時系列変換部４は１分析部２から伝達され
たサンプリングデータを例えばフーリエ変換してその周
波数成分を抽出し、特徴ベクトルの時系列を得るもので
ある。The feature vector time series converting unit 4 performs, for example, Fourier transform on the sampling data transmitted from the analysis unit 2 and extracts its frequency components to obtain a time series of feature vectors.

音韻記号列入力部５は、音声入力部１に入力された音声
入力の音韻名を入力するものであり、この音韻名は例え
ばキーボードより入力される。The phoneme symbol string input section 5 is for inputting the phoneme name of the voice input input to the voice input section 1, and this phoneme name is input from, for example, a keyboard.

−声分析部６は、音声区間切出部３により切り出された
音声区間が音韻記号列入力部５から入力された音韻記号
にもとづき音韻単位に分割されるとともにこれに音韻記
号が付与される。- The voice analysis section 6 divides the speech section cut out by the speech section extraction section 3 into phoneme units based on the phoneme symbols inputted from the phoneme symbol string input section 5, and adds phoneme symbols to the phoneme units.

音韻記号付与部７は、特徴ベクトル時系列変換部４から
伝達された特徴ベクトルの時系列に対し音声分割部６か
ら伝達された分割音韻の音韻記号を付与するものである
。The phoneme symbol assigning unit 7 assigns phoneme symbols of divided phonemes transmitted from the speech dividing unit 6 to the time series of the feature vectors transmitted from the feature vector time series converting unit 4.

音韻辞書８は音韻記号付与部７から送出された音韻記号
付特徴ベクトルを格納するものである。The phoneme dictionary 8 stores the feature vectors with phoneme symbols sent from the phoneme symbol adding section 7.

チェック部９は音韻辞書８に格納された特徴ベクトルと
それに付与された音韻記号カイ正確なものか否かをチェ
ックするものであって、音韻辞書８に音韻記号付きの特
徴ベクトルが格納された後に。The checking unit 9 checks whether the feature vectors stored in the phonetic dictionary 8 and the phonetic symbols attached to them are accurate. .

新らたに伝達された既知の音韻の特徴ベクトルとこの特
徴ペク）ルと類似性の高い特徴ベクトルを音韻辞書８か
ら抽出してこれらの音韻記号を比較し、一致しないとき
不一致信号を出力してこの音韻辞書８に格納されている
この特徴ベクトルを肖り除するような制御を行うもので
ある。A newly transmitted known phoneme feature vector and a feature vector that is highly similar to this feature vector are extracted from the phoneme dictionary 8, these phoneme symbols are compared, and if they do not match, a mismatch signal is output. Control is performed to remove this feature vector stored in the phoneme dictionary 8 of the lever.

削除部１０はチェック部９から出力される不一致信号に
より当該特徴ベクトルを音韻辞書８から削除するもので
あり２例えば音韻記号を取り除く等の制御を行うもので
ある。The deletion unit 10 deletes the feature vector from the phoneme dictionary 8 based on the discrepancy signal output from the check unit 9, and performs control such as removing phoneme symbols, for example.

次に本発明の動作を第１図及び第２図により説明する。Next, the operation of the present invention will be explained with reference to FIGS. 1 and 2.

（リ　音韻辞書の作成まず音韻辞書を作成する。この場合、スイッチＳｔ　お
よびＳ２をいずれも音韻記号付与部７側に切換える。そ
れから既知の音韻の音声信号力（音声入力部ｌに入力さ
れ１分析部２で分析され、特徴ベクトル時系列変換部４
で特徴ベクトルの時系列に変換される。他方分析部２か
ら得られたパワー情報が音声区間切出部３に送出されて
音声区間に切出される。このとき別に音韻記号列入力部
５から音韻記号列が入力されるので、音声分割部６でこ
の音声区間が音韻単位に分割され、音韻記号が付与され
る。上記特徴ベクトル時系列変換部４で作成された特徴
ベクトルは音韻記号付与部７に送出されるが、このとき
音声分割部６から音韻単位の分割データとその音韻記号
が送出されるので、これにより時系列の特徴ベクトルに
はそれぞれ音韻記号が付与され、これが音韻辞書８に格
納される。(Li) Creation of a phoneme dictionary First, a phoneme dictionary is created. In this case, switches St and S2 are both switched to the phoneme symbol adding unit 7 side. Then, the voice signal strength of the known phoneme (input to the voice input unit The feature vector time series converter 4
is converted into a time series of feature vectors. On the other hand, the power information obtained from the analysis section 2 is sent to the speech section cutting section 3 and cut out into speech sections. At this time, since a phoneme symbol string is input separately from the phoneme symbol string input section 5, this speech segment is divided into phoneme units by the speech division section 6, and phoneme symbols are assigned. The feature vector created by the feature vector time series converter 4 is sent to the phonetic symbol assigning section 7, but at this time, the speech dividing section 6 sends the segmented data of the phoneme unit and its phonetic symbol. A phoneme symbol is assigned to each time-series feature vector, and this is stored in the phoneme dictionary 8.

このようにして音韻辞書が作成される。In this way, a phonetic dictionary is created.

（２）音韻辞書のチェック上記（りにより作成された音韻辞書８をチェックする場
合、スイッチＳｌ、Ｓ２をそれぞれチェック部９側に切
換える。それから既知の音韻の音声信号が音声入力部１
に入力され、ｔたこの音韻記号列が音韻記号列入力部５
に入力される。このとき特徴ベクトル時系列変換部４よ
り出力される特徴ベクトルの時系列がチェック部９に伝
達されるので。(2) Checking the phoneme dictionary When checking the phoneme dictionary 8 created as described above, switch the switches Sl and S2 to the checking unit 9 side.Then, the audio signal of the known phoneme is sent to the audio input unit 1.
The phonetic symbol string of t is input to the phonetic symbol string input section 5.
is input. At this time, the time series of feature vectors output from the feature vector time series converter 4 is transmitted to the checker 9.

チェック部９はこの特徴ベクＦルともつとも類似性の高
いものを音韻辞書８より抽出するところでチェック部９
には音声分割部６より出力された音韻単位の分割データ
と音韻記号が伝達されるので。The checking unit 9 extracts from the phonetic dictionary 8 those that are highly similar to this feature vector F.
Since the divided data of phoneme units and phoneme symbols outputted from the speech dividing section 6 are transmitted to.

音韻辞書８から抽出された特徴ベクトルの音韻記号がこ
の音声分割部６より伝達された音韻記号と一致するか否
かがチェックされる。そしてこれらの音韻記号が不一致
のときチェック部９は不一致信号を出力し、これにより
削除部ｌＯは音韻辞書８に格納されている当該特徴ベク
トルを削除する。It is checked whether the phonetic symbol of the feature vector extracted from the phonetic dictionary 8 matches the phonetic symbol transmitted from the speech dividing section 6. When these phoneme symbols do not match, the checking section 9 outputs a mismatch signal, and the deletion section 1O deletes the feature vector stored in the phoneme dictionary 8.

例えば第２図に示す如き音韻Ｓの特徴ベクトルｆ＝・ｆ
２・・・ｆｒｎが音韻辞書８と照合されて特徴ベクトル
△Ｓ＋＋△０２・・・△Ｓｍが抽出されたとき、音韻記
号がＣの特徴ベクトル△Ｃｔが音韻辞書８から削除され
ることになる。このようにして既知の複数の音韻による
チェックを繰返すことにより音韻辞書８にに不正確な特
徴ペク）ルが削除され、正確なものが残されることにな
るので、これを使用して正しい音韻識別を行うことがで
きる。For example, the feature vector f=・f of the phoneme S as shown in FIG.
2... When frn is compared with the phonetic dictionary 8 and the feature vector △S++△02...△Sm is extracted, the feature vector △Ct with the phonetic symbol C will be deleted from the phonetic dictionary 8. . In this way, by repeating checks using a plurality of known phonemes, inaccurate features are removed from the phoneme dictionary 8, and accurate ones are left, which can be used to identify correct phonemes. It can be performed.

〔Effect of the invention〕

本発明によれば音韻の同定を自動的に簡単に行うことが
できるのみならず、その不正確さによる誤りを音韻辞書
作成後のチェックにより取り除き。According to the present invention, it is not only possible to automatically and easily identify phonemes, but also to eliminate errors caused by inaccuracies by checking after the phoneme dictionary is created.

正確な特徴ベクトルのみを格納することができるので、
認識時に非常に認識精度の高い、有効な音韻辞書を得る
ことができる。しかも音声分割方式に依存してその性能
の限界ぎりぎりまで音韻の同定を行うことができる。Since only accurate feature vectors can be stored,
During recognition, an effective phonetic dictionary with very high recognition accuracy can be obtained. Furthermore, depending on the speech segmentation method, phoneme identification can be performed to the very limits of its performance.

[Brief explanation of drawings]

第１図は本発明の一実施例構成図、′Ａ−２図はその動
作説明図である。図中、１は音声入力部、２は分析部、３は音声区間切出
部、４は特徴ベクトル時系列変換部、５は音韻記号列入
力部、６は音声分割部、７は音韻記号付与部、８は音韻
辞書、９はチェック部、ＩＱは削除部である。FIG. 1 is a configuration diagram of an embodiment of the present invention, and FIG. 1A-2 is an explanatory diagram of its operation. In the figure, 1 is a speech input section, 2 is an analysis section, 3 is a speech segment extraction section, 4 is a feature vector time series conversion section, 5 is a phoneme symbol string input section, 6 is a speech division section, and 7 is a phoneme symbol addition section. section, 8 is a phoneme dictionary, 9 is a check section, and IQ is a deletion section.

Claims

[Claims]

Feature vector time series conversion means converts input speech data into a time series of feature vectors; speech section extraction means extracts speech sections from the input speech data; A phonetic symbol adding means for adding a phonetic symbol to the feature vector; a phonetic dictionary that stores the feature vector to which the phonetic symbol is added; It has a checking means for extracting phonetic symbols from a dictionary and detecting whether or not they match. First, a phonetic symbol is attached to each quadruple feature vector of given input speech data, and these are stored in a phonetic dictionary. First, a feature vector with a phoneme symbol that is similar to the feature vector at each time given from other known input speech data is extracted from the phoneme dictionary, and if the phoneme characteristics do not match, the phoneme symbol is extracted from the phoneme dictionary. A phonological dictionary creation method characterized by removing feature vector data with a tagged feature from a phonological dictionary.