JPS62116999A

JPS62116999A - Syllable unit voice recognition equipment

Info

Publication number: JPS62116999A
Application number: JP60256564A
Authority: JP
Inventors: 宮岡　伸一郎; 舩橋　誠寿
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1985-11-18
Filing date: 1985-11-18
Publication date: 1987-05-28

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は、音声認識装置に係り、特に、従来量も良く用
いられていた単語単位音声認識装置を適用することが困
難であった大語い向は音声認識に好適な、単音節単位音
声認識装置に関する。[Detailed Description of the Invention] [Field of Application of the Invention] The present invention relates to a speech recognition device, and particularly to speech recognition devices for large words, to which it has been difficult to apply word-by-word speech recognition devices, which have been commonly used in the past. The present invention relates to a monosyllable unit speech recognition device suitable for speech recognition.

[Background of the invention]

従来の単音節単位認識方式では、不特定話者の単音節単
位入力による大語い単語音声認識（信学会論文誌、　Ｖ
ｏｌ、Ｊ　６５−Ｄ、　＆１２）　（７）ように単音節
中で比較的安定している母音をまず検出し、調音結合に
よる変形に対撚するため、子音・母音。In the conventional monosyllable unit recognition method, large word speech recognition using monosyllable unit input from unspecified speakers (Transactions of the Institute of IEICE, V
ol, J 65-D, &12) As in (7), vowels that are relatively stable in a single syllable are first detected, and consonants and vowels are twisted in order to deal with deformation due to articulatory combination.

あるいは子音・母音・子音の連鎖についてＤＰマツチン
グ等の方法で認識するのが一般であった。Alternatively, it was common to recognize chains of consonants, vowels, and consonants using methods such as DP matching.

しかしこの方法では、子音区間、母音区間のセグメンテ
ーションが必要であり、これが困難なこと、また単一の
距離情報に基づいてマツチングを行うため、本質的では
あるが微小な特徴量が失われてしまい認識率が向上しな
いという難点があった。However, this method requires segmentation of consonant and vowel intervals, which is difficult, and because matching is performed based on a single distance information, essential but minute features are lost. The problem was that the recognition rate did not improve.

[Purpose of the invention]

本発明の目的は、母音区間、子音区間のセグメンテーシ
ョンという困難な問題を回避し、微弱ではあるが本質的
な特徴量を失うことなくきめ細く利用する二とが可能な
、認識率の高い単音節単位音声認識装置を提供すること
にある。The purpose of the present invention is to avoid the difficult problem of segmentation of vowel intervals and consonant intervals, and to create a single syllable with a high recognition rate that can be used finely without losing the essential features, although they are weak. The object of the present invention is to provide a unit speech recognition device.

[Summary of the invention]

入力音声を音響分析によりスペクトル（１０次元程度の
ベクトル表現）の系列に変換する。スペクトル系列は、
標準的なパターンとの比較によって量子化され、プリミ
ティブと呼ばれる記号列に変換される。ここまでは、従
来の認識方式でも用いられていた方法である０本発明で
は、ブリミテイブの系列の認識処理において、セグメン
テーション、マツチングという従来の方法をとらず、生
成文法に基づいた認識を行うものとする。各単音節（ｌ
ａｌ＝　　１Ｋａｌなど）はプリミティブの系列として
表現される。そこで各単音節対応に、事前にプリミティ
ブ系列を生成する規則を定めておく。入力されたプリミ
ティブ系列がどの生成規則によって生成されるかを解析
することにより単音節の認識が可能となる。Input speech is converted into a series of spectra (approximately 10-dimensional vector representation) by acoustic analysis. The spectral series is
They are quantized by comparison with standard patterns and converted into symbol strings called primitives. The methods described so far have been used in conventional recognition methods.In the present invention, the conventional methods of segmentation and matching are not used in the recognition process of primitive sequences, but recognition is performed based on generative grammar. do. Each monosyllable (l
al=1Kal, etc.) is expressed as a sequence of primitives. Therefore, rules for generating primitive sequences for each monosyllable are determined in advance. Monosyllables can be recognized by analyzing which production rule is used to generate the input primitive sequence.

[Embodiments of the invention]

以下、本発明の実施例を第１図から第３図により説明す
る。Embodiments of the present invention will be described below with reference to FIGS. 1 to 3.

単音節認識の手順全体を第１図に示す。入力音声６は、
音響分析部１によって処理され、スペクトルの系列に変
換される。スペクトルの系列は、量子化部２によりプリ
ミティブの標準パターン３と比較されプリミティブの系
列に変換される。プリミティブの系列は、生成規則５を
用いてパーサ４により解析され、単音節系列７として出
力される。The entire procedure for monosyllable recognition is shown in Figure 1. The input audio 6 is
It is processed by the acoustic analysis unit 1 and converted into a series of spectra. The spectral sequence is compared with the primitive standard pattern 3 by the quantization unit 2 and converted into a primitive sequence. The sequence of primitives is analyzed by the parser 4 using production rule 5 and output as a monosyllabic sequence 7.

パーサ４による解析を第２図に従い説明する。The analysis by the parser 4 will be explained with reference to FIG.

パーサに入力されるプリミティブの系列は、第２図（ａ
）に示すような記号列である。第２図（ａ）は１Ｋａｌ
のようなＣｖ音節の例を示しており、先行音節との間で
発生する無音部と、子音部、母音部からなる。この中で
、無音部と母音部は発話状況によって時間的に伸縮され
るので、任意長として表現している。この単音節に対す
る生成規則を第２図（ｂ）に示す。ここで、Ｓは開始記
号。The sequence of primitives input to the parser is shown in Figure 2 (a
) is a symbol string as shown in Figure 2 (a) is 1Kal
An example of a Cv syllable is shown, which consists of a silent part that occurs between the preceding syllable, a consonant part, and a vowel part. Among these, silent parts and vowel parts are expressed as arbitrary lengths because they are expanded or contracted in time depending on the speech situation. The production rule for this monosyllable is shown in FIG. 2(b). Here, S is the start symbol.

Ａ、Ｂなどの大文字は非終端記号（書き換え可能な変数
）ｙ　ａ＋　ｂなどの小文字は終端記号（プリミティブ
）である。第２図（ａ）の記号列に第２図（ｂ）の生成
規則を適用して、終端記号を非終端記号で置き換えてい
けば、開始記号Ｓが最終的に得られる。このとき、該記
号列は該生成規則の規定する文法に従うこととなり、該
生成規則に対応する単音節として認識される。Capital letters such as A and B are non-terminal symbols (rewritable variables), and lowercase letters such as y a+ b are terminal symbols (primitives). By applying the production rule shown in FIG. 2(b) to the symbol string shown in FIG. 2(a) and replacing terminal symbols with non-terminal symbols, the starting symbol S is finally obtained. At this time, the symbol string follows the grammar prescribed by the production rule, and is recognized as a monosyllable corresponding to the production rule.

記号列は実際には連続的に入力されることになるので、
一つの単音節の対するパージング（生成規則による解析
）の終了時をまって、次の単音節に対するパージングの
開始時とする。従って、単音節区間、あるいは母音、子
音区間などを事前にセグメント化しておくことは不要と
なる。Since the symbol string is actually input continuously,
The time when parsing (analysis using production rules) for one single syllable ends is the time when parsing for the next single syllable begins. Therefore, it is not necessary to segment monosyllabic sections, vowel and consonant sections, etc. in advance.

生成規則は、各音節対応に、すなわち単音節の数だけ作
成しておくことになる。調音結合による記号列の変化の
バリエーションも生成規則の中に埋め込んでおくことと
する。また、パージングの方法については、雑音や発声
状況によって生ずる記号の脱落、挿入に対撚するため、
エラーコレクティングなパージング法を用いるのが望ま
しい。Production rules are created for each syllable, that is, for each syllable. Variations in changes in symbol strings due to articulatory combinations are also embedded in the production rules. In addition, regarding the parsing method, in order to prevent symbols from being dropped or inserted due to noise or speech conditions,
It is desirable to use an error-correcting purging method.

第３図に、本発明に係る単語音声認識装置の実施例を示
す。４，５，７．９はそれぞれ、音響分析用、量子化用
、パージング用、マツチング用の装置である。また、６
，８．１０はそれぞれ、プリミティブの標準パターンを
格納したメモリ、生成規則を格納したメモリ、単語辞書
用のメモリである。入力音声１１は、まず４でスペクト
ル系列に変換され、該スペクトル系列は５でプリミティ
ブ系列に変換される。７では、該プリミティブ系列がパ
ージングされ、単音節として認識される。FIG. 3 shows an embodiment of the word speech recognition device according to the present invention. 4, 5, and 7.9 are devices for acoustic analysis, quantization, purging, and matching, respectively. Also, 6
, 8.10 are a memory storing standard patterns of primitives, a memory storing production rules, and a memory for a word dictionary, respectively. The input speech 11 is first converted into a spectral sequence in step 4, and the spectral sequence is converted into a primitive sequence in step 5. At 7, the primitive sequence is parsed and recognized as a monosyllable.

単音節系列は、９で単語辞書とマツチングをとられた結
果単語として認識される。３は、各装置間の通信用のバ
スであり、装置間の同期制御、バス制御はコントローラ
１によって行われる。認識結果は、ホスト計算機２によ
って利用される。The monosyllable sequence is matched with the word dictionary in step 9 and is recognized as a word. 3 is a bus for communication between devices, and the controller 1 performs synchronization control and bus control between the devices. The recognition result is used by the host computer 2.

〔Effect of the invention〕

本発明によれば、単音節単位の認識を行うので語い数が
増大したときにもテンプレート作成の手間、マツチング
に要する手間が増えることがないという効果があること
は言うまでもないが、単音節単位の認識方式として従来
の方法と比較した場合にも、（１）音節区間、あるいは母音区間、子音区間のセグメ
ンテンテーションを行うという困難な問題を回避するこ
とができる（２）距離最小化のマツチングを行う際、失われてしま
うような、微弱ではあるが本質的な特徴量をきめ細く利
用できるという効果があり、認識率の向上が期待される°。According to the present invention, recognition is performed in units of monosyllables, so it goes without saying that even when the number of words increases, the effort required for template creation and matching does not increase. When compared with conventional methods as a recognition method for This has the effect of making it possible to make detailed use of weak but essential features that would otherwise be lost when performing a process, and is expected to improve the recognition rate.

[Brief explanation of drawings]

第１図は、単音節単位音声認識の手順図、第２図（ａ）
は、プリミティブ系列の例、第２図（ｂ）は（ａ）に対
応する生成規則、第３図は単語音声認識装置の実施例で
ある。冨　　１　　図冨　Ｚ　　図（Ｌ）ｒｕｔ　−１’ｌ　ｌ”＋Ｑｚ４ａｂ＋　−ｂ、ｂｔ−
ｂｚ（ｂ）Figure 1 is a procedure diagram for monosyllable unit speech recognition, Figure 2 (a)
is an example of a primitive sequence, FIG. 2(b) is a generation rule corresponding to FIG. 2(a), and FIG. 3 is an example of a word speech recognition device. Tomi 1 Figure Tomi Z Figure (L) rut -1'l l"+Qz4ab+ -b, bt-
bz(b)

Claims

[Claims]

1. A means for acoustically analyzing input speech, a means for quantizing a vector sequence obtained as a result of the acoustic analysis, and a method for recognizing monosyllables by analyzing the symbol string obtained as a result of the quantization using production rules. A syllable-by-syllable speech recognition device characterized by having means for performing.