JPS59168498A

JPS59168498A - Continuous voice recognition system

Info

Publication number: JPS59168498A
Application number: JP58042168A
Authority: JP
Inventors: 市川　熹; 黒須　正明; 義典北原; 淺川　吉章
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1983-03-16
Filing date: 1983-03-16
Publication date: 1984-09-22
Also published as: JPH0552513B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は連続音声認識方式に関し、特に小数の標準パタ
ーンを登録するだけで連続音声の認識を可能とする連続
音声認識方式に関する。DETAILED DESCRIPTION OF THE INVENTION [Field of Application of the Invention] The present invention relates to a continuous speech recognition method, and more particularly to a continuous speech recognition method that enables recognition of continuous speech simply by registering a small number of standard patterns.

[Prior art]

従来、文字レベルで自然に連続発声した音声を認識する
音声認識装置においては、各々個別に離散発声したもの
を標準パターンとして登録し用いていた。Conventionally, in a speech recognition device that recognizes speech that is naturally and continuously uttered at the character level, each individually discrete utterance has been registered and used as a standard pattern.

しかしながら、連続発声中の音声バタ〜ンは、調音結合
などの現象ＶＣより、離散発声の音声パターンとは異な
って来るため、上記離散発声による標準パターンは質的
に満足すべきものとは言えない。However, since the speech pattern during continuous utterances differs from the speech pattern during discrete utterances due to phenomena such as articulatory coupling and VC, the standard pattern using discrete utterances cannot be said to be qualitatively satisfactory.

また、任意の内容の音声を認識するためには、音韻や音
節を認識する必要があるが、このためには、認識の単位
が短く、前記調音結合などによる変形の影響を考慮した
単位、例えば、母音−子音−母音のようなＶＣ■音韻連
鎖などで標準パターンを登録しておくことが望ましい。In addition, in order to recognize speech with arbitrary content, it is necessary to recognize phonemes and syllables, but for this purpose, the recognition unit is short and the unit that takes into account the effects of deformation due to articulatory combination, etc. It is desirable to register standard patterns such as VC ■ phoneme chains such as vowel-consonant-vowel.

しかしながら、上記ＶｃＶ音韻連鎖単位（以下、単にｒ
ｖｃｖ単位」という）は、日本語では少な目に見ても７
００種以上もあり、登録のための利用者の負担が非常傾
大きいという問題がある。However, the above VcV phonological chain unit (hereinafter simply r
"vcv unit") is at least 7 in Japanese.
There are more than 00 types, and there is a problem in that the burden on users to register is extremely heavy.

[Purpose of the invention]

本発明は上記事情に鑑みてなされたもので、その目的と
するところは、従来の連続音声認識方式における上述の
如き問題を、解消し、小数の標準パターンを登録するだ
けで、連続音声の認識を可能とする連続音声認識方式を
提供することにある。The present invention has been made in view of the above circumstances, and its purpose is to solve the above-mentioned problems in the conventional continuous speech recognition method, and to recognize continuous speech by simply registering a small number of standard patterns. The purpose of this invention is to provide a continuous speech recognition method that enables continuous speech recognition.

[Summary of the invention]

本発明の要点は、予め登録された標準パターンとの照合
を行って音声認識を行う連続音声認識方式において、実
用上出現頻度の高い限られた数の標準パターンのみを登
録し、出現頻度の低いパターンに対しては、上で登録さ
れた出現頻度の高い標準パターンの部分パターンを組合
わせて、近似パターンを作成・登録するようにした点に
ある。The key point of the present invention is that in a continuous speech recognition method that performs speech recognition by comparing with pre-registered standard patterns, only a limited number of standard patterns that appear frequently in practical use are registered, and Regarding patterns, approximate patterns are created and registered by combining partial patterns of the standard patterns that appear frequently and are registered above.

なお、上記近似パターンを用いての認識結果が不良であ
った場合には、誤り訂正時の、情報に基づいて対応する
入カバターン中の部分パターンとの差替えを行うことに
より、良好なパターンに作り替えて行くことも可能とし
ている。In addition, if the recognition result using the above approximate pattern is poor, a good pattern can be created by replacing the partial pattern in the corresponding input cover turn based on the information during error correction. It is also possible to change it.

以Ｆ１本発明の原理を詳細に説明しておく。The principle of the F1 invention will now be explained in detail.

第１図は国立国語研究所で調査した、現代雑誌９０誌の
語粟の使用頻度を調査したデータを基に、その中に出現
するＶＣＶ単位を使用頻度順に並べ、その累積頻度を示
したものである。ここでＶ（母音）としては、１ア１，
１イ１，１つ１，１工１゜１第１の５母音と撥音ＩＮ１
、促音ＩＱ＋および語頭・語尾の外の無音声区間を母音
扱いとしておｐ、ｖｃｖ単位の種類の総数は約７８０種
に達する。Figure 1 shows the cumulative frequency of the VCV units that appear in the data, arranged in order of frequency of use, based on data surveyed by the National Institute for Japanese Language and Linguistics on the frequency of use of the word millet in 90 contemporary magazines. It is. Here, V (vowel) is 1a1,
1 I 1, 1 1, 1 G 1゜ 1 first 5 vowels and pellicle IN1
, consonant IQ+, and non-speech intervals outside the beginning and end of words are treated as vowels.The total number of types in p, vcv units reaches approximately 780 types.

ところで、第１図によれば上述の約７８０種のＶＣ■Ｃ
Ｖの使用頻度上位２００種で、全体の約８９％をカバー
していることがわかる。そこで、この上位２００種の■
ＣＶ単位を標準パターンとして登録し、他の■ＣＶ単位
は、上記登録した標準パターンの部分パターンを装置内
で組合わせて近似パターンを作成・登録し使用するよう
にしたものである。上記組合わせは、例えば、使用頻度
の高いｌ　ａｋｕｌのｌ　ｋｕｌ　　と同じく使用頻度
の高い１ｅｋｉ　１の１　ｅｋＩから、使用頻度の低い
ＶＣＶパターンであるｌ　ｅｋｕ　１のパターンを合成
するものである。By the way, according to Figure 1, about 780 types of VC
It can be seen that the top 200 most frequently used types of V cover approximately 89% of the total. Therefore, these top 200 types■
The CV unit is registered as a standard pattern, and the other CV units are created and registered to use approximate patterns by combining partial patterns of the registered standard pattern in the apparatus. The above combination is, for example, to synthesize the pattern of l eku 1, which is a VCV pattern of infrequently used, from l kul of l akul, which is frequently used, and 1 ekI of 1eki 1, which is also frequently used.

[Embodiments of the invention]

以下、本発明の実施例を図面に基づいて詳細に説明する
。Embodiments of the present invention will be described in detail below with reference to the drawings.

第２図は本発明の一実施例を示す音声認識装置のブロッ
ク図である。図において、２は音声分析部、３はバック
アメモリ、４は標準パターンメモリ、５はパターン・マ
ツチング部（以下、単に「マツチング部」という）、６
は入力表示部、７はメインメモリ、８は制御部、９はフ
レキシブル・ディスクメモリを示している。上記各構成
要素自体は従来から知られているものであり、メインメ
モリ７中には、前述の出現頻度の高い標準パターン（以
下、「高使用頻度ＶＣＶ情報」という）が格納されてい
る。FIG. 2 is a block diagram of a speech recognition device showing one embodiment of the present invention. In the figure, 2 is a voice analysis section, 3 is a backup memory, 4 is a standard pattern memory, 5 is a pattern matching section (hereinafter simply referred to as "matching section"), 6
7 indicates an input display section, 7 a main memory, 8 a control section, and 9 a flexible disk memory. The above-mentioned components themselves are conventionally known, and the main memory 7 stores the above-mentioned frequently appearing standard patterns (hereinafter referred to as "highly used VCV information").

以下、本実施例装置の動作について説明する。The operation of the apparatus of this embodiment will be explained below.

まず、入力すべき高使用頻度ＶＣＶ情報の発声登録を次
のようにして行う。制御部８の制御により、前記メイン
メモリ７中の高使用頻度ＶＣＶ情報に従って、入力すべ
きＶＣＶの発声を、入力表示部６のガイダンス部を通し
て使用者に指示する。First, voice registration of frequently used VCV information to be input is performed as follows. Under the control of the control section 8, the user is instructed through the guidance section of the input display section 6 to utter the VCV to be input according to the frequently used VCV information in the main memory 7.

使用者が上記指示に従って、指定ｖｃ■音声１を入力す
ると、分析部２で特徴抽出が行われ、入カバソファメモ
リ３を介して、メインメモリ７に入力される。入力され
たパターンは、標準パターンとして必要な範囲を制御部
８の指示にょシ切出され、標準パターンメモリ４に登録
される。When the user inputs the designated VC ■ voice 1 in accordance with the above instructions, the analysis section 2 performs feature extraction, and inputs it into the main memory 7 via the input cover sofa memory 3 . The input pattern is cut out according to the instructions of the control unit 8 in a necessary range as a standard pattern, and is registered in the standard pattern memory 4.

高使用頻度パターンが上記手順に従いすべて入力登録さ
れると制御部８は低使用頻度パターンを、既登録の高使
用頻度パターンの分割１組合わせによシ作成し、標準パ
ターンメモリ４に登録する。When all of the frequently used patterns have been input and registered according to the above procedure, the control section 8 creates a lowly used pattern by combining one division of the already registered frequently used patterns and registers it in the standard pattern memory 4.

上記分割は、一般に、子母は母音に比し、パワーが小さ
いことから、パワーの時間的ディップ位置を検出するこ
とによって、容易にＶＣＶパターンをＶＣとＣＶに分割
できるというものである。上述の如くして作成された各
標準パターンは、必要に応じて、フレキシブル・ディス
クメモリ９に格納し、後日再使用することが可能である
。The above-mentioned division is based on the fact that the power of a vowel is generally lower than that of a vowel, so by detecting the temporal dip position of the power, the VCV pattern can be easily divided into VC and CV. Each standard pattern created as described above can be stored in the flexible disk memory 9 and reused at a later date, if necessary.

認識は次のようにして行う。入力１は分析部２において
特徴抽出され、一時バッファメモリ３に格納された後、
マツチング部５において、前記各標準パターンとマツチ
ングされる。標準パターンとの対応区間を検出しながら
連続入力音声と標準パターンとのマツチングを行うには
、例えば、本出願人が先に実願昭５５−６６２９６号「
音声認識装置」として提案した如き装置が有効に利用し
得る。Recognition is performed as follows. Input 1 is subjected to feature extraction in analysis unit 2 and stored in temporary buffer memory 3, and then
In the matching section 5, the pattern is matched with each of the standard patterns. In order to match continuous input audio with the standard pattern while detecting the corresponding interval with the standard pattern, for example, the present applicant first proposed Utility Model Application No. 1983-66296 ``
A device such as that proposed as a "speech recognition device" can be effectively used.

上記マツチング結果は制御□□部８に取込まれる。The above matching result is taken into the control section 8.

制御部８は、マツチング結果より、入力音声を構成する
音韻列を認識し、文字列として入力表示部６に表示する
。使用者はその表示結果を見て、誤りがあれば入力表示
部６よシ訂正情報を入力する。The control unit 8 recognizes a phoneme string that constitutes the input speech from the matching result, and displays it on the input display unit 6 as a character string. The user looks at the displayed results, and if there is an error, inputs correction information through the input display section 6.

制御部８は誤り訂正入力があった場合、入力バッファメ
モリ３中の対応する部分を切出し、登録パターンとの差
替えを行う。When there is an error correction input, the control section 8 cuts out the corresponding portion in the input buffer memory 3 and replaces it with the registered pattern.

ここで、正規に登録した標準パターンを用いた場合の認
識率（正しく認識できる割合）は９５％以上、であるの
に対し、分割２組合わせによる近似パターンを用いた場
合の認識率は８５％程度となることが実験的に確認され
ている。とれにより、誤って認識したために、新たに標
準パターンとして追加登録すべきｖＣｖパターンの比率
は、程度、個数にして９０種程度で良いことになる。Here, the recognition rate (proportion of correct recognition) when using the officially registered standard pattern is over 95%, while the recognition rate when using the approximate pattern by combining two divisions is 85%. It has been experimentally confirmed that the The ratio of vCv patterns that have been erroneously recognized and should be newly registered as standard patterns can be approximately 90 types in terms of degree and number.

更に、誤りを生じたものに対しては、使用しながらその
誤シ訂正情報により標準パターンを修正して行く。本出
願人が特願昭５７−１９７５１１号「音声認識装置」に
提案した方式と組合わせれば、装置を使用しながら順次
近似して行けば比較的容易に良好な標準パターンを登録
して行くことが可能である。Furthermore, if an error occurs, the standard pattern is corrected using the error correction information while being used. If combined with the method proposed by the present applicant in Japanese Patent Application No. 57-197511 "Speech Recognition Device", it is possible to register a good standard pattern relatively easily by sequentially approximating it while using the device. is possible.

また、初期登録時の上記２００個の標準パターンのみで
も、前記出現頻度を考慮した認識率は０．８９Ｘ０．９
５＋０．１１Ｘ０．８５＝０．９４　（％）を傅ること
ができることは言うまでもない。In addition, even with only the above 200 standard patterns at the time of initial registration, the recognition rate considering the frequency of appearance is 0.89 x 0.9
Needless to say, it is possible to satisfy 5+0.11X0.85=0.94 (%).

なお、初期登録を行った標準パターンの中に、他のパタ
ーンを作るのに８姿な部分パターンがない場合には、単
音節Ｃ■のみ別途登録したり、一部ＶＣｖを追加したり
、あるいは一部のＶＣに関しては同一調音様式のＣを有
する他のパターンで代用すること等も可能である。上記
代用パターンによシ発生したパターンを入カバターンと
差替えることも可能である。In addition, if there are no 8-figure partial patterns in the initial registered standard pattern to create other patterns, you may need to register only the monosyllable C■ separately, add some VCv, or For some VCs, it is also possible to substitute other patterns having C of the same articulatory style. It is also possible to replace the pattern generated by the above-mentioned substitute pattern with the input cover pattern.

上記実施例においては、撥音や促音を母音に含めたが、
これらを別に扱い、ＶＮＣＶ、ＶＱＣＶのパターンを扱
うようにする等の変形を行っても良いことは言うまでも
ないことである。In the above example, the vowel sounds and consonants were included in the vowels, but
It goes without saying that it is possible to handle these separately and to perform modifications such as handling VNCV and VQCV patterns.

〔Effect of the invention〕

以上述べた如く、本発明によれば、予め登録された標準
パターンとの照合を行って音声認識を行う連続音声認識
方式において、実用上出現頻度の高い限られた数の標準
パターンのみを登録し、出現頻度の低いパターンに対し
ては、上で登録された出現頻度の高いパターンの部分パ
ターンを組合わせて、近似パターンを作成・登録するよ
うにしたので、標準パターン登録時における使用者の負
担を軽減した連続音声認識方式を実現できるという顕著
な効果を奏するものである。As described above, according to the present invention, in a continuous speech recognition method that performs speech recognition by comparing with standard patterns registered in advance, only a limited number of standard patterns that appear frequently in practical use are registered. , for patterns that appear less frequently, an approximate pattern is created and registered by combining the subpatterns of the patterns that appear more frequently registered above, reducing the burden on the user when registering standard patterns. This has the remarkable effect of being able to realize a continuous speech recognition method with reduced noise.

[Brief explanation of the drawing]

第１図は連続音声中の■Ｃ■単位の累積出現頻度を示す
グラフ、第２図は本発明の一実施例を示す音声認識装置
のブロック図である。１・・・入力、２・・・音声分析部、３・・・バックア
メモリ、４・・・標準パターンメモリ、５・・・マツチ
ング部、６・・・入力表示部、７・・・メインメモリ、
８・・・制御部、Ｍ　１　口 ζ２り預　２　図FIG. 1 is a graph showing the cumulative appearance frequency of units of ■C■ in continuous speech, and FIG. 2 is a block diagram of a speech recognition apparatus showing an embodiment of the present invention. 1... Input, 2... Voice analysis section, 3... Backer memory, 4... Standard pattern memory, 5... Matching section, 6... Input display section, 7... Main memory ,
8...Control unit, M 1 Port ζ 2 Deposit 2 Figure

Claims

[Claims] At least a voice analysis section and a standard pattern storage section. In the continuous speech recognition method, which has a pattern matching section and an input display section, and performs speech recognition by matching with standard nodules registered in advance in the standard pattern storage section, the method is stored in the standard pattern storage section. Standard patterns are limited to those that appear frequently in practice.
For other patterns that appear less frequently, create/approximate patterns that combine the above registered patterns or parts of them.
A continuous speech recognition method characterized by registration.