JPS6211898A

JPS6211898A - Voice recognition equipment

Info

Publication number: JPS6211898A
Application number: JP60150179A
Authority: JP
Inventors: 義典北原; 薮内　繁; 大島　義光; 正博阿部; 武市　宣之; 遠藤　裕英
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1985-07-10
Filing date: 1985-07-10
Publication date: 1987-01-20

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】〔発明の利用分野〕本発明は、単語・文節単位で発声させた音声を認識し、
言語において出現しうる２連・３連等の連音節パタンを
記憶した辞書を用いて、複数候補系列の中より正しい系
列を選択する方式に係り、該方式の性能改良に関する。[Detailed Description of the Invention] [Field of Application of the Invention] The present invention recognizes speech uttered in units of words and phrases,
The present invention relates to a method for selecting a correct sequence from among a plurality of candidate sequences using a dictionary that stores syllable patterns such as doublets and triplets that may appear in a language, and relates to improving the performance of the method.

[Background of the invention]

従来の方式は、情処全大ＩＨ−６（１９８３−３）　。 The conventional system is Information Center IH-6 (1983-3).

音響学会講論集２−７−４　（１９８２−１０）記載の
ように、日本語において出現し得る２連・３連等の連音
節パタンを記憶した辞書を用いて、音声認識より得られ
た複数候補系列より正しい系列を選択するものであった
。しかし、日本語において出現し得る全ての連音節パタ
ンを収集することは困難で、新造語等に対する性能につ
いては配慮されていなかった。As described in Proceedings of the Acoustical Society of Japan 2-7-4 (October 1982), plural syllables obtained by speech recognition are The correct sequence was selected from candidate sequences. However, it is difficult to collect all the consecutive syllable patterns that can appear in Japanese, and the performance with newly coined words has not been considered.

[Purpose of the invention]

本発明の目的は、新造語に対しても高い認識性能が得ら
れる音節認識装置を提供することにある。An object of the present invention is to provide a syllable recognition device that can obtain high recognition performance even for newly coined words.

[Summary of the invention]

上記目的を達成するために、本発明では入力音声の認識
結果が正しくない場合に正解系列を入力し、該入力系列
を入力し、該入力系列を２連・３連等の連音節パタンに
分解して該連音節バタンと既に辞書として記憶しである
連音節バタンと照合し、辞書中に連音節バタンか存在し
なければ、該連音節パタンを辞書に追加登録する。In order to achieve the above object, the present invention inputs a correct answer sequence when the recognition result of input speech is incorrect, inputs the input sequence, and decomposes the input sequence into continuous syllable patterns such as doubles and triplets. Then, the connected syllable pattern is compared with the connected syllable pattern already stored in the dictionary, and if the connected syllable pattern does not exist in the dictionary, the connected syllable pattern is additionally registered in the dictionary.

[Embodiments of the invention]

以下、本発明の一実施例を、図を用いて説明する。第１
図は、本発明の詳細な説明するための音声認識装置の一
例のブロック図である。入力された単語／文節音声は、
音響認識部１にて音韻／音節標準パタン３との照合が行
なわれ、各音節複数候補のラティスを形成し出力する。An embodiment of the present invention will be described below with reference to the drawings. 1st
The figure is a block diagram of an example of a speech recognition device for explaining the present invention in detail. The input word/phrase audio is
The acoustic recognition unit 1 performs a comparison with the phoneme/syllable standard pattern 3, and forms and outputs a lattice of multiple candidates for each syllable.

音響認識部１の実施例については、公知の技術（例えば
、矢島他３「日本語音声入力装置の認識方式」信学全大
１６０２　（昭６Ｏ−３）参照）を用いることができる
。For the embodiment of the acoustic recognition unit 1, a known technique (for example, see Yajima et al., ``Recognition Method for Japanese Speech Input Device'', IEICE University, 1602 (Sho 6O-3)) can be used.

第２図は、音響認識部１が文節「カンガエテイルＪを認
識した場合の候補ラティスの一例である。FIG. 2 is an example of a candidate lattice when the acoustic recognition unit 1 recognizes the phrase "Kangae Tail J."

候補ラティスは、候補音節と音響類似度とから構成され
る。音響類似度は、値が大きい程入力音節と音響的に類
似していることを示す。各音節は各容土から音響類似度
の高い順に並んでいる。この候補ラティスは、候補系列
選択部２に送られる。The candidate lattice is composed of candidate syllables and acoustic similarities. The acoustic similarity indicates that the larger the value, the more acoustically similar the input syllable is. Each syllable is arranged in descending order of acoustic similarity from each content. This candidate lattice is sent to the candidate sequence selection section 2.

このラティスから音節を組合せて生成される候補文節系
列は６４００通り存在し、系列としての音響類似度の和
の上位１０位までを示したものが第３図（ａ）である。There are 6,400 candidate phrase sequences generated by combining syllables from this lattice, and FIG. 3(a) shows the top 10 in terms of sum of acoustic similarities as sequences.

この段階においては、正解系列「カンガエテイル」は第
６番目になっている。候補系列選択部２では、これら複
数系列に対して、連音節辞書４を用いて候補系列を絞る
。この連音節辞書４には、第３図（ｂ）のように「カン
」とか「エテ」のように日本語において出現し得る２連
音節が記憶されている。これらは文節頭・中・尾のよう
に位置別に記憶しておいてもよい。連音節は２連に限定
されることなく、ｎ連（ｎ≧２）でよい。日本語におい
て出現し得ない「テピ」や「ガペ」のような連音節パタ
ンは連音節辞書４には記憶されてはいない。まず第３図
（ａ）の候補系列について、１系列ずつを連音節に分解
する。At this stage, the correct answer series "Kangae Tail" is the 6th. The candidate sequence selection unit 2 narrows down the candidate sequences from these multiple sequences using the syllable dictionary 4. As shown in FIG. 3(b), this continuous syllable dictionary 4 stores two continuous syllables that can appear in Japanese, such as "kan" and "ete". These may be stored by position, such as beginning, middle, and end of the phrase. The number of consecutive syllables is not limited to two consecutive syllables, but may be n consecutive syllables (n≧2). Continuous syllable patterns such as "tepi" and "gape" which cannot appear in Japanese are not stored in the concatenated syllable dictionary 4. First, each candidate sequence in FIG. 3(a) is decomposed into continuous syllables.

例えば、「カンガエテピル」は、「カン」、「ンガ」、
「ガニ」、「エテ」、［テピ」、「ピル」のように分解
される。次に、これら分解された６種類の連音節が、連
音節辞書４の各項目と照合し、６種類全てが連音節辞書
４中に存在すれば、該候補移列は音響類似度和とともに
候補系列出力用メモリ２′に記憶され、他方、６種類の
うち１種類でも連音節辞書中に存在しないものがあれば
、該候補系列は候補系列出力用メモリ２′に記憶されな
い。以上の動作により、第３図（ｃ）の３系列が、認識
候補として音響類似度和とともに候補系列出力用メモリ
２′に記憶され、候補系列出力用メモリ２′の内容はデ
ィスプレイ装置５に表示される。For example, "Kangaetepir" means "Kan", "Nga",
It is broken down into ``gani'', ``ete'', ``tepi'', and ``pil''. Next, these six types of decomposed continuous syllables are compared with each item in the continuous syllable dictionary 4, and if all six types are present in the continuous syllable dictionary 4, the candidate transfer is performed together with the acoustic similarity sum. On the other hand, if even one of the six types does not exist in the continuous syllable dictionary, the candidate sequence is not stored in the candidate sequence output memory 2'. Through the above operations, the three sequences shown in FIG. 3(c) are stored as recognition candidates together with the acoustic similarity sum in the candidate sequence output memory 2', and the contents of the candidate sequence output memory 2' are displayed on the display device 5. be done.

ところで、入力音声に新造語が含有されている場合、連
音節辞書４に記憶されている連音節バタンか充分でなく
、候補系列選択部２からの出力に正解系列が含まれない
ことがある。第４図（ａ）は、「ワードプロセッサハ」
を認識したときの音響認識部１の出力である候補ラティ
スより作成した候補系列の例である。また、第４図（ｂ
）は、連音節辞書４の一例である。該辞書を用いた場合
、候補系列選択部２の出力は、第４図（ｃ）のようにな
り、この中には正解系列「ワードプロセッサハ」は含有
されていない。この原因は、連音節辞書４の中に連音節
「ドブ」が存在しないことにあり、ｒドブ」を登録しな
ければ、［ワードプロセッサハ」が認識されることはな
い。これを解決するために、候補系列選択部２の出力系
列がディスプレイ装置５に表示される際に、表示系列中
に正解系列が含有されていなければ、キーボード７等の
手段を用いて、正解系列を入力する。連音節学習登録部
６は、入力された該正解系列を連音節に分解する。先の
例では、「ワー」、「−ド」。By the way, when the input speech contains a newly coined word, the number of consecutive syllables stored in the consecutive syllable dictionary 4 may not be sufficient, and the output from the candidate sequence selection section 2 may not include the correct answer sequence. Figure 4(a) shows the word processor
This is an example of a candidate series created from a candidate lattice that is the output of the acoustic recognition unit 1 when recognizing the following. In addition, Fig. 4 (b
) is an example of the continuous syllable dictionary 4. When this dictionary is used, the output of the candidate sequence selection unit 2 is as shown in FIG. 4(c), which does not include the correct sequence "word processor HA". The reason for this is that the continuous syllable "dobu" does not exist in the continuous syllable dictionary 4, and unless "r dobu" is registered, "word processor ha" will not be recognized. In order to solve this problem, when the output series of the candidate series selection section 2 is displayed on the display device 5, if the correct series is not included in the displayed series, the correct series is selected using a means such as the keyboard 7. Enter. The continuous syllable learning registration unit 6 breaks down the input correct answer sequence into continuous syllables. In the previous example, "word" and "-do".

「ドブ」、「プロ」、「ロセ」、「セラ」、「ツサ」、
「サバ」という８種の２連音節に分解される。次に、上
記８種の入力連音節を１つずつ入力連音節用バッファメ
モリ６′にロードし、また連音節辞書４の中の項目を］
一つずつ辞書連音節用バッファメモリ６“にロードし、
入力連音節用バッファメモリ６′と辞書連音節用バッフ
ァメモリ６′の内容を比較照合する。両内容が一致しな
ければ、連音節辞書４の中の次項目を辞書連音節用バッ
ファメモリ６′にロードして入力連音節用バソファメモ
リ６′と比較照合する。両内容が一致すれば次の入力連
音節を入力連音節用バッファメモリ６′にロードし、ま
た、連音節辞書４の中の項目を１つずつ辞書連音節用バ
ッファメモリ６′にロードし、両バッファメモリを比較
照合する。"Dobu", "Pro", "Rose", "Sera", "Tssa",
It is broken down into eight types of disyllables: ``Saba.'' Next, the eight types of input continuous syllables mentioned above are loaded one by one into the input continuous syllable buffer memory 6', and the items in the continuous syllable dictionary 4 are loaded]
Load them one by one into the dictionary concatenated syllable buffer memory 6",
The contents of the input continuous syllable buffer memory 6' and the dictionary continuous syllable buffer memory 6' are compared and verified. If the two contents do not match, the next item in the continuous syllable dictionary 4 is loaded into the dictionary continuous syllable buffer memory 6' and compared with the input continuous syllable buffer memory 6'. If the two contents match, the next input syllable is loaded into the input syllable buffer memory 6', and the items in the syllable dictionary 4 are loaded one by one into the dictionary syllable buffer memory 6'. Compare and check both buffer memories.

以上の手続きを繰り返し、連音節辞書４の中の全項目と
一致しなかった入力連音節は、連音節辞書４に新項目と
して追加登録される。上記の例では、「ドブ」が新項目
として追加登録される。The above procedure is repeated, and input continuous syllables that do not match all items in the continuous syllable dictionary 4 are additionally registered as new items in the continuous syllable dictionary 4. In the above example, "Gutter" is additionally registered as a new item.

〔Effect of the invention〕

以上説明したように、本発明によれば、音響認識候補文
字列の中から連音節辞書を用いて日本語として存在し得
る系列のみを選択し、新語・造語に対しても、連音節辞
書の項目を学習することによって、正しい候補系列を出
力することができる。As explained above, according to the present invention, only sequences that can exist as Japanese are selected from acoustic recognition candidate character strings using a concatenated syllable dictionary. By learning the items, correct candidate sequences can be output.

[Brief explanation of the drawing]

第１図は本発明の一実施例の全体構成図、第２図は文節
「カンデエティル」を入力した時の音響認識部出力候補
ラティスの例、第３図（ａ）は文節「カンガエテイル」
を入力した時の候補ラティスより得られる音節の組合せ
の系列のうち音響類似度の総和の上位１０系列、同図（
ｂ）は連音節辞書の一例、同図（ｃ）は候補系列選択部
より出力され、メモリに記憶される候補系列、第４図（
ａ）、（ｂ）、（ｃ）は各々文節［ワードプロセッサハ
」を入力した時の第３図（ａ）、（ｂ）。（ｃ）に対応するものである。１・・・音響認識部、２・・・候補系列選択部、２′・
・・候補系列出力用メモリ、３・・・音韻／音節標準バ
タン、４・・・連音節辞書、５・・・ディスプレイ装置
、６・・・連音節学習登録部、６′・・・入力連音節用
バッファメモリ、６“・・・辞書連音節用パップアメモ
リ、７・・・キーボード。 ■２図ｖ３３図（矢）（ｂ）第　４１２］” （幻　　　　　　゛（Ｃ）Fig. 1 is an overall configuration diagram of an embodiment of the present invention, Fig. 2 is an example of the output candidate lattice of the acoustic recognition unit when the phrase ``Kandaeteil'' is input, and Fig. 3(a) is the phrase ``Kangaeteil''.
Among the series of syllable combinations obtained from the candidate lattice when inputting
b) is an example of a continuous syllable dictionary, FIG.
a), (b), and (c) are the results of FIGS. 3 (a) and (b) when the phrase [word processor HA] is input, respectively. This corresponds to (c). 1... Acoustic recognition section, 2... Candidate sequence selection section, 2'.
... memory for candidate series output, 3... phoneme/syllable standard button, 4... continuous syllable dictionary, 5... display device, 6... continuous syllable learning registration unit, 6'... input sequence Buffer memory for syllables, 6 "... Puppy memory for dictionary connected syllables, 7... Keyboard. ■Figure 2 v Figure 33 (arrow) (b) No. 412]" (phantom ゛ (C)

Claims

[Claims]

A dictionary that recognizes speech uttered in units of words and phrases in units of phonemes or syllables, and stores the correct sequence from multiple recognition candidate sequences, as well as consecutive syllable patterns such as doublets and triplets that may appear in the language. If the input voice is not correctly converted into a character code in a speech recognition device that selects and converts it into a character code or similar code, the correct answer sequence is input, and the input sequence is converted into a double, triple, etc. If the continuous syllable pattern does not exist in the dictionary, if the continuous syllable pattern does not exist in the dictionary,
A speech recognition device characterized in that the continuous syllable pattern is additionally registered in a dictionary.