JPH08248980A

JPH08248980A - Voice recognition device

Info

Publication number: JPH08248980A
Application number: JP7070436A
Authority: JP
Inventors: Mitsuhisa Kamei; 光久亀井; Akihiko Hayakawa; 明彦早川
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1995-03-06
Filing date: 1995-03-06
Publication date: 1996-09-27

Abstract

PURPOSE: To easily modify the contents of a dictionary without recompiling the syntax rules. CONSTITUTION: The device is provided with a syntax rules storage means 7 which stores the syntax rules, that sets the grammar for acceptable voices, in a compiled form, a dictionary storage means 5 which is independently constituted from the means 7 and at least records the acceptable words in terms of phoneme string inscription, output inscription and word classifications, a voice piece recognition means 2 which recognizes the inputted voices in terms of phoneme piece units, a grammar collating means 3 which refers to the recorded contents of the means 5 and 7 and collates whether the rows of the phoneme pieces are grammatically acceptable or not, an inputting means 12 which inputs word editing instructions for the means 5 from an external means and a dictionary control means 11 which modifies the recorded contents of the means 5 based on the instructions from the means 12. Thus, the contents of the dictionary are edited by the means 11 without attaching the syntax rules.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、発声された音声を辞書
や構文規則等の言語的制約を参照して認識する音声認識
装置に関し、特に、構文規則の書き換えを行うことなし
に、外部からの指示によって辞書の内容変更を容易に行
えるようにした音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus for recognizing uttered speech by referring to linguistic restrictions such as a dictionary and syntax rules, and in particular, from the outside without rewriting the syntax rules. The present invention relates to a voice recognition device capable of easily changing the contents of a dictionary according to the instruction.

【０００２】[0002]

【従来の技術】利用者が発声した音声を文法等の言語的
制約を参照して機械的に認識する音声認識装置は、文
節、連文節、文、連文等の単位で発声した音声でワード
プロセッサへの文書入力を行ったり、コマンド列等の単
位で発声した音声でデータベースシステムへの検索式の
入力を行うといった種々の分野での応用が図られてい
る。音声認識は、大きく分けて音素片認識処理と言語認
識処理の二つの処理の組合せによって実現される。音素
片認識処理は音声の波形信号から音素、音節、半音節等
の音素片の列に対応した記号列を生成する処理であり、
言語認識処理は音声認識処理で出力された記号列が言語
としてどのように構成されているのかを解析する構文解
析や、記号列の統計的な発生のし易さを判定する処理、
記号列がどのような意味を持っているのかを解析する意
味解析等を行う処理である。2. Description of the Related Art A voice recognition device for mechanically recognizing a voice uttered by a user with reference to linguistic restrictions such as grammars, etc. Has been applied to various fields such as inputting a document or inputting a search expression to a database system with a voice uttered in a unit of a command string or the like. Speech recognition is realized by roughly combining two processes, a phoneme unit recognition process and a language recognition process. The phoneme segment recognition process is a process of generating a symbol string corresponding to a sequence of phonemes, such as phonemes, syllables, and semisyllabic segments, from a waveform signal of speech,
The language recognition processing is a syntax analysis that analyzes how the symbol string output by the speech recognition processing is configured as a language, or a process that determines the statistical ease of occurrence of the symbol string,
This is a process of performing a semantic analysis or the like to analyze what meaning a symbol string has.

【０００３】ここで、精度の高い連続音声認識を行うた
めには、言語認識処理において単語と単語の係り受け関
係等の文としての構成を解析する構文解析が必要不可欠
である。このような構文解析において、特に、数百語等
といった或る程度以上の語彙数の音声を認識する場合に
は、音素片認識処理で生成した記号列が多数になるこ
と、構文解析の間に複数の候補を取り扱う必要があるこ
と、辞書等の参照すべき規則も増えること等から膨大な
計算処理を行わなければならなくなる。そのため、音声
認識を実時間で行うためには、構文解析法を高速で行う
必要がある。Here, in order to perform highly accurate continuous speech recognition, syntactic analysis for analyzing the structure of a word as a sentence such as the dependency relation between words is essential in language recognition processing. In such parsing, especially when recognizing a speech with a certain number or more of vocabulary such as hundreds of words, the number of symbol strings generated by the phoneme segment recognition process becomes large. Since it is necessary to handle a plurality of candidates and the number of rules such as dictionaries to be referred to increases, a huge amount of calculation processing must be performed. Therefore, in order to perform speech recognition in real time, it is necessary to perform a parsing method at high speed.

【０００４】高速に構文解析を行うためには、解析のた
めの規則である文法を予めコンパイルしておくことが有
効であり、これによって、規則がパーザ（構文解析器）
にとって参照し易い形式となり、更に、解析処理におい
てその時に必要な規則だけを効率的に参照できるように
なって、構文解析の処理が高速に行えるようになる。特
開平２−１１３２９７号公報には、音素片認識処理と言
語認識処理の組合せによって音声を認識する方式とし
て、その中で、構文規則をコンパイルしたＬＲテーブル
を用いてＬＲパーザにより構文解析を行う発明が提案さ
れている。ＬＲテーブルは構文規則を予めコンパイルし
たアクションテーブルとＧｏ−ｔｏテーブルと呼ばれる
二つの動作表から構成され、ＬＲパーザは解析する文を
文頭側から順次読み、ＬＲテーブルの指示に従って解析
状態を蓄えておくスタックを変更しながら解析を進め
る。（野村浩郷著「自然言語処理の基礎技術」、社団法
人電子情報通信学会発行、参照）In order to perform high-speed parsing, it is effective to compile a grammar, which is a rule for parsing, in advance, so that the rule is a parser (syntax analyzer).
It becomes a format that is easy to refer to, and moreover, it becomes possible to efficiently refer to only the rules required at that time in the parsing process, and the parsing process can be performed at high speed. Japanese Unexamined Patent Publication No. 2-113297 discloses a method of recognizing speech by a combination of phoneme segment recognition processing and language recognition processing, in which parsing is performed by an LR parser using an LR table in which syntax rules are compiled. Is proposed. The LR table is composed of an action table in which syntax rules are compiled in advance and two action tables called a Go-to table. The LR parser sequentially reads the sentence to be analyzed from the beginning of the sentence and stores the analysis state according to the instructions of the LR table. Advance the analysis while changing the stack. (See Hirogo Nomura, "Basic Technology of Natural Language Processing," published by The Institute of Electronics, Information and Communication Engineers, Japan)

【０００５】[0005]

【発明が解決しようとする課題】ＬＲパーザに代表され
るように、構文規則等の文法を予めコンパイルしておく
音声認識装置においては、新たに単語を登録する必要が
生じても、この単語登録は容易ではなく登録処理のため
に長時間を要してしまう。すなわち、新たな単語をコン
パイルした形式の文法にそのまま登録することができな
いため、新たな単語はコンパイルする前の文法に登録
し、この後に文法を再びコンパイルする必要が生じるた
めである。この文法のコンパイル作業には、文法が複雑
になればなるほど長時間を要するため、新たに単語を登
録した後に直ちに音声認識を再開することは不可能で、
音声認識処理の効率が大幅に低下してしまうという問題
があった。In a voice recognition device in which a grammar such as a syntax rule is previously compiled as represented by an LR parser, even if a new word needs to be registered, this word registration is required. Is not easy and requires a long time for the registration process. That is, it is necessary to register a new word in the grammar before compiling and to compile the grammar again after that because the new word cannot be registered as it is in the grammar of the compiled form. The more complicated the grammar is, the longer it takes to compile this grammar, so it is impossible to restart speech recognition immediately after registering a new word.
There is a problem that the efficiency of the voice recognition processing is significantly reduced.

【０００６】一方、文法をコンパイルしないで構文解析
を行うようにすれば、新たな単語の迅速な登録が可能で
あるが、その反面、構文解析の処理に長時間を要してし
まうこととなる。そもそも構文解析を利用する目的が、
複雑な文法に従った音声認識処理を迅速に実現するため
であるので、構文解析の処理に長時間を要しては音声認
識の実時間処理を実現せんとする本来の目的が没却され
てしまう。したがって、従来にあっては、構文解析が必
要な複雑な文法に従った音声認識を迅速に実現しつつ、
新たな単語の登録更には既に登録されている単語の削除
といった単語の編集処理を迅速に行うことができる音声
認識装置は実現されていなかった。On the other hand, if the syntax analysis is performed without compiling the grammar, a new word can be registered quickly, but on the other hand, it takes a long time to process the syntax analysis. . In the first place, the purpose of using parsing is
This is to quickly realize speech recognition processing according to a complicated grammar, so the original purpose of realizing real-time processing of speech recognition if the parsing processing takes a long time is destroyed. I will end up. Therefore, in the past, while quickly realizing speech recognition according to a complicated grammar that requires parsing,
A voice recognition device has not been realized that can quickly perform a word editing process such as registration of a new word or deletion of a word that has already been registered.

【０００７】本発明は上記従来の事情に鑑みなされたも
ので、構文解析を迅速に実現しつつ、単語を記録する辞
書への編集処理を迅速に行うことができる音声認識装置
を提供することを目的とする。特に、請求項２の音声認
識装置は、ＬＲパーザを行う音声認識装置において、迅
速なる辞書への編集処理を実現することを目的とする。
また、特に、請求項３の音声認識装置は、単語の品詞や
種類等といった単語の分類毎に先頭となる音素片の候補
を予め辞書に記録しておき、この音素片の候補に基づい
て音素片認識処理を行う音声認識装置において、迅速な
る辞書への編集処理を実現することを目的とする。The present invention has been made in view of the above conventional circumstances, and it is an object of the present invention to provide a speech recognition apparatus capable of quickly performing a syntactic analysis and an editing process to a dictionary for recording words. To aim. In particular, a voice recognition device according to a second aspect has an object to realize a quick dictionary edit process in a voice recognition device that performs an LR parser.
Further, in particular, the speech recognition apparatus according to claim 3 records a phoneme piece candidate at the beginning for each word classification such as a part-of-speech or a type of the word in a dictionary in advance, and based on the phoneme piece candidate, the phoneme piece candidate is stored. It is an object of the present invention to realize a quick dictionary editing process in a voice recognition device that performs one-sided recognition processing.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するた
め、請求項１に記載した音声認識装置は、受理できる音
声の文法を規定するための構文規則をコンパイルした形
式で記憶する構文規則記憶手段と、構文規則記憶手段と
は独立して構成され、受理できる単語を少なくとも音素
列表記、出力表記、単語の分類を記録する辞書記憶手段
と、入力された音声を音素片の単位で認識する音素片認
識手段と、辞書記憶手段と構文規則記憶手段の記録内容
を参照して、音素片の並びを文法上受理できるか照合す
る文法照合手段と、外部からの辞書記憶手段に対する単
語の編集指示を外部から入力する入力手段と、入力手段
からの指示に基づいて辞書記憶手段の記録内容を変更す
る辞書管理手段とを備えたことを特徴とする。In order to achieve the above object, the speech recognition apparatus according to claim 1 stores a syntax rule storing means for storing a syntax rule for defining an acceptable speech grammar in a compiled form. And a dictionary storage means that is configured independently of the syntax rule storage means, records acceptable words at least in the phoneme string notation, output notation, and word classification, and a phoneme that recognizes input speech in units of phonemes. A piece recognition means, a grammar collating means for collating whether the arrangement of phonemes is grammatically acceptable or not by referring to the recorded contents of the dictionary storing means and the syntax rule storing means, and an external word editing instruction to the dictionary storing means. It is characterized in that it comprises an input means for inputting from the outside and a dictionary management means for changing the recorded contents of the dictionary storage means based on an instruction from the input means.

【０００９】また、請求項２に記載した音声認識装置
は、請求項１の音声認識装置において、文法照合手段
は、認識された音素片列に対して辞書記憶手段の記録内
容を参照して対応する単語を確定させ、当該単語に対し
て構文規則記憶手段の記録内容を参照してＬＲパーザに
より文法的な照合を行うとともに当該単語の分類に基づ
いて後続する単語の分類を予測し、当該予測した単語の
分類を利用して後続する単語を確定させる処理を行うこ
とを特徴とする。According to a second aspect of the present invention, in the speech recognition device according to the first aspect, the grammar collating means corresponds to the recognized phoneme string sequence by referring to the recorded contents of the dictionary storing means. The word to be determined is determined, the LR parser performs grammatical matching with reference to the recorded content of the syntax rule storage means for the word, and the classification of the subsequent word is predicted based on the classification of the word, and the prediction is performed. It is characterized in that a process for determining a succeeding word is performed by using the classified word.

【００１０】また、請求項３に記載した音声認識装置
は、請求項１の音声認識装置において、辞書記憶手段に
は、単語の分類毎に先頭となる音素片の候補が予め記録
されており、文法照合手段は、認識された音素片列に対
して辞書記憶手段の記録内容を参照して対応する単語を
確定させ、当該単語に対して構文規則記憶手段の記録内
容を参照してＬＲパーザにより文法的な照合を行うとと
もに当該単語の分類に基づいて後続する単語の分類を予
測し、当該単語の分類に基づいて辞書記憶手段の記録内
容から後続する音素片を予測し、音素片認識手段は、当
該予測された先頭となる音素片を見い出すことにより入
力された音声を認識することを特徴とする。A speech recognition apparatus according to a third aspect of the present invention is the speech recognition apparatus according to the first aspect, in which the dictionary storage means has a phoneme segment candidate at the beginning recorded in advance for each word classification. The grammar matching means refers to the recorded content of the dictionary storage means for the recognized phoneme string sequence to determine the corresponding word, and refers to the recorded content of the syntax rule storage means for the word, and uses the LR parser. Performs grammatical matching and predicts the classification of the following word based on the classification of the word, predicts the succeeding phoneme piece from the recorded contents of the dictionary storage means based on the classification of the word, and the phoneme piece recognition means The input voice is recognized by finding the phoneme piece that becomes the predicted head phoneme.

【００１１】ここに、音素片とは音素、音韻、音節、半
音節等の音響的単位をいい、単語の分類とは単語の品詞
や単語の種類等をいう。例えば、データベースシステム
において検索式を音声入力するような場合には、検索式
を構成するコマンド名、拡張子、項目名、題名等といっ
た単語の種類が単語の分類に該当する。また、辞書記憶
手段と構文規則記憶手段とはハードウエア的に別個な記
憶装置で構成するばかりではなく、ハードウエア的に一
体の記憶装置を論理的に分割して構成してもよく、要
は、互いに独立した対象として辞書管理手段がアクセス
できるように構成されていればよい。Here, a phoneme piece refers to an acoustic unit such as a phoneme, a phoneme, a syllable, and a half syllable, and a word classification refers to a word part of speech, a word type, and the like. For example, when a search formula is input by voice in a database system, word types such as command names, extensions, item names, titles, and the like that make up the search formula correspond to word classification. Further, the dictionary storage means and the syntax rule storage means are not limited to being configured by separate storage devices in terms of hardware, but may be configured by logically dividing a storage device that is integrated in terms of hardware. It suffices that the dictionary management means can be accessed as mutually independent objects.

【００１２】なお、音声認識装置へ外部から音声を入力
する手法としては、利用者の音声をマイクによって取り
込むのが一般的であるが、電話、無線等の通信路を介し
て遠隔地から音声を入力することも可能である。また、
音素片の認識処理には、音素モデルとしてＨＭＭ（隠れ
マルコフモデル）を用いた照合処理が精度が比較的高く
処理を高速に行えることから一般的によく利用される
が、ＤＰマッチングやニューラルネット等の照合処理も
利用することができる。また、辞書記憶手段は、単語を
表形式で表現して記録すれば簡単に実現できるが、音素
列を木構造に配列したトライ辞書部を持った辞書に構成
するようにすれば、辞書の参照を効率的且つ高速に行う
ことができるとともに、辞書管理手段による単語の登録
や削除もより迅速に行うことができる。As a method of inputting voice from the outside to the voice recognition device, it is general to capture the voice of the user with a microphone. However, the voice from a remote place can be received via a communication path such as a telephone or a radio. It is also possible to enter. Also,
For phoneme piece recognition processing, collation processing using HMM (Hidden Markov Model) as a phoneme model is generally used because it has relatively high accuracy and can be processed at high speed, but DP matching, neural network, etc. The matching process of can also be used. Further, the dictionary storage means can be easily realized by expressing and recording the words in a tabular format, but if the dictionary storage means is configured as a dictionary having a trie dictionary part in which a phoneme sequence is arranged in a tree structure, the dictionary can be referred to. Can be performed efficiently and at high speed, and words can be registered and deleted by the dictionary management means more quickly.

【００１３】また、構文規則は、テンプレートや有限状
態オートマトン等の表現が可能であるが、文脈自由文法
による表現の方が記述が容易で且つ表現力が豊かである
ので好ましい。また、構文規則記憶手段に格納するコン
パイルした構文規則の表現形式としては、ＬＲテーブル
に表現する方式が検索が迅速に行えて有利であり、特
に、先読み記号を１つ取るＬＲテーブルが構文規則の照
合が容易となって有利である。また、文法照合手段で行
う構文規則との照合処理の方式としては、構文規則をコ
ンパイルした形式に応じて種々な照合方式を採用するこ
とが可能であるが、ＬＲパーザ等のシフト・リデュース
パーザを行う方式が実現が容易であり、特に、先読み記
号を１つ取る拡張ＬＲパーザを行う方式とすれば照合能
力の高さと実現のし易さの両面で優れている。The syntax rule can express a template, a finite state automaton, or the like, but the expression using a context-free grammar is preferable because the description is easy and the expressiveness is rich. Further, as an expression format of the compiled syntax rule stored in the syntax rule storage means, a method of expressing in the LR table is advantageous because the search can be performed quickly, and in particular, the LR table taking one look-ahead symbol is one of the syntax rules. This is advantageous because collation is easy. As the method of matching processing with the syntax rules performed by the grammar matching means, various matching methods can be adopted according to the format in which the syntax rules are compiled, but a shift / reduce parser such as an LR parser is used. The method to be performed is easy to realize, and in particular, the method to perform the extended LR parser that takes one look-ahead symbol is excellent in both high collation ability and easy realization.

【００１４】[0014]

【作用】請求項１の音声認識装置によれば、外部から入
力された音声を、音素片認識手段で音素片の単位で認識
し、この音素片を文法照合手段によって辞書記憶手段に
記録された辞書及び構文規則記憶手段にコンパイルされ
て記録された構文規則とを参照して文法上受理できるか
照合して、認識する。そして、必要に応じて入力手段を
用いて外部からの編集指示を行い、この編集指示に従っ
て辞書管理手段に辞書の内容の編集処理を行わせ、辞書
記憶手段に対して新たな単語の登録や既に登録されてい
る単語の削除といった編集を行う。According to the speech recognition apparatus of the present invention, the speech input from the outside is recognized by the phoneme unit recognition unit in the unit of the phoneme unit, and the phoneme unit is recorded in the dictionary storage unit by the grammar matching unit. The syntactic rule compiled and recorded in the dictionary and the syntactic rule storage means is referred to and recognized by checking whether it is grammatically acceptable. Then, if necessary, an external editing instruction is given using the input means, and the dictionary managing means is caused to edit the contents of the dictionary in accordance with the editing instruction. Edits such as deleting registered words.

【００１５】この編集処理に際して、辞書は構文規則に
比べて構造が簡単であると共に構文規則とは独立した記
憶手段に記録されていることから、構文規則のような長
時間を要するコンパイル処理が不要であり（或いは、コ
ンパイルしたとしても迅速に処理でき）、更に、構文規
則に関しては全く処理を行う必要がない。このため、辞
書の内容を編集した後に直ちに音声認識を再開すること
ができる。したがって、音声認識作業に際して、新たに
認識させたい単語が発生した時や以後は認識が不要にな
った単語が発生した時には、利用者が外部から指示する
ことによって直ちに辞書の内容を変更することができ、
音声認識を効率良く行うことができる。In this editing process, since the dictionary has a simpler structure than the syntax rules and is stored in the storage means independent of the syntax rules, the compilation process which takes a long time like the syntax rules is unnecessary. (Or it can be processed quickly even if compiled), and there is no need to do anything with regard to syntax rules. Therefore, the voice recognition can be restarted immediately after editing the contents of the dictionary. Therefore, in the voice recognition work, when a new word to be recognized occurs or a word that is no longer needed to be recognized thereafter, the contents of the dictionary can be immediately changed by the user's instruction. You can
The voice recognition can be performed efficiently.

【００１６】更に、請求項２の音声認識装置によれば、
上記のような迅速なる辞書の編集処理を実現しつつ、文
法照合手段においてＬＲパーザにより品詞等の単語の分
類を予測した効率的な文法照合処理を行い、総じて効率
の高い音声認識処理を行う。また更に、請求項３の音声
認識装置によれば、文法照合手段においてＬＲパーザに
より単語の種類等といった単語の分類を予測し、更に、
辞書記憶手段に予め記録した単語の分類毎に先頭となる
音素片の候補及び単語ないの音素片のならびの情報から
後続する音素片を予測する。そして、音素片認識手段が
入力された音声のデータからこの予測された音素片を見
い出すことにより、音声データから後続する単語の音素
片を認識する。この場合においても、辞書記憶手段は構
文規則記憶手段とは独立して構成されていることから、
上記のような迅速なる辞書の編集処理も実現される。Further, according to the speech recognition apparatus of claim 2,
While realizing the quick dictionary editing process as described above, the grammar collating means performs the efficient grammar collating process in which the LR parser predicts the classification of words such as parts of speech, and generally the highly efficient speech recognition process. Further, according to the speech recognition apparatus of claim 3, the grammar matching means predicts the word classification such as the word type by the LR parser, and further,
A succeeding phoneme piece is predicted from the information on the leading phoneme piece candidate and the phoneme piece having no word for each word category recorded in advance in the dictionary storage means. Then, the phoneme piece recognition means finds the predicted phoneme piece from the input voice data, and recognizes the phoneme piece of the subsequent word from the voice data. Even in this case, since the dictionary storage means is configured independently of the syntax rule storage means,
The rapid dictionary editing process as described above is also realized.

【００１７】[0017]

【実施例】本発明の第１の実施例に係る音声認識装置を
図面を参照して説明する。なお、本実施例は、迅速なる
辞書の編集処理を実現しつつ、認識された入力音声の音
節の並びをＬＲパーザで単語の品詞を予測することによ
り文法照合する音声認識装置であり、認識結果として得
られた日本語文章をワードプロセッサに入力するもので
ある。DESCRIPTION OF THE PREFERRED EMBODIMENTS A voice recognition device according to a first embodiment of the present invention will be described with reference to the drawings. It should be noted that the present embodiment is a voice recognition device that implements a quick dictionary editing process and at the same time, performs grammatical matching on a sequence of recognized syllables of an input voice by predicting a part of speech of a word by an LR parser. The Japanese sentence obtained as is input to the word processor.

【００１８】音声認識装置は、図１に示すように、外部
から音声を入力するためのマイクロフォン１と、入力さ
れた音声を音節単位で認識して候補としての音節の並び
（音節列）を生成する音素片認識手段２と、生成された
音節列を辞書及び構文規則を参照して文法上受理できる
か照合する文法照合手段３と、辞書４を格納した辞書記
憶手段５と、コンパイルされた形式の構文規則（ＬＲテ
ーブル）６を格納した構文規則記憶手段７と、文法照合
された候補が複数ある場合には連接評価表８を参照して
これら候補を絞り込み処理する候補選定手段９と、認識
結果として得られた文章が入力されるワードプロセッサ
１０と、辞書記憶手段５に記録された辞書４の内容の編
集処理を行う辞書管理手段１１と、ワードプロセッサ１
０や辞書管理手段１１に対するディスプレイ装置やキー
ボード、マウス等の入出力装置を備えた入出力ターミナ
ル１２と、を備えている。As shown in FIG. 1, the voice recognition device recognizes a microphone 1 for inputting voice from the outside and the input voice in syllable units and generates a syllable sequence (syllable string) as a candidate. Phoneme piece recognizing means 2, a grammar collating means 3 for collating whether or not the generated syllable string can be grammatically accepted by referring to a dictionary and syntax rules, a dictionary storing means 5 storing a dictionary 4, and a compiled form. Syntax storing means 7 storing the syntax rules (LR table) 6 and candidate selecting means 9 for narrowing down these candidates by referring to the concatenation evaluation table 8 when there are a plurality of grammatically matched candidates, and recognition. The word processor 10 to which the resulting sentence is input, the dictionary management means 11 for editing the contents of the dictionary 4 recorded in the dictionary storage means 5, and the word processor 1
0 and the input / output terminal 12 having an input / output device such as a keyboard and a mouse for the dictionary management means 11.

【００１９】音素片認識手段２には、音節毎のＨＭＭを
格納した音節ＨＭＭモデル１３と、マイクロフォン１か
らの音声データの電気信号をフレーム毎に周波数解析す
る信号処理手段１４と、周波数解析された音声データか
ら特徴量を抽出する特徴量抽出手段１５と、抽出された
特徴量をＨＭＭ１３に照合して音節毎に認識した候補を
生成する音節照合手段１６と、を備えている。例えば、
図２に示すような認識候補として音節列が、音素片認識
手段２によって生成されて文法照合手段３へ出力され
る。なお、同図中の記号＄は音節列の終端を意味する無
音を示しており、各音節に付記してある数字は音節照合
のスコアを示している。In the phoneme unit recognition means 2, a syllable HMM model 13 storing an HMM for each syllable, a signal processing means 14 for frequency-analyzing the electric signal of the voice data from the microphone 1 for each frame, and a frequency analysis. A feature amount extraction unit 15 for extracting a feature amount from voice data and a syllable collation unit 16 for collating the extracted feature amount with the HMM 13 to generate a candidate recognized for each syllable are provided. For example,
A syllable string as a recognition candidate as shown in FIG. 2 is generated by the phoneme unit recognition unit 2 and output to the grammar matching unit 3. It should be noted that the symbol $ in the figure indicates silence which means the end of the syllable string, and the number attached to each syllable indicates the score of syllable matching.

【００２０】ここで、本実施例では、上記した辞書４、
ＬＲテーブル６、連設評価表８及びＨＭＭモデル１３
は、記憶装置１７にそれぞれ領域を分けて格納されてお
り、特に、辞書４はＬＲテーブル６等とは独立して辞書
管理手段１１によりアクセスできるようになっている。Here, in this embodiment, the above-mentioned dictionary 4,
LR table 6, linked evaluation table 8 and HMM model 13
Are separately stored in the storage device 17, and in particular, the dictionary 4 can be accessed by the dictionary management means 11 independently of the LR table 6 and the like.

【００２１】文法照合手段３は音素片認識手段２によっ
て順次生成された音節列を図３に示すような候補リスト
として保持し、これら候補リストに保持した音節列に対
して辞書４及びＬＲテーブル６を参照しながら文法照合
を行う。候補リストには各候補毎に、単語として未だ確
定していない辞書引き中の音節列を保存する「辞書引き
中の音節列」、図２に示した木構造の音節列において既
に確定した単語の出力表記を保存する「確定出力表
記」、構文解析に必要な単語の分類（本実施例では品
詞）を先読み記号を保存する「先読み記号」、ＬＲパー
ザ構文解析の解析状態をスタックという形で保存する
「ＬＲ構文解析状態」、現時点までに計算されたスコア
を保存する「スコア」の各項目の情報を保存している。The grammar collating means 3 holds the syllable strings sequentially generated by the phoneme unit recognizing means 2 as a candidate list as shown in FIG. 3, and the dictionary 4 and the LR table 6 for the syllable strings held in these candidate lists. Check the grammar while referring to. For each candidate in the candidate list, a syllable string in the dictionary that is not yet finalized as a word is stored as a “syllable string in dictionary dictionary”, which is a list of words already defined in the tree-structured syllable string shown in FIG. "Definite output notation" for saving the output notation, "prefetching sign" for saving the look-ahead symbol for classifying words (part of speech in this embodiment) necessary for parsing, and saving the LR parser parsing analysis state in the form of a stack The information of each item of "LR parsing state" and "score" which stores the score calculated up to the present time is stored.

【００２２】辞書記憶手段３０に記録されている辞書４
は、ワードプロセッサ１０の入力として受け付ける語彙
を含んでおり、図４にその一部を示すように、音素列を
示す木構造部分（トライ辞書部分）と、出力表記や単語
の分類（品詞）を単語情報として記述した表部分とによ
って構成されている。すなわち、木構造部分における○
印で囲んだ音節の位置は当該音節までの音節列が単語と
して受理し得ることを示し、また、○印で囲んだ位置に
付記した数字は表部分へのアドレスを示している。な
お、表部分のアドレスとは表を参照するためのポインタ
やラベル等の参照子を意味しており、記憶装置１７のア
ドレスに一致するものとは限らない。The dictionary 4 recorded in the dictionary storage means 30
Includes a vocabulary accepted as an input to the word processor 10. As shown in a part of FIG. 4, a tree structure part (trie dictionary part) indicating a phoneme string and an output notation and word classification (part of speech) are used as words. It is composed of a table part described as information. That is, ○ in the tree structure part
The position of the syllable surrounded by a mark indicates that the syllable string up to the syllable can be accepted as a word, and the number attached to the position surrounded by a circle indicates an address to the table part. The address of the table portion means a reference such as a pointer or label for referring to the table, and does not always match the address of the storage device 17.

【００２３】構文規則記憶手段５０には図５にその一部
を示すようなアクションテーブルとＧｏ−ｔｏテーブル
から構成される構文ＬＲテーブル６が記録されており、
このＬＲテーブル６はワードプロセッサ１０に入力する
文章（ここでは特に自然言語である日本語）の構文規則
を近似する形で表現された下記のような文脈自由文法に
よって記述した構文規則をコンパイルしたものとなって
いる。The syntax rule storage means 50 records a syntax LR table 6 composed of an action table and a Go-to table, a part of which is shown in FIG.
The LR table 6 is obtained by compiling the syntax rules described by the following context-free grammar expressed in a form that approximates the syntax rules of a sentence (especially Japanese, which is a natural language in this case) input to the word processor 10. Has become.

【００２４】例えば、コンパイルされる構文規則は、規則０：文→文節（文は文節から成る）規則１：文節→文節＋文節（文節は文節の結合から成
る）規則２：文節→名詞句（文節は名詞句から成る）規則３：名詞句→名詞＋助詞（名詞句は名詞と助詞と
の結合から成る）規則４：名詞句→形容詞＋名詞句（名詞句は形容詞と
名詞句との結合から成る）等である。For example, the syntax rules to be compiled are as follows: Rule 0: sentence → bunsetsu (sentence is composed of clauses) Rule 1: bunsetsu → bunsetsu + bunsetsu (bunsetsu consists of a combination of clauses) Rule 2: bunsetsu → noun phrase ( Phrase consists of noun phrase) Rule 3: Noun phrase → noun + particle (noun phrase consists of combination of noun and particle) Rule 4: noun phrase → adjective + noun phrase (noun phrase is combination of adjective and noun phrase) Consists of) and so on.

【００２５】また、図５に示すアクションテーブルで、
最左欄が構文解析の状態を示しており、最上欄が入力さ
れる品詞を示している。そして、或る状態にある時に、
先読み記号として次の単語の品詞が入力されると、この
アクションテーブルの参照すべき欄が決定し、その各欄
に記入されている指示に従い行うべき動作が受理、失
敗、シフト、リデュースの４つ内から決定される。受理
はそれまでに入力された音節列が文として照合に成功し
たことを示し、失敗はそれまでに入力された音節列が文
として認められなかったことを示す。なお、図５には失
敗は空欄で表現されている。In the action table shown in FIG. 5,
The leftmost column shows the state of syntax analysis, and the uppermost column shows the part of speech to be input. And when in a certain state,
When the part-of-speech of the next word is input as the look-ahead symbol, the column to be referred to in this action table is determined, and the action to be performed according to the instruction entered in each column is 4 types: acceptance, failure, shift, and reduce. Determined from within. Acceptance indicates that the syllable string input so far has been successfully matched as a sentence, and failure indicates that the syllable string input so far has not been recognized as a sentence. Note that in FIG. 5, failures are represented by blanks.

【００２６】シフトは照合が進んでいることを示し、新
たな状態をスタックに積む動作を行う。例えば、シフト
５は状態５をスタックに積むことを示す。リデュースは
規則が１つ適用できることを示し、上記した構文規則の
内の該当する規則（例えば、リデュース２では規則２）
の右辺の数だけスタックから状態を取り出し、規則の左
辺の品詞とスタックのトップの状態とからＧｏ−ｔｏテ
ーブルを参照して次の状態をスタックに積む動作を行
う。Ｇｏ−ｔｏテーブルは、アクションテーブルと同様
に、最左欄の見出しが状態を示し、最上欄の見出しが品
詞を示しており、アクションテーブルのリデュースが実
行された後に参照される表である。The shift indicates that the collation is in progress, and the operation of stacking a new state on the stack is performed. For example, shift 5 indicates stacking state 5 on. Reduce indicates that one rule can be applied, and the applicable rule among the syntax rules described above (for example, Rule 2 in Reduce 2).
The state is taken out from the stack by the number on the right side of, and the operation for stacking the next state is performed by referring to the Go-to table from the part of speech on the left side of the rule and the state of the top of the stack. Similar to the action table, the Go-to table is a table in which the heading in the leftmost column indicates the state and the heading in the uppermost column indicates the part of speech, and is referred to after the action table is reduced.

【００２７】候補選定手段９は、文法照合手段３から出
力された音声認識結果としての候補が複数ある場合に、
これら候補を連接評価表８を参照して絞りこむ処理を行
う。連接評価表８には、絞り込み処理の基準として、図
６にその一部を示すような２つの単語の連接スコアが予
め格納されており、例えば、「朝日」に「登る」が連接
するスコアは−０．０１、「朝日」に「光る」が連接す
るスコアは−０．１５、というように２つの単語が連接
する可能性をスコアによって順序付けてある。したがっ
て、候補選定手段９は文法照合手段３から候補として出
力された文等の文字列の内から、単語の連接の最尤度を
基準にて選定を行い、選定した候補をワードプロセッサ
１０へ出力する。The candidate selecting means 9 is provided when there are a plurality of candidates as the voice recognition result output from the grammar matching means 3.
A process of narrowing down these candidates with reference to the connection evaluation table 8 is performed. The concatenation evaluation table 8 stores in advance the concatenation scores of two words, a part of which is shown in FIG. 6, as a criterion for the narrowing down process. For example, the score in which “Climb” is concatenated with “Asahi” is The possibility that two words are concatenated is ordered by score, such as −0.01, the score in which “shining” is concatenated with “Asahi” is −0.15, and the like. Therefore, the candidate selection means 9 selects from the character strings of the sentences and the like output as candidates from the grammar matching means 3 based on the maximum likelihood of word concatenation, and outputs the selected candidates to the word processor 10. .

【００２８】辞書管理手段１１は辞書記憶手段５に格納
されている辞書４の内容を編集処理するものであり、入
出力ターミナル１２のキーボード等により利用者が入力
した編集指示に従って、辞書４に記述されている単語の
情報を検索し、新たな単語の追加処理或いは既に記述さ
れている単語の削除処理を行う。この辞書管理手段は辞
書記憶手段５（すなわち、辞書４）についてのみアクセ
スし、記憶装置１７に記憶されているＬＲテーブル６等
の他の記憶内容に影響を及ぼすことなく編集処理を行
う。The dictionary management means 11 edits the contents of the dictionary 4 stored in the dictionary storage means 5, and writes it in the dictionary 4 in accordance with the editing instruction input by the user using the keyboard of the input / output terminal 12 or the like. The information of the written word is searched for, and a new word is added or a word already written is deleted. The dictionary management means accesses only the dictionary storage means 5 (that is, the dictionary 4) and performs the editing process without affecting other stored contents such as the LR table 6 stored in the storage device 17.

【００２９】本実施例の音声認識装置を音声認識処理の
動作を説明しつつ更に詳しく説明する。まず、利用者は
ワードプロセッサ１０に入力したい文等の文字列を音声
によって発声すると、この音声はマイクロフォン１で電
気信号に変換されて信号処理手段１４に入力される。信
号処理手段１１は入力された音声信号を処理し易いデジ
タル信号に変換した後、音声信号のパワーと継続時間に
よって音声区間の始端と終端を検出して音声信号を切り
出し、切り出された区間に対して１０ｍｓ等といった単
位時間（フレーム）毎に周波数解析を行う。The voice recognition apparatus of this embodiment will be described in more detail while explaining the operation of the voice recognition processing. First, the user utters a character string such as a sentence to be input to the word processor 10 by voice, and this voice is converted into an electric signal by the microphone 1 and input to the signal processing means 14. The signal processing means 11 converts the input audio signal into a digital signal that is easy to process, detects the start and end of the audio section by the power and duration of the audio signal, cuts out the audio signal, and cuts out the extracted section. Frequency analysis is performed for each unit time (frame) such as 10 ms.

【００３０】そして、音節照合を行い易くするために、
周波数解析された音声データの特微量を特微量抽出手段
１５が抽出してベクトル表現する。音節照合は音節照合
手段１６が上記の特徴量を音節ＨＭＭモデル１３に照合
することにより行われ、基準値以上のスコアで受理でき
た音節を図２に示したような木構造で出力する。なお、
音節ＨＭＭモデル１３はモデル作成用の音声データによ
り予め作成されて、記憶装置１７に格納されている。In order to facilitate syllable matching,
The feature amount extraction unit 15 extracts the feature amount of the frequency-analyzed voice data and expresses it as a vector. The syllable matching is performed by the syllable matching means 16 by matching the above feature amount with the syllable HMM model 13, and the syllables that have been accepted with a score equal to or higher than the reference value are output in a tree structure as shown in FIG. In addition,
The syllable HMM model 13 is created in advance by the voice data for model creation and is stored in the storage device 17.

【００３１】この木構造に作られた音節列の候補は文法
照合手段３によって辞書引き及び構文解析がなされ、文
法照合の結果として得られた文字列の候補が候補選定手
段９へ出力される。そして、候補選定手段９によって選
定された最適な候補がワードプロセッサ１０へ出力さ
れ、入出力ターミナル１２のディスプレイに表示され
る。すなわち、連接評価表８に記された単語間の繋がり
のスコアと音節認識時のスコアを基に、最も確からしい
候補が選定されて、当該候補の確定した単語列がワード
プロセッサ１０に入力される。The syllable string candidates formed in this tree structure are subjected to dictionary lookup and syntactic analysis by the grammar matching means 3, and the character string candidates obtained as a result of the grammar matching are output to the candidate selecting means 9. Then, the optimum candidate selected by the candidate selecting means 9 is output to the word processor 10 and displayed on the display of the input / output terminal 12. That is, the most probable candidate is selected based on the score of the connection between words and the score at the time of syllable recognition described in the concatenation evaluation table 8, and the word string in which the candidate is confirmed is input to the word processor 10.

【００３２】上記した文法照合手段３による文法照合の
処理を図７に示すフローチャートに沿って更に詳しく説
明する。まず、辞書引き及び構文解析のための作業領域
としての候補リスト（図３）を初期化し（ステップＳ
１）、音節照合手段１６から出力された木構造の音節列
について以下の処理を枝の終端まで行ったかを判断する
（ステップＳ２）。この結果、終端まで達している場合
には処理を終了する一方、未だ途中である場合には木構
造音節列を枝を辿って順次候補リストに読み込む（ステ
ップＳ３）。なお、この場合には処理開始時点であるの
で、木構造音節列の最初の音節を全て読み込む。The grammar matching process by the grammar matching means 3 will be described in more detail with reference to the flowchart shown in FIG. First, a candidate list (FIG. 3) as a work area for dictionary lookup and syntax analysis is initialized (step S
1), it is determined whether the following processing has been performed up to the end of the branch for the tree-structured syllable string output from the syllable matching unit 16 (step S2). As a result, when the end is reached, the process is terminated, while when it is still in the middle, the tree-structured syllable string is sequentially read into the candidate list by tracing the branches (step S3). In this case, since the processing has started, all the first syllables in the tree-structured syllable string are read.

【００３３】次いで、木構造音節列を参照して、候補リ
ストに読み込んだ候補音節に複数の音節が後続するか
（すなわち、木構造が枝分かれするか）を判断し（ステ
ップＳ４）、複数の音節が後続する場合には候補音節を
当該個数だけ複写して候補リストに格納し、後続する音
節を候補リストの「辞書引き中の音節列」の欄の末尾に
格納する（ステップＳ５）。なお、処理の開始時点で最
初の音節が複数ある場合には、その数だけ候補リストの
行を用意して、最初の音節をそれぞれの行の「辞書引き
中の音節列」の欄に格納する。Next, by referring to the tree structure syllable string, it is judged whether or not a plurality of syllables follow the candidate syllable read in the candidate list (that is, whether or not the tree structure is branched) (step S4), and a plurality of syllables are obtained. If is followed by, the number of candidate syllables is copied by the number and stored in the candidate list, and the following syllable is stored at the end of the column of “syllable string in dictionary lookup” of the candidate list (step S5). If there are multiple initial syllables at the start of the process, prepare as many rows as the candidate list and store the first syllables in the "syllable string in dictionary lookup" column of each row. .

【００３４】次いで、候補リストに格納された各候補
（各行）に対して、単語を確定するために辞書引き処理
（図８を参照して後述する）を行う（ステップＳ６）。
辞書引き処理の結果、単語が確定した候補に対してのみ
構文解析処理（図９を参照して後述する）を行い（ステ
ップＳ７、Ｓ８）、単語が確定していないときには候補
リストに格納された次の候補について辞書引き処理を繰
り返し行う（ステップＳ９）。すなわち、辞書引き処理
は候補リストに格納された全ての候補について行われ、
この内の単語が確定した全ての候補には構文解析が行わ
れる。Next, a dictionary lookup process (described later with reference to FIG. 8) is performed on each candidate (each line) stored in the candidate list to determine the word (step S6).
As a result of the dictionary lookup process, the syntactic analysis process (described later with reference to FIG. 9) is performed only on the candidate for which the word is fixed (steps S7 and S8), and when the word is not fixed, it is stored in the candidate list. The dictionary lookup process is repeated for the next candidate (step S9). That is, the dictionary lookup process is performed for all candidates stored in the candidate list,
Parsing is performed on all the candidates whose words have been decided.

【００３５】上記の処理を行った結果、候補リストに格
納された候補の数が多くなりすぎている場合には、スコ
アの低い候補は認識結果として信頼性が低いものである
ので、以後の処理量を削減するために、スコアの低い候
補を候補リストから消去して候補数の絞り込みを行う
（ステップＳ１０）。この後、再び音節認識の出力結果
である木構造の音節列を参照して、候補リストの「辞書
引き中の音節列」の欄に更に後続する音節を延ばし（ス
テップＳ３）、上記の処理を木構造音節列を最後まで辿
り終えるか或いは候補リストが空になるまで繰り返し行
う。When the number of candidates stored in the candidate list is too large as a result of the above processing, the candidate having a low score has a low reliability as a recognition result, and hence the subsequent processing. In order to reduce the amount, candidates with a low score are deleted from the candidate list to narrow down the number of candidates (step S10). After that, the tree-structured syllable string that is the output result of the syllable recognition is referenced again, and the syllables that follow in the column of "syllable string in dictionary lookup" of the candidate list are further extended (step S3), and the above process is performed. This process is repeated until the tree structure syllable string is traced to the end or the candidate list becomes empty.

【００３６】なお、上記の処理が終了した結果として、
出力される候補が１つもない場合には音声認識に失敗し
たことを意味する。この場合には、認識に失敗したこと
を意味するなんらかのメッセージを利用者に対して入出
力ターミナル１２のディスプレイに表示したり或いは音
声合成機能を持たせて音声通知するのが好ましい。As a result of the above processing being completed,
If there are no candidates to be output, it means that the voice recognition has failed. In this case, it is preferable to display some message indicating that the recognition has failed on the display of the input / output terminal 12 to the user or give a voice notification by providing a voice synthesis function.

【００３７】上記した辞書引き処理は辞書記憶手段５に
格納された辞書４を参照して図８に示すフローチャート
に従って行われる。まず、図４に示した辞書４の木構造
部分を辿り（ステップＳ２１）、候補リストの「辞書引
き中の音節列」の欄に格納した音節列に該当する音節列
が存在するかどうか判定する（ステップＳ２２）。この
結果、木構造部分を辿れなかった場合には辞書４にない
音節列を持つ候補であるので、その候補を候補リストか
ら消去して辞書引き処理を終了する（ステップＳ２
３）。一方、木構造部分を辿れた場合には、辿った最後
の音節に単語受理のラベル（○印）が付いているかを判
断する（ステップＳ２４）。The above-mentioned dictionary lookup processing is performed according to the flowchart shown in FIG. 8 with reference to the dictionary 4 stored in the dictionary storage means 5. First, the tree structure portion of the dictionary 4 shown in FIG. 4 is traced (step S21), and it is determined whether or not there is a syllable string corresponding to the syllable string stored in the "syllable string being dictionary searched" column of the candidate list. (Step S22). As a result, if the tree structure portion cannot be traced, it is a candidate having a syllable string that is not in the dictionary 4, so that candidate is deleted from the candidate list and the dictionary lookup processing is ended (step S2).
3). On the other hand, when the tree structure portion is traced, it is determined whether the last syllable traced has a word acceptance label (circle mark) (step S24).

【００３８】この結果、単語受理ラベルが付いていない
場合には、継続して後続する音節を延ばして辞書引き処
理を行う必要があるので候補はそのまま保存して、次の
候補の処理に移るために辞書引き処理を終了する。一
方、単語受理ラベルが付いている場合には、候補リスト
の「辞書引き中の音節列」の欄に格納した対象としてい
る音節列に単語が確定したこととなるので、次のような
単語確定処理を行って辞書引き処理を終了する。As a result, when the word acceptance label is not attached, it is necessary to continuously extend the subsequent syllables and perform dictionary lookup processing, so the candidate is saved as it is and the processing of the next candidate is performed. The dictionary lookup processing is ended. On the other hand, if the word acceptance label is attached, it means that the word has been fixed in the target syllable string stored in the "syllable string in dictionary lookup" column of the candidate list. After processing, the dictionary lookup processing is ended.

【００３９】すなわち、辞書４において単語受理ラベル
に付記されている表部分への参照アドレスを辿り、辞書
４の表部分を読み出して（ステップＳ２６）、出力表記
を候補リストの確定出力表記の欄の末尾に書き加え（ス
テップＳ２７）、その単語の品詞を候補リストの先読み
記号の欄に書き出す（ステップＳ２８）。That is, the reference address to the table portion added to the word acceptance label in the dictionary 4 is traced, the table portion of the dictionary 4 is read (step S26), and the output notation is set in the final output notation column of the candidate list. It is added to the end (step S27), and the part of speech of the word is written in the look-ahead symbol column of the candidate list (step S28).

【００４０】なお、上記の単語確定処理において候補が
複数に分かれる場合がある。複数の単語が同時に確定し
た場合と、或る単語が確定すると同時に別の単語の辞書
引きが継続する場合である。その両者が同時に起こるこ
ともある。前者は、同音でも品詞や出力表記が異なる単
語が存在している場合であり、この場合には単語受理ラ
ベルに複数の参照アドレスが付記されていることで検出
できる。後者は、長い単語の一部として別の単語が含ま
れている場合に起こる。例えば、図４の辞書４に対して
「あさ」という音節列が候補として存在すると、「あ
さ」で辞書引きが確定する候補と「あさひ」まで枝を伸
ばす候補がある場合である。このような競合が起こった
場合には、候補リストの候補を複数に複写してそれぞれ
を別の候補として扱い（ステップＳ２９）、上記の単語
確定処理（ステップＳ２６〜Ｓ２８）をそれぞれの候補
について行う。There are cases in which the candidates are divided into a plurality of candidates in the above-mentioned word determination process. There are a case where a plurality of words are fixed at the same time, and a case where a certain word is fixed and the dictionary lookup of another word continues at the same time. Both may occur at the same time. The former is a case where there are words having different parts of speech and output notations even with the same sound, and in this case, it can be detected by adding a plurality of reference addresses to the word acceptance label. The latter happens when another word is included as part of a long word. For example, when a syllable string “Asa” exists as a candidate in the dictionary 4 of FIG. 4, there is a candidate whose dictionary lookup is determined by “Asa” and a candidate which extends a branch to “Asahi”. When such a conflict occurs, the candidates in the candidate list are duplicated and treated as different candidates (step S29), and the above-mentioned word determination process (steps S26 to S28) is performed for each candidate. .

【００４１】また、上記した構文解析処理は構文規則記
憶手段７に格納されたＬＲテーブル６を参照して図９に
示すフローチャートに従って行われる。構文解析処理で
は、単語が確定した候補に対して、辞書引き処理で得ら
れた先読み記号（品詞）とその候補の状態のスタックと
に基づいて拡張ＬＲパーザ構文解析を行う。まず、スタ
ックのトップの状態と先読み記号とに基づいてＬＲテー
ブル６のアクションテーブルを引く（ステップＳ３
１）。ここで、拡張ＬＲパーザ構文解析では１つの候補
に対して複数の解釈が可能になるので、その場合にはア
クションテーブルを引いた結果として複数のエントリが
あることになる。したがって、複数のエントリがあるか
を判断して（ステップＳ３２）、複数ある場合には、そ
のエントリの数だけ候補を複写して（ステップＳ３
３）、それぞれの候補に対してそれぞれのエントリに従
って解析を進める。The above-mentioned syntax analysis processing is performed according to the flowchart shown in FIG. 9 with reference to the LR table 6 stored in the syntax rule storage means 7. In the syntactic analysis process, the extended LR parser syntactic analysis is performed on the candidate whose word has been determined based on the look-ahead symbol (part of speech) obtained by the dictionary lookup process and the stack of the state of the candidate. First, the action table of the LR table 6 is drawn based on the state of the top of the stack and the look-ahead symbol (step S3).
1). Here, in the extended LR parser parsing, one interpretation can be performed for a plurality of candidates, and in that case, there are a plurality of entries as a result of subtracting the action table. Therefore, it is judged whether there are a plurality of entries (step S32), and if there are a plurality of entries, the candidates are copied by the number of the entries (step S3).
3), proceed with the analysis according to each entry for each candidate.

【００４２】構文解析解析はアクションテーブルを引い
た結果に基づくリデュース、シフト、失敗、受理の動作
で行われ、これらいずれの動作であるかの判断結果に応
じた処理がなされる（ステップＳ３４〜Ｓ３６）。ま
ず、アクションテーブルを引いた結果がリデュースであ
れば、その構文規則情報を基にリデュースして状態スタ
ックを書き換え（ステップＳ３７）、Ｇｏ−ｔｏテーブ
ルを引いて、次の状態をスタックに積み（ステップＳ３
８）、再度アクションテーブルを引いて（ステップＳ３
１）、その結果に従った処理を繰り返し行う。Parsing is performed by the actions of reduce, shift, failure, and acceptance based on the result of pulling the action table, and the process according to the decision result of which of these actions is performed (steps S34 to S36). ). First, if the result of pulling the action table is reduce, the state stack is rewritten based on the syntax rule information to rewrite the state stack (step S37), and the Go-to table is pulled to stack the next state on the stack (step S37). S3
8) Draw the action table again (step S3
1), the process according to the result is repeated.

【００４３】また、アクションテーブルを引いた結果が
シフトであれば、次の状態をスタックに積んで、その候
補に対する構文解析の処理を終了する（ステップＳ３
９）。また、アクションテーブルを引いた結果が空欄で
あれば構文解析に失敗したことを示すので、その候補を
候補リストから消去して、その候補に対する構文規則照
合の作業を終了する（ステップＳ４０）。また、アクシ
ョンテーブルを引いた結果が受理であれば、その候補を
候補リストから候補選定手段９へ出力するための認識成
功のリストに移して、その候補に対する構文解析の処理
を終了する（ステップＳ４１）。If the result obtained by subtracting the action table is a shift, the next state is stacked on the stack and the process of parsing the candidate is completed (step S3).
9). If the result obtained by subtracting the action table is blank, it means that the syntactic analysis has failed. Therefore, the candidate is deleted from the candidate list, and the work of the syntactic rule matching for the candidate is finished (step S40). If the result obtained by subtracting the action table is acceptable, the candidate is moved from the candidate list to the recognition success list for output to the candidate selecting means 9, and the syntax analysis process for the candidate is terminated (step S41). ).

【００４４】ここで、上記のような音声認識処理を行っ
ている最中に、或いは、音声認識処理を行っていない時
に、辞書記憶手段５に格納してある辞書４の内容を編集
する場合には図１０或いは図１１に示すフローチャート
に従った処理がなされる。まず、辞書４の含まれていな
い新たな単語を利用者が登録したい場合には、利用者が
入出力ターミナル１２のキーボード等から単語登録の指
示を辞書管理手段１１に入力して、辞書管理手段１１で
単語登録の処理を行う。Here, when the contents of the dictionary 4 stored in the dictionary storage means 5 are edited during the voice recognition process as described above or when the voice recognition process is not performed. Is processed according to the flowchart shown in FIG. First, when the user wants to register a new word not included in the dictionary 4, the user inputs a word registration instruction to the dictionary management means 11 from the keyboard of the input / output terminal 12 and the like, and the dictionary management means At 11, the word registration process is performed.

【００４５】この単語登録の処理では図１０にフローチ
ャートで示すように、まず、利用者が入出力ターミナル
１２のキーボード等から入力した新たに登録したい単語
の音節列、出力表記及び品詞を辞書管理手段１１が受け
取り（ステップＳ５１）、入力された音節列に基づいて
辞書４を検索して、同じ読みの単語等といった同様な単
語が既に辞書４に登録されているか検索する（ステップ
Ｓ５２）。この結果、既に登録されている場合には（ス
テップＳ５３）、同一の単語が既に辞書４に存在するこ
とを入出力ターミナル１２のディスプレイ等に表示して
（ステップＳ５４）単語登録の処理を終了する。In the word registration process, as shown in the flowchart of FIG. 10, first, the syllable string, the output notation, and the part of speech of the word newly input by the user from the keyboard of the input / output terminal 12 are stored in the dictionary management means. 11 receives (step S51), searches the dictionary 4 based on the input syllable string, and searches whether a similar word such as a word having the same reading is already registered in the dictionary 4 (step S52). As a result, if already registered (step S53), the fact that the same word already exists in the dictionary 4 is displayed on the display or the like of the input / output terminal 12 (step S54), and the word registration process ends. .

【００４６】一方、全く同一ではないが、同じ音韻列で
別の品詞の単語や同じ音節列で別の出力表記の単語（同
音異義、同音異表記）等の類似の単語が辞書４に既に存
在している場合には（ステップＳ５３）、その類似単語
の内容（出力表記、品詞等）を入出力ターミナル１２の
ディスプレイに表示し（ステップＳ５５）、登録処理を
行うか否か利用者からの確認を取る（ステップＳ５
６）。また、一方、同一又は類似の単語が未だ辞書４に
登録されていない場合にも（ステップＳ５３）、登録処
理を行うか否か利用者からの確認を取って登録処理の必
要性を確認する（ステップＳ５６）。On the other hand, although not exactly the same, similar words such as a word having a different part of speech in the same phoneme sequence or a word having a different output notation in the same syllable sequence (same synonym, different homophone) already exist in the dictionary 4. If so (step S53), the contents of the similar word (output notation, part of speech, etc.) are displayed on the display of the input / output terminal 12 (step S55), and the user confirms whether or not to perform the registration process. Take (step S5
6). On the other hand, when the same or similar words are not yet registered in the dictionary 4 (step S53), the user confirms whether or not the registration processing is performed to confirm the necessity of the registration processing ( Step S56).

【００４７】確認の結果（ステップＳ５７）、利用者に
拒否された場合には単語登録を終了する一方、利用者に
了承された場合には、新たに登録する単語の音節列の一
部からなる単語が既に辞書４に登録されているかを判断
する（ステップＳ５８）。この結果、新たに登録する単
語の音節列が既に辞書４に登録されていない場合には、
新たに登録する単語の音節列を成すように、辞書４の木
構造部分の既にある音節列に音節を追加して枝を延ば
し、末尾の音節に単語受理ラベルを記入する（ステップ
Ｓ５９）。As a result of the confirmation (step S57), if the user rejects the word registration, the word registration is terminated. If the user approves, the word registration consists of a part of the syllable string. It is determined whether the word is already registered in the dictionary 4 (step S58). As a result, when the syllable string of the word to be newly registered is not already registered in the dictionary 4,
A syllable is added to the existing syllable string in the tree structure portion of the dictionary 4 to extend the branch so as to form a syllable string of the word to be newly registered, and a word acceptance label is written on the last syllable (step S59).

【００４８】次いで、辞書４の表部分の領域（アドレ
ス）を確保して、登録する単語の出力表記及び品詞を記
録し（ステップＳ６１）、この表部分へのアドレスを木
構造部分の単語受理ラベルに付記する（ステップＳ６
１）。上記の処理によって辞書４に新たな単語が登録さ
れ、直ちに辞書引き処理（ステップＳ６）の利用に供せ
られる。まお、複数の単語を登録する場合には上記の処
理を繰り返し行う。Next, the area (address) of the front part of the dictionary 4 is secured, and the output notation and part of speech of the word to be registered are recorded (step S61), and the address to this front part is used as the word acceptance label of the tree structure part. (Step S6
1). By the above processing, a new word is registered in the dictionary 4 and immediately used for the dictionary lookup processing (step S6). Well, when registering a plurality of words, the above process is repeated.

【００４９】また、辞書４に既に登録されている単語を
削除する場合には図１１に示すフローチャートに従った
処理が辞書管理手段１１によってなされる。まず、利用
者が入出力ターミナル１２のキーボード等から入力した
単語削除の指示と削除する単語の情報（出力表示、音節
列、品詞等の単語を特定できる情報）を受け取る（ステ
ップＳ７１）。そして、入力された情報に基づいて辞書
４を検索し（ステップＳ７２）、該当する単語が辞書４
に登録されているかを調べる（ステップＳ７３）。When deleting a word already registered in the dictionary 4, the dictionary management means 11 performs the processing according to the flowchart shown in FIG. First, the user receives an instruction to delete a word input from the keyboard or the like of the input / output terminal 12 and information on the word to be deleted (output display, syllable string, information that can specify a word such as a part of speech) (step S71). Then, the dictionary 4 is searched based on the input information (step S72), and the corresponding word is found in the dictionary 4.
It is checked whether it is registered in (step S73).

【００５０】この結果、該当する単語が存在しない場合
には、その旨を入出力ターミナル１２のディスプレイに
表示して単語削除処理を終了する（ステップＳ７４）。
一方、該当する単語が存在する場合には、辞書４に格納
されているその単語の音節列、品詞及び出力表記を入出
力ターミナル１２のディスプレイに表示し（ステップＳ
７５）、削除処理を実行してよいか利用者の確認を取る
（ステップＳ７６）。なお、該当する単語が複数存在す
る場合には、その単語を全て表示して削除する単語を利
用者に指定させ、指定された単語についてのみ削除処理
を行う。As a result, if the corresponding word does not exist, a message to that effect is displayed on the display of the input / output terminal 12, and the word deleting process is terminated (step S74).
On the other hand, when the corresponding word exists, the syllable string, the part of speech, and the output notation of the word stored in the dictionary 4 are displayed on the display of the input / output terminal 12 (step S
75), the user confirms whether the deletion process should be executed (step S76). If there are a plurality of applicable words, all the words are displayed and the user is allowed to specify the words to be deleted, and only the specified words are deleted.

【００５１】そして、単語の削除が利用者から了承され
なかった場合には削除処理を終了する一方、了承された
場合には次のようにして辞書４から該当する単語を削除
する処理を行う。すなわち、まず、辞書４の木構造部分
の該当する単語受理ラベルから削除しようとする単語の
表部分へのアドレスを消去する（ステップＳ７７）。次
いで、削除しない単語の内で削除する単語と同じ音節列
を持つ同音の単語が辞書４に登録されているかを判断す
る（ステップＳ７８）。この結果、このような単語が存
在しない場合には削除する単語に該当する単語受理のラ
ベルを削除した後に（ステップＳ７９）辞書４の表部分
から該当する単語の欄を削除する一方（ステップＳ８
０）、このような単語が存在する場合には単語受理のラ
ベルの削除を行うことなく辞書４の表部分から該当する
単語の欄を削除する（ステップＳ８０）。Then, when the deletion of the word is not approved by the user, the deletion processing is terminated, while when the deletion is approved, the processing of deleting the corresponding word from the dictionary 4 is performed as follows. That is, first, the address to the front part of the word to be deleted is deleted from the corresponding word acceptance label of the tree structure part of the dictionary 4 (step S77). Next, it is determined whether or not a word having the same syllable string as the word to be deleted among the words not to be deleted is registered in the dictionary 4 (step S78). As a result, if such a word does not exist, the word acceptance label corresponding to the word to be deleted is deleted (step S79), while the corresponding word column is deleted from the table portion of the dictionary 4 (step S8).
0) If such a word exists, the corresponding word column is deleted from the front part of the dictionary 4 without deleting the word acceptance label (step S80).

【００５２】以上説明したように、本実施例は音声入力
された音声信号を音節毎に認識して、その結果を辞書４
を参照しながら構文解析を行うことで認識し、その結果
をワードプロセッサ１０に入力することができる。そし
て、この構文解析の効果を上げるように予めコンパイル
しておいた構文規則（ＬＲテーブルの形式で表現した構
文ＬＲテーブル６）を利用することで、一般に処理ステ
ップ数を要する構文解析を短時間で行うことができる。
更に、辞書４が構文ＬＲテーブル６とは独立して構成さ
れていることから、辞書管理手段１１に指示することで
構文ＬＲテーブル６を変更することなしに辞書４の編集
が可能であり、構文規則をコンパイルし直すといった長
時間を要する処理を行うことなしに辞書４の編集を行う
ことができる。したがって、構文解析が必要な複雑な文
の音声を認識する音声認識装置において、音声認識処理
の最中でも必要に応じて直ちに単語の登録や削除等の辞
書編集が可能となる。As described above, in this embodiment, the voice signal input by voice is recognized for each syllable and the result is recognized in the dictionary 4.
It can be recognized by performing syntax analysis while referring to, and the result can be input to the word processor 10. By using the syntax rule (the syntax LR table 6 expressed in the form of the LR table) that has been compiled in advance so as to improve the effect of the syntax analysis, the syntax analysis that generally requires the number of processing steps can be performed in a short time. It can be carried out.
Further, since the dictionary 4 is constructed independently of the syntax LR table 6, the dictionary 4 can be edited without changing the syntax LR table 6 by instructing the dictionary management means 11. It is possible to edit the dictionary 4 without performing a long-time process such as recompiling the rules. Therefore, in the voice recognition device that recognizes the voice of a complex sentence that requires syntactic analysis, it is possible to immediately perform dictionary editing such as word registration and deletion as needed even during the voice recognition process.

【００５３】次に、本発明の第２の実施例に係る音声認
識装置を図面を参照して説明する。なお、前述した第１
の実施例と同一の機能手段には同一符号を付して重複す
る説明は省略する。本実施例は、自動的に迅速なる辞書
の編集処理を実現しつつ、入力音声の音素の並びを単語
の種類（単語の分類）毎に予測して、迅速且つ確実な音
素片認識を実現する音声認識装置であり、認識結果とし
て得られた日本語文字列をデータベース検索のための命
令文としてデータベース検索システムに入力するもので
ある。Next, a voice recognition apparatus according to the second embodiment of the present invention will be described with reference to the drawings. In addition, the above-mentioned first
The same functional means as those in the embodiment of FIG. In the present embodiment, while automatically realizing a quick dictionary editing process, the phoneme sequence of the input speech is predicted for each word type (word classification) to realize quick and reliable phoneme unit recognition. It is a voice recognition device, and inputs a Japanese character string obtained as a recognition result to a database search system as a command sentence for database search.

【００５４】本実施例の音声認識装置は図１２に示すよ
うな構成を有しており、入力された音声を音素単位に認
識することに対応して、音素片認識手段２には音素照合
手段２６と音素モデル２３とが備えられている。また、
文法照合手段３は、照合制御手段３１と、辞書照合手段
３２と、構文規則照合手段３３とを備えており、音素片
認識手段２が認識した音素を受け取って、辞書記憶手段
５と構文規則記憶手段７との内容を参照しながら文法的
に次に後続し得る音素を予測して、予測した音素を次に
認識できるか音素片認識手段２に指示をする形式で文法
照合を行う。The speech recognition apparatus of this embodiment has a structure as shown in FIG. 12, and the phoneme piece recognition means 2 has a phoneme collation means corresponding to the recognition of input speech in phoneme units. 26 and a phoneme model 23 are provided. Also,
The grammar matching unit 3 includes a matching control unit 31, a dictionary matching unit 32, and a syntax rule matching unit 33. The grammar matching unit 3 receives the phonemes recognized by the phoneme unit recognition unit 2 and stores the dictionary storage unit 5 and the syntax rule memory. By referring to the contents of the means 7, grammatically succeeding phonemes are predicted, and grammatical matching is performed in a format instructing the phoneme piece recognition means 2 whether the predicted phoneme can be recognized next.

【００５５】辞書記憶手段５には、データベース検索シ
ステム４１への命令文に含まれる語彙の情報を、図１３
に一部を示すような、単語の音素列を示す木構造部分
と、単語の出力表示及び種類を格納した表部分と、単語
の種類毎に対応する音素列の先頭となり得る先頭音素を
格納した品詞別先頭音素表として格納されている。本実
施例は自然言語ではなくデータベース４２の検索のため
のコマンド列を受理するものであることから、第１の実
施例とは異なって、辞書５や構文規則７においての単語
の分類は言語学的に呼ばれる品詞ではなく単語の種類で
ある。例えば、引数を取るコマンドや引数を取らないコ
マンドの名称、データベース上のデータの項目名や題名
等の種類が単語の分類として用いられている。The dictionary storage means 5 stores information on the vocabulary included in the statement to the database search system 41 as shown in FIG.
A tree structure part showing a phoneme string of a word, a table part storing output display and type of the word, and a head phoneme that can be the head of the phoneme string corresponding to each word type are stored. It is stored as a head phoneme table classified by part of speech. Since the present embodiment accepts a command sequence for searching the database 42 instead of the natural language, unlike the first embodiment, the word classification in the dictionary 5 and the syntax rules 7 is linguistic. It is not the part of speech that is commonly called, but the type of word. For example, the names of commands that take arguments, the names of commands that do not take arguments, and the types of item names and titles of data on the database are used as word categories.

【００５６】構文規則記憶手段７には、データベース検
索システム４１へのコマンド列（すなわち命令文）の構
文規則が、第１の実施例と同様に、文脈自由文法によっ
て記述して構文ＬＲテーブルにコンパイルして格納され
ている。データベース検索システム４１はコマンド列に
従ってデータベース４２を検索するシステムであり、入
手力ターミナル１２からの入力で動作するとともに、候
補選定手段９の出力する音声認識結果（コマンド列）か
らも動作する。なお、、本実施例では、辞書記憶手段
５、構文規則記憶手段７及び音素モデル２３はそれぞれ
独立した記憶装置で構成されており、辞書記憶手段５は
構文規則記憶手段７とは独立して辞書管理手段１１によ
りアクセスできるようになっている。As in the first embodiment, the syntax rule storage means 7 describes the syntax rules of the command string (that is, the imperative sentence) for the database search system 41 by the context-free grammar and compiles it into the syntax LR table. And stored. The database search system 41 is a system for searching the database 42 according to a command string, and operates by the input from the acquisition terminal 12 and also by the voice recognition result (command string) output by the candidate selecting means 9. In the present embodiment, the dictionary storage unit 5, the syntax rule storage unit 7, and the phoneme model 23 are each configured as an independent storage device, and the dictionary storage unit 5 is independent of the syntax rule storage unit 7. It can be accessed by the management means 11.

【００５７】本実施例の音声認識装置を音声認識処理の
動作を説明しつつ更に詳しく説明する。利用者がデータ
ベース検索システム４１に指示するコマンド列を音声に
よって発声すると、この音声はマイクロフォン１によっ
て電気信号に変換されて信号処理手段１４に入力され
る。信号処理手段１４が入力された音声信号をデジタル
信号に変換した後にフレーム毎に周波数解析し、特微量
抽出手段１５が周波数解析された音声データから特微量
を抽出する。The voice recognition apparatus of this embodiment will be described in more detail while explaining the operation of the voice recognition processing. When the user utters a command string instructing the database search system 41 by voice, this voice is converted into an electric signal by the microphone 1 and input to the signal processing means 14. The signal processing unit 14 converts the input audio signal into a digital signal and then frequency-analyzes each frame, and the feature extraction unit 15 extracts a feature amount from the frequency-analyzed voice data.

【００５８】音素照合手段２６が、後述するように文法
照合手段３において予測された音素に対して、音素モデ
ル２３の内の必要な音素のモデルを起動して、特微量を
抽出した音声信号と照合して基準値以上のスコアで受理
できた音素のリストを文法照合手段３の照合制御手段３
１へ出力する。そして、文法照合手段３が、音素片認識
手段２が認識した音素列を辞書記憶手段５に格納された
辞書や構文規則記憶手段７に格納されたＬＲテーブルに
照合する。ここで、本実施例では、まずＬＲパーザ構文
解析によって次に受理し得る音素（群）を予測し、これ
ら予測した音素群を音素照合手段２６へに渡して、その
結果として音素照合手段２６が受理した音素を文法照合
手段３が受け取る方式でで処理が進められる。The phoneme collating means 26 activates a necessary phoneme model of the phoneme model 23 for the phoneme predicted by the grammar collating means 3 as described later, and outputs a voice signal in which a trace amount is extracted. The list of phonemes that have been matched and accepted with a score equal to or higher than the reference value is the matching control means 3 of the grammar matching means 3.
Output to 1. Then, the grammar collating means 3 collates the phoneme string recognized by the phoneme piece recognizing means 2 with the dictionary stored in the dictionary storage means 5 or the LR table stored in the syntax rule storage means 7. Here, in the present embodiment, first, the phonemes (groups) that can be accepted next are predicted by the LR parser syntax analysis, and these predicted phoneme groups are passed to the phoneme matching means 26, and as a result, the phoneme matching means 26 is The processing proceeds by a method in which the grammar matching means 3 receives the received phonemes.

【００５９】照合制御手段３１は図１４にその一例を示
すような候補リストを作業領域として有している。候補
リストは、候補のラベルの「番号」、既に確定した単語
（コマンド）の出力表記を記録する「確定出力表記」、
未だ単語が確定していない音素列を記録する「辞書引き
中の音素列」、ＬＲパーザ構文解析の状態をスタックと
して記録する「ＬＲ構文解析状態」、確定した単語の種
類をＬＲパーザ構文解析の先読み記号として記録する
「先読み記号」、予測された次に照合すべき音素を記録
する「予測音素」、候補の累積スコアを記録する「スコ
ア」の各欄で構成されている。The collation control means 31 has a candidate list, an example of which is shown in FIG. 14, as a work area. The candidate list is a "number" of the label of the candidate, a "fixed output notation" that records the output notation of the already fixed word (command),
"Phoneme string in dictionary lookup" that records a phoneme string in which a word has not yet been determined, "LR parsing state" that records the state of LR parser parsing as a stack, and the type of a confirmed word in LR parser parsing Each column includes a “look-ahead symbol” recorded as a look-ahead symbol, a “predicted phoneme” recording a predicted phoneme to be matched next, and a “score” recording a cumulative score of the candidate.

【００６０】上記した文法照合手段３による照合処理を
図１５〜図１７に示すフローチャートに沿って更に詳し
く説明する。まず、候補リストを初期化し、図１３に示
した辞書の品詞別先頭音素表を参照して、コマンド列の
先頭に来る可能性のある単語の種類から先頭にする音素
を候補リストの「予測音素」の欄に記入する（ステップ
Ｓ１０１）。例えば、コマンド列の先頭に来る可能性の
ある単語の種類が項目名であれば、品詞別先頭音素表か
ら、／ｄ／、／ｋ／、／ｊ／、／ｍ／をそれぞれ「予測
音素」の欄に記入した候補が候補リストに作られる。The collation processing by the grammar collating means 3 will be described in more detail with reference to the flow charts shown in FIGS. First, the candidate list is initialized, and referring to the part-of-speech head phoneme table of the dictionary shown in FIG. 13, the phoneme starting from the kind of word that may come at the head of the command string is set as “predicted phoneme” of the candidate list. Is entered in the field of "" (step S101). For example, if the type of word that may come at the beginning of the command string is an item name, then / d /, / k /, / j /, and / m / are each "predicted phoneme" from the beginning phoneme table by part of speech. The candidates entered in the column are created in the candidate list.

【００６１】そして、候補リストが空になるまで以下の
処理を繰り返して行う（ステップＳ１０２）。まず、予
測された各候補の予測音素の照合を開始するように音素
照合手段２６へ依頼する（ステップＳ１０３）。音素照
合手段２６は予測音素が入力された音声の先頭で照合で
きるか否かの処理を行い、この結果の照合に成功した音
素や失敗した音素のリストを文法照合手段３が受理する
（ステップＳ１０４）。次いで、照合に失敗した音素を
「予測音素」の欄に記入してある候補を候補リストから
削除し（ステップＳ１０５）、照合に成功した音素を
「予測音素」の欄に記入してある候補に対しては、図１
６を参照して後述する、次に来ることのできる音素を予
測する処理を行う（ステップＳ１０６）。Then, the following process is repeated until the candidate list becomes empty (step S102). First, the phoneme matching unit 26 is requested to start matching of the predicted phonemes of each predicted candidate (step S103). The phoneme matching unit 26 performs a process as to whether or not the predicted phoneme can be matched at the head of the input speech, and the grammar matching unit 3 accepts the result list of the phonemes that have been successfully matched or have failed (Step S104). ). Next, the candidate in which the matching failed phoneme is entered in the “predicted phoneme” column is deleted from the candidate list (step S105), and the successfully matched phoneme is selected as the candidate entered in the “predicted phoneme” column. In contrast, Figure 1
A process of predicting the next phoneme that can come next, which will be described later with reference to FIG. 6, is performed (step S106).

【００６２】上記の処理を候補リストが空になるまで繰
り返し行い、その結果、認識候補リストに候補が無けれ
ば認識に失敗したことになり、候補があれば候補選定手
段９によってスコアが最も高い候補が認識結果として選
択され（ステップＳ１０７）、データベース検索システ
ム４１へ認識結果として得られたコマンド列が入力され
る（ステップＳ１０８）。The above process is repeated until the candidate list becomes empty. As a result, if there is no candidate in the recognition candidate list, the recognition has failed. If there is a candidate, the candidate selecting means 9 gives the highest score. Is selected as the recognition result (step S107), and the command string obtained as the recognition result is input to the database search system 41 (step S108).

【００６３】上記した次の音素を予測する処理（ステッ
プＳ１０６）を図１６に示すフローチャートに沿って説
明する。まず、前に予測していた音素が照合に成功して
確定しているので、前の音素を候補リストの「予測音
素」の欄から「辞書引き中の音素列」の欄の末尾へ移動
する（ステップＳ１１１）。ここで、第１の実施例と同
様に、単語が確定した単語としていない単語とに分かれ
る場合がある。また、単語が確定した場合でも、同様に
複数の単語が確定することがある。このような全ての場
合の数に合わせて、別の候補とするために候補を複写す
る（ステップＳ１１２、Ｓ１１３）。そして、複写した
各候補については、未処理の候補を一つずつ取り出して
以下の処理を順次行う（ステップＳ１１４）。The process of predicting the next phoneme (step S106) will be described with reference to the flowchart shown in FIG. First, since the previously predicted phoneme has been successfully confirmed and confirmed, the previous phoneme is moved from the "predicted phoneme" field of the candidate list to the end of the "phoneme string in dictionary lookup" field. (Step S111). Here, as in the first embodiment, the word may be divided into a word that has not been fixed and a word that has not been fixed. Even when a word is fixed, a plurality of words may be fixed in the same manner. According to the number of all such cases, the candidate is copied to be another candidate (steps S112 and S113). Then, for each of the copied candidates, the unprocessed candidates are taken out one by one and the following processes are sequentially performed (step S114).

【００６４】未処理の候補に対して、単語がまだ確定し
ていない場合には（ステップＳ１１５）、辞書の木構造
部分を参照してこれにつながる音素を次の予測音素とし
てリストアップする（ステップＳ１１６）。一方、単語
が確定した場合には、図１７に基づいて後述する構文解
析を含んだ音素の予測の処理を行い、次の単語の先頭に
来る可能性のある音素を予測音素としてリストアップす
る（ステップＳ１１７）。When the word is not yet determined for the unprocessed candidate (step S115), the phoneme connected to this is listed up as the next predicted phoneme by referring to the tree structure part of the dictionary (step S115). S116). On the other hand, when a word is determined, a phoneme prediction process including a syntax analysis described below is performed based on FIG. 17, and a phoneme that may possibly be at the beginning of the next word is listed as a predicted phoneme ( Step S117).

【００６５】そして、リストアップされた予測音素の数
だけ候補を候補リストに複写し（ステップＳ１１８）、
それぞれの候補の候補リストの「予測音素」の欄に予測
された音素を１つずつ記入する（ステップＳ１１９）。
上記の処理を候補リストで未処理の候補がなくなるまで
行い（ステップＳ１２０）、これら予測音素に基づい
て、音素列中で後続する音素や単語列中で後続する単語
の先頭となる音素を音素照合部２６が照合する（ステッ
プＳ１０４）。Then, as many candidates as the number of listed predicted phonemes are copied to the candidate list (step S118),
Predicted phonemes are entered one by one in the "predicted phoneme" column of the candidate list of each candidate (step S119).
The above processing is performed until there are no unprocessed candidates in the candidate list (step S120), and based on these predicted phonemes, the phoneme that follows the phoneme sequence or the first phoneme of the word that follows in the word sequence is phoneme-matched. The unit 26 collates (step S104).

【００６６】上記した構文解析を含んだ音素予測の処理
（ステップＳ１１７）を図１７に示すフローチャートに
沿って説明する。まず、単語が確定したことから（ステ
ップＳ１１５）、辞書の木構造部分の単語受理ラベルに
付記してあるアドレスを参照して、辞書の表部分から単
語のデータを読み出す（ステップＳ１３１）。そして、
取り出した単語の出力表記を候補リストの「確定出力表
記」の欄の末尾に書き加え（ステップＳ１３２）、同じ
く取り出した単語の種類を候補リストの「先読み記号」
の欄に記入する（ステップＳ１３３）。更に、候補リス
トの「辞書引き中の音素列」の欄の音素列をクリアする
（ステップＳ１３４）。The phoneme prediction process (step S117) including the above-mentioned syntax analysis will be described with reference to the flowchart shown in FIG. First, since the word has been determined (step S115), the word data is read from the front part of the dictionary by referring to the address added to the word acceptance label of the tree structure part of the dictionary (step S131). And
The output notation of the extracted word is added to the end of the "fixed output notation" column of the candidate list (step S132), and the type of the extracted word is also added to the "look-ahead symbol" of the candidate list.
Is filled in in the column (step S133). Further, the phoneme string in the "phoneme string in dictionary lookup" column of the candidate list is cleared (step S134).

【００６７】次いで、「先読み記号」の欄に記入した単
語の種類（第１の実施例では品詞に該当）を用いて、図
９にフローチャートで示した第１の実施例と同様な構文
解析処理を行う（ステップＳ１３５）。この構文解析の
結果、受理又は失敗した候補は結果が確定しているの
で、以後の処理をする必要がないが（ステップＳ１３
６）、その他の場合には、アクションテーブルを参照し
て現在のＬＲ構文解析状態から次に来る単語の種類で失
敗とならないものをリストアップする（ステップＳ１３
７）。そして、リストアップされた単語の種類に対し
て、辞書の品詞別先頭音素表を参照して、次に来る予測
音素をリストアップする（ステップＳ１３８）。すなわ
ち、後続する単語の種類から、この単語の先頭に来る可
能性のある音素を予測音素とし、この予測音素に基づい
て音素照合部２６に照合処理を行わせる（ステップＳ１
０４）。Then, using the type of word (corresponding to the part of speech in the first embodiment) entered in the "look-ahead symbol" column, the same syntax analysis processing as in the first embodiment shown in the flowchart in FIG. 9 is performed. Is performed (step S135). As a result of this syntactic analysis, the result of the accepted or failed candidate is fixed, so that no further processing is required (step S13).
6) In other cases, the action table is referenced to list the types of words that come next from the current LR parsing state that do not fail (step S13).
7). Then, with respect to the listed word type, the head phoneme table for each part of speech in the dictionary is referred to, and the next predicted phoneme is listed (step S138). That is, a phoneme that may possibly come at the beginning of this word is selected as a predicted phoneme from the types of the following words, and the phoneme matching unit 26 is caused to perform matching processing based on this predicted phoneme (step S1).
04).

【００６８】上記のようにしてデータベース検索システ
ム４１にコマンド列を音声入力するが、この作業中或い
は作業休止中に、データベース４２内に新たにデータが
登録されて項目名等が新たに追加されると、認識できる
単語を辞書記憶手段５に格納されている辞書に追加する
必要が生じる。この単語登録の処理は図１８に示すフロ
ーチャートに沿って辞書管理手段１１により自動的に行
われる。As described above, a command string is input by voice to the database search system 41. During this work or work suspension, new data is registered in the database 42 and item names etc. are newly added. Then, it becomes necessary to add the recognizable word to the dictionary stored in the dictionary storage means 5. This word registration processing is automatically performed by the dictionary management means 11 according to the flowchart shown in FIG.

【００６９】例えば、データベース４２に新たな項目名
の追加があると、データベース検索システム４１から追
加されるべき項目名前とその種類を辞書管理手段１１へ
送出する。辞書管理手段１１は、入出力ターミナル１２
から利用者にその読み方を問合せる等して、項目名に対
応する読み方を獲得する（ステップＳ１４０）。次い
で、辞書の既に同じ項目名（単語）が登録されているか
を調べ（ステップＳ１４１、Ｓ１４２）、同一の項目名
が既に登録されている場合にはそのまま処理を終了す
る。For example, when a new item name is added to the database 42, the item name to be added and its type are sent from the database search system 41 to the dictionary management means 11. The dictionary management means 11 is an input / output terminal 12
Then, the user is inquired about the reading method, and the reading method corresponding to the item name is acquired (step S140). Next, it is checked whether or not the same item name (word) in the dictionary is already registered (steps S141 and S142), and if the same item name is already registered, the process is terminated.

【００７０】一方、同一の項目名がない場合には、第１
の実施例と同様にして辞書に新たな項目名の登録を行
う。すなわち、辞書の表部分に新たなアドレスを獲得し
て、登録する項目名に関する情報を登録し（ステップＳ
１４３）、辞書の木構造部分に必要であれば枝を延ばし
て単語受理ラベルに表部分のアドレスを付記する（ステ
ップＳ１４４）。そして、更に本実施例では、辞書の品
詞別先頭音素表を調べ、この表に登録した項目名の先頭
の音素が含まれていないときにはその音素を追加記入す
る（ステップＳ１４５）。この結果、構文規則記憶手段
７に何等影響を及ぼすことなく、辞書記憶手段５に格納
されている辞書に新たな単語を登録することができる。On the other hand, if there is no same item name, the first
A new item name is registered in the dictionary in the same manner as in the above embodiment. That is, a new address is acquired in the front part of the dictionary, and information on the item name to be registered is registered (step S
143), if necessary, the branch is extended to the tree structure portion of the dictionary, and the address of the front portion is added to the word acceptance label (step S144). Then, in this embodiment, the head phoneme table for each part of speech in the dictionary is checked, and when the head phoneme of the item name registered in this table is not included, the phoneme is additionally written (step S145). As a result, a new word can be registered in the dictionary stored in the dictionary storage means 5 without any influence on the syntax rule storage means 7.

【００７１】以上のように、本実施例においては、音声
入力された音声信号を辞書を参照しながら構文解析を行
うことで認識し、その結果をコマンド列としてデータベ
ース検索システム４１に入力することができる。特に、
本実施例は第１の実施例の特徴に加えて、辞書記憶手段
５と構文規則記憶手段７を分離してる状態においても、
照合すべき音素の予測をすることができる。そして、音
素を予測することで、照合処理ために起動される音素モ
デルの数を減らすことが可能となり、照合に必要な処理
量を減らすことができる。また、データベース検索シス
テム４１がデータベース４２の内容の更新に伴って辞書
管理手段１１を介して自動的に単語登録をするため、デ
ータベース４２の内容の変化に応じて受理できる語彙を
自動的に増加させることができる。As described above, in the present embodiment, it is possible to recognize a voice signal input as a voice by performing syntax analysis while referring to a dictionary and input the result as a command string to the database search system 41. it can. In particular,
In this embodiment, in addition to the features of the first embodiment, even when the dictionary storage means 5 and the syntax rule storage means 7 are separated,
The phoneme to be matched can be predicted. Then, by predicting the phonemes, the number of phoneme models activated for the matching process can be reduced, and the processing amount required for the matching can be reduced. In addition, since the database search system 41 automatically registers words via the dictionary management means 11 with the update of the contents of the database 42, the vocabulary that can be accepted is automatically increased according to the change of the contents of the database 42. be able to.

【００７２】[0072]

【発明の効果】以上詳細に説明したように、本発明の音
声認識装置によれば、辞書をコンパイルに時間を要する
構文規則から独立させ、構文規則に影響を及ぼすことな
く辞書にアクセスできる辞書管理手段を設けたため、標
準音声を入力することや必要に応じて構文規則の再構成
を行うことなしに、外部からの指示によって辞書に新た
な語彙の追加や削除等の変更を直ちに行うことができ、
効率的な音声認識を実現することができる。As described in detail above, according to the speech recognition apparatus of the present invention, the dictionary management is performed so that the dictionary can be made independent from the syntax rules that require a long time to compile and the dictionary can be accessed without affecting the syntax rules. With the provision of means, it is possible to immediately make changes such as addition or deletion of new vocabulary to the dictionary by external instructions without inputting standard voices and reconstructing syntax rules as needed. ,
It is possible to realize efficient voice recognition.

【００７３】更に、請求項２の音声認識装置によれば、
上記のような迅速なる辞書の編集処理を実現しつつ、文
法照合手段においてＬＲパーザにより品詞等の単語の分
類を予測した効率的な文法照合処理を行い、総じて効率
の高い音声認識を実現することができる。また更に、請
求項３の音声認識装置によれば、ＬＲパーザにより後続
する音素片を予測して、この予測音素片に基づいて音素
片認識手段が入力された音声のデータから音素片を認識
するようにしたため、効率的な音素片照合を実現して、
総じて効率の高い音声認識を実現することができる。Further, according to the voice recognition device of claim 2,
To achieve efficient speech recognition as a whole by performing efficient grammar matching processing that predicts the classification of words such as parts of speech by the LR parser in the grammar matching means while realizing the above-described quick dictionary editing processing. You can Still further, according to the speech recognition apparatus of claim 3, the LR parser predicts a subsequent phoneme piece, and the phoneme piece recognition unit recognizes the phoneme piece from the input voice data based on the predicted phoneme piece. As a result, efficient phoneme matching is realized,
It is possible to realize highly efficient voice recognition as a whole.

[Brief description of drawings]

【図１】本発明の第１の実施例に係る音声認識装置の
構成図である。FIG. 1 is a configuration diagram of a voice recognition device according to a first embodiment of the present invention.

【図２】音素片照合によって得られた音素列の一例を
示す概念図である。FIG. 2 is a conceptual diagram showing an example of a phoneme string obtained by phoneme piece matching.

【図３】候補リストの一例を示す概念図である。FIG. 3 is a conceptual diagram showing an example of a candidate list.

【図４】辞書の内容を示す概念図である。FIG. 4 is a conceptual diagram showing the contents of a dictionary.

【図５】構文規則をコンパイルしたＬＲテーブルの内
容を示す概念図である。FIG. 5 is a conceptual diagram showing contents of an LR table in which syntax rules are compiled.

【図６】連接評価表の内容を示す概念図である。FIG. 6 is a conceptual diagram showing the contents of a connection evaluation table.

【図７】本発明の第１の実施例に係る音声認識処理の
手順を示すフローチャートである。FIG. 7 is a flowchart showing a procedure of voice recognition processing according to the first embodiment of the present invention.

【図８】本発明の第１の実施例に係る辞書引き処理の
手順を示すフローチャートである。FIG. 8 is a flowchart showing a procedure of dictionary lookup processing according to the first embodiment of the present invention.

【図９】本発明の第１の実施例に係る構文解析処理の
手順を示すフローチャートである。FIG. 9 is a flowchart showing a procedure of syntax analysis processing according to the first example of the present invention.

【図１０】本発明の第１の実施例に係る単語登録処理
の手順を示すフローチャートである。FIG. 10 is a flowchart showing a procedure of word registration processing according to the first embodiment of the present invention.

【図１１】本発明の第１の実施例に係る単語削除処理
の手順を示すフローチャートである。FIG. 11 is a flowchart showing a procedure of word deletion processing according to the first embodiment of the present invention.

【図１２】本発明の第２の実施例に係る音声認識装置
の構成図である。FIG. 12 is a configuration diagram of a voice recognition device according to a second embodiment of the present invention.

【図１３】本発明の第２の実施例に係る辞書の内容を
示す概念図である。FIG. 13 is a conceptual diagram showing contents of a dictionary according to a second embodiment of the present invention.

【図１４】本発明の第２の実施例に係る候補リストの
一例を示す概念図である。FIG. 14 is a conceptual diagram showing an example of a candidate list according to a second example of the present invention.

【図１５】本発明の第２の実施例に係る音声認識処理
の手順を示すフローチャートである。FIG. 15 is a flowchart showing a procedure of voice recognition processing according to the second embodiment of the present invention.

【図１６】本発明の第２の実施例に係る音素予測処理
の手順を示すフローチャートである。FIG. 16 is a flowchart showing a procedure of phoneme prediction processing according to the second embodiment of the present invention.

【図１７】本発明の第２の実施例に係る構文解析処理
の手順を示すフローチャートである。FIG. 17 is a flowchart showing a procedure of a syntax analysis process according to the second embodiment of the present invention.

【図１８】本発明の第２の実施例に係る単語登録処理
の手順を示すフローチャートである。FIG. 18 is a flowchart showing a procedure of word registration processing according to the second embodiment of the present invention.

[Explanation of symbols]

２・・・音素片認識手段、３・・・文法照合手段、
５・・・辞書記憶手段、７・・・構文規則記憶手段、
１１・・・辞書管理手段、１２・・・入出力ターミナ
ル、2 ... Phoneme piece recognition means, 3 ... Grammar matching means,
5 ... dictionary storage means, 7 ... syntax rule storage means,
11 ... Dictionary management means, 12 ... Input / output terminal,

Claims

[Claims]

1. A syntax rule storage means for storing a syntax rule for defining an acceptable speech grammar in a compiled form, and a syntax rule storage means configured independently of each other, wherein an acceptable word is at least phoneme string notation. , The output notation, the dictionary storage means for recording the classification of words, the phoneme unit recognition means for recognizing the input speech in units of phoneme units, and the recorded contents of the dictionary storage means and the syntax rule storage means, Grammar collating means for collating the arrangement of the pieces in terms of grammar, inputting means for inputting a word editing instruction to the dictionary storing means from the outside, and recorded contents of the dictionary storing means based on the instruction from the inputting means A voice recognition device comprising: a dictionary management unit for changing the.

2. The grammar checking means refers to the recorded content of the dictionary storage means for the recognized phoneme string sequence to determine the corresponding word, and refers to the recorded content of the syntax rule storage means for the word. Then, grammatical matching is performed by LR syntax analysis, and the classification of the subsequent word is predicted based on the classification of the word, and the process of determining the subsequent word using the predicted classification of the word is performed. The voice recognition device according to claim 1, wherein the voice recognition device is a voice recognition device.

3. A dictionary storage means stores in advance a candidate of a phoneme piece which becomes a head for each classification of words, and the grammar matching means records the content of the dictionary storage means for the recognized phoneme piece string. To determine the corresponding word, refer to the recorded content of the syntax rule storage means for the word, perform grammatical matching by LR parsing, and classify subsequent words based on the classification of the word. And the subsequent phoneme piece from the recorded content of the dictionary storage means based on the classification of the word, and the phoneme piece recognition means finds the input phoneme by finding the predicted phoneme piece at the beginning. The voice recognition device according to claim 1, wherein the voice recognition device recognizes the voice.