JP2001013992A

JP2001013992A - Voice understanding device

Info

Publication number: JP2001013992A
Application number: JP11188480A
Authority: JP
Inventors: Atsushi Noguchi; 淳野口
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 1999-07-02
Filing date: 1999-07-02
Publication date: 2001-01-19

Abstract

PROBLEM TO BE SOLVED: To obtain a voice understanding device, which obtains a correct key word even though a necessary key word exists not only in one recognition result candidate but also exists in plural recognition result candidates. SOLUTION: A voice recognition section 101 outputs plural recognition result candidate sentences obtained as a recognition result and scores of each word included in each candidate sentence. A meaning extracting section 104 discriminates whether the key word exists in the plural recognition result candidate sentences outputted from the section 101 or not based on the stored contents of a conversation control section 106 and a meaning expression storage section 107 and outputs the key words existent in these sentences to a meaning output section 105. The section 104 selects a best score outputted by the section 101 for each key word when plural key words, which should not simultaneously exist in one uttering, exist.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、入力された音声の
認識を行う音声認識装置に関し、特に入力音声の認識結
果よりからユーザーが意図した意味を出力する音声理解
装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus for recognizing an input speech, and more particularly to a speech understanding apparatus for outputting a meaning intended by a user based on a recognition result of an input speech.

【０００２】[0002]

【従来の技術】連続発声された音声を認識あるいは理解
する装置のために用いられる音声認識方法として、例え
ば特開平８−２４８９８８号公報には、音響処理によっ
て得られた複数の認識結果候補から文法的あるいは意味
的に確率の高い認識結果を得ることにより、認識処理全
体として、高い認識率／意味理解率が得られる音声認識
方法が提案されている。2. Description of the Related Art As a speech recognition method used for a device for recognizing or understanding continuously uttered speech, for example, Japanese Patent Laid-Open No. 8-248988 discloses a grammar from a plurality of recognition result candidates obtained by acoustic processing. A speech recognition method has been proposed in which a recognition result having a high probability in terms of meaning or meaning can be obtained, so that a high recognition rate / meaning understanding rate can be obtained as a whole recognition processing.

【０００３】この従来の音声認識方法は、音響処理部が
上位複数個の認識結果候補を出力した後に、言語処理部
が、認識結果候補に対して文法的評価値を与え、音響処
理部によって与えられた音響的評価値と、文法的評価値
を適当な重み付けを行った線形和を、総合的な評価値と
し、総合的な評価値の最も高い候補を認識結果としてい
る。In this conventional speech recognition method, after a sound processing section outputs a plurality of candidate recognition results, a language processing section gives a grammatical evaluation value to the recognition result candidates, and a grammatical evaluation value is given by the sound processing section. A linear sum obtained by appropriately weighting the obtained acoustic evaluation value and the grammatical evaluation value is defined as a comprehensive evaluation value, and the candidate having the highest overall evaluation value is determined as a recognition result.

【０００４】[0004]

【発明が解決しようとする課題】しかしながら、上述し
た従来の方法は、下記記載の問題点を有している。However, the above-mentioned conventional method has the following problems.

【０００５】すなわち、複数の認識結果候補の中から１
つを選択するため、必要なキーワードが１つの認識結果
候補の中だけには存在せずに、複数個の認識結果候補の
中にそれぞれ存在した場合に、うまくキーワードを取り
出すことができない、ということである。That is, one out of a plurality of recognition result candidates
In order to select one, if the required keyword does not exist only in one recognition result candidate, but exists in each of a plurality of recognition result candidates, the keyword cannot be extracted properly. It is.

【０００６】例えばユーザーが「明日のＡ席のチケット
を２枚下さい」と発声し、音声理解装置の出力として望
まれるキーワードが、「明日」、「Ａ席」、「２枚」で
あるものとする。この時、音響処理部における認識結果
は、図２に示す通りになったものとする。なお、図２に
おいて、（）内の数字はスコアを表している。For example, it is assumed that the user utters “Please give two tickets for tomorrow's A seat” and the keywords desired as output of the voice understanding device are “tomorrow”, “A seat”, and “two tickets”. I do. At this time, it is assumed that the recognition result in the sound processing unit is as shown in FIG. In FIG. 2, the numbers in parentheses indicate scores.

【０００７】キーワード「明日」、「Ａ席」は、第１位
の認識結果候補に含まれているが、キーワード「２枚」
は、第２位の認識結果候補にしか含まれていず、複数の
認識結果候補の中から１つを選択するためキーワード
「２枚」取り出すことができない。すなわち、必要なキ
ーワードが１つの認識結果候補の中だけには存在せずに
複数個の認識結果候補の中にそれぞれ存在した場合に、
うまく取り出すことができない。The keywords "tomorrow" and "seat A" are included in the first-ranked recognition result candidates, but the keywords "two"
Is included only in the second-ranked recognition result candidate, and one of the plurality of recognition result candidates is selected, so that the keyword "2" cannot be extracted. That is, when the required keyword is not present only in one recognition result candidate but is present in each of a plurality of recognition result candidates,
Can not be taken out well.

【０００８】したがって本発明は、上記問題点に鑑みて
なされたものであって、その目的は、数個の認識結果候
補文中にキーワードが存在する場合にも、必要なキーワ
ードを正しく出力することができる音声理解装置を提供
することにある。[0008] Therefore, the present invention has been made in view of the above problems, and an object of the present invention is to output a necessary keyword correctly even when a keyword exists in several recognition result candidate sentences. It is an object of the present invention to provide a voice comprehension device that can perform the same.

【０００９】[0009]

【課題を解決するための手段】前記目的を達成する本発
明は、音声認識手段から出力された複数個の認識結果候
補中文から各文中に含まれるキーワードを該キーワード
に対応するスコアに基づき選択するキーワード選択手段
を備えている。According to the present invention for achieving the above object, a keyword included in each sentence is selected from a plurality of candidate recognition result middle sentences output from a speech recognition means based on a score corresponding to the keyword. A keyword selecting means is provided.

【００１０】かかる構成の本発明によれば、複数個の認
識結果候補文中にキーワードが存在する場合にも、正し
く必要なキーワードを出力することができる可能性が増
え、誤認識の修正や音声の再入力の手間などを削減す
る。According to the present invention having such a configuration, even when a keyword is present in a plurality of recognition result candidate sentences, there is an increased possibility that a necessary keyword can be output correctly. Reduces re-entry effort.

【００１１】[0011]

【発明の実施の形態】本発明の実施の形態について説明
する。本発明の音声理解装置は、その好ましい実施の形
態において、音声を認識し認識結果の候補として複数の
文章および前記認識結果候補の文章に含まれる各キーワ
ードに対応するスコアを出力する音声認識手段（１０
１）と、音声認識手段（１０１）から出力された複数の
前記キーワードおよび該キーワードに対応する前記スコ
アに基づき、適切なキーワードを選択するキーワード選
択手段（１０４）と、を含む。Embodiments of the present invention will be described. In a preferred embodiment, the speech understanding device of the present invention, in a preferred embodiment, recognizes speech and outputs a plurality of sentences as candidates for a recognition result and a score corresponding to each keyword included in the sentences of the recognition result candidates ( 10
1) and keyword selection means (104) for selecting an appropriate keyword based on the plurality of keywords output from the voice recognition means (101) and the score corresponding to the keywords.

【００１２】キーワード選択手段は、一発声中に同時に
存在し得ないキーワードが複数個存在した場合には、音
声認識手段（１０１）から出力された各キーワードに対
するスコアを基に一つのキーワードを選択する。When there are a plurality of keywords that cannot be present simultaneously in one utterance, the keyword selection means selects one keyword based on the score for each keyword output from the speech recognition means (101). .

【００１３】また本発明は、その好ましい実施の形態に
おいて、あらかじめ用意された対話の流れを記憶し、次
のユーザからの音声入力にはどのようなキーワードが含
まれるかという情報を、キーワード選択手段に渡し、各
対話の状態毎に受け付けるキーワードを切り替える対話
管理手段（１０６）を備える。According to a preferred embodiment of the present invention, a flow of a dialog prepared in advance is stored, and information on what keyword is included in a voice input from the next user is stored in a keyword selecting means. And a dialogue management means (106) for switching a keyword to be accepted for each state of each dialogue.

【００１４】本発明は、入力された音声を認識して認識
結果よりからユーザが意図した意味を現わす複数のキー
ワードを出力する音声理解装置であって、音声認識に用
いる認識用の文法を記憶する認識用文法記憶部と、音声
認識に用いる認識辞書およびキーワードとなる単語であ
るかどうかの情報を記憶する認識用辞書記憶部と、各対
話の状態毎にユーザが入力し得るキーワードを記憶する
意味表現記憶部と、を備え、（ａ）ユーザが音声入力手
段から入力した音声に対して、前記認識用文法記憶部と
前記認識用辞書記憶部を参照して、音声認識を行い、認
識結果として得られた複数の認識結果候補文と該候補文
に含まれる各単語のスコアを出力する音声認識手段と、
（ｂ）前記意味表現記憶部の記憶内容を基に、前記音声
認識手段から出力された複数の各認識結果候補文内にキ
ーワードがあるか否かを判定し、存在したキーワードを
出力し、その際、一つの発声中に同時に存在し得ないキ
ーワードが複数個存在した場合は、各キーワードに対す
る前記音声認識手段が出力したスコアの最も良いもの１
つを選択する意味抽出手段と、（ｃ）前記意味抽出手段
より出力された情報を加工して意味表現を作成して出力
する意味出力手段と、の前記（ａ）乃至（ｃ）の各手段
の処理は、前記音声理解装置を構成するコンピュータで
実行させることで実現するようにしてもよい。The present invention is a speech comprehension apparatus for recognizing input speech and outputting a plurality of keywords representing the meaning intended by the user based on the recognition result, and stores a recognition grammar used for speech recognition. Grammar storage unit for recognition, a recognition dictionary used for speech recognition and a recognition dictionary storage unit for storing information as to whether or not the word is a keyword, and a keyword that can be input by the user for each state of each dialogue is stored. And (a) performing speech recognition on the speech input by the user from the speech input unit with reference to the recognition grammar storage unit and the recognition dictionary storage unit. Voice recognition means for outputting a plurality of recognition result candidate sentences obtained as and a score of each word included in the candidate sentences,
(B) Based on the storage contents of the semantic expression storage unit, it is determined whether or not there is a keyword in each of the plurality of recognition result candidate sentences output from the voice recognition unit, and the existing keyword is output. At this time, when there are a plurality of keywords that cannot be present simultaneously in one utterance, the one with the best score output by the voice recognition means for each keyword
(A) to (c), meaning extraction means for selecting one of the meanings, and (c) meaning output means for creating and outputting a meaning expression by processing the information output from the meaning extraction means. May be realized by being executed by a computer constituting the speech understanding device.

【００１５】また本発明において、（ｄ）前記意味出力
手段の出力結果の情報と受け取り、あらかじめ用意され
た対話の流れを記憶し、前記意味出力手段の出力に応じ
て、次の音声入力にはどのようなキーワードが含まれる
かという情報を前記意味抽出手段に渡す対話管理手段を
備え、前記意味抽出手段が、前記対話管理手段からの出
力及び前記意味表現記憶部の記憶内容を基に、前記音声
認識手段から出力された複数の各認識結果候補文内にキ
ーワードがあるか否かを判定し、存在したキーワードを
出力するように構成してもよい。前記（ｄ）の手段の処
理は、前記音声理解装置を構成するコンピュータで実行
させることで実現するようにしてもよい。Also, in the present invention, (d) receiving the information of the output result of the meaning output means, storing the flow of the dialog prepared in advance, and storing the next speech input in accordance with the output of the meaning output means A dialogue management unit that passes information about what keyword is included to the meaning extraction unit, wherein the meaning extraction unit is configured to output the information based on an output from the dialogue management unit and a storage content of the meaning expression storage unit. It may be configured to determine whether or not there is a keyword in each of the plurality of recognition result candidate sentences output from the voice recognition unit, and to output the existing keyword. The processing of the means (d) may be realized by being executed by a computer constituting the speech understanding device.

【００１６】すなわち、上記プログラムを記録した記録
媒体、又は通信媒体から読み出したプログラムをコンピ
ュータに読み出して実行することで本発明を実施するこ
とができる。That is, the present invention can be implemented by reading out a program read from a recording medium or a communication medium storing the above-mentioned program into a computer and executing the program.

【００１７】[0017]

【実施例】本発明の実施例について図面を参照して詳細
に説明する。Embodiments of the present invention will be described in detail with reference to the drawings.

【００１８】図１は、本発明の一実施例の構成を示す図
である。図１を参照すると、本発明の一実施例に係る音
声理解装置は、ユーザが入力した音声を認識する音声認
識部１０１と、音声認識に用いる認識用の文法を記憶す
る認識用文法記憶部１０２と、音声認識に用いる認識辞
書およびキーワードとなる単語であるかどうかの情報を
記憶する認識用辞書記憶部１０３と、認識結果候補中か
ら含まれるキーワードを取り出す意味抽出部１０４と、
得られたキーワードを出力する意味出力部１０５と、ユ
ーザの対話の流れを管理する対話管理部１０６と、各対
話の状態毎にユーザが入力し得るキーワードを記憶する
意味表現記憶部１０７とを含む。FIG. 1 is a diagram showing the configuration of one embodiment of the present invention. Referring to FIG. 1, a speech understanding apparatus according to an embodiment of the present invention includes a speech recognition unit 101 for recognizing speech input by a user, and a recognition grammar storage unit 102 for storing a recognition grammar used for speech recognition. A recognition dictionary storage unit 103 that stores a recognition dictionary used for speech recognition and information as to whether the word is a keyword, a meaning extraction unit 104 that extracts a keyword included in a recognition result candidate,
Includes a meaning output unit 105 that outputs the obtained keywords, a dialog management unit 106 that manages the flow of the user's dialogue, and a meaning expression storage unit 107 that stores a keyword that can be input by the user for each state of the dialogue. .

【００１９】音声認識部１０１は、認識用文法記憶部１
０２と認識用辞書記憶部１０３の記憶内容を基に、ユー
ザが不図示の音声入力手段から入力しディジタル信号に
変換された音声に対して音声認識処理を行い、認識結果
として得られた複数の認識結果候補文（テキスト情報）
と各候補文に含まれる各単語のスコアを意味抽出部１０
４に出力する。The speech recognition unit 101 includes a recognition grammar storage unit 1
02 and the contents stored in the recognition dictionary storage unit 103, the user performs a voice recognition process on the voice input from a voice input unit (not shown) and converted into a digital signal, and a plurality of voices obtained as a recognition result are obtained. Recognition result candidate sentence (text information)
And the score of each word included in each candidate sentence
4 is output.

【００２０】この音声認識部１０１より出力されるスコ
アとしては、例えば認識処理の際の音響スコアや、認識
用文法記憶部１０２の記憶内容により作成した言語スコ
ア、もしくは、音響スコアと言語スコアの両方を考慮し
たものからなる。The score output from the speech recognition unit 101 is, for example, an acoustic score at the time of recognition processing, a language score created by the storage contents of the recognition grammar storage unit 102, or both an acoustic score and a language score. Is considered.

【００２１】認識用文法記憶部１０２には、音声認識処
理に用いられる例えばＣＦＧ（文脈自由文法）や、Bigr
am、Trigramのような統計言語モデルなどの文法が記憶
保持され、必要に応じて音声認識部１０１に記憶内容を
渡す。The recognition grammar storage unit 102 stores, for example, CFG (context-free grammar) or Bigr
A grammar such as a statistical language model such as am and Trigram is stored and held, and the stored content is passed to the speech recognition unit 101 as necessary.

【００２２】認識用辞書記憶部１０３には、音声認識に
用いる認識辞書が記憶保持されており、必要に応じて音
声認識部１０１に記憶内容を渡す。The recognition dictionary storage unit 103 stores a recognition dictionary used for voice recognition, and transfers the stored contents to the voice recognition unit 101 as needed.

【００２３】意味抽出部１０４は、対話管理部１０６、
意味表現記憶部１０７の記憶内容を基に、音声認識部１
０１から出力された複数の各認識結果候補文内にキーワ
ードが存在するかどうかを判定し、検出されたキーワー
ドを意味出力部１０５に出力する。The meaning extracting unit 104 includes a dialog managing unit 106,
Based on the contents stored in the semantic expression storage unit 107, the speech recognition unit 1
It is determined whether a keyword exists in each of the plurality of recognition result candidate sentences output from No. 01, and the detected keyword is output to the meaning output unit 105.

【００２４】この際、一つの発声中に同時に存在し得な
いキーワードが複数個存在した場合は、各キーワードに
対する音声認識部１０１が出力したスコアの最も良いも
の１つを選択する。At this time, when there are a plurality of keywords that cannot be present simultaneously in one utterance, one of the best scores output by the speech recognition unit 101 for each keyword is selected.

【００２５】意味出力部１０５は、意味抽出部１０４よ
り渡された情報を加工して意味表現を作成し出力する。
さらに対話管理部１０６に、出力結果の情報を渡す。The meaning output unit 105 processes the information passed from the meaning extraction unit 104 to create and output a meaning expression.
Further, the output result information is passed to the dialog management unit 106.

【００２６】意味出力部１０５からの出力される意味表
現としては、例えばキーワード列、あらかじめ用意され
たフレーム中に得られたキーワードを入力したデータ等
がある。The semantic expression output from the meaning output unit 105 includes, for example, a keyword string, data obtained by inputting a keyword obtained in a frame prepared in advance, and the like.

【００２７】対話管理部１０６は、あらかじめ用意され
た対話の流れを記憶し、意味出力部１０５の出力に応じ
て、次のユーザーからの音声入力にはどのようなキーワ
ードが含まれるかという情報を、意味抽出部１０４に渡
す。The dialogue management unit 106 stores a flow of the dialogue prepared in advance, and, in accordance with the output of the meaning output unit 105, information on what kind of keyword is included in the voice input from the next user. To the meaning extraction unit 104.

【００２８】意味表現記憶部１０７は、各対話の状態毎
にユーザーが入力し得るキーワードを記憶し、記憶内容
を必要に応じて意味抽出部１０４に出力する。The semantic expression storage unit 107 stores a keyword that can be input by the user for each state of each conversation, and outputs the stored contents to the meaning extracting unit 104 as needed.

【００２９】次に、本発明の一実施例について、具体的
なデータに即して説明する。対話管理部１０６は、図３
に示すような処理フローでユーザの対話フローを管理す
るものとする。Next, an embodiment of the present invention will be described with reference to specific data. The dialog management unit 106
The user interaction flow is managed by the processing flow shown in FIG.

【００３０】ユーザは、まず、状態＜予約入力＞（２０
１）にて、席の種類、日時、枚数を入力し、次に、状態
＜認識結果確認＞（２０２）にて、認識結果に問題が無
いか確認し、問題が無ければ終了し、問題があれば、状
態＜予約入力＞（２０１）にて再度入力する。First, the user enters the state <reservation input> (20
In 1), enter the type of seat, date and time, and the number of seats. Then, in state <confirmation of recognition result> (202), check whether there is any problem in the recognition result. If there is no problem, the process ends. If there is, input again in the state <reservation input> (201).

【００３１】また、意味表現記憶部１０７には、図４に
示すようなキーワードが登録されているものとする。Also, it is assumed that keywords as shown in FIG. 4 are registered in the semantic expression storage unit 107.

【００３２】状態＜予約入力＞では、「今日」、「明
日」、「あさって」、「Ａ席」・・・をキーワードと
し、状態＜認識結果確認＞では、「はい」、「いいよ」
・・・をキーワードとする。In the state <reservation input>, "today", "tomorrow", "tomorrow", "seat A"... Are used as keywords, and in the state <recognition result confirmation>, "yes" and "good"
... is a keyword.

【００３３】図４にて、例えば「Ａ席」、「Ｂ席」、
「Ｓ席」は同一の属性＜席の種類＞に属するものとし、
同一の属性のものは、一発声中には１回しか存在しない
ものとする。In FIG. 4, for example, "A seat", "B seat",
"S seat" shall belong to the same attribute <seat type>
It is assumed that the attribute having the same attribute exists only once in one utterance.

【００３４】ユーザが、「明日のＡ席のチケットを２枚
下さい」と音声入力したものとする。It is assumed that the user voice-inputs "Please give two tickets for seat A tomorrow."

【００３５】この時、音声認識部１０１における認識結
果は、図２に示す通りになり、意味抽出部１０４は、こ
の結果が渡されたものとする。At this time, the recognition result in the voice recognition unit 101 is as shown in FIG. 2, and it is assumed that the result is passed to the meaning extraction unit 104.

【００３６】図２において、（）内の数字は音声認識部
１０１が出力したスコアを現わすものである。In FIG. 2, the numbers in parentheses indicate the scores output by the voice recognition unit 101.

【００３７】意味抽出部１０４は、対話管理部１０６、
意味表現記憶部１０７の記憶内容より、この認識結果候
補文内に存在するキーワードを調べる。The meaning extracting unit 104 includes a dialog managing unit 106,
A keyword existing in the recognition result candidate sentence is checked from the storage contents of the semantic expression storage unit 107.

【００３８】この時、状態が＜予約確認＞であるから、
「明日」、「Ａ席」、「２枚」、「Ｂ席」、「５枚」が
認識結果候補文内に含まれるキーワードであることが分
かる。At this time, since the state is <reservation confirmation>,
It can be seen that “tomorrow”, “A seat”, “2 seats”, “B seat”, and “5 seats” are keywords included in the recognition result candidate sentence.

【００３９】ここで、意味抽出部１０４は、「Ａ席」と
「Ｂ席」、「５枚」と「２枚」はそれぞれ同一の属性で
あるため、音声認識部１０１が出力したスコアが良い方
を選択し、「Ｂ席」、「５枚」を削除し、「明日」、
「Ａ席」、「２枚」を意味表現として出力し、対話管理
部１０６に、この情報を送る。対話管理部１０６では送
られた情報を元に対話の状態を＜認識結果確認＞とす
る。Here, since the meaning extraction unit 104 has the same attribute for "seat A" and "seat B" and "5 sheets" and "2 sheets", the score output by the speech recognition unit 101 is good. And delete "Seat B" and "5", "Tomorrow"
"Seat A" and "two sheets" are output as meaning expressions, and this information is sent to the dialog management unit 106. The dialog management unit 106 sets the state of the dialog as <confirmation of recognition result> based on the transmitted information.

【００４０】[0040]

【発明の効果】以上説明したように、本発明によれば、
次のような効果を奏する。As described above, according to the present invention,
The following effects are obtained.

【００４１】第１の効果は、入力音声の理解性能を改善
する、ということにある。The first effect is to improve the performance of understanding input speech.

【００４２】第２の効果は、誤認識の修正や音声の再入
力の手間などを削減する、ということにある。The second effect is to reduce the trouble of correcting erroneous recognition and re-inputting speech.

【００４３】その理由は、第１位の認識結果候補だけで
なく、全ての認識結果候補文中から含まれているキーワ
ードを抽出するため、正しくキーワードが選択される可
能性が向上するためである。The reason is that not only the first recognition result candidate but also the keywords included in all the recognition result candidate sentences are extracted, so that the possibility that the keyword is correctly selected is improved.

[Brief description of the drawings]

【図１】本発明の一実施例の構成を示す図である。FIG. 1 is a diagram showing a configuration of an embodiment of the present invention.

【図２】本発明の一実施例における認識結果候補の一具
体例示す図である。FIG. 2 is a diagram illustrating a specific example of a recognition result candidate according to an embodiment of the present invention.

【図３】本発明の一実施例における、対話フローの具体
例の一つを示す流れ図である。FIG. 3 is a flowchart showing one specific example of a dialog flow in one embodiment of the present invention.

【図４】本発明の一実施例における、キーワード記憶内
容の一具体例を示す図である。FIG. 4 is a diagram showing one specific example of keyword storage contents in one embodiment of the present invention.

[Explanation of symbols]

１０１音声認識部１０２認識用文法記憶部１０３認識用辞書記憶部１０４意味抽出部１０５意味出力部１０６対話管理部１０７意味表現記憶部 Reference Signs List 101 speech recognition unit 102 recognition grammar storage unit 103 recognition dictionary storage unit 104 meaning extraction unit 105 meaning output unit 106 dialog management unit 107 meaning expression storage unit

Claims

[Claims]

1. A speech understanding device for recognizing input speech and outputting a plurality of keywords representing a meaning intended by a user from a result of the recognition, comprising: A voice recognition unit that outputs a score corresponding to each keyword included in the recognition candidate sentence; a plurality of the keywords output from the voice recognition unit and a keyword that selects an appropriate keyword based on the score corresponding to the keyword A speech understanding device, comprising: selecting means.

2. The method according to claim 1, wherein said keyword selecting means selects one keyword based on a score for each keyword output from said voice recognizing means when there are a plurality of keywords which cannot be present simultaneously in one utterance. The speech understanding device according to claim 1, wherein

3. The speech comprehension apparatus according to claim 1, further comprising dialogue management means for switching a keyword to be accepted for each dialogue state.

4. A recognition grammar storage unit for storing a recognition grammar used for speech recognition, a recognition dictionary storage unit for storing a recognition dictionary used for speech recognition and information as to whether or not the word is a keyword. A semantic expression storage unit for storing a keyword that can be input by the user in association with attribute information for each state of each dialogue; a recognition grammar storage unit and the recognition dictionary for a voice input from a voice input unit A voice recognition unit that performs a voice recognition process with reference to a storage unit and outputs a plurality of recognition result candidate sentences obtained as a recognition result of the input voice and a score of each word included in the candidate sentences; Based on the storage contents of the semantic expression storage unit, determine whether there is a keyword in each of the plurality of recognition result candidate sentences output from the speech recognition means, and output a detected keyword, a meaning extraction means, And a meaning output means for processing the information output from the meaning extraction means to create and output a meaning expression.

5. A method according to claim 1, wherein said semantic extraction means includes a plurality of keywords which cannot be present simultaneously in one utterance in said recognition result candidate sentence. 5. The speech understanding device according to claim 4, wherein one of the best ones is selected.

6. Receiving information on the output result of said meaning output means, storing a flow of a dialog prepared in advance, and according to the output of said meaning output means, what keyword is included in the next voice input. Dialogue means for passing information to the meaning extraction means, the meaning extraction means being output from the speech recognition means based on the output from the dialogue management means and the contents stored in the meaning expression storage unit. 5. The speech understanding device according to claim 4, wherein it is determined whether or not there is a keyword in each of the plurality of recognition result candidate sentences, and the existing keyword is output.

7. A speech comprehension apparatus for performing, by a computer, a process of recognizing an input speech and outputting a plurality of keywords representing a meaning intended by a user based on a result of the recognition. A recognition grammar storage unit for storing grammar, a recognition dictionary for use in speech recognition and a recognition dictionary storage unit for storing information as to whether or not the word is a keyword, and a keyword that can be input by the user for each state of each dialogue And (a) performing voice recognition on the voice input by the user from the voice input unit with reference to the recognition grammar storage unit and the recognition dictionary storage unit. Voice recognition processing for outputting a plurality of recognition result candidate sentences obtained as a recognition result of the input speech and scores of each word included in the candidate sentences; and (b) storage contents of the semantic expression storage unit. Based on the above, it is determined whether or not there is a keyword in each of the plurality of recognition result candidate sentences output from the voice recognition process, and the detected keyword is output. At this time, the keyword may be present simultaneously in one utterance. If there are a plurality of missing keywords, a meaning extraction process for selecting one having the best score output by the voice recognition process for each keyword; and (c) processing information output from the meaning extraction process. A recording medium which stores a program for causing the computer to execute each of the above-described processes (a) to (c) of creating and outputting a semantic expression.

8. A speech comprehension apparatus for performing, by a computer, a process of recognizing input speech and outputting a plurality of keywords representing a meaning intended by a user based on a result of the recognition. A recognition grammar storage unit for storing grammar, a recognition dictionary for use in speech recognition and a recognition dictionary storage unit for storing information as to whether or not the word is a keyword, and a keyword that can be input by the user for each state of each dialogue (A) referring to the recognition grammar storage unit and the recognition dictionary storage unit for the voice input by the user from the voice input unit. Speech recognition processing for performing speech recognition and outputting a plurality of recognition result candidate sentences obtained as a result of recognition of the input speech and a score of each word included in the candidate sentences; Based on the content stored in the current storage unit, it is determined whether or not there is a keyword in each of the plurality of recognition result candidate sentences output from the voice recognition process, and the detected keyword is output. If there are a plurality of keywords that cannot be present at the same time during the utterance, a semantic extraction process of selecting one having the best score output by the speech recognition process for each keyword; and (c) a semantic extraction process. Processing the output information to create and output a semantic expression; and (d) receiving the output result information of the semantic output processing and receiving;
A dialog management process of storing a flow of a dialog prepared in advance, and passing information on what keyword is included in the next voice input to the meaning extraction process in accordance with the output of the meaning output means. The semantic extraction processing is based on the output from the dialog management processing and the storage contents of the semantic expression storage unit, and determines whether or not there is a keyword in each of the plurality of recognition result candidate sentences output from the speech recognition means. A storage medium for storing a program for outputting the detected keyword, and causing the computer to execute each of the processes (a) to (d).