JPH06202688A

JPH06202688A - Speech recognition device

Info

Publication number: JPH06202688A
Application number: JP4360221A
Authority: JP
Inventors: Ichiro Ujiie; 一朗氏家
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 1992-12-28
Filing date: 1992-12-28
Publication date: 1994-07-22

Abstract

PURPOSE:To increase the degree of freedom of vocalization and to improve the recognition rate of a speech by providing a case frame generating means which generates a new hypothesis for the vocalization contents of the speech on the basis of the analysis result of an analyzing means. CONSTITUTION:A purser 2 analyzes the recognition result of a speech recognition part 1 in vocalization units such as sentence units on the basis of a case frame representing the meaning of the hypothesis for the vocalization contents of the speech supplied from an interaction control part 3. The purser 2 repeats this process as to plural concept frames supplied from the interaction control part 3 and outputs the concept frame having the largest total scores described in an attribute value to the interaction control part 3 as the analysis result of the recognition result of the speech recognition part 1 once the description of the attribute values of respective slots of respective concept frames ends. The interaction control part 3 sets a hypothesis for the vocalization contents of a speech inputted to the speech recognition part on the basis of interaction control information stored previously in a domain knowledge dictionary 4 and generates a concept frame representing the meaning of the hypothesis.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声を認識する場合に
用いて好適な音声認識装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition apparatus suitable for recognizing speech.

【０００２】[0002]

【従来の技術】従来の音声認識装置においては、例えば
入力された音声をワードスポッティングし、その結果得
られた単語候補列を、あらかじめ用意された構文情報を
用いて、文単位で構文解析するようになされている。2. Description of the Related Art In a conventional voice recognition device, for example, an input voice is word-spotted, and a word candidate string obtained as a result is parsed on a sentence-by-sentence basis using syntax information prepared in advance. Has been done.

【０００３】そして、その構文解析結果に基づいて、単
語候補列から、誤った単語候補を除き、正しい文（文
章）が得られるようになされている。Then, based on the result of the syntactic analysis, a correct sentence (sentence) is obtained from the word candidate sequence by removing erroneous word candidates.

【０００４】[0004]

【発明が解決しようとする課題】ところで、話し言葉に
おいては、非文法的な文が使われる場合が多く（正しい
文法で発話される場合が少なく）、さらに不要語（例え
ば、文節間などで、間をとるために発声される「えー」
など）が頻繁に使われるため、構文解析が困難になり、
音声の認識率が劣化する課題があった。By the way, in spoken language, non-grammatical sentences are often used (less often spoken in correct grammar), and unnecessary words (for example, between clauses) are used. "Eh" is spoken to take
Is often used, which makes parsing difficult and
There was a problem that the recognition rate of voice deteriorates.

【０００５】そこで、非文法的な文や不要語が挿入され
た文などをすべて構文情報に登録しておく方法がある。Therefore, there is a method of registering all non-grammatical sentences and sentences in which unnecessary words are inserted in the syntax information.

【０００６】しかしながら、この方法では、構文情報が
莫大な量になり、構文解析に時間がかかる課題があっ
た。However, this method has a problem in that the amount of syntax information is enormous and the syntax analysis takes time.

【０００７】そこで、発話者の発話パターンを制限する
方法があるが、この場合、発話者の発話の自由度が小さ
くなり、発話者にわずらわしさを感じさせる課題があっ
た。Therefore, there is a method of limiting the utterance pattern of the utterer, but in this case, the degree of freedom of the utterance of the utterer becomes small, and there is a problem that the utterer feels annoyance.

【０００８】本発明は、このような状況に鑑みてなされ
たものであり、発話の自由度を向上させるとともに、音
声の認識率を向上させるものである。The present invention has been made in view of such circumstances, and improves the degree of freedom of speech and the recognition rate of voice.

【０００９】[0009]

【課題を解決するための手段】請求項１に記載の音声認
識装置は、音声を認識する認識手段としての音声認識部
１と、音声の発話内容の仮説をたて、仮説を意味表現す
るケースフレームを生成する生成手段としての対話管理
部３と、対話管理部３により生成されたケースフレーム
に基づいて、音声認識部１の認識結果を解析する解析手
段としてのパーサ２とを備え、対話管理部３が、パーサ
２の解析結果に基づいて、音声の発話内容の新たな仮説
を生成することを特徴とする。A speech recognition apparatus according to claim 1, wherein a speech recognition unit 1 as a recognition means for recognizing a speech and a hypothesis of speech utterance content are set to express the hypothesis semantically. The dialog management unit 3 as a generation unit that generates a frame, and the parser 2 as an analysis unit that analyzes the recognition result of the voice recognition unit 1 based on the case frame generated by the dialog management unit 3 are provided. It is characterized in that the unit 3 generates a new hypothesis of the utterance content of the voice based on the analysis result of the parser 2.

【００１０】請求項２に記載の音声認識装置は、ケース
フレームが、対話管理部３により生成される仮説を、所
定のキーワードを中心とした意味関係で表現することを
特徴とする。A speech recognition apparatus according to a second aspect of the present invention is characterized in that the case frame expresses the hypothesis generated by the dialogue management unit 3 in a semantic relationship centered on a predetermined keyword.

【００１１】請求項３に記載の音声認識装置は、ケース
フレームが、属性名と属性値の組で表現されることを特
徴とする。A speech recognition apparatus according to a third aspect of the present invention is characterized in that the case frame is represented by a set of an attribute name and an attribute value.

【００１２】[0012]

【作用】上記構成の音声認識装置においては、パーサ２
に、対話管理部３により生成された音声の発話内容の仮
説を意味表現するケースフレームに基づいて、音声認識
部１の音声の認識結果を解析させる。そして、対話管理
部３に、パーサ２の解析結果に基づいて、音声の発話内
容の新たな仮説を生成させる。従って、音声が、語順に
関係なく解析されるので、発話の自由度を大きくするこ
とができる。さらに、音声中に含まれる、例えば不要語
などの意味のない単語が無視されるので、音声の認識率
を向上させることができる。In the voice recognition device having the above structure, the parser 2
Then, the voice recognition result of the voice recognition unit 1 is analyzed based on the case frame that expresses the hypothesis of the utterance content of the voice generated by the dialogue management unit 3. Then, the dialogue management unit 3 is caused to generate a new hypothesis of the utterance content of the voice based on the analysis result of the parser 2. Therefore, since the voice is analyzed regardless of the word order, the degree of freedom of speech can be increased. Furthermore, since meaningless words such as unnecessary words contained in the voice are ignored, the voice recognition rate can be improved.

【００１３】[0013]

【実施例】図１は、本発明の音声認識装置を適用したＡ
Ｖシステム制御装置の一実施例の構成を示すブロック図
である。このＡＶシステム制御装置は、音声によってＡ
Ｖ機器システム７を操作することができるように構成さ
れている。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 is a schematic diagram of a voice recognition device according to the present invention.
It is a block diagram which shows the structure of one Example of a V system control apparatus. This AV system controller uses audio
It is configured so that the V device system 7 can be operated.

【００１４】即ち、音声認識部１は、単語のスポッティ
ング処理（ワードスポッティング処理）を、入力された
音声に対して施し、パーサ２から出力依頼された単語の
スポッティング結果、即ち入力された音声中から、パー
サ２より出力依頼された単語をスポッティングすること
ができなかった場合には、単語が存在しないという情報
を出力し、入力された音声中から、パーサ２より出力依
頼された単語をスポッティングすることができた場合に
は、スポッティングすることができた単語名、スコア、
並びにスポッティングすることができた単語の、音声中
の発声区間の始点および終点（以下、検出区間と記載す
る）を、パーサ２に出力する。That is, the voice recognition unit 1 performs word spotting processing (word spotting processing) on the input voice, and from the spotting result of the word output-requested from the parser 2, that is, from the input voice. , If the word output requested by the parser 2 cannot be spotted, the information that the word does not exist is output, and the word output requested by the parser 2 is spotted from the input voice. If you can, you can spot the word name, score,
In addition, the start point and the end point (hereinafter, referred to as a detection section) of the vocalization section in the voice of the word that can be spotted are output to the parser 2.

【００１５】ここで、スコアとは、スポッティングした
単語の尤度を意味し、その値が大きいほど、スポッティ
ングした単語が確からしいものとする。Here, the score means the likelihood of the spotted word, and the larger the value, the more likely the spotted word is.

【００１６】パーサ２は、例えば図２に示すように、対
話管理部３より供給される音声の発話内容の仮説を意味
表現するケースフレームが登録される発話仮説パターン
テーブル１１、単語に関する情報が登録されている単語
辞書１２（図３）、単語の概念が階層構造で記述された
単語シソーラス１３（図４）、解析中のケースフレーム
としての概念フレーム（図５）が登録されるプライオリ
ティキュー１４、および解析の終了した概念フレームが
登録される結果キュー１５から構成される。For example, as shown in FIG. 2, the parser 2 registers a utterance hypothesis pattern table 11 in which a case frame expressing the hypothesis of the utterance content of the voice supplied from the dialogue management unit 3 is registered, and information about words. A word dictionary 12 (FIG. 3), a word thesaurus 13 in which word concepts are described in a hierarchical structure (FIG. 4), and a priority queue 14 in which a concept frame (FIG. 5) as a case frame under analysis is registered. And a result queue 15 in which the concept frame whose analysis has been completed is registered.

【００１７】ここで、概念フレームとは、音声の発話内
容の仮説を、例えば動詞などの所定のキーワードを中心
とした意味関係で表現した、例えば図５に示すようなケ
ースフレームを意味する。図５に示すケースフレーム、
即ち概念フレームにおいては、音声の発話内容の仮説
が、キーワードとしての動詞を中心とした、その動詞の
意味する行為をおよぼす行為対象、その行為を行うため
の手段、その行為を行う方法、その行為を開始する行為
起点、およびその行為を終了する行為終点によって記述
されるようになされている。Here, the concept frame means a case frame, for example, as shown in FIG. 5, which expresses a hypothesis of speech utterance content in a semantic relationship centered on a predetermined keyword such as a verb. The case frame shown in FIG.
That is, in the concept frame, the hypothesis of the utterance content of the voice is an action target centering on the verb as a keyword, the action target that exerts the action that the verb means, a means for performing that action, a method for performing that action, and that action The action starting point that starts the action and the action ending point that ends the action are described.

【００１８】つまり、概念フレームでは、音声の発話内
容の仮説が、行為（動詞で表される行為）を中心とし
て、その行為を、何に対して、何によって、どのよう
に、どこからどこまでおよぼすかというように表現され
ることになる。In other words, in the concept frame, the hypothesis of the utterance content of the voice is centered on the action (action expressed by a verb), what the action is, what, by what, where and from where. Will be expressed as follows.

【００１９】さらに、概念フレームは、属性名と属性値
を組としたスロットによって表現される。Further, the concept frame is represented by a slot in which an attribute name and an attribute value are paired.

【００２０】即ち、図５に示す概念フレームは、属性名
としての「動詞」、「行為対象」、「手段」、「方
法」、「行為起点」、または「行為終点」と、各属性名
の属性値としての名前、スコア、および検出区間の組
（スロット）で表現される。That is, in the conceptual frame shown in FIG. 5, "verb", "action target", "means", "method", "action start point", or "action end point" as attribute names, and It is expressed by a set (slot) of a name as an attribute value, a score, and a detection section.

【００２１】つまり、図５に示す概念フレームにおいて
は、例えば属性名「動詞」のスロットは、そのスロット
に記述された動詞としての名前（例えば、”録画する”
や”再生する”など）、その「名前」が音声認識部１
で、入力された音声からワードスポッティング処理され
て出力されたスコア、および検出区間（始点と終点）か
らなる属性値を有することになる。That is, in the conceptual frame shown in FIG. 5, for example, the slot having the attribute name "verb" is the name as the verb described in the slot (for example, "record").
, "Play", etc., and its "name" is the voice recognition unit 1.
Then, the input voice has a score output by word spotting processing and an attribute value including a detection section (start point and end point).

【００２２】さらに、各スロットの属性名は、名詞の部
分と、助詞または助動詞の部分に分けられ、それぞれに
ついて属性値が記述される。Further, the attribute name of each slot is divided into a noun part and a particle or auxiliary verb part, and the attribute value is described for each part.

【００２３】即ち、例えば属性名「動詞」のスロット
に”録画する”などが記述された場合、”録画する”
が、名詞部分としての”録画”と、助動詞部分として
の”する”に分けられ、それぞれについて属性値が記述
される。That is, for example, when "record" is described in the slot of the attribute name "verb", "record"
Is divided into "record" as a noun part and "do" as an auxiliary verb part, and the attribute value is described for each.

【００２４】なお、図５において、＜＞で囲んであるも
のは、実際の値（または単語）が記述されることを意味
する。In FIG. 5, what is surrounded by <> means that an actual value (or word) is described.

【００２５】また、属性名「ｉｄ」のスロットには、概
念フレームのｉｄが属性値として記述されるとともに、
属性名「スコア」のスロットには、属性名「動詞」、
「行為対象」、「手段」、「方法」、「行為起点」、ま
たは「行為終点」それぞれの属性値「スコア」に記述さ
れたスコアの合計スコアが属性値として記述されるよう
になされている。In the slot having the attribute name "id", the id of the conceptual frame is described as an attribute value, and
In the slot with the attribute name "score", the attribute name "verb",
The total score of the score described in the attribute value “score” of each “action target”, “means”, “method”, “action start point”, or “action end point” is described as the attribute value. .

【００２６】さらに、属性名「検出区間」のスロットに
は、属性名「動詞」、「行為対象」、「手段」、「方
法」、「行為起点」、または「行為終点」それぞれの属
性値「検出区間」に記述された検出区間すべてが属性値
として記述されるとともに、属性名「区間長」のスロッ
トには、属性名「動詞」、「行為対象」、「手段」、
「方法」、「行為起点」、または「行為終点」それぞれ
の属性値「検出区間」に記述された検出区間の長さ（＝
終点−始点）の合計値が属性値として記述されるように
なされている。Further, in the slot of the attribute name "detection section", the attribute value "verb", "action target", "means", "method", "action start point", or "action end point" is set to the attribute value " All the detection sections described in "Detection section" are described as attribute values, and the attribute name "verb", "action target", "means",
The length of the detection section described in the attribute value “detection section” of each of “method”, “action start point”, or “action end point” (=
The total value of (end point-start point) is described as an attribute value.

【００２７】また、図５の概念フレームにおいては、助
詞および助動詞を両方含めて助詞と図示してある。Further, in the conceptual frame of FIG. 5, both particles and auxiliary verbs are shown as particles.

【００２８】パーサ２（図２）は、対話管理部３より供
給される音声の発話内容の仮説を意味表現するケースフ
レーム（概念フレーム）に基づいて、音声認識部１の認
識結果を、例えば文単位などの所定の発話単位で解析す
る。The parser 2 (FIG. 2) uses the case frame (conceptual frame), which is supplied from the dialogue management unit 3 and expresses the hypothesis of the utterance content of the voice, as a result of recognition by the voice recognition unit 1, for example, a sentence. Analyze in a predetermined utterance unit such as a unit.

【００２９】即ち、パーサ２は、対話管理部３より供給
される音声の発話内容の仮説を意味表現する概念フレー
ムの、各スロットの属性値「名前」に記述された単語を
音声認識部１に順次出力し、入力された音声中からの、
その単語のスポッティング処理を依頼する。That is, the parser 2 causes the voice recognition unit 1 to input the word described in the attribute value “name” of each slot of the conceptual frame that expresses the hypothesis of the utterance content of the voice supplied from the dialogue management unit 3. It outputs sequentially and from the input voice,
Request spotting processing for the word.

【００３０】そして、パーサ２は、音声認識部１より出
力される、ワードスポッティングされた単語、その単語
のスコア、およびその単語の検出区間（始点および終
点）を受信し、スコアまたは検出区間を、ワードスポッ
ティングされた単語を属性値「名前」として有するスロ
ットの属性値「スコア」または「検出区間」にそれぞれ
記述する。The parser 2 receives the word-spotted word, the score of the word, and the detection section (start point and end point) of the word output from the voice recognition unit 1, and outputs the score or the detection section as Described in the attribute value “score” or “detection section” of the slot having the word spotted word as the attribute value “name”, respectively.

【００３１】パーサ２は、対話管理部３より供給される
複数の概念フレームについて、上述の処理を繰り返し、
各概念フレームの、各スロットの属性値の記述を終了す
ると、例えば属性名「スコア」のスロットの属性値、即
ち属性名「動詞」、「行為対象」、「手段」、「方
法」、「行為起点」、または「行為終点」それぞれの属
性値「スコア」に記述されたスコアの、例えば合計スコ
アの最も大きい概念フレームを、音声認識部１の認識結
果の解析結果として、対話管理部３に出力する。The parser 2 repeats the above processing for a plurality of conceptual frames supplied from the dialogue management unit 3,
When the description of the attribute value of each slot in each concept frame is completed, for example, the attribute value of the slot with the attribute name "score", that is, the attribute name "verb", "action target", "means", "method", "action" The concept frame having the largest total score, for example, of the scores described in the attribute value “score” of each of “start point” or “action end point” is output to the dialogue management unit 3 as the analysis result of the recognition result of the voice recognition unit 1. To do.

【００３２】対話管理部３（図１）は、ドメイン知識辞
書４にあらかじめ記憶された対話管理情報を参照し、対
話の流れを管理するとともに、パーサ２の解析結果（パ
ーサ２より出力される概念フレームの意味表現）に基づ
いて、ＡＶ機器システム７を制御する。The dialogue management unit 3 (FIG. 1) refers to the dialogue management information stored in advance in the domain knowledge dictionary 4, manages the flow of dialogue, and analyzes the parser 2 (concept output from the parser 2). The AV device system 7 is controlled based on the semantic representation of the frame).

【００３３】即ち、対話管理部３は、ドメイン知識辞書
４にあらかじめ記憶された対話管理情報、またはパーサ
２より出力される概念フレームの意味表現に基づいて、
音声認識部１に入力される音声の発話内容の仮説をた
て、その仮説を意味表現する概念フレームを生成する。That is, the dialogue management unit 3 determines, based on the dialogue management information stored in advance in the domain knowledge dictionary 4, or the semantic representation of the conceptual frame output from the parser 2.
A hypothesis of the speech content of the voice input to the voice recognition unit 1 is set, and a concept frame that expresses the hypothesis is generated.

【００３４】さらに、対話管理部３は、ドメイン知識辞
書４にあらかじめ記憶された対話管理情報を参照し、パ
ーサ２より出力される概念フレームの意味表現に対する
返事、即ち音声認識部１に入力された音声の発話内容に
対する返事の意味表現を生成して自然言語生成部５に出
力する。Further, the dialogue management unit 3 refers to the dialogue management information stored in the domain knowledge dictionary 4 in advance, and replies to the semantic representation of the conceptual frame output from the parser 2, that is, is input to the voice recognition unit 1. A meaning expression of a reply to the utterance content of the voice is generated and output to the natural language generation unit 5.

【００３５】また、対話管理部３は、パーサ２より出力
される概念フレームの意味表現に対応する動作をＡＶ機
器システム７に行わせるコマンド（制御命令）をＡＶ機
器システム７に出力する。The dialogue management section 3 also outputs to the AV equipment system 7 a command (control command) for causing the AV equipment system 7 to perform an operation corresponding to the semantic expression of the conceptual frame output from the parser 2.

【００３６】ドメイン知識辞書４には、対話を管理する
ための対話管理情報（例えば、問いかけに対する返事の
パターンや、現在の発話に続く、次の発話のパターンな
ど）があらかじめ記憶されている。The domain knowledge dictionary 4 stores in advance dialogue management information for managing dialogue (for example, a reply pattern to a question and a next utterance pattern following the current utterance).

【００３７】自然言語生成部５は、対話管理部３より出
力される概念フレームの意味表現に基づいて、テキスト
データを生成し、テキスト音声合成部６に出力する。テ
キスト音声合成部６は、自然言語生成部５より出力され
るテキストデータに基づいて、合成音声を生成し、内蔵
するスピーカ（図示せず）から出力する。ＡＶ機器シス
テム７は、少なくとも１つのＡＶ機器から構成され、対
話管理部３より出力されるコマンドに対応して動作す
る。The natural language generation unit 5 generates text data based on the semantic expression of the conceptual frame output from the dialogue management unit 3 and outputs it to the text-to-speech synthesis unit 6. The text-to-speech synthesis unit 6 generates synthetic speech based on the text data output from the natural language generation unit 5, and outputs it from a built-in speaker (not shown). The AV device system 7 is composed of at least one AV device and operates in response to a command output from the dialogue management unit 3.

【００３８】以上のように構成されるＡＶシステム制御
装置においては、まず対話管理部３において、入力され
る音声の発話内容の仮説がたてられ、その仮説を意味表
現するように、概念フレーム（図５）の各スロットの属
性値が記述されて（各スロットが埋められて）、パーサ
２に出力される。パーサ２（図２）において、対話管理
部３からの概念フレームが、発話仮説パターンテーブル
１１に記憶される。In the AV system control device configured as described above, the dialogue management unit 3 first makes a hypothesis of the utterance content of the input voice, and the concept frame ( The attribute value of each slot in FIG. 5) is described (each slot is filled) and output to the parser 2. In the parser 2 (FIG. 2), the conceptual frame from the dialogue management unit 3 is stored in the utterance hypothesis pattern table 11.

【００３９】なお、対話管理部３においては、発話され
る可能性のあるすべての仮説がたてられ、その仮説を意
味表現するすべての概念フレームが順次出力されるよう
になされている。従って、パーサ２においては、対話管
理部３から出力されるすべての概念フレームが発話仮説
パターンテーブル１１に順次記憶されることになる。In the dialogue management unit 3, all hypotheses that are likely to be uttered are set, and all concept frames that express the hypotheses are output sequentially. Therefore, in the parser 2, all concept frames output from the dialogue management unit 3 are sequentially stored in the utterance hypothesis pattern table 11.

【００４０】さらに、対話管理部３においては、概念フ
レームのスロットすべてが埋められるのではなく（すべ
てが埋められる場合もあるが）、たてられた仮説を意味
表現するだけのスロットが埋められる。Further, in the dialogue management unit 3, not all slots of the concept frame are filled (although all of them may be filled), but only slots for expressing the hypothesis made are expressed.

【００４１】また、音声認識部１に、最初に音声が入力
される場合、即ち発話が開始される場合、対話管理部３
においては、属性名「動詞」のスロットの属性値だけが
記述され、他の属性値が記述されていない概念フレーム
（以下、白紙概念フレームと記載する）がパーサ２に順
次出力されるようになされている。When a voice is first input to the voice recognition unit 1, that is, when utterance is started, the dialogue management unit 3
In this case, only the attribute value of the slot having the attribute name "verb" is described, and the conceptual frames (hereinafter referred to as blank conceptual frames) in which other attribute values are not described are sequentially output to the parser 2. ing.

【００４２】ここで、以下、白紙概念フレームも含め、
対話管理部３からパーサ２に出力されるすべての概念フ
レームを初期概念フレーム群と記載する。Hereafter, including the blank conceptual frame,
All concept frames output from the dialogue management unit 3 to the parser 2 will be referred to as an initial concept frame group.

【００４３】対話管理部３から初期概念フレーム群が出
力されると、パーサ２において、単語辞書１２が参照さ
れ、初期概念フレームのスロットのうち、属性値が記述
されていないスロットが埋められる。When the initial concept frame group is output from the dialogue management unit 3, the parser 2 refers to the word dictionary 12 and fills the slots of the initial concept frame in which the attribute values are not described.

【００４４】ここで、単語辞書１２には、例えば図３に
示すように単語が登録されている。Here, words are registered in the word dictionary 12 as shown in FIG. 3, for example.

【００４５】即ち、単語辞書１２においては、例えば”
再生する”という動作（行為）の対象となり得る（属性
名「行為対象」のスロットの属性値として記述される可
能性のある）”レーザーディスク”などは、”Ｌ
Ｄ”、”レーザー”、”ビデオディスク”、または”デ
ィスク”と発声される可能性があるので、単語”Ｌ
Ｄ”、”レーザー”、”ビデオディスク”、および”デ
ィスク”の代表語とされている。That is, in the word dictionary 12, for example, ""
A "laser disk" that can be the target of the action (act) of "play" (which may be described as the attribute value of the slot with the attribute name "action target") is "L".
The word "L" as it is possible to say "D", "laser", "video disc", or "disc".
It is a representative term for "D", "laser", "video disc", and "disc".

【００４６】なお、単語辞書１２に登録されているすべ
ての単語は、例えば図４に示すような単語の概念が階層
構造で記述された単語シソーラス１３の末端のいずれか
に記述されており、従って、単語辞書１２において、単
語”ＬＤ”、”レーザー”、”ビデオディスク”、およ
び”ディスク”の代表語としての単語”レーザーディス
ク”の分類が、再生専用映像媒体となっているのは、単
語”レーザーディスク”が単語シソーラス１３（図４）
の再生専用映像媒体の末端に記述されていることを意味
する。All the words registered in the word dictionary 12 are described at any one of the ends of the word thesaurus 13 in which the concept of the word as shown in FIG. 4 is described in a hierarchical structure. In the word dictionary 12, the words "LD", "laser", "video disk", and the word "laser disk" as a representative word of "disk" are classified as read-only video media. "Laser disc" is the word thesaurus 13 (Fig. 4)
Means that it is described at the end of the playback-only video medium.

【００４７】よって、単語”レーザーディスク”を代表
語とする単語”ＬＤ”、”レーザー”、”ビデオディス
ク”、および”ディスク”も、単語シソーラス１３（図
４）の再生専用映像媒体の末端に記述されている。Therefore, the words "LD", "laser", "video disk", and "disk" typified by the word "laser disk" are also added to the end of the read-only video medium of the word thesaurus 13 (FIG. 4). It has been described.

【００４８】さらに、単語辞書１２（図３）において
は、例えば単語”録画”などの、属性名「動詞」のスロ
ットの属性値となり得る単語に関しての意味関係が登録
されている。Further, in the word dictionary 12 (FIG. 3), a semantic relationship is registered with respect to words that can be attribute values of the slot of the attribute name "verb", such as the word "record".

【００４９】即ち、例えば単語”録画”に関しては、”
録画”（録画する）という行為を、どのような行為対象
に対して、どのような手段によって、どのような方法
で、どの行為起点からどの行為終点までおよぼすかとい
う意味関係が記述されている。That is, for the word "record", for example, "
It describes the semantic relation of the act of "recording" (recording) with respect to what kind of action target, by what means, by what method, and from which action start point to which action end point.

【００５０】さらに、単語”録画”などの、属性名「動
詞」のスロットの属性値となり得る単語においても、上
述の単語”レーザーディスク”における場合と同様に、
それを代表語とする単語が記述されているとともに、単
語シソーラス１３（図４）のどの階層の末端に属するか
が記述されている。Further, even in words such as the word "record" that can be the attribute value of the slot of the attribute name "verb", as in the case of the above-mentioned word "laser disk",
A word having that as a representative word is described, and which hierarchy of the word thesaurus 13 (FIG. 4) belongs to is described.

【００５１】また、この単語”録画”などの、属性名
「動詞」のスロットの属性値となり得る単語に関して
は、その活用に関する情報（文法情報）も記述されてい
る。Information (grammar information) on the use of words such as the word "record" that can be the attribute value of the slot of the attribute name "verb" is also described.

【００５２】従って、対話管理部３から初期概念フレー
ムが出力されると、パーサ２において、単語辞書１２
（図３）が参照され、初期概念フレーム（図５）の属性
名「動詞」のスロットの属性値に記述されている単語が
検出される。そして、単語辞書１２に記述されているそ
の単語（属性名「動詞」のスロットの属性値）の意味関
係に基づいて、初期概念フレームのスロットのうち、属
性値が記述されていないスロットの属性値が記述され
る。Therefore, when the initial concept frame is output from the dialogue management section 3, the parser 2 outputs the word dictionary 12
(FIG. 3) is referred to, and the word described in the attribute value of the slot of the attribute name “verb” in the initial concept frame (FIG. 5) is detected. Then, based on the semantic relationship of the word (the attribute value of the slot of the attribute name "verb") described in the word dictionary 12, the attribute value of the slot in which the attribute value is not described among the slots of the initial concept frame. Is described.

【００５３】即ち、初期概念フレームが、白紙概念フレ
ームである場合、つまり、初期概念フレームの属性名
「動詞」のスロットの属性値に、例えば単語”録画”だ
けが記述されていた場合、パーサ２において、図３に示
す単語辞書１２の単語”録画”の記述が参照され、図６
に示すように初期概念フレームのスロットのうち、属性
値の記述されていないスロットが埋められる。That is, when the initial concept frame is a blank concept frame, that is, when only the word "record" is described in the attribute value of the slot of the attribute name "verb" of the initial concept frame, the parser 2 6, the description of the word “record” in the word dictionary 12 shown in FIG.
As shown in, the slots in which the attribute value is not described among the slots of the initial concept frame are filled.

【００５４】一方、音声認識部１に音声が入力される
と、そこでワードスポッティング処理が行われる。そし
て、音声認識部１において、ワードスポッティング処理
が終了すると、そこに入力された音声の発話時間ととも
に、入力された音声に対するワードスポッティング処理
が終了したことを知らせるための制御信号がパーサ２に
出力される。On the other hand, when a voice is input to the voice recognition unit 1, word spotting processing is performed there. Then, when the word spotting process is completed in the voice recognition unit 1, a control signal for notifying that the word spotting process for the input voice is completed is output to the parser 2 together with the utterance time of the voice input therein. It

【００５５】パーサ２（図２）において、音声認識部１
からの制御信号が受信されると、まず最初に、初期概念
フレームの属性名「動詞」のスロットの属性値に記述さ
れた単語が音声認識部１に出力され、入力された音声中
からの、その単語のスポッティング処理結果を出力する
ように出力依頼が行われる。In the parser 2 (FIG. 2), the voice recognition unit 1
When the control signal from is received, first, the word described in the attribute value of the slot of the attribute name "verb" of the initial concept frame is output to the voice recognition unit 1, and from the input voice, An output request is made to output the spotting processing result of the word.

【００５６】音声認識部１において、パーサ２から単語
が出力されるとともに、その単語のスポッティング結果
の出力依頼があると、その単語名、スコア、並びにその
単語を検出（スポッティング）することができた検出区
間（始点および終点）が、パーサ２に出力される。In the voice recognition unit 1, when a word is output from the parser 2 and an output request for the spotting result of the word is made, the word name, the score, and the word can be detected (spotting). The detection section (start point and end point) is output to the parser 2.

【００５７】パーサ２において、音声認識部１より出力
された、ワードスポッティングされた単語、その単語の
スコア、およびその単語の検出区間（始点および終点）
が受信され、スコアまたは検出区間が、ワードスポッテ
ィングされた単語を属性値「名前」として有するスロッ
ト（この場合、属性名「動詞」のスロット（図５））の
属性値「スコア」または「検出区間」にそれぞれ記述さ
れる。In the parser 2, the word-spotted word output from the speech recognition unit 1, the score of the word, and the detection section (start point and end point) of the word.
Is received, and the score or the detection section has the attribute value “score” or “detection section” of the slot having the word-spotted word as the attribute value “name” (in this case, the slot of the attribute name “verb” (FIG. 5)). ], Respectively.

【００５８】以上のようにして、属性名「動詞」のスロ
ットの属性値「スコア」および「検出区間」が記述され
た概念フレームは、プライオリティキュー１４に順次転
送されて記憶される。As described above, the concept frames in which the attribute values "score" and "detection section" of the slot having the attribute name "verb" are described are sequentially transferred to and stored in the priority queue 14.

【００５９】プライオリティキュー１４では、概念フレ
ームが、その属性名「スコア」（図５）のスロットの属
性値、即ち属性名「動詞」、「行為対象」、「手段」、
「方法」、「行為起点」、または「行為終点」それぞれ
の属性値「スコア」に記述されたスコアの合計スコアの
昇順にソートされて記憶される。In the priority queue 14, the concept frame has the attribute value of the slot having the attribute name "score" (FIG. 5), that is, the attribute name "verb", "action target", "means",
It is sorted and stored in ascending order of the total score of the scores described in the attribute value “score” of each of “method”, “action starting point”, and “action end point”.

【００６０】対話管理部３から供給されたすべての初期
概念フレームについて以上の処理が終了すると、プライ
オリティキュー１４の先頭に記憶されている概念フレー
ムから順次読み出される。When the above processing is completed for all initial concept frames supplied from the dialogue management unit 3, the concept frames stored at the head of the priority queue 14 are sequentially read.

【００６１】そして、パーサ２において、単語シソーラ
ス１３が参照され、概念フレームの、属性値が記述され
ていないスロットうち、優先順位の最も高いスロットの
属性値に、具体的な単語が記述される。In the parser 2, the word thesaurus 13 is referred to, and a specific word is described in the attribute value of the highest priority slot among the slots in which the attribute value is not described in the concept frame.

【００６２】即ち、例えば図６に示すように、現在、概
念フレームの属性名「動詞」のスロットの属性値だけに
具体的な単語が記述されている場合、具体的な単語が記
述されていないスロットのうち、最も優先順位の高い、
例えば属性名「行為対象」のスロットの属性値に、（映
像媒体映像表示機器）に代えて、単語シソーラス１３
（図４）の映像媒体および映像表示機器の階層の末端に
記述されている単語（例えば、”レーザーディスク”
や”ビデオテープ”、”レーザーディスクプレー
ヤ”、”ビデオテープ装置”（いずれも図示せず）な
ど）が記述される。That is, for example, as shown in FIG. 6, when a specific word is currently described only in the attribute value of the slot of the attribute name "verb" of the concept frame, the specific word is not described. Of the slots, the highest priority,
For example, instead of (video medium video display device), the word thesaurus 13
A word (for example, "laser disk") described at the end of the hierarchy of the video medium and the video display device (Fig. 4).
Or "video tape", "laser disc player", "video tape device" (neither shown), etc. are described.

【００６３】すると、パーサ２において、属性名「行為
対象」のスロットの属性値に、（映像媒体映像表示機
器）に代えて記述された単語が音声認識部１に出力さ
れ、入力された音声中からの、その単語のスポッティン
グ処理結果を出力するように出力依頼が行われる。Then, in the parser 2, the word described in place of (video medium video display device) in the attribute value of the slot of the attribute name "action target" is output to the voice recognition unit 1 and the input voice An output request is made to output the spotting processing result of the word from.

【００６４】音声認識部１において、パーサ２から単語
が出力されるとともに、その単語のスポッティング結果
の出力依頼があると、その単語名、スコア、並びにその
単語を検出（スポッティング）することができた検出区
間（始点および終点）が、パーサ２に出力される。In the voice recognition unit 1, when a word is output from the parser 2 and an output request for the spotting result of the word is made, the word name, the score, and the word can be detected (spotting). The detection section (start point and end point) is output to the parser 2.

【００６５】パーサ２において、音声認識部１より出力
された、ワードスポッティングされた単語、その単語の
スコア、およびその単語の検出区間（始点および終点）
が受信され、スコアまたは検出区間が、ワードスポッテ
ィングされた単語を属性値「名前」として有するスロッ
ト（この場合、属性名「行為対象」のスロット（図
５））の属性値「スコア」または「検出区間」にそれぞ
れ記述される。In the parser 2, the word-spotted word output from the voice recognition unit 1, the score of the word, and the detection section (start point and end point) of the word.
Is received, and the score or detection interval has the attribute value "score" or "detection" of the slot having the word-spotted word as the attribute value "name" (in this case, the slot having the attribute name "action target" (FIG. 5)). Section ”.

【００６６】以上のようにして、属性名「行為対象」の
スロットの属性値「スコア」および「検出区間」が記述
された概念フレームは、プライオリティキュー１４に順
次転送されて記憶される。As described above, the conceptual frames in which the attribute values "score" and "detection section" of the slot having the attribute name "action target" are described are sequentially transferred to and stored in the priority queue 14.

【００６７】以下、同様の処理を繰り返し、概念フレー
ムの属性名「動詞」、「行為対象」、「手段」、「方
法」、「行為起点」、および「行為終点」のスロットの
すべての属性値「スコア」および「検出区間」が記述さ
れる。Thereafter, the same processing is repeated, and all attribute values of the slots of the concept frame attribute names "verb", "action target", "means", "method", "action start point", and "action end point". “Score” and “detection section” are described.

【００６８】なお、音声認識部１において、パーサ２か
ら出力された単語が、入力された音声中からスポッティ
ングすることができなかった場合、概念フレームの、対
応するスロットの属性値「スコア」および「検出区間」
には、その旨が記述される。In the speech recognition unit 1, when the word output from the parser 2 cannot be spotted in the input speech, the attribute values "score" and "slot" of the corresponding slot in the concept frame. Detection section "
Indicates that.

【００６９】属性名「動詞」、「行為対象」、「手
段」、「方法」、「行為起点」、並びに「行為終点」の
スロットのすべての属性値「スコア」および「検出区
間」が記述された概念フレームは、プライオリティキュ
ー１４から結果キュー１５へ転送されて記憶される。All attribute values "score" and "detection section" of the slot of the attribute name "verb", "action target", "means", "method", "action start point", and "action end point" are described. The concept frame is transferred from the priority queue 14 to the result queue 15 and stored therein.

【００７０】結果キュー１５では、プライオリティキュ
ー１４における場合と同様に、概念フレームが、その属
性名「スコア」（図５）のスロットの属性値、即ち属性
名「動詞」、「行為対象」、「手段」、「方法」、「行
為起点」、または「行為終点」それぞれの属性値「スコ
ア」に記述されたスコアの合計スコアの昇順にソートさ
れて記憶される。In the result queue 15, as in the case of the priority queue 14, the concept frame has the attribute value of the slot with the attribute name "score" (FIG. 5), that is, the attribute name "verb", "action target", " It is sorted and stored in ascending order of the total score of the scores described in the attribute value “score” of each of “means”, “method”, “action starting point”, or “action end point”.

【００７１】以上の処理を繰り返し、パーサ２におい
て、プライオリティキュー１４に記憶されていた概念フ
レームがすべて結果キュー１５に転送されて記憶される
と、各概念フレームの属性名「動詞」、「行為対象」、
「手段」、「方法」、「行為起点」、または「行為終
点」のスロットの名詞部分（図５）の属性値に記述され
たそれぞれの単語に付く助詞または助動詞（以下、両方
含めて助詞部分と記載する）（図５）のスポッティング
結果の出力依頼が、音声認識部１に対して行われる。When all the concept frames stored in the priority queue 14 are transferred to and stored in the result queue 15 in the parser 2 by repeating the above process, the attribute names "verb" and "action target" of each concept frame are stored. ",
Particles or auxiliary verbs attached to each word described in the attribute value of the noun part (FIG. 5) of the slot of “means”, “method”, “starting point of action”, or “end point of action” (hereinafter, the particle part including both) Is output) (FIG. 5) is issued to the voice recognition unit 1.

【００７２】すると、音声認識部１において、概念フレ
ームの属性名「動詞」、「行為対象」、「手段」、「方
法」、「行為起点」、または「行為終点」のスロットの
名詞部分（図５）の属性値に記述されたそれぞれの単語
に付く助詞部分（図５）のスポッティング結果（スコア
および検出区間）がパーサ２に出力され、パーサ２にお
いて、そのスポッティング結果が、概念フレームの属性
名「動詞」、「行為対象」、「手段」、「方法」、「行
為起点」、または「行為終点」のスロットの助詞部分
（図５）の属性値「スコア」および「検出区間」に記述
される。Then, in the voice recognition unit 1, the noun part of the slot of the attribute name "verb", "action target", "means", "method", "action start point", or "action end point" of the concept frame (Fig. 5) The spotting result (score and detection section) of the particle part (Fig. 5) attached to each word described in the attribute value of 5) is output to the parser 2, and the spotting result is the attribute name of the conceptual frame. Described in the attribute values "score" and "detection section" of the particle part (Fig. 5) of the slot of "verb", "object of action", "means", "method", "starting point of action", or "end point of action". It

【００７３】以上の処理後、結果キュー１５では、概念
フレームが、その属性名「スコア」（図５）のスロット
の属性値、即ち属性名「動詞」、「行為対象」、「手
段」、「方法」、「行為起点」、または「行為終点」の
名詞部分および助詞部分のすべての属性値「スコア」に
記述されたスコアの合計スコアの昇順にソートされて記
憶される。After the above processing, in the result queue 15, the concept frame shows that the attribute value of the slot of the attribute name "score" (FIG. 5), that is, the attribute name "verb", "action target", "means", " It is sorted and stored in the ascending order of the total scores of the scores described in all the attribute values “score” of the noun part and particle part of “method”, “action starting point”, or “action ending point”.

【００７４】そして、結果キュー１５の先頭に記憶され
た概念フレームが、対話管理部３で生成された音声の発
話内容の仮説に基づく、音声認識部１の認識結果の解析
結果として、対話管理部３に出力される。Then, the concept frame stored at the head of the result queue 15 is used as the analysis result of the recognition result of the voice recognition unit 1 based on the hypothesis of the utterance content of the voice generated by the conversation management unit 3, as a result of the dialogue management unit. 3 is output.

【００７５】パーサ２から解析結果としての概念フレー
ムが出力されると、対話管理部３において、ドメイン知
識辞書４にあらかじめ記憶された対話管理情報が参照さ
れ、パーサ２より出力された概念フレームの意味表現に
対する返事、即ち音声認識部１に入力された音声の発話
内容に対する返事の意味表現が生成されて自然言語生成
部５（図１）に出力される。When the parser 2 outputs the conceptual frame as the analysis result, the dialog management unit 3 refers to the dialog management information stored in advance in the domain knowledge dictionary 4, and the meaning of the conceptual frame output from the parser 2 is referred to. A semantic expression of a reply to the expression, that is, a reply to the utterance content of the voice input to the voice recognition unit 1 is generated and output to the natural language generation unit 5 (FIG. 1).

【００７６】自然言語生成部５（図１）において、対話
管理部３より出力された概念フレームの意味表現に基づ
いて、テキストデータが生成され、テキスト音声合成部
６に出力される。テキスト音声合成部６において、自然
言語生成部５より出力されたテキストデータに基づい
て、合成音声が生成され、内蔵するスピーカから出力さ
れる。In the natural language generation unit 5 (FIG. 1), text data is generated based on the semantic expression of the conceptual frame output from the dialogue management unit 3 and output to the text-to-speech synthesis unit 6. In the text-to-speech synthesis unit 6, synthetic speech is generated based on the text data output from the natural language generation unit 5, and output from the built-in speaker.

【００７７】同時に、対話管理部３において、パーサ２
より出力された概念フレームの意味表現に対応する動作
をＡＶ機器システム７に行わせるコマンド（制御命令）
がＡＶ機器システム７に出力される。At the same time, in the dialogue management unit 3, the parser 2
A command (control command) that causes the AV equipment system 7 to perform an operation corresponding to the semantic expression of the conceptual frame output from
Is output to the AV device system 7.

【００７８】ＡＶ機器システム７においては、対話管理
部３より出力されたコマンドに対応する動作が行われ
る。In the AV equipment system 7, the operation corresponding to the command output from the dialogue management section 3 is performed.

【００７９】即ち、パーサ２から対話管理部３に、例え
ば”ＣＤ再生”という意味表現の概念フレームが出力さ
れた場合、ＡＶ機器システム７においては、ＣＤの再生
が開始され、テキスト音声合成部６からは、”ＣＤ再
生”に対する返事としての、例えば”ＣＤ再生を開始し
ました”などの合成音が出力されることになる。That is, when the parser 2 outputs, for example, the conceptual frame of the meaning expression “CD reproduction” to the dialogue management unit 3, the AV device system 7 starts the reproduction of the CD and the text-to-speech synthesis unit 6 Will output a synthesized sound such as "CD playback has started" as a reply to "CD playback".

【００８０】なお、この後、対話管理部３においては、
ドメイン知識辞書４とともに、パーサ２より出力された
概念フレームの意味表現（上述の場合においては、”Ｃ
Ｄ再生”）に基づいて、次に発声される音声の発話内容
の仮説がたてられるようになる。After that, in the dialogue management unit 3,
Along with the domain knowledge dictionary 4, the semantic representation of the concept frame output from the parser 2 (in the above case, "C
Based on "D reproduction"), the hypothesis of the utterance content of the next uttered voice can be established.

【００８１】即ち、この場合、対話管理部３では、”Ｃ
Ｄ再生”の次に発声される音声の発話内容の仮説が、例
えば”ＣＤ停止”や”ＣＤ早送り”などのようにたてら
れる。That is, in this case, the dialogue management unit 3 selects "C".
A hypothesis of the utterance content of the voice uttered after "D reproduction" is set, for example, "CD stop" or "CD fast forward".

【００８２】以上、本発明の音声認識装置を、ＡＶシス
テム制御装置に適用した場合について説明したが、本発
明は、ＡＶシステム制御装置の他、音声を認識するあら
ゆる装置に適用することができる。The case where the voice recognition device of the present invention is applied to the AV system control device has been described above, but the present invention can be applied to any device that recognizes voice in addition to the AV system control device.

【００８３】なお、本実施例では、音声認識部１におけ
るワードスポッティングの方法については言及しなかっ
たが、音声認識部１においては、例えば、例えばＤＰマ
ッチング法やＨＭＭ法、特開昭６０−２４９１９８、特
開昭６０−２４９１９９、または特開昭６０−２５２３
９６などに開示されている音声認識装置の音声認識アル
ゴリズムなど、あらゆる音声認識アルゴリズムに基づい
て、ワードスポッティング処理するようにすることがで
きる。Although the method of word spotting in the voice recognition unit 1 is not mentioned in this embodiment, in the voice recognition unit 1, for example, the DP matching method, the HMM method, or the Japanese Patent Laid-Open No. 60-249198 is used. , JP-A-60-249199, or JP-A-60-2523.
The word spotting process can be performed based on any voice recognition algorithm such as the voice recognition algorithm of the voice recognition device disclosed in H.96.

【００８４】さらに、本実施例においては、音声の発話
内容の仮説を、動詞を中心とした意味関係で表現した概
念フレームを用いるようにしたが、これに限られるもの
ではなく、音声の発話内容の仮説を、所定のキーワード
を中心とした意味関係で表現した概念フレームを用いる
ようにすることができる。Further, in the present embodiment, the concept frame in which the hypothesis of the speech utterance content of the voice is expressed by the semantic relationship centering on the verb is used, but the present invention is not limited to this, and the speech utterance content of the voice is not limited to this. It is possible to use the concept frame expressed by the semantic relationship centering on a predetermined keyword.

【００８５】また、本実施例では、概念フレーム（図
５）の属性名「スコア」のスロットの属性値を、属性名
「動詞」、「行為対象」、「手段」、「方法」、「行為
起点」、または「行為終点」それぞれの属性値「スコ
ア」に記述されたスコアの合計スコアとしたが、これに
限らず、例えば属性名「動詞」、「行為対象」、「手
段」、「方法」、「行為起点」、または「行為終点」そ
れぞれの属性値「スコア」と属性値「検出区間」の長さ
（＝終点−始点）の積和をとった値とすることができ
る。Further, in the present embodiment, the attribute values of the slot of the attribute name "score" of the concept frame (FIG. 5) are set to the attribute names "verb", "action target", "means", "method", "action". The total score of the scores described in the attribute value "score" of each of the "starting point" or the "action end point" is used, but the present invention is not limited to this. For example, the attribute name "verb", "action target", "means", "method" ], “Action start point”, or “action end point”, and the attribute value “score” and the length of the attribute value “detection section” (= end point−start point) can be summed.

【００８６】さらに、概念フレーム（図５）の属性名
「スコア」のスロットの属性値を、属性名「動詞」、
「行為対象」、「手段」、「方法」、「行為起点」、ま
たは「行為終点」それぞれの属性値「スコア」と属性値
「検出区間」の長さ（＝終点−始点）の積和をとり、そ
の積和値を属性値「検出区間」の長さ（＝終点−始点）
の総和で除算した値とすることができる。Further, the attribute value of the slot of the attribute name "score" in the concept frame (FIG. 5) is changed to the attribute name "verb",
The product sum of the attribute value "score" of each "action target", "means", "method", "action start point", or "action end point" and the length of the attribute value "detection section" (= end point-start point) Then, the product sum value is the length of the attribute value “detection section” (= end point−start point)
It can be a value divided by the sum of.

【００８７】[0087]

【発明の効果】以上の如く、本発明の音声認識装置によ
れば、解析手段に、生成手段により生成された音声の発
話内容の仮説を意味表現するケースフレームに基づい
て、認識手段の音声の認識結果を解析させる。そして、
生成手段に、解析手段の解析結果に基づいて、音声の発
話内容の新たな仮説を生成させる。従って、音声が、語
順に関係なく解析されるので、発話の自由度を大きくす
ることができる。さらに、音声中に含まれる、例えば不
要語などの意味のない単語が無視されるので、音声の認
識率を向上させることができる。As described above, according to the voice recognition apparatus of the present invention, the analysis unit is configured to recognize the voice of the recognition unit based on the case frame representing the hypothesis of the utterance content of the voice generated by the generation unit. Let the recognition result be analyzed. And
The generation means is caused to generate a new hypothesis of the speech content of the voice based on the analysis result of the analysis means. Therefore, since the voice is analyzed regardless of the word order, the degree of freedom of speech can be increased. Furthermore, since meaningless words such as unnecessary words contained in the voice are ignored, the voice recognition rate can be improved.

[Brief description of drawings]

【図１】本発明の音声認識装置を応用したＡＶシステム
制御装置の一実施例の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of an embodiment of an AV system control device to which a voice recognition device of the present invention is applied.

【図２】図１の実施例のパーサ２のより詳細なブロック
図である。2 is a more detailed block diagram of the parser 2 of the embodiment of FIG.

【図３】単語辞書１２の構成例を示す図である。FIG. 3 is a diagram showing a configuration example of a word dictionary 12.

【図４】単語シソーラス１３の構成例を示す図である。FIG. 4 is a diagram showing a configuration example of a word thesaurus 13.

【図５】概念フレームを示す図である。FIG. 5 is a diagram showing a conceptual frame.

【図６】概念フレームを示す図である。FIG. 6 is a diagram showing a conceptual frame.

[Explanation of symbols]

１音声認識部２パーサ３対話管理部４ドメイン知識辞書５自然言語生成部６テキスト音声合成部７ＡＶ機器システム１１発話仮説パターンテーブル１２単語辞書１３単語シソーラス１４プライオリティキュー１５結果キュー 1 Speech recognition unit 2 Parser 3 Dialog management unit 4 Domain knowledge dictionary 5 Natural language generation unit 6 Text-speech synthesis unit 7 AV device system 11 Speech hypothesis pattern table 12 Word dictionary 13 Word thesaurus 14 Priority queue 15 Result queue

Claims

[Claims]

1. A recognition unit for recognizing a voice, a generation unit for generating a case frame which expresses the hypothesis of the utterance content of the voice and expresses the hypothesis, and a case frame generated by the generation unit. Based on, based on the analysis result of the analysis unit, and the analysis unit for analyzing the recognition result of the recognition unit,
A voice recognition device, which generates a new hypothesis of the utterance content of the voice.

2. The voice recognition device according to claim 1, wherein the case frame expresses the hypothesis generated by the generating means in a semantic relationship centered on a predetermined keyword.

3. The voice recognition device according to claim 1, wherein the case frame is represented by a set of an attribute name and an attribute value.