JP2003233391A

JP2003233391A - Language processor

Info

Publication number: JP2003233391A
Application number: JP2002034046A
Authority: JP
Inventors: Tsukasa Shimizu; 司清水
Original assignee: Toyota Central R&D Labs Inc
Current assignee: Toyota Central R&D Labs Inc
Priority date: 2002-02-12
Filing date: 2002-02-12
Publication date: 2003-08-22

Abstract

<P>PROBLEM TO BE SOLVED: To realize robust, highly precise and fast language processing. <P>SOLUTION: In a word dictionary 1, a symbol $ is added to a word which is possibly a slot value (keyword) of a facility name, a symbol % is added to a word which is possibly a slot value of the type of industry, and a symbol Ð is added to a word which is possibly a slot value of an address. A speech recognition part does not impart a category symbol (Ð/$/%,...) to a reading of each registered word in the word dictionary 1, but imparts a category symbol to only word notation (the left side of the word dictionary) so as to perform recognition processing according to the reading (phoneme string) of each word by using an acoustic model, the word dictionary 1, and a statistic language model. The speech recognition part performs the speech recognition processing by using such a word dictionary 1 and then a speech recognition result character string such as 'Well, it is % a restaurant in Ð Chikusa Ward.' is obtained. Hence the calculation cost for category identification processing, etc., by a meaning understanding part can effectively be reduced. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、話者の音声を文字
列として認識する音声認識部と、その音声認識部の出力
情報である音声認識文字列の中から所望のキーワードの
候補となる候補単語を抽出して、その候補単語のカテゴ
リーを同定し、前記キーワードを特定する意味理解部と
を有する言語処理装置に関する。したがって、本発明は
例えば、車載用のカーナビゲーション・システム等に応
用可能で、例えば目的地の施設名称や住所などの、所謂
「スロット」に該当する情報を埋めていくような対話を
行う音声対話装置等に適用することができる。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice recognition unit that recognizes a speaker's voice as a character string, and a candidate that is a candidate for a desired keyword from a voice recognition character string that is output information from the voice recognition unit. The present invention relates to a language processing device having a meaning understanding unit that extracts words, identifies categories of candidate words, and specifies the keywords. Therefore, the present invention can be applied to, for example, a car navigation system mounted on a vehicle, and a voice dialogue for performing dialogue for filling information corresponding to a so-called “slot” such as a facility name or an address of a destination. It can be applied to devices and the like.

【０００２】[0002]

【従来の技術】音声認識部の出力情報である音声認識文
字列の中からキーワードを特定する意味理解部を有する
言語処理装置としては、例えば、公開特許公報「特開平
７−２６２１９０：言語処理装置」（以下、「文献１」
と言う。）に記載されている装置や、公開特許公報「特
開平８−２６３４９４：言語解析装置」（以下、「文献
２」と言う。）に記載されている装置等が一般に知られ
ている。2. Description of the Related Art As a language processing device having a meaning understanding part for specifying a keyword from a voice recognition character string which is output information of a voice recognition part, for example, Japanese Patent Laid-Open Publication No. Hei 7-262190: Language Processing Device. ("Reference 1" below)
Say ), The device described in Japanese Patent Laid-Open No. 8-263494: Language analysis device (hereinafter referred to as "Document 2"), and the like are generally known.

【０００３】上記の文献１には、さまざまな言い回しを
含む入力文に対する意味解析を行うことを目的とした装
置が記載されており、この従来装置では、意味解析処理
の前段で、入力文を形態素解析し、各単語の意味素性情
報を意味素性知識記憶手段を参照することによって決定
している。また、後者の文献２には、発声音声文の文字
列に基づいて各構造を得るように言語解析することを目
的とした装置が記載されており、この従来装置では、入
力文の格構造（構文構造）を得るために、入力文に含ま
れる各単語の形態情報、統語情報及び意味情報を得るた
めに形態素解析を行っている。[0003] Document 1 described above describes a device intended to perform a semantic analysis on an input sentence including various phrases. In this conventional device, the input sentence is morpheme before the semantic analysis process. The semantic feature information of each word is analyzed and determined by referring to the semantic feature knowledge storage means. Further, the latter document 2 describes a device intended for linguistic analysis so as to obtain each structure based on a character string of a spoken voice sentence. In this conventional device, the case structure of an input sentence ( Morphological analysis is performed to obtain morphological information, syntactic information, and semantic information of each word included in the input sentence in order to obtain (syntactic structure).

【０００４】音声認識の分野では、音声から予め決めら
れたキーワードだけを認識する音声認識手法（所謂「ワ
ードスポッティング技術」）や、定型文法に基づいて単
語もしくはカテゴリーの並びを規定（制限）した言語モ
デルを用いた音声認識手法等がしばしば用いられる。し
かし、これらの技術や文法等に基づいた音声認識手法を
用いた、例えば上記の様な従来装置においては、あるカ
テゴリーに属するある単語が認識単語列中のどの位置に
現れるかが決まっているため、単語の認識と同時にキー
ワードの抽出およびカテゴリーの同定は可能であるが、
ユーザのさまざまな言い回し（文法的に不適格な文も含
む）に対して、頑健に言語認識することができない。そ
こで、現在では通常、このような言い回しに対する音声
認識手法としては、統計的言語モデルを用いた言語認識
手法がとられるのが一般的である。In the field of speech recognition, a speech recognition method for recognizing only predetermined keywords from speech (so-called "word spotting technology") or a language that defines (limits) the arrangement of words or categories based on a fixed grammar. A voice recognition method using a model is often used. However, in a conventional device using a voice recognition method based on these techniques and grammars, for example, in the conventional device as described above, it is determined at which position in a recognized word string a certain word belonging to a certain category appears. , It is possible to recognize keywords and extract keywords and identify categories at the same time.
It is not possible to robustly recognize the language of the user's various expressions (including grammatically inappropriate sentences). Therefore, at present, generally, a speech recognition method using a statistical language model is generally used as a speech recognition method for such a phrase.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、統計的
言語モデルを用いた音声対話装置では、認識結果として
得られる単語列はどの単語がどの順番でどの位置に出現
するかは定かではない。そのため、認識した単語列中で
どの単語がキーワードであり、どのようなカテゴリーに
属するかを決定するには、別途単語カテゴリーテーブル
のようなものを用意して参照処理を行うか、認識単語列
を形態素解析する必要が生じるため、これらの処理で
は、キーワード抽出とカテゴリー同定のために音声認識
以外の計算コストのかかる処理を行う必要がある。即
ち、音声認識された単語列を形態素解析を用いて単語の
カテゴリーを同定する従来の意味理解処理では、処理に
かかる計算コスト（時間、メモリ容量）が大きい。However, in a voice dialog device using a statistical language model, it is not clear which word appears in which position in a word string obtained as a recognition result. Therefore, to determine which word is a keyword and which category it belongs to in the recognized word string, prepare a separate word category table for reference processing, or Since it becomes necessary to perform morphological analysis, in these processes, it is necessary to perform a process with high calculation cost other than voice recognition for keyword extraction and category identification. That is, in the conventional meaning comprehension process of identifying the category of a word using the morphological analysis of a word string recognized by speech, the calculation cost (time, memory capacity) required for the process is large.

【０００６】例えばこの様に、従来装置においては、自
由発話の意味解析処理として形態素解析に基づいた処理
を行っているため、キーワードとなる単語の抽出とカテ
ゴリー（上記では発明では、意味素性や意味情報）の同
定のために、高い計算コストが必要になる。また、従来
技術は、入力される単語列が文法的に正しい文を成すこ
とを前提としたものであるが、現行の実際の音声認識技
術の精度は自然発話を全て正しく認識できる程には高く
ないため、上記の従来技術では、期待される意味処理が
確実に実行できる保証を得ることは困難である。For example, as described above, in the conventional apparatus, since the processing based on the morpheme analysis is performed as the semantic analysis processing of the free utterance, the extraction of the word as the keyword and the category (in the above invention, the semantic feature and the meaning are defined). High computational cost is required for the identification of (information). Further, the conventional technology is premised on that the input word string forms a grammatically correct sentence, but the accuracy of the current actual voice recognition technology is high enough to correctly recognize all natural utterances. Therefore, it is difficult to obtain a guarantee that the expected semantic processing can be reliably executed in the above-described conventional technology.

【０００７】本発明は、上記の課題を解決するために成
されたものであり、その目的は、頑健で高精度で高速な
言語処理を実現することである。The present invention has been made to solve the above problems, and an object thereof is to realize robust, highly accurate and high-speed language processing.

【０００８】[0008]

【課題を解決するための手段、並びに、作用及び発明の
効果】上記の課題を解決するためには、以下の手段が有
効である。即ち、第１の手段は、話者の音声を文字列と
して認識する音声認識部と、この音声認識部の出力情報
である音声認識文字列の中から所望のキーワードの候補
となる候補単語を抽出し、候補単語のカテゴリーを同定
してキーワードを特定する意味理解部とを有する言語処
理装置において、上記の音声認識部に、候補単語と成り
得る各登録単語の各表記文字列に、各登録単語のカテゴ
リーを示すカテゴリー記号をそれぞれ付加した単語辞書
に基づいて、音声認識文字列を生成する文字列生成手段
を備え、更に、上記の意味理解部に、音声認識文字列中
のカテゴリー記号の有無又は値を検索する記号検索手段
を備えることである。Means for Solving the Problems, and Functions and Effects of the Invention In order to solve the above problems, the following means are effective. That is, the first means extracts a candidate word that is a candidate for a desired keyword from a voice recognition unit that recognizes a speaker's voice as a character string and a voice recognition character string that is output information of the voice recognition unit. Then, in the language processing device having a meaning understanding part for identifying the category of the candidate word and specifying the keyword, the above-mentioned speech recognition part, for each notation character string of each registered word that can be a candidate word, for each registered word Based on the word dictionary to which each category symbol indicating the category of, is provided with a character string generation means for generating a voice recognition character string, further, the meaning understanding unit, the presence or absence of the category symbol in the voice recognition character string or It is to have a symbol search means for searching a value.

【０００９】これにより、後段の意味理解部において
は、認識単語列（即ち、音声認識部の出力情報である音
声認識文字列）から、少ない計算コストで容易に上記の
キーワードを抽出したり、そのカテゴリーを同定したり
することができる。即ち、本発明の手段によれば、認識
された単語自体にカテゴリーを示す記号が付与されてい
るので、キーワード抽出と同時にカテゴリー同定を行え
るため、意味処理における計算コストを低減できる。As a result, the meaning understanding unit in the subsequent stage can easily extract the above-mentioned keyword from the recognized word string (that is, the voice recognition character string which is the output information of the voice recognition unit) at a low calculation cost, and You can identify the category. That is, according to the means of the present invention, since the recognized word itself is provided with the symbol indicating the category, the category identification can be performed simultaneously with the keyword extraction, so that the calculation cost in the semantic processing can be reduced.

【００１０】また、統計的言語モデルを用いた現行の音
声認識技術では、誤認識などより文法的に不適格な文
（また、発声自体が文法的に不適格な場合がある）が認
識結果として得られることがしばしば起こる。このよう
な文法的に不適格な文に対しては、現状の形態素解析技
術では頑健に精度よく解析することができない。しか
し、上記の本発明の手段によれば、形態素解析などの言
語処理を行わずに、認識された単語に付与された上記の
「カテゴリー記号」に基づいて意味理解処理を行うの
で、文法的に不適格な認識単語列に対しても、頑健かつ
精度よく所望の言語処理を行うことができる。In addition, in the current speech recognition technology using a statistical language model, a grammatically inadequate sentence (and the utterance itself may be grammatically inadequate) is recognized as a recognition result due to erroneous recognition or the like. Often obtained. Such a grammatically inadequate sentence cannot be robustly and accurately analyzed by the current morphological analysis technique. However, according to the above-mentioned means of the present invention, the meaning understanding process is performed based on the above "category symbol" given to the recognized word without performing the language process such as morphological analysis. It is possible to robustly and accurately perform desired language processing even on an unrecognized recognized word string.

【００１１】また、本発明の第２の手段は、上記の第１
の手段において、音声認識文字列の内容に対応する応答
画面又は応答文の、種類、構成又は意味を決定する対話
制御手段を備えることである。また、第３の手段は、上
記の第２の手段において、応答文を一連の出力文字列と
して生成する応答文生成手段を備えることである。ま
た、第４の手段は、上記の第２又は第３の手段におい
て、応答文を音声に変換して出力する音声出力手段を備
えることである。The second means of the present invention is the above-mentioned first means.
This means is provided with a dialogue control means for determining the type, configuration or meaning of the response screen or response sentence corresponding to the content of the voice recognition character string. A third means is that, in the above-mentioned second means, it is provided with a response sentence generating means for generating a response sentence as a series of output character strings. The fourth means is that the second or third means is provided with a voice output means for converting the response sentence into voice and outputting the voice.

【００１２】即ち、上記の作用原理からも判るように、
本発明は、言語処理におけるキーワードの決定手順に特
徴を有するものであり、必ずしも音声出力を前提とする
ものではない。したがって、本発明は、使用者に対して
実時間応答や対話型応答をすることを前提としない自動
翻訳装置や自動議事録生成装置などの言語処理装置等に
応用することも可能である。尚、以上の本発明の作用・
効果は、日本語処理に限定されることなく、任意の自然
言語処理に対して有効である。以上の本発明の手段によ
り、前記の課題を効果的、或いは合理的に解決すること
ができる。That is, as can be seen from the above-mentioned principle of operation,
The present invention is characterized by the procedure for determining keywords in language processing, and is not necessarily premised on voice output. Therefore, the present invention can be applied to a language processing device such as an automatic translation device or an automatic minutes generating device which is not premised on a real-time response or an interactive response to the user. The operation of the present invention described above
The effect is effective not only for Japanese processing but for any natural language processing. By the means of the present invention described above, the above problems can be effectively or rationally solved.

【００１３】[0013]

【発明の実施の形態】以下、本発明を具体的な実施例に
基づいて説明する。ただし、本発明は以下に示す実施例
に限定されるものではない。〔実施例〕本実施例では、目的地設定タスクを例にして
説明する。図１は、本実施例の音声対話装置１０の論理
的な構成を例示するシステム構成図である。本実施例で
の目的地設定タスクでは、ユーザと対話を行いながら目
的地となる「施設名称」、その「業種」、「住所」の3
つのスロット（キーワードの受け皿）を満たす必要があ
るとする。BEST MODE FOR CARRYING OUT THE INVENTION The present invention will be described below based on specific embodiments. However, the present invention is not limited to the examples shown below. [Embodiment] In this embodiment, a destination setting task will be described as an example. FIG. 1 is a system configuration diagram illustrating a logical configuration of the voice interaction device 10 of the present embodiment. In the destination setting task according to the present embodiment, there are three types of destinations, “facility name”, “industry”, and “address”, while interacting with the user.
Suppose you need to fill one slot (the keyword tray).

【００１４】図２と図３に、音声対話装置１０が使用す
る単語辞書１及び統計的言語モデル２の構成様式を例示
する模式図をそれぞれ示す。これらの単語辞書１、統計
的言語モデル２において、施設名称のスロット値（キー
ワード）となる単語には前（又は後ろ）に＄記号、業種
のスロット値となる単語の前（又は後ろ）には％記号、
住所のスロット値となる単語には＠記号を付与するとす
る。FIG. 2 and FIG. 3 are schematic diagrams illustrating the configuration of the word dictionary 1 and the statistical language model 2 used by the voice dialog device 10, respectively. In these word dictionary 1 and statistical language model 2, a word that is a slot value (keyword) of a facility name is preceded (or back) by a $ sign, and a word that is a slot value of an industry is preceded (or after) by a symbol. %symbol,
It is assumed that the @ symbol is added to the word that becomes the slot value of the address.

【００１５】また、音声認識では、音響モデル、単語辞
書１、統計的言語モデル２を用いて、単語の読み（音素
列；図２の単語辞書の記載例の右側）にしたがって認識
処理を実行するため、上記の単語辞書１においては、読
みには記号を付与せず、単語表記（単語辞書の記載例の
左側）にだけカテゴリーを示す記号（カテゴリー記号）
を付与するものとする。In the speech recognition, the recognition process is executed according to the reading of the word (phoneme string; the right side of the description example of the word dictionary in FIG. 2) using the acoustic model, the word dictionary 1, and the statistical language model 2. Therefore, in the above-mentioned word dictionary 1, a symbol is not given to reading, and a symbol (category symbol) indicating a category only in the word notation (on the left side of the description example of the word dictionary).
Shall be given.

【００１６】音声認識部１１により、以上の様な単語辞
書１（図２）、及び統計的言語モデル２（図３）を用い
て音声認識処理を行った結果、次の＜実際の対話Ａ＞に
おける「ユーザ側の返答」に対して、以下の音声認識結
果＜音声認識文字列Ｂ＞が得られる。＜実際の対話Ａ＞音声対話装置１０：お店の住所と業種を言って下さい。ユーザ側の返答：えーと千種区のレストランです。＜音声認識文字列Ｂ＞ユーザ側の返答：えーと＠千種区の％レストラ
ンです。As a result of the voice recognition processing performed by the voice recognition unit 11 using the above-mentioned word dictionary 1 (FIG. 2) and the statistical language model 2 (FIG. 3), the following <actual dialogue A> The following voice recognition result <voice recognition character string B> is obtained in response to the “user side response”. <Actual dialogue A> Voice dialogue device 10: Please tell us the address of the shop and the type of business. User Response: Well, this is a restaurant in Chikusa Ward. <Voice recognition character string B> User response: Eh @ @ Chikusa-ku% Restaurant.

【００１７】上記の様な音声認識結果＜音声認識文字列
Ｂ＞が得られた際の、その後の意味理解部１２における
言語処理手順について以下に例示する。図４は、音声対
話装置１０の意味理解部１２の言語処理手順を例示する
ゼネラルフローチャートである。A language processing procedure in the meaning understanding unit 12 after the voice recognition result <voice recognition character string B> as described above is obtained will be illustrated below. FIG. 4 is a general flowchart illustrating a language processing procedure of the meaning understanding unit 12 of the voice interaction device 10.

【００１８】本フローチャートでは、まず最初に、音声
認識部１１から上記の＜音声認識文字列Ｂ＞を入力す
る。次に、ステップ１２３では、認識単語列（音声認識
文字列Ｂ）の各単語の先頭１文字を調べてキーワードの
候補単語を抽出する。即ち、単語列の各単語の先頭1 文
字目が＄記号、％記号、＠記号である場合は、それらの
単語をキーワードの候補単語として抽出する。この時、
その他の単語は、以後の処理に用いないので保持する必
要はない。したがって、この処理により以下のキーワー
ド列（キーワードの候補単語）が保持される。In this flowchart, first, the above-mentioned <voice recognition character string B> is input from the voice recognition unit 11. Next, in step 123, the leading one character of each word in the recognized word string (voice recognition character string B) is examined to extract a keyword candidate word. That is, when the first character at the beginning of each word in the word string is a $ symbol, a% symbol, or an @ symbol, those words are extracted as keyword candidate words. At this time,
The other words are not used in the subsequent processing and need not be retained. Therefore, the following keyword string (keyword candidate word) is held by this processing.

【００１９】＜抽出キーワード列＞（ユーザ側の返答）：＠千種区％レストラン<Extracted keyword string> (User response): @ Chikusa-ku% Restaurant

【００２０】次に、ステップ１２６では、各候補単語の
カテゴリー記号（先頭１文字：＠／％／＄）に基づい
て、各候補単語のカテゴリーを同定する。例えば上記の
＜音声認識文字列Ｂ＞が得られた場合、抽出されたキー
ワードの先頭１文字目が、＄記号の場合は施設名称のス
ロット値とし、％記号の場合は業種のスロット値とし、
＠記号の場合は住所のスロット値とする。その結果、次
の様に各スロットが満たされる。Next, in step 126, the category of each candidate word is identified based on the category symbol (first character: @ /% / $) of each candidate word. For example, when the above <voice recognition character string B> is obtained, if the first character of the extracted keyword is the $ symbol, the slot value is the facility name, and if the% symbol is the slot value of the industry,
In case of @ symbol, it is the slot value of the address. As a result, each slot is filled as follows.

【００２１】＜各スロット値の解決結果＞業種スロット値＝レストラン住所スロット＝千種区<Resolution result of each slot value> Industry Slot Value = Restaurant Address slot = Chikusa Ward

【００２２】例えば以上の様な処理により、音声認識文
字列を入力とし、単語抽出部ではカテゴリー記号が付与
されているキーワードだけを抽出し、単語カテゴリー同
定部ではカテゴリー記号にしたがってキーワードのカテ
ゴリーを同定することができる。For example, as a result of the above-mentioned processing, the voice recognition character string is input, the word extraction unit extracts only the keywords to which the category symbols are added, and the word category identification unit identifies the keyword categories according to the category symbols. can do.

【００２３】尚、上記の実施例では、カテゴリー記号に
特殊文字（例えば、＠／％／＄．．．等）を用いたが、
これらのカテゴリー記号には、解析対象とする言語の種
類等に応じて任意の文字を用いることができる。また、
上記の実施例では、カテゴリー記号を１文字に留めた
が、カテゴリー記号の文字数は任意で良い。In the above embodiment, special characters (for example, @ /% / $ ..., etc.) are used as category symbols.
Any characters can be used for these category symbols depending on the type of language to be analyzed. Also,
In the above embodiment, the category symbol is limited to one character, but the number of characters of the category symbol may be arbitrary.

【００２４】また、全体のカテゴリー自身を階層化し、
例えば３階層に分類する場合に、カテゴリー記号も同等
に３階層化する等の構成を採用しても良い。これによ
り、複雑なカテゴリー（概念）をきめ細かく階層化して
階層的に取り扱うことが容易となる。例えば、住所を表
す場合に、都道府県名には「＠１」をカテゴリー記号と
して適用し、市名には「＠２」をカテゴリー記号として
適用し、町村名には「＠３」をカテゴリー記号として適
用する等の階層化を実施することが有効となる場合も考
えられる。Further, the entire categories themselves are hierarchized,
For example, in the case of classifying into three layers, the category symbols may be similarly divided into three layers. As a result, it becomes easy to handle complicated categories (concepts) in a finely hierarchical manner. For example, when representing an address, “@ 1” is applied as a category symbol to prefecture names, “@ 2” is applied to city names as a category symbol, and “@ 3” is assigned to category names as a category symbol. In some cases, it may be effective to implement layering such as applying as.

[Brief description of drawings]

【図１】本発明の実施例に係わる音声対話装置１０の論
理的な構成を例示するシステム構成図。FIG. 1 is a system configuration diagram illustrating a logical configuration of a voice dialog device 10 according to an embodiment of the present invention.

【図２】音声対話装置１０が使用する単語辞書１の構成
様式を例示する模式図。FIG. 2 is a schematic diagram illustrating the configuration of a word dictionary 1 used by the voice interaction device 10.

【図３】音声対話装置１０が使用する統計的言語モデル
２の構成様式を例示する模式図。FIG. 3 is a schematic diagram illustrating a configuration mode of a statistical language model 2 used by the voice interaction device 10.

【図４】音声対話装置１０の意味理解部１２の言語処理
手順を例示するゼネラルフローチャート。FIG. 4 is a general flowchart illustrating a language processing procedure of the meaning understanding unit 12 of the voice dialog device 10.

[Explanation of symbols]

１ … 単語辞書２ … 統計的言語モデル１０ … 音声対話装置（言語処理装置）１１ … 音声認識部１２ … 意味理解部１２３ … 単語抽出部１２６ … 単語カテゴリー同定部１３ … 対話制御部１４ … 音声出力部＄ … カテゴリー記号（店名、又は施設名）％ … カテゴリー記号（業種、又は取扱品名）＠ … カテゴリー記号（地名、又は住所） 1… Word dictionary 2… Statistical language model 10 ... Spoken dialogue device (language processing device) 11 ... Voice recognition unit 12 ... Meaning understanding 123 ... Word extraction unit 126 ... Word category identification unit 13 ... Dialogue control unit 14 ... Voice output section $… Category code (store name or facility name) %… Category symbol (type of industry or product name) @… Category symbol (place name or address)

フロントページの続き (51)Int.Cl.⁷ 識別記号ＦＩテーマコート゛(参考）Ｇ１０Ｌ 15/18 Ｇ１０Ｌ 3/00 Ｒ 15/22 15/28 Front page continuation (51) Int.Cl. ⁷ Identification code FI theme code (reference) G10L 15/18 G10L 3/00 R 15/22 15/28

Claims

[Claims]

1. A voice recognition unit that recognizes a voice of a speaker as a character string, and a candidate word that is a candidate for a desired keyword is extracted from a voice recognition character string that is output information of the voice recognition unit, In a language processing device having a meaning understanding unit that identifies a category of candidate words and specifies the keyword, the voice recognition unit, in each notation character string of each registered word that can be the candidate word,
Based on a word dictionary to which a category symbol indicating the category of each of the registered words is respectively added, it has a character string generation means for generating the voice recognition character string, the meaning understanding unit, the meaning in the voice recognition character string A language processing apparatus having a symbol search means for searching for the presence or value of a category symbol.

2. The language processing apparatus according to claim 1, further comprising a dialogue control unit that determines a type, a structure, or a meaning of a response screen or a response sentence corresponding to the content of the voice recognition character string.

3. A response sentence generating means for generating the response sentence as a series of output character strings is provided.
The language processing device according to.

4. The language processing apparatus according to claim 2, further comprising a voice output unit that converts the response sentence into voice and outputs the voice.