JP2003271596A

JP2003271596A - Language processor

Info

Publication number: JP2003271596A
Application number: JP2002071372A
Authority: JP
Inventors: Tomoko Okuma; 智子大熊; Kazuki Hirata; 和貴平田
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2002-03-15
Filing date: 2002-03-15
Publication date: 2003-09-26
Anticipated expiration: 2022-03-15
Also published as: JP3956730B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a language processor for extracting a proper word as an important word even from dialogic data including wrong recognition or ambiguity. <P>SOLUTION: A text information storage means 2 stores, for example, the past text information in a dialogue, and a text information acquisition means 11 acquires, for example, the present text information in the dialogue concerned. A matched character string part detection means 3 detects a character string part matched between the text information acquired by the text information acquisition means 1 and the text information stored in the text information storage means 2, and an important word extraction means 4 extracts the important word from the character string part detected by the matched character string part detection means 3. <P>COPYRIGHT: (C)2003,JPO

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】本発明は、テキスト情報から
重要語を抽出する言語処理装置に関し、特に、例えば認
識誤りやあいまい性を含む対話データなどからでも、適
切な単語を重要語として抽出する言語処理装置に関す
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a language processing apparatus for extracting an important word from text information, and in particular, a language for extracting an appropriate word as an important word even from dialogue data including recognition error or ambiguity. Regarding a processing device.

【０００２】[0002]

【従来の技術】例えば、対話データからその内容や話題
を示すような重要語を抽出するために用いられてきた従
来の方法では、入力された対話データから、単語辞書に
よるキーワードスポッティング若しくは形態素解析手段
によって単語を抽出することが行われる。2. Description of the Related Art For example, according to a conventional method that has been used to extract an important word indicating a content or a topic from dialogue data, keyword spotting or morphological analysis means using a word dictionary is input from the inputted dialogue data. Words are extracted by.

【０００３】特開平１１−３３４８号公報に記載の「電
子対話用広告装置」では、あらかじめ単語辞書（広告辞
書）を用意しておき、対話データ中に辞書に登録されて
いる単語が現れると、その語を重要語として、その語に
関連する情報（広告情報）を表示することが行われる。
特開平６−２３６４１０号公報に記載の「自動情報提供
方法」では、単なる単語辞書ではなく、単語（話題）と
その語が属する分野を記録したデータベースを用いて、
対話がどの分野に属しているかを判定することや、属す
る分野が変換したことを検知することが行われる。In the "advertising device for electronic dialogue" described in Japanese Patent Laid-Open No. 11-3348, a word dictionary (advertising dictionary) is prepared in advance, and when a word registered in the dictionary appears in the dialogue data, Information related to the word (advertising information) is displayed with the word as an important word.
In the "automatic information providing method" described in Japanese Patent Laid-Open No. 6-236410, a database in which a word (topic) and a field to which the word belongs are recorded is used instead of a simple word dictionary.
It is performed to determine which field the dialogue belongs to, and to detect that the field to which the dialogue belongs has been converted.

【０００４】特開平８−１３７８７４号公報に記載の
「対話処理装置」では、話題転換の有無の検出を行うた
めに、入力された対話データに対して、同義語辞書と単
語辞書と対比語辞書を備えた形態素解析装置によって、
単語リストを作成することが行われる。特開平１０−６
９４８２号公報に記載の「話題処理装置」では、対話デ
ータ（発言オブジェクト）に対して形態素解析処理を行
って、特定の種類の単語をキーワードとして抽出するこ
とが行われる。In the "dialogue processing device" described in Japanese Patent Laid-Open No. 8-137874, a synonym dictionary, a word dictionary, and a contrast word dictionary are applied to input dialogue data in order to detect the presence or absence of topic change. With a morphological analyzer equipped with
A word list is created. JP-A-10-6
In the “topic processing device” described in Japanese Patent No. 9482, morphological analysis processing is performed on dialogue data (speech object) to extract a specific type of word as a keyword.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、上記し
た従来の重要語抽出の方法ではいずれも、対話データか
ら単語辞書によるキーワードスポッティングや形態素解
析を用いて単語を抽出した上で、単語のみを対象として
記録装置に保持して、頻度を計数することや、空間ベク
トル上で単語間の距離を計測することなどが行われる。
このような従来の方法では、最初の単語抽出に一度失敗
してしまうと、二度と正しい解を得ることができないと
いった問題がある。例えば、対話データがノイズつまり
認識誤り文字や、あいまい性を含んでいる場合には、単
語抽出に失敗する可能性は極めて高い。However, in any of the above-mentioned conventional important word extraction methods, only words are extracted from dialogue data by using keyword spotting or morphological analysis with a word dictionary and then extracting only the words. It is held in a recording device to count the frequency and measure the distance between words on a space vector.
Such a conventional method has a problem in that once the first word extraction fails, a correct solution cannot be obtained again. For example, if the dialogue data includes noise, that is, a recognition error character or ambiguity, the word extraction is highly likely to fail.

【０００６】具体例として、「このちほうのおおきなだ
いごみはさかなです。」というデータがあるときに、
「だい」という部分が誤りであるか、或いは「大ごみ」
と「醍醐味」とのあいまい性を含んでいる場合には、対
話の正しい認識結果が｛この（連体詞）／地方（名詞-
一般）／の（助詞-連体化）大きな（連体詞）／だい
（誤り）／ゴミ（名詞-一般）／は（係助詞）／魚（名
詞-一般）／です（助動詞）｝であっても、この文字列
が{この（連体詞）／地方（名詞-一般）／の（助詞-連
体化）大きな（連体詞）／醍醐味（名詞）／は（係助
詞）／魚（名詞-一般）／です（助動詞）}という漢字か
な混じりで表記された単語リストへ変換された後では、
本来得たかった単語である「ゴミ」を抽出することがで
きない上に、「醍醐味」という不要な単語をキーワード
として抽出してしまう可能性がある。[0006] As a concrete example, when there is the data "The big trash of this place is fish",
The word "dai" is incorrect, or "large garbage"
When the ambiguity between "and" is "real" is included, the correct recognition result of the dialogue is {this (adjective) / local (noun-
General) / of (particle-unification) large (adjective) / dai (error) / garbage (noun-general) / ha (partition particle) / fish (noun-general) / is (auxiliary verb)} This string is {this (adjective) / local (noun-general) / of (particle-adjective) large (adjective) / daigo (noun) / is (engagement particle) / fish (noun-general) / is (auxiliary verb) )} After being converted into a word list written in a mixture of kanji and kana,
In addition to being unable to extract the word "garbage" that was originally desired, there is a possibility that unnecessary words "the real pleasure" will be extracted as keywords.

【０００７】本発明は、このような従来の事情に鑑みな
されたもので、例えばテキスト情報に誤りやあいまい性
が含まれていても、適切な単語を重要語として抽出する
ことができる言語処理装置などを提供することを目的と
する。更に具体的には、本発明では、例えば、音声認識
装置などにより得られた対話データに誤りである文字列
若しくはあいまい性のある文字列が含まれているような
場合においても、これらの誤りやあいまい性をそのまま
保持しておき、前後の対話などに応じて誤りやあいまい
性を解消することにより、対話などの内容を表すための
重要語を適切に抽出することを実現する。The present invention has been made in view of such conventional circumstances. For example, even if text information includes an error or ambiguity, a suitable word can be extracted as an important word. The purpose is to provide such. More specifically, in the present invention, for example, even when the dialogue data obtained by the voice recognition device or the like contains an erroneous character string or an ambiguous character string, these errors and By keeping the ambiguity as it is and eliminating errors and ambiguity depending on the dialogue before and after, it is possible to properly extract the important words for expressing the contents of the dialogue.

【０００８】[0008]

【課題を解決するための手段】上記目的を達成するた
め、本発明に係る言語処理装置では、テキスト情報記憶
手段がテキスト情報を記憶し、テキスト情報取得手段が
テキスト情報を取得し、一致文字列部分検出手段がテキ
スト情報取得手段により取得されたテキスト情報とテキ
スト情報記憶手段に記憶されたテキスト情報とで一致す
る文字列部分を検出し、重要語抽出手段が一致文字列部
分検出手段により検出された文字列部分から重要語を抽
出する。To achieve the above object, in a language processing apparatus according to the present invention, a text information storage means stores text information, a text information acquisition means acquires text information, and a matching character string is obtained. The part detecting means detects a matching character string part between the text information acquired by the text information acquiring means and the text information stored in the text information storing means, and the important word extracting means is detected by the matching character string part detecting means. The important word is extracted from the character string part.

【０００９】従って、記憶されたテキスト情報と取得さ
れたテキスト情報とで一致する文字列部分から重要語が
抽出されるため、例えば記憶されたテキスト情報或いは
取得されたテキスト情報に誤りやあいまい性が含まれる
ような場合においても、適切な重要語を抽出することが
できる。つまり、記憶されたテキスト情報と取得された
テキスト情報とに全く同一の誤りなどが含まれる場合を
除いては、検出される一致する文字列部分に誤りなどが
含まれないため、誤りなどを含んだ単語を重要語として
抽出してしまう割合を従来と比べて低めることができ、
これにより、適切な重要語を抽出することを実現するこ
とができる。Therefore, since the important word is extracted from the character string portion where the stored text information and the acquired text information match, an error or ambiguity may occur in the stored text information or the acquired text information. Even when it is included, an appropriate important word can be extracted. In other words, except when the stored text information and the acquired text information include exactly the same error, the detected matching character string portion does not include the error, so the error is not included. It is possible to reduce the ratio of extracting a word as an important word compared to the past,
This makes it possible to realize extraction of an appropriate important word.

【００１０】なお、図３には、本発明に係る言語処理装
置の概略的な構成例として、テキスト情報記憶手段の機
能を有するテキスト情報記憶部３１と、テキスト情報取
得手段の機能を有するテキスト情報取得部３２と、一致
文字列部分検出手段の機能を有する一致文字列部分検出
部３３と、重要語抽出手段の機能を有する重要語抽出部
３４とを示してあり、また、テキスト情報記憶部３１か
ら一致文字列部分検出部３３へ供給されるテキスト情報
４１を示してある。In FIG. 3, as a schematic configuration example of the language processing apparatus according to the present invention, a text information storage section 31 having a function of text information storage means and text information having a function of text information acquisition means. An acquisition unit 32, a matching character string portion detecting unit 33 having a function of a matching character string portion detecting unit, and an important word extracting unit 34 having a function of an important word extracting unit are shown, and the text information storage unit 31 is shown. The text information 41 supplied from the to the matching character string portion detection unit 33 is shown.

【００１１】ここで、記憶されるテキスト情報や取得さ
れるテキスト情報としては、種々な情報が用いられても
よく、例えば文章を含んだ情報が用いられる。また、一
致する文字列部分を検出するために用いられる記憶され
たテキスト情報の数や取得されたテキスト情報の数とし
ては、例えば１つずつといった態様が用いられるが、他
の態様として、いずれか或いは両方が複数であってもよ
く、この場合には、３つ以上の全てのテキスト情報で一
致する文字列部分を検出する。Here, various information may be used as the stored text information or the acquired text information, for example, information including a sentence is used. The number of stored text information used to detect a matching character string portion or the number of acquired text information may be one, for example, but one of the other aspects is Alternatively, both may be plural, and in this case, the matching character string portion is detected in all of the three or more pieces of text information.

【００１２】また、一致する文字列部分を検出するため
に用いる記憶されたテキスト情報と取得されたテキスト
情報としては、好ましい態様として、例えば話題の内容
は同一又は類似であるが異なる文章を含むようなテキス
ト情報が用いられる。具体的には、例えば、対話や講演
会などにおける同一又は類似の話題に関する過去の発言
のテキスト情報と現在の発言のテキスト情報や、また、
美術館内の同一の展示場などのように同一の場所におけ
る複数の人の意見やアンケートのテキスト情報など、種
々なテキスト情報を用いることができる。As the stored text information used to detect the matching character string portion and the acquired text information, as a preferred mode, for example, the topic contents are the same or similar but include different sentences. Text information is used. Specifically, for example, text information about past statements and text information about current statements about the same or similar topics in dialogues and lectures, or
It is possible to use various text information such as opinions of a plurality of people at the same place such as the same exhibition hall in a museum or text information of a questionnaire.

【００１３】また、テキスト情報記憶手段としては、例
えばメモリを用いて構成することができる。また、テキ
スト情報取得手段によりテキスト情報を取得する仕方と
しては、必ずしもテキスト情報の形式で取得する仕方ば
かりでなく、例えば音声情報の形式をテキスト情報の形
式へ変換して取得する仕方など、種々な仕方が用いられ
てもよい。Further, the text information storage means can be constituted by using a memory, for example. Further, as the method of acquiring the text information by the text information acquisition means, not only the method of acquiring the text information in the format of the text information but also various methods such as the method of converting the format of the voice information into the format of the text information and acquiring the information are available. Methods may be used.

【００１４】また、一致文字列部分検出手段により検出
する文字列部分としては、例えば記憶されたテキスト情
報と取得されたテキスト情報とに含まれる共通な文字列
の部分が検出される。また、必ずしも２文字以上から成
る文字列ばかりでなく、例えば１文字を文字列として検
出するような態様が用いられてもよい。また、検出する
文字列の数としては、特に限定はなく、種々な数が用い
られてもよい。また、文字列を構成する文字としては、
必ずしもひらがなやカタカナやローマ字などばかりでな
く、例えば記号などの種々なものが含まれてもよい。As the character string portion detected by the matched character string portion detecting means, for example, a common character string portion included in the stored text information and the acquired text information is detected. Further, not only a character string composed of two or more characters but also a mode in which one character is detected as a character string may be used. The number of character strings to be detected is not particularly limited, and various numbers may be used. Also, as the characters that make up the character string,
Not only hiragana, katakana, romaji, etc., but also various things such as symbols may be included.

【００１５】また、一致文字列部分検出手段により文字
列部分を検出する仕方としては、種々な仕方が用いられ
てもよく、好ましい態様として、例えば、テキスト情報
を単語に区切る前の表音文字から成る意味を持たせてい
ない文字の並びとして見て、前方から後方へ順に或いは
後方から前方へ順に、一致する文字の並び部分を一致す
る文字列部分として検出するような仕方を用いることが
できる。Various methods may be used to detect the character string portion by the matching character string portion detecting means. In a preferred mode, for example, from the phonetic character before the text information is divided into words. It is possible to use a method in which a sequence of matching characters is detected as a sequence of matching character strings when viewed as a sequence of characters having no meaning, in order from front to back or from rear to front.

【００１６】また、重要語抽出手段により抽出する重要
語としては、種々な語であってもよい。また、重要語抽
出手段により重要語を抽出する仕方としては、種々な仕
方が用いられてもよく、例えば、検出された一致する文
字列部分の中で意味を持った単語を検出して当該単語を
重要語として抽出する。また、抽出される重要語につい
ては、例えば、メモリなどの記憶手段に記憶すること
や、画面などの表示手段に表示することなどが行われ
る。Further, various words may be used as the important words extracted by the important word extracting means. Further, various methods may be used to extract the important word by the important word extracting means, for example, by detecting a word having meaning in the detected matching character string portion, Is extracted as an important word. In addition, the extracted important word is stored in a storage unit such as a memory or displayed on a display unit such as a screen, for example.

【００１７】また、本発明に係る言語処理装置では、一
構成例として、テキスト情報記憶手段に記憶されるテキ
スト情報は、音声情報を音声認識して得られたテキスト
情報である。また、本発明に係る言語処理装置では、一
構成例として、テキスト情報取得手段では、音声情報入
力手段が音声情報を入力し、音声認識手段が音声情報入
力手段により入力された音声情報を音声認識してテキス
ト情報へ変換する。In the language processing device according to the present invention, as one configuration example, the text information stored in the text information storage means is text information obtained by voice recognition of voice information. Further, in the language processing device according to the present invention, as one configuration example, in the text information acquisition means, the voice information input means inputs voice information, and the voice recognition means voice recognizes the voice information input by the voice information input means. And convert it to text information.

【００１８】従って、例えば人が発する音声情報に関し
て、重要語を抽出することができる。一般に、音声情報
を音声認識して得られるテキスト情報には認識誤りなど
が生じ得るため、本発明が特に有効となる。ここで、音
声情報としては、種々な情報が用いられてもよい。ま
た、音声認識の処理としては、例えば一般に知られてい
る技術を用いることが可能である。また、音声情報入力
手段としては、例えばマイクを用いて構成することがで
きる。Therefore, for example, with respect to the voice information uttered by a person, the important word can be extracted. In general, the present invention is particularly effective because a recognition error or the like may occur in text information obtained by voice recognition of voice information. Here, various information may be used as the audio information. Further, as the voice recognition process, for example, a generally known technique can be used. The voice information input means can be configured by using a microphone, for example.

【００１９】また、本発明に係る言語処理装置では、好
ましい態様例として、テキスト情報記憶手段に記憶され
るテキスト情報とテキスト情報取得手段により取得され
るテキスト情報とは互いに関連した内容を有する。具体
的には、例えば記憶されるテキスト情報と取得されるテ
キスト情報とで共通の単語を含むようなテキスト情報が
用いられ、このような共通の単語を重要語として抽出す
ることができる。In the language processing apparatus according to the present invention, as a preferred example, the text information stored in the text information storage means and the text information acquired by the text information acquisition means have contents related to each other. Specifically, for example, text information that includes a common word in the stored text information and the acquired text information is used, and such a common word can be extracted as an important word.

【００２０】また、本発明に係る言語処理装置では、一
構成例として、テキスト情報取得手段の音声情報入力手
段は２人以上により行われる対話で現在に発せられる音
声情報を入力し、テキスト情報記憶手段に記憶されたテ
キスト情報は当該対話で過去に発せられた音声情報を音
声認識して得られたテキスト情報である。従って、例え
ば同一又は類似の話題について話している対話内容に関
して、話者が発する言葉の情報から重要語を抽出するこ
とができる。Further, in the language processing apparatus according to the present invention, as an example of the configuration, the voice information input means of the text information acquisition means inputs voice information which is currently uttered by a dialogue performed by two or more people, and stores the text information. The text information stored in the means is text information obtained by voice-recognizing voice information issued in the past in the dialogue. Therefore, for example, with respect to the conversation contents talking about the same or similar topic, the important word can be extracted from the information of the words spoken by the speaker.

【００２１】また、本発明に係る言語処理装置では、一
構成例として、テキスト情報記憶手段は記憶したテキス
ト情報を所定の期間となったことに応じて記憶内容から
削除する。従って、例えば対話の区切り目などの所定の
期間毎に重要語を抽出することができる。Further, in the language processing apparatus according to the present invention, as one configuration example, the text information storage means deletes the stored text information from the stored contents when a predetermined period has come. Therefore, it is possible to extract the important word for each predetermined period such as a break point of the dialogue.

【００２２】ここで、所定の期間としては、種々な期間
が用いられてもよく、例えば５分間や１０分間などの予
め定められた期間を用いることができる。好ましい態様
の具体例として、１０分間の対話において１０分間を所
定の期間として用いることができ、また、５分毎に話題
の内容が切り替わる対話において５分間を所定の期間と
して用いることができ、また、会議での対話において当
該会議が終了するまでの期間を所定の期間として用いる
ことができ、また、美術館内の所定の絵画についての対
話について当該絵画が展示されている期間を所定の期間
として用いることができる。Here, various periods may be used as the predetermined period, for example, a predetermined period such as 5 minutes or 10 minutes can be used. As a specific example of a preferable mode, 10 minutes can be used as a predetermined period in a 10-minute dialogue, and 5 minutes can be used as a predetermined period in a dialogue in which the content of a topic switches every 5 minutes. , The period until the end of the conference can be used as the predetermined period in the dialogue at the conference, and the period in which the painting is exhibited for the conversation about the predetermined painting in the museum is used as the predetermined period. be able to.

【００２３】また、本発明に係る言語処理装置では、話
題転換検出手段がテキスト情報取得手段の音声情報入力
手段により入力される音声情報又はテキスト情報取得手
段の音声認識手段により変換されるテキスト情報に基づ
いて対話における話題の転換を検出し、テキスト情報記
憶手段は記憶したテキスト情報を話題転換検出手段によ
り話題転換が検出されたことに応じて記憶内容から削除
する。従って、例えば対話における話題の転換が発生す
るまでの期間毎に重要語を抽出することができる。Further, in the language processing apparatus according to the present invention, the topic conversion detection means is converted into voice information input by the voice information input means of the text information acquisition means or text information converted by the voice recognition means of the text information acquisition means. Based on the detection of the topic change in the dialogue, the text information storage means deletes the stored text information from the stored contents in response to the topic change detection means detecting the topic change. Therefore, for example, the important word can be extracted for each period until a topic change occurs in the dialogue.

【００２４】ここで、対話における話題の転換として
は、例えば「ところで」や「話は変わるが」などのよう
に話題の転換点を示す所定の語句を検出することや、例
えば所定の時間以上の無音声時間（音声が発せられない
時間）を話題の転換点とみなして検出することができ
る。Here, as the conversion of the topic in the dialogue, detection of a predetermined word or phrase indicating a point of change of the topic, such as "By the way" or "Speak changes," is performed. It is possible to detect the silent time (the time during which no sound is emitted) by regarding it as the turning point of the topic.

【００２５】また、本発明に係る言語処理装置では、一
構成例として、テキスト情報取得手段はネットワーク上
においてテキスト情報を用いて２人以上により行われる
対話で現在に発せられるテキスト情報を取得し、テキス
ト情報記憶手段に記憶されたテキスト情報は当該対話で
過去に発せられたテキスト情報である。従って、例えば
インターネット上のチャットなどで発せられるテキスト
情報から重要語を抽出することができる。Further, in the language processing apparatus according to the present invention, as one configuration example, the text information acquisition means acquires text information currently issued by a dialogue performed by two or more people using the text information on the network, The text information stored in the text information storage means is the text information issued in the past in the dialogue. Therefore, the important word can be extracted from the text information issued by chat on the Internet, for example.

【００２６】また、本発明に係る言語処理装置では、一
構成例として、重要語抽出手段では、形態素解析手段が
一致文字列部分検出手段により検出された文字列部分を
形態素解析して品詞情報付きの単語リストを取得し、所
定品詞単語抽出手段が形態素解析手段により取得された
単語リストの中から所定の品詞の単語を重要語として抽
出する。Further, in the language processing apparatus according to the present invention, as one configuration example, in the important word extraction means, the morpheme analysis means performs morpheme analysis on the character string portion detected by the coincident character string portion detection means, and the part-of-speech information is added. And the predetermined part-of-speech word extraction means extracts the word of the predetermined part-of-speech as an important word from the word list acquired by the morpheme analysis means.

【００２７】ここで、形態素解析の処理としては、例え
ば一般に知られている技術を用いることが可能である。
また、品詞情報付きの単語リストとしては、例えば検出
された文字列部分を品詞の情報が付いた単語のリストへ
変換したものが用いられる。また、所定の品詞の単語の
当該品詞としては、例えば名詞などの種々な品詞が用い
られてもよい。具体的には、例えば名詞などの必要な品
詞の単語を抽出する一方、例えば助詞などの不要な品詞
の単語を抽出しないような態様を用いることができる。Here, as the morphological analysis processing, for example, a generally known technique can be used.
Further, as the word list with part-of-speech information, for example, the one obtained by converting the detected character string portion into a list of words with part-of-speech information is used. Various parts of speech such as nouns may be used as the part of speech of the word having a predetermined part of speech. Specifically, a mode can be used in which, for example, a word having a necessary part-of-speech such as a noun is extracted, while an unnecessary part-of-speech word such as a particle is not extracted.

【００２８】また、本発明に係る言語処理装置では、一
構成例として、重要語抽出手段では、重要語候補単語記
憶手段が重要語として抽出する候補となる単語を記憶
し、一致単語抽出手段が一致文字列部分検出手段により
検出された文字列部分の中から重要語候補単語記憶手段
に記憶された単語と一致する単語を重要語として抽出す
る。Further, in the language processing apparatus according to the present invention, as one configuration example, the important word extracting means stores words that are candidates to be extracted as important words by the important word candidate word storing means, and the matching word extracting means A word that matches the word stored in the important word candidate word storage means is extracted as an important word from the character string portion detected by the matching character string portion detection means.

【００２９】ここで、重要語候補単語記憶手段に記憶さ
れる単語としては、種々な単語が用いられてもよい。当
該記憶される単語は、重要語を抽出するためのキーワー
ドとして用いられ、つまり、検出された文字列部分の中
に当該キーワードと一致する単語がある場合には、当該
単語が重要語として抽出される。また、重要語候補単語
記憶手段に記憶される単語の数としては、種々な数が用
いられてもよい。また、重要語候補単語記憶手段として
は、例えばメモリを用いて構成することができる。Various words may be used as the words stored in the important word candidate word storage means. The stored word is used as a keyword for extracting the important word, that is, if there is a word that matches the keyword in the detected character string portion, the word is extracted as the important word. It Further, various numbers may be used as the number of words stored in the important word candidate word storage means. Further, the important word candidate word storage means can be configured by using a memory, for example.

【００３０】また、本発明に係る言語処理装置では、重
要語抽出手段は一致文字列部分検出手段により検出され
た文字列部分から所定の条件を満たす語を除いて重要語
を抽出する。ここで、所定の条件としては、種々な条件
が用いられてもよい。具体例として、１文字の語という
条件が用いられる場合には、１文字の語を除いて重要語
が抽出され、つまり、２文字以上の重要語が抽出され
る。Further, in the language processing apparatus according to the present invention, the important word extracting means extracts the important words from the character string portion detected by the matching character string portion detecting means, excluding words satisfying a predetermined condition. Here, various conditions may be used as the predetermined condition. As a specific example, when the condition of one-character word is used, important words are extracted except for one-character words, that is, two or more important words are extracted.

【００３１】また、本発明に係る言語処理装置では、重
要度付与手段が重要語抽出手段により抽出される重要語
に対して重要度を付与する。ここで、重要語の重要度と
しては、例えば、重要である方が値が大きくなる数値な
どを用いることができる。また、重要語の重要度は、例
えば複数の重要語が抽出されたような場合に、これら複
数の重要語を順序付けるためや、これら複数の重要語か
ら一部を選択するためなどに用いることができる。Further, in the language processing apparatus according to the present invention, the importance degree giving means gives the importance degree to the important word extracted by the important word extracting means. Here, as the importance of the important word, for example, a numerical value in which the more important the value is, the larger the value can be used. Also, the importance of important words should be used, for example, in order to order these important words or to select some of these important words when multiple important words are extracted. You can

【００３２】また、本発明に係る言語処理装置では、一
構成例として、重要度付与手段では、重要度出現頻度演
算手段が重要語抽出手段により抽出される重要語の出現
頻度を演算し、重要語出現頻度情報記憶手段が重要語抽
出手段により抽出された重要語と当該重要語の出現頻度
の情報とを対応付けて記憶し、重要語重要度演算手段が
重要語の出現頻度に基づいて当該重要語の重要度を演算
する。Further, in the language processing apparatus according to the present invention, as one configuration example, in the importance degree giving means, the importance degree appearance frequency calculating means calculates the appearance frequency of the important word extracted by the important word extracting means, The word appearance frequency information storage means stores the important word extracted by the important word extraction means and the information of the appearance frequency of the important word in association with each other, and the important word importance degree calculation means performs the operation based on the appearance frequency of the important word. Calculate the importance of important words.

【００３３】ここで、重要語出現頻度情報記憶手段とし
ては、例えばメモリを用いて構成することができる。ま
た、重要語の出現頻度としては、例えば同一の重要語が
抽出された回数などを用いることができる。また、例え
ば重要語の重要度が当該重要語の出現頻度に比例するよ
うな態様を用いることができる。Here, as the important word appearance frequency information storage means, for example, a memory can be used. As the appearance frequency of the important word, for example, the number of times the same important word is extracted can be used. Further, for example, a mode in which the importance of an important word is proportional to the frequency of appearance of the important word can be used.

【００３４】また、本発明に係る言語処理装置では、一
構成例として、単語間関連度情報記憶手段が単語間の関
連度の情報を記憶し、重要語関連語取得手段が単語間関
連度情報記憶手段の記憶内容に基づいて重要語抽出手段
により抽出された重要語に関連する他の単語を取得す
る。なお、取得される当該他の単語は、例えば、対話を
行っている者たちに対して次の話題を考えるための重要
語を提供する場合などにおいて、抽出された重要語から
推測される他の重要語として用いられる。Further, in the language processing apparatus according to the present invention, as one configuration example, the inter-word relevance information storage means stores the relevance information between words, and the important word relevance word acquisition means stores the inter-word relevance information. Another word related to the important word extracted by the important word extracting means is acquired based on the stored contents of the storing means. Note that the other words to be acquired may be other words inferred from the extracted important words, for example, in the case of providing important words for thinking of the next topic to those who are having a dialogue. Used as an important word.

【００３５】ここで、単語間関連度情報記憶手段として
は、例えばメモリを用いて構成することができる。ま
た、単語間の関連度の情報としては、例えば関連のある
複数の単語の情報及びこれらの関連度の情報などが用い
られる。また、重要語関連語取得手段により取得する単
語としては、種々な単語が用いられてもよく、例えば抽
出された複数の重要語に最も関連する１つの単語を取得
することや、また、例えば抽出された重要語に関連する
全ての単語を取得することなどができる。Here, the inter-word degree-of-relationship information storage means can be configured by using a memory, for example. As information on the degree of association between words, for example, information on a plurality of related words and information on the degree of association thereof are used. Further, various words may be used as the words acquired by the important word related word acquiring means, for example, one word most relevant to the extracted plural important words may be acquired, or, for example, extraction may be performed. It is possible to obtain all the words related to the specified important word.

【００３６】また、以上に示したような本発明に係る技
術思想は、例えば方法や、プログラムや、このようなプ
ログラムを記憶した記憶媒体などに適用することも可能
である。例えば、本発明に係る言語処理方法では、テキ
スト情報を取得し、取得したテキスト情報と記憶手段に
記憶されたテキスト情報とで一致する文字列部分を検出
し、検出した文字列部分から重要語を抽出する。ここ
で、記憶手段としては、例えばメモリを用いて構成され
る。The technical idea according to the present invention as described above can be applied to, for example, a method, a program, a storage medium storing such a program, or the like. For example, in the language processing method according to the present invention, text information is acquired, a matching character string portion is detected between the acquired text information and the text information stored in the storage means, and an important word is detected from the detected character string portion. Extract. Here, for example, a memory is used as the storage means.

【００３７】また、本発明に係るプログラムでは、テキ
スト情報を取得する機能と、取得したテキスト情報とメ
モリに記憶されたテキスト情報とで一致する文字列部分
を検出する機能と、検出した文字列部分から重要語を抽
出する機能と、をコンピュータにより実現する。ここ
で、プログラムとしては、種々なプログラムが用いられ
てもよい。Further, in the program according to the present invention, the function of acquiring the text information, the function of detecting the matching character string portion between the acquired text information and the text information stored in the memory, and the detected character string portion The function of extracting important words from is realized by a computer. Here, various programs may be used as the program.

【００３８】また、本発明に係る記憶媒体では、コンピ
ュータに実行させるプログラムを当該コンピュータの入
力手段により読み取り可能に記憶しており、当該プログ
ラムは、テキスト情報を取得する処理と、取得したテキ
スト情報とメモリに記憶されたテキスト情報とで一致す
る文字列部分を検出する処理と、検出した文字列部分か
ら重要語を抽出する処理を当該コンピュータに実行させ
る。ここで、記憶媒体としては、例えばフロッピー（登
録商標）ディスクや、ＣＤ（Compact Disk）−ＲＯＭ
（Read Only Memory）などの種々なものが用いられても
よい。Further, in the storage medium according to the present invention, the program to be executed by the computer is stored so that it can be read by the input means of the computer, and the program stores the process of acquiring text information and the acquired text information. The computer is caused to execute a process of detecting a character string portion that matches the text information stored in the memory and a process of extracting an important word from the detected character string portion. Here, the storage medium is, for example, a floppy (registered trademark) disk or a CD (Compact Disk) -ROM.
Various types such as (Read Only Memory) may be used.

【００３９】[0039]

【発明の実施の形態】本発明に係る一実施例を図面を参
照して説明する。本例では、対話データから重要語を抽
出する対話処理装置に本発明を適用した場合を示す。図
１には、本例の対話処理装置の構成例を示してある。本
例の対話処理装置には、マイクデバイス１１とディクテ
ーション処理部１２を有する対話入力部１と、対話記録
装置１３と入力データ格納部１４を有する対話記録部２
と、文字列比較部１５を有する対話比較部３と、単語辞
書部１６と形態素解析部１７と重要語単語抽出部１８を
有する重要語抽出部４とが備えられている。BEST MODE FOR CARRYING OUT THE INVENTION An embodiment of the present invention will be described with reference to the drawings. In this example, a case is shown in which the present invention is applied to a dialogue processing device that extracts important words from dialogue data. FIG. 1 shows a configuration example of the dialogue processing apparatus of this example. The dialogue processing apparatus of this example includes a dialogue input unit 1 having a microphone device 11 and a dictation processing unit 12, and a dialogue recording unit 2 having a dialogue recording device 13 and an input data storage unit 14.
A dialogue comparison unit 3 having a character string comparison unit 15, a word dictionary unit 16, a morpheme analysis unit 17, and an important word extraction unit 4 having an important word extraction unit 18 are provided.

【００４０】対話入力部１は、対話を認識して音声デー
タからテキストデータへ変換する。対話記録部２は、対
話比較部３により比較を終えたテキストデータを対話記
録装置１３に格納する。対話比較部３は、過去の対話に
おけるテキストデータと現在に入力されたテキストデー
タとを比較し、一致した文字列を出力する。重要語抽出
部４は、当該一致した文字列に対して形態素解析を行っ
て当該文字列を品詞情報付きの単語リストへ変換し、当
該単語リストから特定の品詞の単語を重要語として出力
する。The dialogue input unit 1 recognizes a dialogue and converts voice data into text data. The dialogue recording unit 2 stores the text data, which has been compared by the dialogue comparing unit 3, in the dialogue recording device 13. The dialogue comparison unit 3 compares the text data in the past dialogue with the text data input at present, and outputs the matched character string. The important word extraction unit 4 performs morphological analysis on the matched character string to convert the character string into a word list with part-of-speech information, and outputs a word having a specific part-of-speech from the word list as an important word.

【００４１】以下で、本例の対話処理装置により行われ
る動作の一例を示す。本例では、例えば誤り文字列やあ
いまい性を含むような音声データから重要語を抽出す
る。対話入力部１では、対話における音声をマイクデバ
イス１１から入力し、マイクデバイス１１から入力され
た音声データをディクテーション処理部１２によりひら
がなや、カタカナや、ローマ字などの表音記号によって
表されるテキストデータへ変換して対話比較部３へ出力
する。An example of the operation performed by the dialogue processing apparatus of this example will be shown below. In this example, an important word is extracted from voice data that includes an erroneous character string or ambiguity, for example. In the dialogue input unit 1, the voice in the dialogue is input from the microphone device 11, and the voice data input from the microphone device 11 is text data represented by phonetic symbols such as hiragana, katakana, and Roman letters by the dictation processing unit 12. And outputs to the dialogue comparison unit 3.

【００４２】対話比較部３では、対話入力部１から受け
取ったテキストデータを文字列比較部１５により過去の
対話記録データ２１と比較して、一致する部分の文字列
を重要語抽出部４へ出力する。ここで、過去の対話記録
データ２１は、対話における過去の音声データをテキス
トデータとしたものであり、対話記録部２から供給され
る。また、文字列比較部１５は、対話入力部１から受け
取ったテキストデータを対話記録部２へ出力する。In the dialogue comparison unit 3, the text data received from the dialogue input unit 1 is compared with the past dialogue recording data 21 by the character string comparison unit 15, and the matching character string is output to the important word extraction unit 4. To do. Here, the past dialogue record data 21 is the past voice data in the dialogue as text data, and is supplied from the dialogue recording unit 2. The character string comparison unit 15 also outputs the text data received from the dialogue input unit 1 to the dialogue recording unit 2.

【００４３】対話記録部２では、対話入力部１から受け
取って対話比較部３による前記比較処理が終了したテキ
ストデータを入力データ格納部３４により対話記録装置
１３に保持してある対話記録データに追加する形で記録
する。具体例として、入力されるテキストデータが「こ
のちほうのおおきなだいごみはさかなです」という文の
テキストデータであり、追加前の対話記録データが「ご
みのふほうとうきがこのちほうでもしんこくです」とい
う文のテキストデータであった場合には、追加後の対話
記録データは「ごみのふほうとうきがこのちほうでもし
んこくです／このちほうのおおきなだいごみはさかなで
す」という文のテキストデータとなる。In the dialogue recording unit 2, the text data received from the dialogue input unit 1 and subjected to the comparison processing by the dialogue comparing unit 3 is added to the dialogue recording data held in the dialogue recording device 13 by the input data storage unit 34. Record in the form of As a specific example, the text data to be input is the text data of the sentence "The big trash of this place is fish", and the dialogue record data before the addition is "The trash of the trash is also good in this place." When the text data of the sentence is added, the dialogue record data after the addition is the text data of the sentence "The trash of the garbage is safe even in this area / The big garbage of this garbage is fish".

【００４４】なお、対話記録装置１３に記録された対話
データを、例えば一定の期間保持した後に、削除するよ
うな態様を用いることもできる。また、例えば特開平８
−１３７８７４号公報などに記載された既存の技術を用
いて話題転換を検出し、話題転換が行われると同時に、
対話記録装置１３に記録された対話データを削除するよ
うな態様を用いることもできる。It is also possible to use a mode in which the dialogue data recorded in the dialogue recording device 13 is deleted, for example, after being held for a certain period. In addition, for example, Japanese Patent Laid-Open No.
The topic conversion is detected by using the existing technology described in Japanese Patent Publication No. 137874, and at the same time when the topic conversion is performed,
It is also possible to use a mode in which the dialogue data recorded in the dialogue recording device 13 is deleted.

【００４５】重要語抽出部４では、まず、形態素解析部
１７が、品詞情報などを記述した単語辞書を格納した単
語辞書部１６の辞書内容を参照して、対話比較部３から
入力された文字列に対して形態素解析を行うことによ
り、当該文字列について単語毎に区切られて各単語に品
詞情報が付与された単語リストを生成して重要語単語抽
出部１８へ出力する。次に、重要語単語抽出部１８が、
形態素解析部１７から入力された単語リストから特定の
品詞の単語や特定の語を抽出し、これらを重要語２２と
して出力する。In the important word extraction unit 4, first, the morpheme analysis unit 17 refers to the dictionary contents of the word dictionary unit 16 which stores the word dictionary in which part-of-speech information is described, and refers to the characters input from the dialogue comparison unit 3. By performing morphological analysis on the string, a word list in which the character string is divided into words and part-of-speech information is added to each word is generated and output to the important word extraction unit 18. Next, the important word extraction unit 18
A word having a specific part of speech or a specific word is extracted from the word list input from the morphological analysis unit 17, and these are output as the important word 22.

【００４６】次に、図２を参照して、文字列比較部１５
により、対話入力部１から入力されるテキストデータと
対話記録部２に記録された対話データとを比較して一致
する文字列を出力する処理の手順の一例を示す。まず、
対話入力部１から入力されるテキストデータをString_c
urrentという変数に読み込む（ステップＳ１）。本例で
は、入力されるテキストデータが「このちほうのおおき
なだいごみはさかなです」という文のテキストデータで
あると想定する。Next, referring to FIG. 2, the character string comparison unit 15
Thus, an example of a processing procedure for comparing the text data input from the dialogue input unit 1 with the dialogue data recorded in the dialogue recording unit 2 and outputting a matching character string will be described. First,
The text data input from the dialogue input unit 1 is String_c
It is read into a variable called urrent (step S1). In this example, it is assumed that the input text data is the text data of the sentence "This big trash is fish".

【００４７】次に、対話記録装置１３から過去の対話に
おけるテキストデータ（過去の対話データ）を入力して
String_logに読み込む（ステップＳ２）。本例では、入
力される対話データが「ごみのふほうとうきがこのちほ
うでもしんこくです」という文のテキストデータである
と想定する。次に、上記したString_currentをString_o
rgという変数にコピーする（ステップＳ３）。Next, the text data in the past dialogue (past dialogue data) is input from the dialogue recording device 13.
Read in String_log (step S2). In this example, it is assumed that the input conversational data is text data of a sentence, "The garbage and futoku are even better." Next, change the above String_current to String_o
Copy it to a variable called rg (step S3).

【００４８】ここで、以降の処理（ステップＳ５〜ステ
ップＳ１０の処理）については、ループ１の処理とし
て、String_orgの文字列の長さが０になるまで処理を繰
り返す（ステップＳ４）。また、以降の処理（ステップ
Ｓ７、ステップＳ８、ステップＳ１０の処理）について
は、ループ２の処理として、String_currentの文字列の
長さが０になるまで処理を繰り返す（ステップＳ５）。Here, as for the subsequent processing (the processing of steps S5 to S10), the processing of loop 1 is repeated until the length of the character string of String_org becomes 0 (step S4). As for the subsequent processing (the processing of step S7, step S8, and step S10), the processing of loop 2 is repeated until the length of the character string of String_current becomes 0 (step S5).

【００４９】ループ１の処理の中のループ２の処理で
は、まず、String_currentとString_logとを比較して、
String_currentの全体がString_logの一部と一致するか
否かを判定する（ステップＳ６）。一致しない場合に
は、String_currentの最後尾の１文字を削除する（ステ
ップＳ１０）。In the processing of loop 2 in the processing of loop 1, first, String_current and String_log are compared and
It is determined whether the entire String_current matches a part of the String_log (step S6). If they do not match, the last character of String_current is deleted (step S10).

【００５０】本例では、まず、「このちほうのおおきな
だいごみはさかなです」という文字の列と「ごみのふほ
うとうきがこのちほうでもしんこくです」という文字の
列とが比較される。この段階では、両者の間には上記の
ような部分的な一致の関係がないため、String_current
の最後尾の一文字が削除されて、String_currentは「こ
のちほうのおおきなだいごみはさかなで」という文字の
列になる。そして、このような１文字の削除処理を繰り
返した結果、本例では、String_currentは「このちほ
う」という文字の列となり、この段階で、String_logの
一部と一致する。[0050] In this example, first, the string of characters "This big trash is fish" is compared with the string of characters "The trash of the garbage is safe in this garbage". At this stage, there is no such partial matching relationship as described above, so String_current
The last character of the is deleted, and String_current becomes a string of characters "The big trash in this place is in the fish." Then, as a result of repeating such a character deletion process, in this example, String_current becomes a string of characters "this way", and at this stage, it matches a part of String_log.

【００５１】このように、String_currentの文字列がSt
ring_logと部分的に一致すると（ステップＳ６）、当該
String_currentの文字列が重要語抽出部４へ出力される
（ステップＳ７）。次に、String_currentと一致した箇
所の文字列部分をString_orgから削除し、当該削除後の
String_orgの文字列をString_currentにコピーする（ス
テップＳ８）。本例では、「このちほう」という文字列
部分がString_orgから削除され、この結果、当該削除後
のStr ing_orgは「のおおきなだいごみはさかなです」
という文字の列になる。In this way, the character string of String_current is St
If it partially matches the ring_log (step S6),
The character string of String_current is output to the important word extraction unit 4 (step S7). Next, delete the part of the string that matches the String_current from String_org, and
The character string of String_org is copied to String_current (step S8). In this example, the character string "Kono Chiho" is deleted from String_org, and as a result, String_org after the deletion is "Onodai Daibashi is a fish."
Will be a string of characters.

【００５２】上記のようなループ２の処理を繰り返して
行った結果、String_currentの長さが０になると、当該
ループ２の処理をいったん終了する。そして、ループ１
の処理として、String_orgの先頭の一文字を削除して
（ステップＳ９）、その後、String_orgをString_curre
ntにコピーして、上記したループ２の処理を再び行う。
本例では、１回目のループ２の処理では、String_orgは
「のおおきなだいごみはさかなです」になる。When the length of String_current becomes 0 as a result of repeating the processing of loop 2 as described above, the processing of loop 2 concerned is once terminated. And loop 1
As the processing of, the first character of String_org is deleted (step S9), and then String_org is replaced with String_curre.
After copying to nt, the above processing of loop 2 is performed again.
In this example, in the first loop 2 process, String_org becomes "large big trash is fish".

【００５３】上記のようなループ２の処理を含むループ
１の処理を繰り返して行った結果、String_orgの長さが
０になると、処理を終了する。本例では、このような文
字列比較部１５による処理により、「このちほう」、
「の」、「き」、「ごみ」、「です」という５つの文字
列が検出されて重要語抽出部４へ出力される。When the length of String_org becomes 0 as a result of repeating the processing of loop 1 including the processing of loop 2 as described above, the processing is terminated. In this example, by the processing by the character string comparison unit 15 as described above, “this way”,
Five character strings “no”, “ki”, “garbage”, and “da” are detected and output to the important word extraction unit 4.

【００５４】次に、重要語抽出部４により、重要語を抽
出する処理の手順の一例を示す。まず、形態素解析部１
７が、単語辞書部１６の単語辞書を参照して、対話比較
部３から入力された文字列を品詞情報付きの単語リスト
へ変換する。本例では、変換対象となる文字列として、
「このちほう」、「の」、「き」、「ごみ」、「です」
という５つの文字列が入力されたと想定する。この場
合、本例では、これらの文字列を変換した結果、｛この
（連体詞）／ちほう（名詞-一般）｝、｛の（連体助
詞）｝、｛き（名詞-一般）｝、｛ごみ（名詞-一
般）｝、{です(助動詞)}という情報が品詞情報付きの単
語リストとして得られる。Next, an example of a procedure of processing for extracting an important word by the important word extracting section 4 will be shown. First, the morphological analysis unit 1
7 refers to the word dictionary of the word dictionary unit 16 and converts the character string input from the dialogue comparison unit 3 into a word list with part-of-speech information. In this example, as the character string to be converted,
"This way", "no", "ki", "garbage", "is"
It is assumed that the following five character strings are input. In this case, in this example, as a result of converting these character strings, {this (adjective) / chiho (noun-general)}, {no (adnominal particle)}, {ki (noun-general)}, {garbage ( Noun-general)}, {is (auxiliary verb)} information is obtained as a word list with part-of-speech information.

【００５５】次に、重要語単語抽出部１８が、形態素解
析部１７により得られた単語リストから、特定の品詞の
単語を抽出する。本例では、名詞を抽出対象とする。こ
の場合、本例では、「ちほう」、「き」、「ごみ」とい
う３つの語が重要語２２として抽出される。Next, the important word word extraction unit 18 extracts a word having a specific part of speech from the word list obtained by the morpheme analysis unit 17. In this example, a noun is the extraction target. In this case, in this example, three words “chiho”, “ki”, and “garbage” are extracted as the important words 22.

【００５６】なお、例えば、形態素解析を行う前に、前
記した「の」のような長さが１である文字列については
削除するなどといったフィルタリング処理を行う構成と
することも可能である。このようなフィルタリング処理
を行う構成では、前記した４つの文字列から、１文字の
文字列に該当する「の」と「き」が削除されることとな
るため、重要語としては「ちほう」と「ごみ」という２
つの語が抽出される。Note that, for example, it is possible to perform a filtering process such as deleting the character string having a length of 1 such as "no" before performing the morphological analysis. In the configuration for performing such filtering processing, “no” and “ki” corresponding to a character string of one character are deleted from the above-mentioned four character strings, so that “chiho” is an important word. 2 called "garbage"
One word is extracted.

【００５７】また、例えば、特開平８−１３７８７４号
公報などに記載された既存の技術を用いて、抽出された
重要語に対してその出現頻度に基づいて重要度を付与
し、付与した重要度に応じて複数の重要語を任意の数に
絞り込むような処理を行う構成とすることも可能であ
る。また、例えば、単語と単語との間の関連度を記述し
た辞書に問い合わせて、抽出された重要語に関連の高い
語を選定することにより、新たな重要語を予測するよう
な処理を行う構成とすることも可能である。Further, for example, using the existing technology described in Japanese Patent Laid-Open No. 8-137874, the extracted important words are given a degree of importance based on their appearance frequency, and the given degree of importance is given. It is also possible to adopt a configuration in which a plurality of important words are narrowed down to an arbitrary number according to the above. In addition, for example, a structure that performs a process of predicting a new important word by inquiring a dictionary that describes the degree of association between words and selecting a word that is highly related to the extracted important word It is also possible to

【００５８】以上のように、本例の対話処理装置では、
対話入力部１が例えば２人以上により行われる対話から
音声データを受け取って当該音声データをテキストデー
タへ変換し、対話比較部３が入力されたテキストデータ
（対話データ）と記録しておいた過去のテキストデータ
（対話データ）とを比較して一致した部分の文字列を出
力し、対話記録部２が入力された対話データを記憶し、
重要語抽出部４が対話比較部３から受け取った文字列か
ら特定の単語を抽出することが行われる。As described above, in the dialogue processing apparatus of this example,
The past in which the dialogue input unit 1 receives voice data from a dialogue performed by, for example, two or more people, converts the voice data into text data, and records the input text data (conversation data) by the dialogue comparison unit 3. The text data of the dialogue (interaction data) is output, the character string of the matched portion is output, and the dialogue recording unit 2 stores the input dialogue data,
The important word extraction unit 4 extracts a specific word from the character string received from the dialogue comparison unit 3.

【００５９】また、本例の対話処理装置では、例えば対
話記録部２が入力された対話データを或る一定の期間の
み記憶することや、例えば対話記録部２が入力された対
話データを話題転換が検出されるまで記憶することが行
われる。Further, in the dialogue processing apparatus of this example, for example, the dialogue recording unit 2 stores the inputted dialogue data only for a certain period, or, for example, the dialogue recording unit 2 switches the conversation data inputted into a topic. Is stored until is detected.

【００６０】また、本例の対話処理装置では、例えば重
要語抽出部４が対話比較部３から受け取った文字列を形
態素解析部１７により品詞情報付きの単語リストへ変換
して重要語単語抽出部１８により特定の品詞の単語を抽
出することや、例えば重要語抽出部４が単語辞書保持装
置によりキーワードとなる単語を記憶して重要語単語抽
出部１８により対話比較部３から受け取った文字列の中
から単語辞書保持装置に記憶されたキーワードと一致す
る単語を抽出することが行われる。Further, in the dialogue processing apparatus of this example, for example, the important word extraction unit 4 converts the character string received from the dialogue comparison unit 3 into a word list with part-of-speech information by the morphological analysis unit 17, and the important word word extraction unit. 18 is used to extract a word having a specific part of speech, or, for example, the important word extraction unit 4 stores a word serving as a keyword by the word dictionary holding device, and the important word extraction unit 18 extracts a character string received from the dialogue comparison unit 3. A word that matches the keyword stored in the word dictionary holding device is extracted from the inside.

【００６１】また、本例の対話処理装置では、例えば重
要語抽出部４がフィルタリング機能により対話比較部３
から受け取った文字列から不要な文字列を削除して重要
語を抽出することが行われる。具体例としては、予め設
定された条件に合う文字列を重要語抽出の対象から削除
する構成とし、例えば、１文字から成る文字列や、不明
な記号を含む文字列や、「」（かぎかっこ）やメールア
ドレスで用いられる＠といった記号の文字列などを重要
語抽出の対象から削除するように設定する。Further, in the dialogue processing apparatus of this example, for example, the important word extracting unit 4 uses the filtering function to make the dialogue comparing unit 3
Unnecessary character strings are deleted from the character strings received from to extract important words. As a specific example, a character string that meets a preset condition is deleted from the important word extraction target. For example, a character string that consists of one character, a character string that includes an unknown symbol, or "" (square brackets). ) Or a character string such as @ used in an email address is set to be deleted from the target of important word extraction.

【００６２】また、本例の対話処理装置では、例えば重
要語抽出部４が重要度付与機能により単語に対して重要
度を付与することが行われる。また、本例の対話処理装
置では、重要度付与の処理として、例えば頻度計算機能
が抽出された単語の頻度を計算し、頻度情報保持装置が
単語と頻度の情報を保持し、重要度計算機能が頻度情報
に基づいて単語の重要度を計算することが行われる。Further, in the dialogue processing apparatus of this example, for example, the important word extraction unit 4 gives the importance degree to the word by the importance degree giving function. Further, in the dialogue processing apparatus of this example, as the processing of assigning the importance, for example, the frequency calculation function calculates the frequency of the extracted word, and the frequency information holding device holds the information of the word and the frequency, and the importance calculation function. Calculates the importance of a word based on frequency information.

【００６３】また、本例の対話処理装置では、例えば重
要語抽出部４が、関連語辞書保持装置により単語間の関
連度を記述してある関連語辞書を保持して、重要語予測
機能により単語間の関連度に基づいて新たな語を予測し
て重要語として出力することが行われる。In the dialogue processing apparatus of this example, for example, the important word extraction unit 4 holds the related word dictionary in which the degree of relevance between words is described by the related word dictionary holding device, and uses the important word prediction function. A new word is predicted based on the degree of association between words and output as an important word.

【００６４】また、本例の対話処置装置では、対話にお
ける音声データをテキストデータへ変換して重要語を抽
出する構成例を示したが、他の構成例として、チャット
などにおけるテキストデータを入力して当該テキストデ
ータから重要語を抽出するようなことも可能であり、こ
の場合、例えば対話入力部１は、ネットワーク上におい
て２人以上により文字列によって対話を行うことができ
る文字入力端末を備える。文字入力端末としては、例え
ばキーボードを備えるパーソナルコンピュータなど、種
々なものを用いて構成することができる。Further, in the dialog processing apparatus of this example, an example of the structure in which the voice data in the dialog is converted into the text data and the important word is extracted is shown. However, as another example of the structure, the text data in the chat etc. is inputted. It is also possible to extract an important word from the text data, and in this case, for example, the dialogue input unit 1 includes a character input terminal that allows two or more persons to have a dialogue with a character string on the network. As the character input terminal, various ones such as a personal computer having a keyboard can be used.

【００６５】以上のような構成により、本例の対話処理
装置では、例えば認識誤りやあいまい性を含む音声認識
データなどからであっても、このような認識誤りなどの
影響を低減させて、適切な重要語を抽出することが可能
であり、抽出される重要語の確からしさを高めることが
できる。With the above-mentioned configuration, the dialogue processing apparatus of the present embodiment can reduce the influence of such recognition error even if it is, for example, from voice recognition data including recognition error or ambiguity, and it is appropriate. It is possible to extract important important words, and it is possible to increase the certainty of the extracted important words.

【００６６】なお、本例では、対話記録部２の機能によ
りテキスト情報記憶手段が構成されており、対話入力部
１の機能によりテキスト情報取得手段が構成されてお
り、対話比較部３の機能により一致文字列部分検出手段
が構成されており、重要語抽出部４の機能により重要語
抽出手段が構成されている。また、本例では、マイクデ
バイス１１の機能により音声情報入力手段が構成されて
おり、ディクテーション処理部１２の機能により音声認
識手段が構成されている。In this example, the function of the dialogue recording section 2 constitutes the text information storage means, the function of the dialogue input section 1 constitutes the text information acquisition means, and the function of the dialogue comparison section 3 The matching character string portion detecting means is constituted, and the function of the important word extracting portion 4 constitutes the important word extracting means. Further, in this example, the function of the microphone device 11 constitutes the voice information input means, and the function of the dictation processing section 12 constitutes the voice recognition means.

【００６７】また、本例では、形態素解析部１７の機能
により形態素解析手段が構成されており、重要語単語抽
出部１８の機能により所定品詞単語抽出手段が構成され
ている。また、本例では、例えば、重要語として抽出す
る候補となる単語を記憶する重要語候補単語記憶手段の
機能及び当該記憶された単語と一致する単語を文字列部
分の中から重要語として抽出する一致単語抽出手段の機
能を重要語抽出部４に備えることもできる。Further, in this example, the function of the morpheme analysis unit 17 constitutes a morpheme analysis unit, and the function of the important word word extraction unit 18 constitutes a predetermined part-of-speech word extraction unit. Further, in this example, for example, the function of an important word candidate word storage unit that stores words that are candidates to be extracted as important words and a word that matches the stored word is extracted as an important word from the character string portion. The function of the matching word extracting means may be provided in the important word extracting unit 4.

【００６８】また、本例では、例えば、話題の転換を検
出する話題転換検出手段の機能を対話記録部２などに備
えることもできる。また、本例では、例えば、重要度付
与手段の機能を重要語抽出部４などに備えることもで
き、当該機能として例えば重要語出現頻度演算手段の機
能と重要語出現頻度情報記憶手段の機能と重要語重要度
演算手段の機能を重要語抽出部４などに備えることもで
きる。また、本例では、例えば、単語間関連度情報記憶
手段の機能及び単語間の関連度の情報に基づいて抽出さ
れた重要語に関連する他の単語を取得する重要語関連語
取得手段の機能を重要語抽出部４などに備えることもで
きる。Further, in this example, for example, the dialogue recording section 2 or the like can be provided with a function of topic conversion detecting means for detecting conversion of a topic. Further, in the present example, for example, the function of the importance degree giving means may be provided in the important word extracting unit 4 and the like, and as the functions, for example, the function of the important word appearance frequency calculating means and the function of the important word appearance frequency information storing means. The function of the important word importance calculating means may be provided in the important word extracting unit 4 or the like. Further, in the present example, for example, the function of the inter-word degree-of-association information storage unit and the function of the important word-related word acquisition unit that acquires other words related to the important word extracted based on the information of the degree of association between words Can be provided in the important word extraction unit 4 or the like.

【００６９】ここで、本発明に係る言語処理装置などの
構成としては、必ずしも以上に示したものに限られず、
種々な構成が用いられてもよい。また、本発明の適用分
野としては、必ずしも以上に示したものに限られず、本
発明は、種々な分野に適用することが可能なものであ
る。Here, the configuration of the language processing apparatus and the like according to the present invention is not necessarily limited to the one shown above,
Various configurations may be used. Further, the application fields of the present invention are not necessarily limited to those shown above, and the present invention can be applied to various fields.

【００７０】また、本発明に係る言語処理装置などにお
いて行われる各種の処理としては、例えばプロセッサや
メモリ等を備えたハードウエア資源においてプロセッサ
がＲＯＭ（Read Only Memory）に格納された制御プログ
ラムを実行することにより制御される構成が用いられて
もよく、また、例えば当該処理を実行するための各機能
手段が独立したハードウエア回路として構成されてもよ
い。また、本発明は上記の制御プログラムを格納したフ
ロッピー（登録商標）ディスクやＣＤ（Compact Disc）
−ＲＯＭ等のコンピュータにより読み取り可能な記録媒
体や当該プログラム（自体）として把握することもで
き、当該制御プログラムを記録媒体からコンピュータに
入力してプロセッサに実行させることにより、本発明に
係る処理を遂行させることができる。As various processes performed in the language processing apparatus according to the present invention, the processor executes a control program stored in a ROM (Read Only Memory) in a hardware resource including a processor and a memory, for example. A configuration controlled by doing so may be used, and for example, each functional unit for executing the processing may be configured as an independent hardware circuit. The present invention also relates to a floppy (registered trademark) disc or a CD (Compact Disc) storing the above control program.
It can be understood as a computer-readable recording medium such as a ROM or the program (itself), and the processing according to the present invention is performed by inputting the control program into the computer from the recording medium and causing the processor to execute the control program. Can be made.

【００７１】[0071]

【発明の効果】以上説明したように、本発明に係る言語
処理装置によると、例えば過去のテキスト情報を記憶
し、例えば現在のテキスト情報を取得し、取得したテキ
スト情報と記憶されたテキスト情報とで一致する文字列
部分を検出し、検出した文字列部分から重要語を抽出す
るようにしたため、例えば記憶されるテキスト情報や取
得されるテキスト情報に誤りやあいまい性が含まれるよ
うな場合においても、適切な重要語を抽出することがで
きる。As described above, according to the language processing apparatus of the present invention, for example, past text information is stored, for example, current text information is acquired, and the acquired text information and the stored text information are stored. Since the key word is extracted from the detected character string part by detecting the matching character string part, even when the stored text information or the acquired text information contains an error or ambiguity, , It is possible to extract appropriate important words.

[Brief description of drawings]

【図１】本発明の一実施例に係る対話処理装置の構成
例を示す図である。FIG. 1 is a diagram showing a configuration example of a dialogue processing apparatus according to an embodiment of the present invention.

【図２】文字列比較部により行われる処理の手順の一
例を示す図である。FIG. 2 is a diagram showing an example of a procedure of processing performed by a character string comparison unit.

【図３】本発明に係る言語処理装置の概略的な構成例
を示す図である。FIG. 3 is a diagram showing a schematic configuration example of a language processing device according to the present invention.

[Explanation of symbols]

１・・対話入力部、２・・対話記録部、３・・対話
比較部、４・・重要語抽出部、１１・・マイクデバイ
ス、１２・・ディクテーション処理部、１３・・対話
記録装置、１４・・入力データ格納部、１５・・文字
列比較部、１６・・単語辞書部、１７・・形態素解析
部、１８・・重要語単語抽出部、２１・・対話記録デ
ータ、２２・・重要語、３１・・テキスト情報記憶
部、３２・・テキスト情報取得部、３３・・一致文字
列部分検出部、３４・・重要語抽出部、４１・・テキ
スト情報、1 ... Dialogue input unit, 2 ... Dialogue recording unit, 3 ... Dialogue comparison unit, 4 ... Important word extraction unit, 11 ... Microphone device, 12 ... Dictation processing unit, 13 ... Dialogue recording device, 14 .. Input data storage unit, 15 ... Character string comparison unit, 16 ... Word dictionary unit, 17 ... Morphological analysis unit, 18 ... Important word extraction unit, 21 ... Dialog recording data, 22 ... Important word , 31 ... text information storage unit, 32 ... text information acquisition unit, 33 ... matching character string portion detection unit, 34 ... important word extraction unit, 41 ... text information,

フロントページの続きＦターム(参考） 5B091 AA15 BA02 CA02 CB12 CC01 CC15 5D015 KK02 Continued front page F term (reference) 5B091 AA15 BA02 CA02 CB12 CC01 CC15 5D015 KK02

Claims

[Claims]

1. A text information storage means for storing text information, a text information acquisition means for acquiring text information, text information acquired by the text information acquisition means, and text information stored in the text information storage means. A language comprising: a matching character string portion detecting means for detecting a matching character string portion; and an important word extracting means for extracting an important word from the character string portion detected by the matching character string portion detecting means. Processing equipment.

2. The language processing apparatus according to claim 1, wherein the text information stored in the text information storage means is text information obtained by voice recognition of voice information. apparatus.

3. The language processing apparatus according to claim 1 or 2, wherein the text information acquisition means recognizes voice information input means for inputting voice information and voice information input by the voice information input means. And a speech recognition means for converting the text information into text information, and a language processing device.

4. The language processing device according to claim 1, wherein the text information stored in the text information storage means and the text information acquired by the text information acquisition means are related to each other. A language processing device having the following contents.

5. The language processing apparatus according to claim 3, wherein the voice information input means of the text information acquisition means inputs voice information currently issued by a dialogue performed by two or more persons, and stores it in the text information storage means. A language processing device characterized in that the stored text information is text information obtained by voice recognition of voice information issued in the past in the dialogue.

6. The language processing apparatus according to claim 5, wherein the text information storage means deletes the stored text information from the stored contents when a predetermined period has elapsed. apparatus.

7. The language processing apparatus according to claim 5, wherein dialogue is performed based on voice information input by voice information input means of the text information acquisition means or text information converted by voice recognition means of the text information acquisition means. And a text information storage means for deleting the stored text information from the stored contents when the topic conversion is detected by the topic conversion detection means. Language processing device.

8. The language processing device according to claim 1, wherein the text information acquisition means acquires text information currently issued by a dialogue performed by two or more people on the network using the text information, and the text information is acquired. The language processing device, wherein the text information stored in the storage means is text information issued in the past in the dialogue.

9. The language processing device according to claim 1, wherein the important word extraction means performs morphological analysis on the character string portion detected by the matched character string portion detection means to perform part-of-speech. It is configured by using a morpheme analysis unit that acquires a word list with information, and a predetermined part-of-speech word extraction unit that extracts a word of a predetermined part-of-speech as an important word from the word list acquired by the morpheme analysis unit, A language processing device characterized by.

10. The language processing device according to claim 1, wherein the important word extraction means is an important word candidate word storage means for storing words that are candidates to be extracted as important words. A matching word extraction unit that extracts a word that matches a word stored in the important word candidate word storage unit from the character string portion detected by the matching character string portion detection unit as an important word, A language processing device characterized by the above.

11. The method according to any one of claims 1 to 10.
In the language processing device described in the paragraph 1, the important word extracting means extracts the important word from the character string portion detected by the matching character string portion detecting means, excluding words satisfying a predetermined condition. Processing equipment.

12. The method according to any one of claims 1 to 11.
The language processing device as described in the paragraph (1), further comprising an importance level assigning unit that assigns an importance level to the important word extracted by the important word extracting unit.

13. The language processing apparatus according to claim 12, wherein the importance degree assigning means calculates an appearance frequency of the important word extracted by the important word extracting means, and an important word appearance frequency calculating means.
An important word appearance frequency information storage unit that stores the important word extracted by the important word extraction unit and the information of the appearance frequency of the important word in association with each other, and the importance degree of the important word based on the appearance frequency of the important word. A language processing device comprising: an important word importance calculating means for calculating.

14. The method according to any one of claims 1 to 13.
In the language processing device described in the paragraph (1), an inter-word relevance information storage unit that stores information about relevance between words, and an important word extracted by the important word extraction unit based on the stored contents of the inter-word relevance information storage unit. A language processing apparatus, comprising: an important word-related word acquiring unit that acquires another word related to a word.

15. A method of acquiring text information, detecting a character string portion that matches the acquired text information and the text information stored in the storage means, and extracting an important word from the detected character string portion. Language processing method.

16. A function of acquiring text information, a function of detecting a character string portion that matches the acquired text information and the text information stored in the memory, and a function of extracting an important word from the detected character string portion. And a program that realizes

17. A storage medium, which stores a program to be executed by a computer so that the program can be read by an input means of the computer, the program includes a process of acquiring text information, the acquired text information, and text information stored in a memory. A storage medium characterized by causing the computer to execute a process of detecting a character string portion that matches with and a process of extracting an important word from the detected character string portion.