JP2003058538A

JP2003058538A - Sentence analysis method and sentence analyzer capable of utilizing this method

Info

Publication number: JP2003058538A
Application number: JP2001249535A
Authority: JP
Inventors: Hiroki Tanioka; 広樹谷岡
Original assignee: JustSystems Corp
Current assignee: JustSystems Corp
Priority date: 2001-08-20
Filing date: 2001-08-20
Publication date: 2003-02-28
Anticipated expiration: 2021-08-20
Also published as: JP3691773B2

Abstract

PROBLEM TO BE SOLVED: To provide a technique which efficiently specifies the theme of sentences. SOLUTION: In a sentence processing system 10, a sentence inputted by a user is acquired by a sentence acceptance part 26 and is set as a noticed block by an object setting part 28, and a character string analysis part 34 extracts one or more words from the noticed block, and a set of the words is used to update sets of words extracted in the past, and thus a theme specifying part 42 specifies the theme of the sentence.

Description

Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、文章解析方法お
よび装置に関する。この発明は特に、対話における話題
同定技術に関する。TECHNICAL FIELD The present invention relates to a sentence analysis method and apparatus. The present invention particularly relates to topic identification technology in dialogue.

【０００２】[0002]

【従来の技術】一般社会において高度情報化が進展し、
パーソナルコンピュータ（以下、「ＰＣ」という。）な
どの電子端末に向かって文章を入力することは多くの人
々にとって生活の一部となっている。かつてはビジネス
書類や学術論文などをワードプロセッサで作成する場合
の文章入力が主な利用形態であったが、今日ではインタ
ーネットの普及によってあらゆる人々のコミュニケーシ
ョンに電子メールなどのツールが欠かせない。最近では
携帯電話にも電子メール機能が標準的に搭載されてい
る。これにより、人間が入力する文章をコンピュータが
処理する機会は益々増加することが予測される。2. Description of the Related Art With the progress of advanced information technology in the general society,
Inputting sentences to an electronic terminal such as a personal computer (hereinafter referred to as “PC”) is a part of many people's lives. In the past, the main form of text input was to create business documents and academic papers with a word processor, but today, with the spread of the Internet, tools such as e-mail are indispensable for the communication of all people. Recently, mobile phones are also equipped with an e-mail function as standard. As a result, it is expected that the chances that the computer will process sentences input by humans will increase more and more.

【０００３】[0003]

【発明が解決しようとする課題】ここで、コンピュータ
が文章を処理するに当たり、その文章を単なるテキスト
データとしてではなく意味のある文字列として扱うため
に必要な技術の研究が数多くなされている。その一つに
文章の話題を同定する技術の研究がある。こうした技術
によって文章内容や文章作成者の意図をコンピュータが
より正確に理解できるようになれば、コンピュータの知
的エージェント化も飛躍的に高まると思われる。Here, when a computer processes a sentence, a lot of research has been done on techniques necessary for treating the sentence as a meaningful character string, not as simple text data. One of them is the research of the technique to identify the topic of the text. If these technologies enable computers to more accurately understand the content of text and the intentions of text creators, the use of computers as intelligent agents will dramatically increase.

【０００４】しかしながら、従来の話題同定におけるア
プローチには、対象となる文章の構文解析や意味解析そ
のものに頼ることが多い。この場合、一定の精度を保つ
ためには考えられるあらゆる文章パターンや対話パター
ンをデータベース化しておく必要があり、複雑にならざ
るを得ない。例えば、特開昭６３−１０６０４２号公報
においては、代名詞や前置詞がもつ意味に着目して話題
の切れ目を探す技術を開示しているが、この場合あらゆ
る代名詞や前置詞に基づいた文章パターンを条件に登録
しておく必要があり、その検索処理に関しても効率的と
は言い難い。文章処理機能が携帯電話などＰＣ以外の簡
易な機器でも多くとりいれられている現実からすれば、
汎用的でしかも処理効率の高い文章解析技術の確立に対
する期待は大きい。However, the conventional approaches to topic identification often rely on the syntactic analysis and semantic analysis of the target sentence. In this case, in order to maintain a certain degree of accuracy, it is necessary to make a database of all possible text patterns and dialogue patterns, which inevitably becomes complicated. For example, Japanese Patent Laid-Open No. 63-106042 discloses a technique for searching for a break in a topic by paying attention to the meaning of pronouns and prepositions. In this case, a sentence pattern based on all pronouns and prepositions is used as a condition. It is necessary to register, and it is hard to say that the search process is efficient. Given the fact that many simple devices other than PCs such as mobile phones have text processing functions,
There are great expectations for the establishment of general-purpose and highly efficient text analysis technology.

【０００５】本発明者は以上の認識に基づき本発明をな
したもので、その目的は、効率よく文章のテーマを特定
する技術の提供にある。The present inventor has made the present invention based on the above recognition, and an object thereof is to provide a technique for efficiently specifying a theme of a sentence.

【０００６】[0006]

【課題を解決するための手段】本発明のある態様は、文
章解析方法に関する。この方法は、時間の経過とともに
次々に入力される文章を取得し、入力があるごとに新た
に取得した文章のブロックを注目ブロックとして設定
し、注目ブロックを分解して少なくとも一つ以上の語句
を抽出し、抽出された語句と過去に抽出された語句との
間における時間的要素を含む変化に基づいて注目ブロッ
クとして設定された文章のテーマを特定する。One aspect of the present invention relates to a sentence analysis method. This method acquires sentences that are input one after another over time, sets a block of newly acquired sentences as an input block each time there is an input, and decomposes the attention block to generate at least one or more words and phrases. The theme of the sentence set as the block of interest is identified based on the change including the temporal element between the extracted phrase and the phrase extracted in the past.

【０００７】「時間の経過とともに次々に入力される文
章」は、例えばＰＣのユーザが文書作成時に入力し続け
る文章、複数のユーザがネットワークを介して相互に対
話形式で送信し合う文章、音声認識の結果として生成さ
れる文章など、解析対象として現在本装置に入力され続
けている文字列を示す。「新たに取得した文章のブロッ
ク」は、例えば一連の文章に含まれるひとつひとつの段
落をブロックの単位としたり、いわゆるチャットなどの
対話において一度に送信される発話内容をブロックの単
位とするなど、一つ以上の語句が有意なひとかたまりを
形成する文字列である。"Sentences that are input one after another over time" are, for example, sentences that a user of a PC continues to input when creating a document, sentences that a plurality of users mutually transmit via a network, and voice recognition. A character string such as a sentence generated as a result of the above is being continuously input to the present device as an analysis target. "A newly acquired block of sentences" means, for example, that each paragraph included in a series of sentences is used as a block unit, or utterance content transmitted at one time in a dialogue such as so-called chat is used as a block unit. A string of three or more words that form a significant chunk.

【０００８】「時間的要素を含む変化」は、話題の変化
が単なる人間の思考の変化だけに左右されるのではな
く、時間の経過にも左右されることに着目したものであ
る。例えば、完全に話題が変わってしまう場合を除け
ば、人間は少し前の会話内容を記憶に留めながら会話を
進めており、その記憶は時間の経過とともに薄れていく
と言える。逆に言えば、しばらく遡った過去の会話内容
まである程度記憶に留めながら会話を進めるのであっ
て、必ずしも直前の会話内容だけを記憶に留めているの
ではない。この点を話題同定のアルゴリズムに取り入れ
ることによって、人間の意識へより近づいた技術を実現
できる。The "change including a temporal element" focuses on the fact that a change in a topic depends not only on a change in a person's thinking but also on a lapse of time. For example, except for the case where the topic is completely changed, human beings proceed with conversation while keeping a memory of the previous conversation contents, and it can be said that the memory fades over time. To put it the other way around, the conversation is proceeded while memorizing the past conversation contents that went back a while to some extent, and not necessarily only the immediately preceding conversation contents are memorized. By incorporating this point into the topic identification algorithm, it is possible to realize a technology that is closer to human consciousness.

【０００９】「文章のテーマ」は、例えば会話における
発言ごとの話題、論文における段落ごとの論題、文書フ
ァイルにおける一文ごとの主題など、その文章のブロッ
クが入力された時点における文章の作成者の意識を想定
したその文章の要点を示す。The "sentence theme" is, for example, the utterance of the creator of the sentence at the time when the block of the sentence is input, such as a topic for each statement in a conversation, a topic for each paragraph in a paper, or a subject for each sentence in a document file. The main points of the sentence are as follows.

【００１０】本発明の別の態様は、文章解析装置であ
る。本装置は、文章の入力を受け付ける文章受付部と、
入力があるごとに新たに入力された文章のブロックを注
目ブロックとして設定する対象設定部と、注目ブロック
を分解して少なくとも一つ以上の語句を抽出する文字列
解析部と、抽出された語句に基づいて文章のテーマを特
定するテーマ特定部と、を有し、テーマ特定部は、注目
ブロックとして設定された文章のテーマを、過去に入力
された文章のテーマとの間における時間的要素を含む変
化に基づいて特定する。Another aspect of the present invention is a sentence analysis device. This device is a sentence receiving unit that receives a sentence input,
Each time there is an input, the target setting unit that sets the block of the newly input sentence as the attention block, the character string analysis unit that decomposes the attention block and extracts at least one or more words, and the extracted words and phrases And a theme specifying unit for specifying a theme of a sentence based on the theme specifying unit including a temporal element between the theme of the sentence set as the attention block and the theme of the sentence input in the past. Identify based on change.

【００１１】ここでいう「過去に入力された」は、主に
「前回入力された」を意味するが、その「前回入力され
た文章のテーマ」には「さらに前回入力された文章のテ
ーマ」が反映されている場合があり、それを含めた意味
で「過去」と表現している。The term "previously entered" as used herein mainly means "previously entered", but the "previously entered sentence theme" is "further previously entered sentence theme". Is sometimes reflected, and is expressed as "past" in the sense that it is included.

【００１２】本発明のさらに別の態様は、コンピュータ
プログラムである。このプログラムは、時間の経過とと
もに次々に入力される文章を取得する処理と、入力があ
るごとに新たに取得した文章のブロックを注目ブロック
として設定する処理と、注目ブロックを分解して少なく
とも一つ以上の語句を抽出する処理と、抽出された語句
と過去に抽出された語句との間における時間的要素を含
む変化に基づいて注目ブロックとして設定された文章の
テーマを特定する処理と、をコンピュータに実行させ
る。Yet another aspect of the present invention is a computer program. This program acquires the sentences that are input one after another over time, sets the newly acquired block of the sentence as the input block each time there is an input, and decomposes the block of interest into at least one block. A process for extracting the above words and phrases, and a process for identifying a theme of a sentence set as a block of interest based on a change including a temporal element between the extracted phrases and the words extracted in the past, To run.

【００１３】なお、以上の構成要素の任意の組合せや、
本発明の構成要素や表現を方法、装置、システム、コン
ピュータプログラム、コンピュータプログラムを格納し
た記録媒体などの間で相互に置換したものもまた、本発
明の態様として有効である。Any combination of the above components,
It is also effective as an aspect of the present invention that the components and expressions of the present invention are mutually replaced among methods, devices, systems, computer programs, recording media storing computer programs, and the like.

【００１４】[0014]

【発明の実施の形態】本実施形態においては、入力され
た文章のブロックを分解して複数の語句を抽出し、これ
らを用いてその文章のテーマとなる語句の集合を決定す
る。複数の語句のそれぞれには重み付けとして重要度が
付与され、その重要度を時間経過に応じて下げていく。
この重要度は、会話中の人間の意識においてその語句が
もつ印象の大きさにも相当し、時間経過とともに意識ま
たは記憶が薄れていくのに合わせられている。このよう
な語句と重要度の集合を、文章が入力されるたびに更新
することによって、最新の「テーマ」を効率よくリアル
タイムに決定しながら対話の処理を進めることができ
る。BEST MODE FOR CARRYING OUT THE INVENTION In this embodiment, a block of an input sentence is decomposed to extract a plurality of words and phrases, and a set of words and phrases that are the theme of the sentence is determined using these. The degree of importance is given to each of the plurality of words as a weighting, and the degree of importance is lowered with the passage of time.
This degree of importance corresponds to the size of the impression that the phrase has in the consciousness of the human being during the conversation, and is adjusted as the consciousness or memory fades over time. By updating such a set of phrases and importance each time a sentence is input, it is possible to proceed with the dialogue process while efficiently determining the latest “theme” in real time.

【００１５】図１は、本実施形態における文章処理シス
テムの構成を示す機能ブロック図である。文章処理シス
テム１０は、入出力ユニット１２と文章解析ユニット１
４を有する。入出力ユニット１２は、ユーザとの間で文
章の入出力を処理し、またはインターネットを介して文
章の入出力を処理する。文章解析ユニット１４は、入出
力ユニット１２によって入力された文章を解析して結果
を入出力ユニット１２に出力する。文章処理システム１
０は、ハードウエア的には、コンピュータのＣＰＵをは
じめとする素子で実現でき、ソフトウエア的にはデータ
処理機能のあるプログラムなどによって実現されるが、
本図ではそれらの連携によって実現される機能ブロック
を描いている。したがって、これらの機能ブロックはハ
ードウエア、ソフトウエアの組合せによっていろいろな
かたちで実現できる。この文章処理システム１０は、Ｐ
Ｃ、携帯電話、ＰＤＡなどの他、テキスト処理機能を有
するあらゆる電子機器や家庭電化製品としても実現でき
る。FIG. 1 is a functional block diagram showing the configuration of the text processing system in this embodiment. The text processing system 10 includes an input / output unit 12 and a text analysis unit 1.
Have 4. The input / output unit 12 processes input / output of texts with a user or input / output of texts via the Internet. The text analysis unit 14 analyzes the text input by the input / output unit 12 and outputs the result to the input / output unit 12. Text processing system 1
0 can be realized by an element such as a CPU of a computer in terms of hardware, and can be realized by a program having a data processing function in terms of software.
In this figure, the functional blocks realized by the cooperation are drawn. Therefore, these functional blocks can be realized in various ways depending on the combination of hardware and software. This sentence processing system 10 is P
In addition to C, a mobile phone, a PDA, etc., it can be realized as any electronic device or home electric appliance having a text processing function.

【００１６】入出力ユニット１２は、通信部２０、表示
部２２、対話処理部２４、言語入力処理部３０、および
応用処理部４０を有する。言語入力処理部３０は、ユー
ザからキーボード入力または音声認識入力によって文章
の入力を受け付け、その文章をテキストデータの形で対
話処理部２４に送る。通信部２０は、インターネット経
由で他のユーザから文章を受け付ける。対話処理部２４
は、複数のユーザ間でやりとりされる文章を対話形式で
表示部２２に表示させるとともに、本装置のユーザが入
力した文章を通信部２０を介して他のユーザに送信す
る。応用処理部４０に関しては後述する。The input / output unit 12 has a communication section 20, a display section 22, a dialogue processing section 24, a language input processing section 30, and an application processing section 40. The language input processing unit 30 receives an input of a sentence from the user by keyboard input or voice recognition input, and sends the sentence to the interaction processing unit 24 in the form of text data. The communication unit 20 receives a sentence from another user via the Internet. Dialog processing unit 24
Displays a sentence exchanged between a plurality of users on the display unit 22 in an interactive manner and transmits the sentence input by the user of the present apparatus to another user via the communication unit 20. The application processing unit 40 will be described later.

【００１７】文章解析ユニット１４は、文章受付部２
６、対象設定部２８、尤度判定部３２、文字列解析部３
４、およびテーマ特定部４２を有する。文章受付部２６
は、対話処理部２４から文章の入力を受け付ける。この
文章は、対話の形式で対話処理部２４に入力されるテキ
ストデータである。対象設定部２８は、入力があるごと
に新たに入力された文章のブロックを注目ブロックとし
て設定する。The text analysis unit 14 includes a text reception unit 2
6, target setting unit 28, likelihood determination unit 32, character string analysis unit 3
4 and a theme specifying unit 42. Text reception section 26
Accepts a text input from the dialogue processing unit 24. This sentence is text data input to the dialogue processing unit 24 in the form of dialogue. The target setting unit 28 sets a block of a newly input sentence as a block of interest each time there is an input.

【００１８】文字列解析部３４は、語句抽出部３６およ
び重要度設定部３８を含む。語句抽出部３６は、注目ブ
ロックを分解して少なくとも一つ以上の語句を抽出す
る。語句の抽出は、一般的な形態素解析方法によりなさ
れてもよい。例えば、「昨日は晴れてましたよね。」と
いう文章から「昨日」「晴れ」の語句を抽出するが如く
である。The character string analysis unit 34 includes a phrase extraction unit 36 and an importance level setting unit 38. The word / phrase extracting unit 36 decomposes the block of interest and extracts at least one or more words / phrases. The extraction of words may be performed by a general morphological analysis method. For example, it seems that the words "yesterday" and "sunny" are extracted from the sentence "Yesterday was sunny."

【００１９】重要度設定部３８は、抽出された語句のそ
れぞれに対してその文章における語句の重要度を対応付
ける。この重要度は、各語句に対する重み付けとして作
用する。例えば、「昨日」「晴れ」の語句にそれぞれ重
要度として「５」を付与する。この重要度は、言語的尤
度および認識尤度のうち少なくともいずれかに基づいて
設定されてもよい。言語的尤度は、例えば語句抽出部３
６による形態素解析における各語句の言語的な確からし
さであり、係り受けや語句間の共起などによって判断し
てもよい。言語的尤度は、言語入力処理部３０による音
声認識時のテキスト変換やかな漢字変換におけるその変
換結果の言語的な確からしさであってもよい。認識尤度
は、例えば言語入力処理部３０による音声認識における
その認識の確からしさを示し、音声の音量や雑音が影響
する場合もある。言語的尤度および認識尤度を尤度判定
部３２が判定してもよい。The importance setting section 38 associates each extracted word with the importance of the word in the sentence. This importance acts as a weight for each word. For example, "5" is given to the words "yesterday" and "sunny" as the degree of importance. The importance may be set based on at least one of the linguistic likelihood and the recognition likelihood. The linguistic likelihood is calculated by, for example, the phrase extraction unit 3
It is the linguistic certainty of each word or phrase in the morphological analysis according to 6, and may be judged by dependency or co-occurrence between words and the like. The linguistic likelihood may be the linguistic certainty of the conversion result in text conversion or kana-kanji conversion during speech recognition by the language input processing unit 30. The recognition likelihood indicates the certainty of the recognition in the speech recognition by the language input processing unit 30, for example, and may be affected by the sound volume or noise. The likelihood determining unit 32 may determine the linguistic likelihood and the recognition likelihood.

【００２０】テーマ特定部４２は、文字列解析部３４に
よって抽出された語句に基づいて文章のテーマを特定す
るブロックであり、類似度決定部４４、重要度更新部４
６、テーマ決定部４８、およびテーマ保持部５０を含
む。このテーマは、一つ以上の語句とその重要度の集合
がテーマに反映されることを前提とした上で、過去に入
力された文章のテーマとの間における時間的要素を含む
変化に基づいて特定される。テーマに反映させる各語句
は、対話における文章ごとの話題同定に寄与する。例え
ば、｛昨日（５）、晴れ（５）｝のような語句とその重
要度の集合が「昨日は晴れてましたよね。」の文章のテ
ーマとして位置づけられる。The theme specifying unit 42 is a block for specifying the theme of a sentence based on the words and phrases extracted by the character string analyzing unit 34, and includes a similarity determining unit 44 and an importance updating unit 4.
6, a theme determination unit 48, and a theme holding unit 50. This theme is based on the change including the temporal element between the themes of the sentences entered in the past, assuming that one or more words and the set of their importance are reflected in the theme. Specified. Each word / phrase reflected in the theme contributes to the topic identification for each sentence in the dialogue. For example, a set of phrases such as {yesterday (5), sunny (5)} and their importance is positioned as the theme of the sentence "It was sunny yesterday."

【００２１】テーマ保持部５０には、新たに特定された
最新のテーマが格納され、次に文章が入力されたときに
新たなテーマを特定する際には「前回のテーマ」として
参照される。このテーマ保持部５０は、最新のテーマだ
けを記憶するメモリとして構成されてもよいし、過去の
テーマを累積的に保持するデータベースとして構成され
てもよい。そのデータベースに各テーマがそれぞれひと
つのレコードとして記録されてもよい。The theme holding unit 50 stores the newly specified latest theme, and is referred to as the "previous theme" when the new theme is specified when the next text is input. The theme holding unit 50 may be configured as a memory that stores only the latest theme, or may be configured as a database that cumulatively holds past themes. Each theme may be recorded as one record in the database.

【００２２】類似度決定部４４は、注目ブロックにおけ
る語句の集合と前回のテーマに含まれる語句の集合との
間で類似度を判断する。この類似度は、語句の集合間に
おける概念的な近さであり、例えば対話においては話題
の変化の大きさに相当する。従って、類似度が大きけれ
ば話題が継続していると判断でき、類似度が小さければ
話題が大きく変わったと判断できる。The similarity determining unit 44 determines the similarity between the set of words in the target block and the set of words included in the previous theme. This similarity is a conceptual closeness between a set of words and phrases, and corresponds to, for example, the magnitude of change in a topic in a dialogue. Therefore, if the degree of similarity is high, it can be determined that the topic is continuing, and if the degree of similarity is low, it can be determined that the topic has changed significantly.

【００２３】類似度としては、２つの集合間における積
集合の数を用いてもよい。この場合、２つの集合ＡとＢ
の類似度は、｜Ａ∩Ｂ｜／｜Ａ∪Ｂ｜＝｜Ａ∩Ｂ｜／
｛｜Ａ｜＋｜Ｂ｜−｜Ａ∩Ｂ｜｝の式で求められる。ま
た、類似度としてレーベンシュタイン距離を用いてもよ
い。この場合、２つの集合ＡとＢの類似度は、ｍａｘ
｛｜Ａ｜，｜Ｂ｜｝−｜Ａ∩Ｂ｜の式で求められる。類
似度として２つの集合の平均ベクトル間の距離を用いて
もよい。このベクトルは、ｎ個の語句を要素とする集合
に対してｎ次元の空間ベクトルで表現してもよい。As the similarity, the number of intersection sets between two sets may be used. In this case, two sets A and B
Is similar to | A∩B | / | A∪B | = | A∩B | /
It is obtained by the formula of {| A | + | B | − | A∩B |}. Alternatively, the Levenshtein distance may be used as the similarity. In this case, the similarity between the two sets A and B is max
{| A |, | B |}-| A∩B | A distance between average vectors of two sets may be used as the similarity. This vector may be expressed by an n-dimensional space vector for a set having n words and phrases as elements.

【００２４】類似度が所定の基準値を下回った場合、重
要度更新部４６が注目ブロックにおける語句の集合で前
回のテーマに含まれる語句を置き換えることによってテ
ーマの更新をなす。すなわち、文章のテーマが大きく変
わった場合には語句の集合が全て入れ替わる。When the degree of similarity is lower than a predetermined reference value, the importance updating unit 46 updates the theme by replacing the words included in the previous theme with the set of words in the target block. That is, when the theme of a sentence changes significantly, all the sets of words are exchanged.

【００２５】類似度が所定の基準値以上である場合、重
要度更新部４６は、注目ブロックにおける語句とその重
要度の集合を用いて前回のテーマに含まれる語句とその
重要度の集合を更新することにより最新のテーマを特定
する。具体的には、注目ブロックにおける語句と前回の
テーマに含まれる語句を併合するとともに、共通の語句
が含まれる場合はその重要度として高い方を用いる。例
えば、注目ブロックにおける語句とその重要度が｛昨日
（５）、晴れ（５）｝であり、過去における語句と重要
度が｛今日（３）、晴れ（３）｝である場合、これらを
併合した更新後の集合は｛昨日（５）、晴れ（５）、今
日（３）｝となる。When the degree of similarity is equal to or greater than a predetermined reference value, the importance update unit 46 updates the set of words and phrases included in the previous theme and their importance by using the set of words and phrases in the target block. To identify the latest theme. Specifically, the words in the target block and the words included in the previous theme are merged, and when a common word is included, the one with higher importance is used. For example, if the phrase and its importance in the block of interest are {yesterday (5), sunny (5)} and the phrases and importance in the past are {today (3), sunny (3)}, these are merged. The updated set is {yesterday (5), sunny (5), today (3)}.

【００２６】重要度更新部４６は、語句と重要度の更新
時に、更新までの時間経過に応じて語句の重要度を減衰
させる。例えば、新たに特定されたテーマが｛昨日
（５）、晴れ（５）｝の場合、次回のテーマ特定時には
｛昨日（３）、晴れ（３）｝のように重要度が下がり、
この集合が上記の「前回のテーマに含まれる語句とその
重要度の集合」として扱われる。この重要度が所定の最
低値を下回った場合にはその語句は集合から除外され
る。例えば、｛昨日（３）、晴れ（１）｝となった場合
に、条件として「重要度１以下は除外」と定めていた場
合には「晴れ（１）」が除外される。すなわち、現実の
対話においては、話者の意識には「晴れ」の印象がほぼ
消えていると判断される。なお、減衰の度合いは任意で
ある。The importance updating unit 46 attenuates the importance of a word or phrase when updating the word and the importance according to the time elapsed until the update. For example, if the newly specified theme is {Yesterday (5), Sunny (5)}, the importance will be reduced to {Yesterday (3), Sunny (3)} at the next theme identification,
This set is treated as the above-mentioned “set of words included in the previous theme and their importance”. If this importance falls below a predetermined minimum value, the phrase is excluded from the set. For example, in the case of {yesterday (3), fine (1)}, "fine (1)" is excluded when the condition "exclude importance level 1 or less" is set. That is, in the actual dialogue, it is judged that the impression of "clear" has almost disappeared in the consciousness of the speaker. Note that the degree of attenuation is arbitrary.

【００２７】テーマ決定部４８は、重要度更新部４６に
よって更新された語句と重要度の集合を、注目ブロック
として設定された文章に対するテーマに決定し、これを
テーマ保持部５０に記録する。The theme determining section 48 determines the set of words and importance updated by the importance updating section 46 as the theme for the sentence set as the block of interest, and records this in the theme holding section 50.

【００２８】応用処理部４０は、新たに特定されたテー
マを言語入力処理部３０によるかな漢字変換における変
換候補の優先順位に反映させる。応用処理部４０は、新
たに特定されたテーマを言語入力処理部３０による音声
認識処理における認識候補の優先順位に反映させる。The application processing unit 40 reflects the newly specified theme in the priority order of conversion candidates in the Kana-Kanji conversion by the language input processing unit 30. The application processing unit 40 reflects the newly specified theme in the priority order of the recognition candidates in the voice recognition processing by the language input processing unit 30.

【００２９】以上の構成による動作を以下説明する。図
２は、本実施形態における文章処理システム１０の動作
を示すフローチャートである。まず、新たな文章を入力
し（Ｓ１０）、その文章を注目ブロックとして設定する
（Ｓ１２）。注目ブロックから語句を抽出し（Ｓ１
４）、それぞれの重要度を設定する（Ｓ１６）。注目ブ
ロックにおける語句の集合と、前回のテーマにおける語
句の集合との間で類似度を決定する（Ｓ１８）。類似度
が所定の基準値を下回った場合（Ｓ２０Ｎ）、集合の語
句を全て入れ替える（Ｓ２６）。類似度が所定の基準値
以上の場合（Ｓ２０Ｙ）、前回のテーマにおける語句の
重要度を減衰させた上で（Ｓ２２）、各集合の語句を併
合することにより更新する（Ｓ２４）。The operation of the above configuration will be described below. FIG. 2 is a flowchart showing the operation of the text processing system 10 according to this embodiment. First, a new sentence is input (S10), and the sentence is set as a block of interest (S12). Extract words and phrases from the block of interest (S1
4), each importance is set (S16). The similarity is determined between the set of words in the block of interest and the set of words in the previous theme (S18). When the degree of similarity is lower than the predetermined reference value (S20N), all the words in the set are replaced (S26). When the similarity is equal to or higher than a predetermined reference value (S20Y), the importance of the word or phrase in the previous theme is attenuated (S22), and the words of each set are merged to update (S24).

【００３０】更新された語句と重要度の集合を注目ブロ
ックとして設定された文章に対するテーマに決定し、こ
れを記録する（Ｓ２８）。新しいテーマを対話処理など
に利用する（Ｓ３０）。こうした対話を終了するまで、
Ｓ１０〜Ｓ３０の処理を繰り返す（Ｓ３２Ｎ）。The set of updated phrases and importance is determined as the theme for the sentence set as the block of interest, and this is recorded (S28). The new theme is used for dialogue processing and the like (S30). Until the end of these dialogues
The processing of S10 to S30 is repeated (S32N).

【００３１】図３は、対話における注目ブロック、語
句、重要度、および類似度の対応関係を示す。図におい
ては、「Ａさん」と「Ｂさん」の対話形式で表現する。
文字６０は、「Ａさん」の発言として入力された文章で
あり、入力された時点での注目ブロックとして設定され
る。文字６４は、テーマとして特定された集合の要素と
なる語句であり、数字６６はその重要度である。数字６
２は、一つ前の発言におけるテーマとの間で決定される
類似度である。FIG. 3 shows a correspondence relationship among a block of interest, a phrase, importance, and similarity in a dialogue. In the figure, "A" and "B" are expressed in an interactive form.
The character 60 is a sentence input as the statement of “Mr. A” and is set as a block of interest at the time of input. The character 64 is a word or phrase that is an element of the set specified as the theme, and the number 66 is its importance. Number 6
2 is the degree of similarity determined with the theme of the previous speech.

【００３２】発言（２）の注目ブロックからは「こんに
ちは」の語句が抽出され、発言（１）のテーマとの間で
語句が共通するため、併合後も、テーマとなる集合の要
素は「こんにちは」のみである。その重要度は、発言
（１）から発言（２）へ推移したときに本来は「５」か
ら「３」に減衰するところ、発言（２）にも同じ語句が
含まれていたことから「５」のままになる。[0032] from the target block of speech (2) are extracted phrase of "Hello", for common words and phrases between the theme of the speech (1), even after the annexation, the elements of the set to be the theme of "Hello "Only. The degree of importance is "5" because the original phrase "5" is reduced to "3" when transitioning from utterance (1) to utterance (2), but the same phrase is included in utterance (2). Will remain.

【００３３】発言（３）のテーマには「こんにちは」の
語句が含まれているが、重要度は減衰して「３」になっ
ている。この「こんにちは」の語句の重要度は、発言
（４）ではさらに「１」まで下がり、発言（５）では除
外されている。なお、発言（５）はＡＢ双方が沈黙して
いる間にも語句の重要度が下がることを示すために図示
しているが、実際のＰＣ画面などでは表示することを要
しない。[0033] While the theme of the speech (3) that contains the phrase "Hello", the degree of importance is "3" and decay. Significance of the phrase of "Hello" is speaking down to (4), the further "1", it is excluded in speech (5). Note that the utterance (5) is illustrated to show that the importance of the phrase decreases while both AB are silent, but it need not be displayed on an actual PC screen or the like.

【００３４】発言（６）では、発言（５）との間で類似
度が「０」と判断されたため、テーマにおける語句が全
て入れ替わっている。同様に、発言（１４）においても
発言（１３）との間で類似度が所定値を下回ったとして
語句が全て入れ替わっている。このようなテーマの転換
に基づき、各テーマの上位概念として上位テーマを判定
してもよい。図に示す通り、線７０で囲まれた対話の上
位テーマを「挨拶」に、線７２で囲まれた対話の上位テ
ーマを「天気」に、線７４で囲まれた対話の上位テーマ
を「テレビ」にそれぞれ決定し、こうした上位テーマを
かな漢字変換や音声認識処理に応用してもよい。In the statement (6), since the similarity with the statement (5) is judged to be "0", all the words and phrases in the theme are exchanged. Similarly, in the utterance (14), all the words and phrases are exchanged, assuming that the similarity between the utterance (13) and the utterance (13) is lower than a predetermined value. Based on such theme conversion, a higher-level theme may be determined as a higher-level concept of each theme. As shown in the figure, the upper theme of the dialog surrounded by the line 70 is "greeting", the upper theme of the dialog surrounded by the line 72 is "weather", and the upper theme of the dialog surrounded by the line 74 is "TV". It is also possible to apply these higher-level themes to kana-kanji conversion and voice recognition processing.

【００３５】図４は、語句と重要度の更新過程を示す。
テーブル８０は、前回のテーマにおける語句と重要度の
対応を示す。テーブル８２は、前回のテーマにおける各
語句の重要度を減衰させた後の語句と重要度の対応を示
す。例えば、「雨」の重要度は「５」から「３」へと減
衰している。テーブル８４は、注目ブロックにおける語
句と重要度の対応を示す。テーブル８６は、前回のテー
マにおける語句および重要度の集合と、注目ブロックに
おける語句および重要度の集合を併合した後の語句と重
要度の対応を示す。各語句は重要度の高い順に並び替え
られている。「天気」「予報」「雨」は双方の集合に共
通する語句であり、重要度にはより高い値が採用されて
いる。例えば、「天気」と「予報」は注目ブロックにお
ける語句の方が重要度が大きいためそれぞれ「５」
「４」で更新され、「雨」は前回のテーマにおける語句
の方が重要度が大きいため「３」のまま更新されない。
「暗い」「確率」の重要度は「１」であり、所定の最低
値を下回ったとして除外される。この除外がなされた後
の語句と重要度の集合が最新のテーマとして決定され、
テーブル８８に示される。除外のための最低値や、最新
のテーマとして決定すべき語句の数は、それぞれ任意で
ある。FIG. 4 shows a process of updating a word and a degree of importance.
The table 80 shows the correspondence between the term and the importance in the previous theme. The table 82 shows the correspondence between the word and the importance after the importance of each word in the previous theme is attenuated. For example, the importance of "rain" is reduced from "5" to "3". The table 84 shows the correspondence between words and phrases in the block of interest and the degree of importance. The table 86 shows the correspondence between the word and the importance degree after the set of the word and the importance degree in the previous theme and the word and the importance degree set in the attention block are merged. The terms are sorted in descending order of importance. “Weather”, “forecast”, and “rain” are terms common to both sets, and a higher value is adopted as the degree of importance. For example, “weather” and “forecast” are “5” because the word in the block of interest is more important.
It is updated with "4", and "rain" is not updated as "3" because the word in the previous theme is more important.
The importance levels of “dark” and “probability” are “1” and are excluded as being below a predetermined minimum value. After this exclusion is made, the set of words and importance is determined as the latest theme,
Shown in table 88. The minimum value for exclusion and the number of words and phrases to be decided as the latest theme are arbitrary.

【００３６】図５は、かな漢字変換にテーマを反映させ
た結果を示す。例えば（ａ）においては、直前に入力し
た文章のテーマが「今日、晴れ、天気」などの語句で構
成されるときに、このテーマから推測し、図示する入力
例を「雨」に変換する処理をしてもよい。例えば（ｂ）
においては、直前に入力した文章のテーマが「今日、ガ
ム、菓子」などの語句で構成されるときに、このテーマ
から推測し、図示する入力例を「飴」に変換する処理を
してもよい。同様の変換を音声認識に反映させてもよ
い。FIG. 5 shows the result of reflecting the theme in the kana-kanji conversion. For example, in (a), when the theme of the sentence input immediately before is composed of phrases such as "today, sunny, weather", the process of guessing from this theme and converting the illustrated input example into "rain" You may For example (b)
In, when the theme of the sentence entered immediately before is composed of words such as “today, gum, sweets”, even if the input example shown in the figure is converted to “candy” Good. Similar conversion may be reflected in voice recognition.

【００３７】（第２実施形態）図６は、対話システムの
画面を示す。本実施形態においては、対話システムがユ
ーザから発言を受け取って、これに対する返答を自動生
成して表示する。このとき、対話システムによる返答の
生成においてユーザの発言のテーマが反映される。この
システムにおいては、直前の話題に沿った対話を高速か
つ円滑に処理できる。(Second Embodiment) FIG. 6 shows a screen of the interactive system. In the present embodiment, the dialogue system receives a message from the user and automatically generates and displays a response to the message. At this time, the theme of the user's statement is reflected in the generation of the reply by the dialogue system. In this system, the dialogue along the topic immediately before can be processed quickly and smoothly.

【００３８】以上、本発明を実施の形態をもとに説明し
た。この実施の形態は例示であり、それらの各構成要素
や各処理プロセスの組合せにいろいろな変形が可能なこ
と、またそうした変形例も本発明の範囲にあることは当
業者に理解されるところである。以下、変形例を挙げ
る。The present invention has been described above based on the embodiments. It should be understood by those skilled in the art that this embodiment is an exemplification, that various modifications can be made to the combinations of the respective constituent elements and the respective processing processes, and that such modifications are within the scope of the present invention. . Hereinafter, modified examples will be described.

【００３９】語句抽出部３６は、注目ブロックから語句
を抽出する際に、同義語や統制語を一定の語句に統一し
てもよい。また、語句を語幹の形に変換してもよいし、
その語句の上位概念をさらに抽出して追加してもよい。The word / phrase extracting unit 36 may unify synonyms and controlled words into certain words / phrases when extracting words / phrases from the block of interest. Moreover, you may convert the phrase into the form of the stem,
The superordinate concept of the term may be further extracted and added.

【００４０】テーマ決定部４８は、重要度更新部４６に
よって更新された語句の集合に対し、それらの語句の上
位概念を追加してもよい。The theme determining section 48 may add the superordinate concept of these terms to the set of terms updated by the importance updating section 46.

【発明の効果】本発明によれば、比較的効率よく文章の
テーマを特定できる。According to the present invention, the theme of a sentence can be specified relatively efficiently.

[Brief description of drawings]

【図１】本実施形態における文章処理システムの構成
を示す機能ブロック図である。FIG. 1 is a functional block diagram showing the configuration of a text processing system according to this embodiment.

【図２】本実施形態における文章処理システムの動作
を示すフローチャートであるFIG. 2 is a flowchart showing the operation of the text processing system in this embodiment.

【図３】対話における注目ブロック、語句、重要度、
および類似度の対応関係を示す図である。FIG. 3 is a block of interest, a phrase, an importance level in a dialogue,
It is a figure which shows the corresponding relationship of and a similarity.

【図４】語句と重要度の更新過程を示す図である。FIG. 4 is a diagram showing a process of updating a word and a degree of importance.

【図５】かな漢字変換にテーマを反映させた結果を示
す図である。FIG. 5 is a diagram showing a result of reflecting a theme in kana-kanji conversion.

【図６】対話システムの画面を示す図である。FIG. 6 is a diagram showing a screen of an interactive system.

[Explanation of symbols]

２６文章受付部、２８対象設定部、３４文字
列解析部、４０応用処理部、４２テーマ特定
部。26 text receiving unit, 28 target setting unit, 34 character string analyzing unit, 40 applied processing unit, 42 theme specifying unit.

Claims

[Claims]

1. A sentence that is input one after another over time is acquired, and a block of a sentence that is newly acquired each time the input is input is set as a block of interest, and the block of interest is decomposed into at least one or more blocks. The phrase, which is characterized by specifying the theme of the sentence set as the attention block based on the change including the temporal element between the extracted phrase and the phrase extracted in the past analysis method.

2. A sentence receiving unit that receives an input of a sentence, a target setting unit that sets a block of a sentence newly input as an attention block each time there is an input, and at least one of which is obtained by disassembling the attention block. A character string analysis unit that extracts the above words and phrases, and a theme identification unit that identifies the theme of the sentence based on the extracted words and phrases, wherein the theme identification unit is the sentence set as the block of interest. The text analysis device is characterized in that the above-mentioned theme is specified based on a change including a temporal element with respect to the theme of the previously input text.

3. The sentence receiving unit receives a sentence input in the form of a dialogue, and the theme specifying unit specifies the theme based on a phrase that contributes to topic identification for each sentence in the dialogue. The sentence analysis device according to claim 2, which is characterized in that.

4. The theme specifying unit updates the past set of words by using the set of words in the block of interest, on the assumption that the set of extracted words is reflected in the theme. The sentence analysis device according to claim 2, wherein the latest theme is specified by the above.

5. The character string analysis unit associates each of the extracted words and phrases with the degree of importance of the word or phrase in the sentence, and the theme specifying unit determines whether or not the extracted word or phrase and its degree of importance. On the assumption that the set is reflected in the theme, the latest theme is specified by updating the words and their importance in the past using the words and their importance in the attention block. Claim 2
Or the sentence analysis device described in 3.

6. The theme specifying unit, when updating the word and the degree of importance, attenuates the degree of importance of the word according to the time elapsed until the update, and when the degree of importance falls below a predetermined minimum value. The sentence analysis device according to claim 5, wherein the phrase is excluded from the set.

7. The theme specifying unit determines the degree of similarity between a set of words and phrases in the block of interest and a past set of words and phrases, and when the similarity is lower than a predetermined reference value, these sets are selected. The sentence analysis device according to any one of claims 4 to 6, wherein the update is performed by replacing the sentence.

8. The sentence analysis apparatus according to claim 2, further comprising an application processing unit that reflects the specified theme in a priority order of conversion candidates in Kana-Kanji conversion.

9. The sentence analysis device according to claim 2, further comprising an application processing unit that reflects the specified theme in a priority order of recognition candidates in a voice recognition process.

10. A process of acquiring sentences that are input one after another over time, a process of setting a block of a newly acquired sentence as an attention block each time there is an input, and decomposing the attention block. A process of extracting at least one or more words and phrases, and specifying the theme of the sentence set as the attention block based on a change including a temporal element between the extracted words and phrases and the words extracted in the past. A computer program characterized by causing a computer to execute a process.