JP2011145844A

JP2011145844A - Predicate functional expression normalization method, device and program thereof

Info

Publication number: JP2011145844A
Application number: JP2010005464A
Authority: JP
Inventors: Tomoko Izumi; 朋子泉; Kenji Imamura; 賢治今村; Genichiro Kikui; 玄一郎菊井; Michifumi Sato; 理史佐藤
Original assignee: Nagoya University NUC; Nippon Telegraph and Telephone Corp
Current assignee: Nagoya University NUC; Nippon Telegraph and Telephone Corp
Priority date: 2010-01-14
Filing date: 2010-01-14
Publication date: 2011-07-28
Anticipated expiration: 2030-01-14
Also published as: JP5370680B2

Abstract

<P>PROBLEM TO BE SOLVED: To convert a predicate as the center of sentence information in natural language processing into the simplest form without changing the meaning as much as possible. <P>SOLUTION: With respect to the word information of each morpheme corresponding to the predicate of an input sentence analyzed by a morphemic analysis part 2, and extracted by a predicate extraction part 3, a meaning label application part 4 applies a meaning label to the word information of a function word string configuring functional expression by using a functional expression meaning label dictionary 1 configured by combining at least a meaning label expressing the meaning of functional expression giving an influence to the meaning of the predicate and the list of the functional expression corresponding to each meaning label, and an NULL deletion part 5 and a redundant rule application part 6 delete a function word and a redundant function word which do not give any influence to the meaning of the predicate according to the pertinent meaning label, and a practical use generation part 7 generates the predicate of the input sentence based on the word information of each morpheme corresponding to the residual predicate of the input sentence. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

本発明は、自然言語処理において文情報の中心となる述部を正規化する、即ち述部自体の意味を出来るだけ変えずに最も単純な形に変換する技術に関するものである。 The present invention relates to a technique for normalizing a predicate that is the center of sentence information in natural language processing, that is, converting it to the simplest form without changing the meaning of the predicate itself as much as possible.

＜タスクの説明＞
議事録やアンケート、ｗｅｂ上のテキストなど膨大な文書から有益な情報を得るためには、対象文書から「誰がどこで何をどうした？」という情報を自動で抽出及び集計することが必須である。この技術はテキストマイニングと呼ばれる。このテキストマイニングにおいて、「誰」、「どこ」、「何」といった特定の名前を抽出するのと異なり、「どうした」に当たる「述部」を抽出し、集計することは困難である。その原因は、述部が単純に単語一語で表わされないためである。 <Description of task>
In order to obtain useful information from an enormous amount of documents such as minutes, questionnaires, and web text, it is essential to automatically extract and aggregate information such as “who and what” from the target document. This technique is called text mining. In this text mining, unlike the extraction of specific names such as “who”, “where”, and “what”, it is difficult to extract and aggregate “predicates” corresponding to “what”. The reason is that the predicate is not simply expressed by a single word.

例えば、「パソコンが壊れちゃったよ。」という文があった場合、「どうした」に当たる「壊れちゃったよ」は動詞「壊れ（る）」と、助動詞「ちゃっ」「た」及び終助詞「よ」との組み合わせで構成されている。このように、「どうした」の意味を表す述部は、動詞、名詞、形容詞、形容動詞、副詞といった「内容語」と、助詞、助動詞などの「機能語」との組み合わせで成り立っている。前記「パソコンが壊れちゃったよ。」という文を「内容語」及び「機能語」で区分けすると、
壊れちゃったよ＝壊れ(る)[内容語]＋ちゃっ[機能語]＋た[機能語]＋よ[機能語] (1)
となる。 For example, if there is a sentence that says “My computer is broken,” the “broken” that hits “What” is the verb “broken”, the auxiliary verbs “chu” “ta”, and the final particle “yo”. It is composed of a combination. In this way, the predicate representing the meaning of “what” is composed of a combination of “content words” such as verbs, nouns, adjectives, adjective verbs and adverbs and “function words” such as particles and auxiliary verbs. When the sentence “PC is broken” is divided into “content word” and “function word”,
It's broken = broken (ru) [content word] + cha [function word] + ta [function word] + yo [function word] (1)
It becomes.

述部の表層形（表記）が異なる大きな原因の一つが、この述部における「機能語」の存在である。述部「壊れちゃった」の「ちゃった」や述部「壊れてしまいました」の「てしまいました」は、複数の機能語の列（以下、本発明では１または２以上の機能語からなる文字列を「機能表現」と呼ぶ。）で構成されており、文体やスタイル（敬語や口語）などにより大きく異なる。従って、同じ「壊れた」という意味の述部でも、「壊れちゃった」「壊れてしまった」「壊れてしまいました」など様々な言い方で表現され、「同じ意味をまとめる」というマイニングの障害になる。 One of the major reasons for the difference in the surface form (notation) of predicates is the presence of “function words” in the predicates. “Cha” of the predicate “broken” or “has broken” of the predicate “broken” is a sequence of a plurality of function words (hereinafter, one or more function words in the present invention). The character string consisting of is called “functional expression”), and varies greatly depending on the style and style (honorifics and colloquial language). Therefore, the same predicate meaning "broken" is expressed in various ways such as "broken", "broken", "broken", and the mining failure of "same meaning" become.

しかし、これらの「機能表現」は、一方で違う意味を区別するための重要な手がかりとなる。例えば、「壊れちゃった」は「壊れてしまいました」と同じことを表しているが、「壊れてない」とは異なることを表している。同じ動詞「壊れる」を用いても、機能表現「ちゃった」がつく場合と、「てない」がつく場合とでは、全く逆の意味を表すことになる。これを「内容語」及び「機能表現」で示すと、
壊れ(る)[内容語]＋ちゃった[機能表現]≠壊れ(る)[内容語]＋てない[機能表現] (2)
となる。 However, these “functional expressions” are important clues to distinguish different meanings. For example, “broken” represents the same thing as “has broken” but different from “not broken”. Even if the same verb “break” is used, the case where the functional expression “Cha” is added and the case where “Tenai” is added are completely opposite. If this is expressed by "content word" and "functional expression",
Broken (Ru) [content word] + Chaotic [Functional expression] ≠ Broken (Ru) [Content word] + Unbreakable [Functional expression] (2)
It becomes.

このように、機能表現は述部を同じ意味でまとめる時の障害になるが、一方で違う意味の述部を区別する際の重要な手がかりになる。つまり、述部を抽出及び集計するようなテキストマイニングの精度を上げるためには、「表層形は異なるが同じ意味の機能表現」はまとめ、「述部の意味を区別するのに必要な機能表現」は残したままとする述部の正規化が必要である。 In this way, functional expression is an obstacle to grouping predicates with the same meaning, but on the other hand, it is an important clue when differentiating predicates with different meanings. In other words, in order to improve the accuracy of text mining, such as extracting and counting predicates, “the functional expressions with the same meaning but different surface forms” are summarized and “functional expressions necessary to distinguish the meanings of predicates” "Requires predicate normalization to remain.

＜従来技術１＞
テキストから「どうした」に当たる述部を抽出及び集計するマイニング手法として、非特許文献１に開示されたものが挙げられる。非特許文献１は、ブログから「状況」、「行動」及び「主観」という３つの情報を自動で抽出及び集計する技術を提案している。そして「どうした」に当たる述部の抽出は「行動」抽出の一部として組み込まれている。 <Prior Art 1>
Non-patent document 1 discloses a mining technique for extracting and counting predicates corresponding to “what” from text. Non-Patent Document 1 proposes a technique for automatically extracting and tabulating three pieces of information “situation”, “action”, and “subjectivity” from a blog. And the extraction of predicates corresponding to “what” is incorporated as part of “behavior” extraction.

しかし、非特許文献１の手法では、文を形態素解析し、そこから動詞の標準形のみを述部の正規形として抽出しているため、「壊れた」「壊れない」「壊れそう」など、機能表現によって異なる意味を表す述部も全て「壊れる」として抽出されてしまう。そのため、テキストマイニングの精度の低下を引き起こす。図１に非特許文献１の手法による述部抽出結果の一例を示す。「／」は機能語の形態素（単語）単位の区切りを示す。 However, in the technique of Non-Patent Document 1, since the sentence is morphologically analyzed, and only the normal form of the verb is extracted as the normal form of the predicate, “broken”, “not broken”, “looks broken”, etc. All predicates that represent different meanings depending on the functional expression are also extracted as “broken”. Therefore, the accuracy of text mining is reduced. FIG. 1 shows an example of a predicate extraction result by the method of Non-Patent Document 1. “/” Indicates a morpheme (word) unit break of the function word.

＜従来技術２＞
機能表現を含めた述部正規化のもう一つの手法として、非特許文献２に開示されたものが挙げられる。非特許文献２は、文節の機能表現を、１対１もしくは１対Ｎの機能表現に対応付けて、機能表現の言い換えを行う。例えば「壊れ/ちゃっ/た」といった述部が入力された場合、「壊れ/てしまっ/た」のように「ちゃっ」→「てしまっ」といった１対１の言い換えを行う。このように機能表現の言い換え候補を出力していくことで、言い換え先となった出力は全て「同じ意味」としてまとめることが可能である。 <Conventional technology 2>
As another method of predicate normalization including function expression, one disclosed in Non-Patent Document 2 can be cited. Non-Patent Document 2 performs paraphrasing of function expressions by associating phrase function expressions with one-to-one or one-to-N function expressions. For example, when a predicate such as “broken / chat / ta” is input, a one-to-one paraphrase such as “chat” → “tesama” is performed, such as “broken / tame / ta”. By outputting the paraphrase candidates for function expression in this way, it is possible to combine all the output as the paraphrase destination as “same meaning”.

しかし、非特許文献２では機能語が１対１もしくは１対Ｎ対応で正確に言い換えられるため、マイニングに必要な同じ出来事を表している述部を「まとめる」という述部正規化の粒度としては細かすぎる。非特許文献２では１対Ｎ対応の言い換えも行うが、それは言い換え後の文法的接続関係を保持するために要素を追加するものであり、「要は何を言っているのか？」と単純化するための言い換えではない。図２に非特許文献２の手法で言い換えられた場合の出力結果の一例を示す。 However, in Non-Patent Document 2, the function words can be accurately rephrased in a one-to-one or one-to-N correspondence, so the granularity of predicate normalization that “summarizes” predicates representing the same event necessary for mining is as follows: too detailed. Non-Patent Document 2 also performs paraphrasing in a one-to-N correspondence, but it adds elements to maintain the grammatical connection relationship after paraphrasing, and simplifies “What is the key?” It is not a paraphrase to do. FIG. 2 shows an example of an output result in the case where the method of Non-Patent Document 2 is used in other words.

図２の場合、入力された述部は全て「壊れた」という「何かが壊れた出来事」を表しているが、非特許文献２の手法だとテキストマイニングに不必要な機能語まで正確に言い換えられるため、述部が一つにまとまらない。これは、終助詞の「よ」や、過去の時制が重複している「てしまっ［完了］」「た［完了］」が全て保持された状態の言い換えがなされているからである。テキストマイニング等で必要な正規化は、「同じ出来事を表しているか否か」に関係している機能表現のみを残した最も単純な正規化であり、非特許文献２の手法だと、粒度が細かすぎるため、マイニングの再現率（即ち、同じ意味のものをまとめる率）の低下を引き起こす。 In the case of FIG. 2, the input predicates all represent “an event that something is broken” that is “broken”, but with the technique of Non-Patent Document 2, even the function words that are unnecessary for text mining are accurate. In other words, predicates do not come together. This is because the final particle “Yo” and “Teshima [completed]” and “Ta [completed]” in which past tenses are duplicated are all rephrased. The normalization required for text mining is the simplest normalization that leaves only the functional expression related to “whether or not it represents the same event”. Since it is too fine, it causes a reduction in the reproducibility of mining (that is, the rate of collecting the same meaning).

倉島健・藤村考・奥田英範「大規模テキストからの経験マイニング」電子情報通信学会第１９回データ工学ワークショップ論文集、DEWS2008 A1-4、2008年、pp.301-310Ken Kurashima, Kou Fujimura, Hidenori Okuda “Experience Mining from Large Scale Texts” Proceedings of the 19th Data Engineering Workshop of the IEICE, DEWS2008 A1-4, 2008, pp.301-310 松吉俊・佐藤理史「文体と難易度を制御可能な日本語機能表現の言い換え」自然言語処理、Vol.15、No.2、2008年、pp.75-99Satoshi Matsuyoshi and Satoshi Sato “Rephrasing Japanese Functional Expressions that Control Style and Difficulty” Natural Language Processing, Vol.15, No.2, 2008, pp.75-99

上述したように、従来の技術では述部の抽出及び集計に次のような問題があった。 As described above, the conventional technique has the following problems in predicate extraction and aggregation.

１．機能表現を無視した動詞のみの抽出だと、異なる意味の述部まで「同じ」として抽出されるため、テキストマイニングの精度を低下させる（従来技術１）。 1. If only the verbs ignoring the functional expression are extracted, the predicates having different meanings are extracted as “same”, so that the accuracy of text mining is reduced (prior art 1).

２．全ての機能表現を考慮した述部の言い換えだと、「同じ出来事」を表している述部でも、テキストマイニングに不必要な機能表現まで保持されるために、再現率を低下させる（従来技術２）。 2. In other words, the predicate that takes into account all functional expressions reduces the reproducibility because even predicates that represent “the same event” retain functional expressions that are unnecessary for text mining (Prior Art 2). ).

述部の抽出及び集計のための述部正規化における２つの問題点を解決するため、本発明では以下の方法をとる。 In order to solve two problems in predicate normalization for predicate extraction and aggregation, the present invention takes the following method.

ａ．述部の意味に直接影響を与える機能表現の意味を３つの指標を用いて表し、その指標をもとに機能表現意味ラベル（以下、単に「意味ラベル」と称す。）を作成する。 a. The meaning of the function expression that directly affects the meaning of the predicate is expressed using three indices, and a function expression meaning label (hereinafter simply referred to as “meaning label”) is created based on the indices.

ｂ．意味ラベルに機能語列のリストを対応付けて機能表現意味ラベル辞書とし、述部内の機能表現を構成する機能語のうち、辞書のエントリーと適合した機能語に意味ラベルを付与する。 b. A list of function word strings is associated with the meaning label to form a function expression meaning label dictionary, and among the function words constituting the function expression in the predicate, a meaning label is assigned to the function word that matches the entry in the dictionary.

ｃ．意味ラベルが付与されなかった機能語を削除する。 c. Delete function words that have not been given a semantic label.

ｄ．意味ラベルが付与された機能語に冗長性がある場合、所定の冗長ルールに沿って機能語を削除する。 d. When the function word to which the semantic label is assigned has redundancy, the function word is deleted according to a predetermined redundancy rule.

このような方法をとることで、述部の意味に影響を及ぼす機能表現を保持しつつ、最も単純な形へと機能表現を正規化することができる。その結果、テキストマイニングなどにおける述部抽出及び集計の精度と再現率をともに向上させることができる。 By taking such a method, it is possible to normalize the functional expression into the simplest form while retaining the functional expression that affects the meaning of the predicate. As a result, it is possible to improve both the accuracy and recall of predicate extraction and tabulation in text mining and the like.

本発明によれば、
述部の機能表現に対し、意味ラベルをもとに述部の意味に影響を与えない機能語及び冗長な機能語を削除し、述部を単純化させる。その結果、述部の表層形が異なる原因となる機能表現を最も単純な形に正規化することができ、述部の集計等を行うテキストマイニングの再現率を向上させることができる。 According to the present invention,
For the functional expression of the predicate, based on the semantic label, function words that do not affect the meaning of the predicate and redundant function words are deleted to simplify the predicate. As a result, it is possible to normalize the functional expression that causes the different predicate surface layer forms to the simplest form, and to improve the reproducibility of text mining for performing predicate aggregation and the like.

また、３つの指標をもとに機能表現に意味ラベルをつけるため、「壊れた」と「壊れてない」のように、機能表現によって表す出来事の意味が変わってしまう述部を区別することができる。その結果、マイニング等の同じ意味をまとめる集計の精度を向上させることができる。 Also, because semantic labels are attached to functional expressions based on the three indicators, predicates that change the meaning of events represented by functional expressions, such as “broken” and “not broken,” can be distinguished. it can. As a result, it is possible to improve the accuracy of tabulating the same meaning such as mining.

最も単純な機能表現を選択し、述部を生成するため、マイニングの他、テキスト要約技術などの他の自然言語を扱うアプリケーションでも幅広く使用することができる。 In order to select the simplest functional expression and generate predicates, it can be widely used in mining and other natural language applications such as text summarization techniques.

従来技術１による述部抽出結果の一例を示す説明図Explanatory drawing which shows an example of the predicate extraction result by the prior art 1. 従来技術２による言い換え結果の一例を示す説明図Explanatory drawing which shows an example of the paraphrase result by the prior art 2. 本発明の述部機能表現正規化装置の実施の形態の一例を示す構成図The block diagram which shows an example of embodiment of the predicate function expression normalization apparatus of this invention 機能表現意味ラベル辞書の一例を示す説明図Explanatory drawing which shows an example of a function expression meaning label dictionary 図３の述部機能表現正規化装置における処理の流れ図Flowchart of processing in the predicate function expression normalization apparatus of FIG. 形態素解析結果の一例を示す説明図Explanatory drawing showing an example of morphological analysis results 述部抽出結果の一例を示す説明図Explanatory diagram showing an example of predicate extraction results 意味ラベル付与結果の一例を示す説明図Explanatory drawing which shows an example of a meaning label assignment result ＮＵＬＬ削除結果の一例を示す説明図Explanatory drawing which shows an example of a NULL deletion result 冗長ラベル削除結果の一例を示す説明図Explanatory drawing which shows an example of a redundant label deletion result 形態素解析結果の他の例を示す説明図Explanatory drawing showing another example of morphological analysis results 述部抽出結果の他の例を示す説明図Explanatory drawing showing another example of predicate extraction results 意味ラベル付与結果の他の例を示す説明図Explanatory drawing which shows the other example of a semantic label provision result ＮＵＬＬ削除結果の他の例を示す説明図Explanatory drawing which shows the other example of a NULL deletion result 冗長ラベル削除結果の他の例を示す説明図Explanatory drawing which shows the other example of a redundant label deletion result

本発明は、表層形は異なるが同じ出来事を表す述部を同じ表層形に正規化することを目的としている。そのため、述部によって表される出来事の意味に直接影響を与える機能表現のみを残した正規化を行う。どの機能表現が述部の意味に影響を与えているか否かを判断する手法として、本発明では、以下に説明する指標に沿って予め作成された機能表現意味ラベル辞書を参照し、述部の意味に影響を与える機能表現にのみ意味ラベルを付与し、その意味ラベルを手がかりに述部全体を正規化する手法をとる。以下、本発明を図示の実施の形態により詳細に説明する。 An object of the present invention is to normalize predicates that represent the same event but with different surface forms to the same surface form. For this reason, normalization is performed by leaving only functional expressions that directly affect the meaning of the event represented by the predicate. As a method of determining which function expression affects the meaning of the predicate, the present invention refers to a function expression meaning label dictionary created in advance according to the index described below, and A method is used in which semantic labels are assigned only to functional expressions that affect meaning, and the entire predicate is normalized using the semantic labels as clues. Hereinafter, the present invention will be described in detail with reference to the illustrated embodiments.

図３は本発明の述部機能表現正規化装置の実施の形態の一例を示すもので、図中、１は機能表現意味ラベル辞書、２は形態素解析部、３は述部抽出部、４は意味ラベル付与部、５はＮＵＬＬ削除部、６は冗長ルール適用部、７は活用生成部である。 FIG. 3 shows an example of an embodiment of the predicate function expression normalization apparatus of the present invention. In the figure, 1 is a function expression meaning label dictionary, 2 is a morpheme analyzer, 3 is a predicate extractor, and 4 is A semantic label assignment unit, 5 is a NULL deletion unit, 6 is a redundancy rule application unit, and 7 is a utilization generation unit.

なお、通常、本装置に入力されるのは複数の文を含む文書であるが、各部での処理は個々の文単位に行われるため、以下の説明では一つの文が入力されたものとして記述する。 Normally, a document containing a plurality of sentences is input to this device. However, since processing in each part is performed for each sentence, it is described in the following description that one sentence is input. To do.

機能表現意味ラベル辞書１は、述部の意味に影響を与える機能表現の意味を表す意味ラベルと、各意味ラベルに対応する機能表現の標準形による文字列のリストと、一つの述部に対応する機能表現が同じ意味ラベルの機能語を複数含む場合にそれらの機能語をどのように保持すべきかを表した意味ラベル毎の冗長ルールとを組み合わせてなるもので、その一例を図４に示す。 The functional expression meaning label dictionary 1 corresponds to a semantic label representing the meaning of the functional expression that affects the meaning of the predicate, a list of character strings in the standard form of the functional expression corresponding to each semantic label, and one predicate. When the function expression to be included includes a plurality of function words having the same semantic label, it is combined with a redundancy rule for each semantic label representing how to retain the function words, and an example is shown in FIG. .

機能表現意味ラベル辞書１は、次の通りに作成する。 The function expression meaning label dictionary 1 is created as follows.

第一に、下記（Ａ）〜（Ｃ）の指標に沿って意味ラベルを作成し、各意味ラベルに対応する機能表現の標準形による文字列のリストを付与する。この指標に沿って意味ラベルを作成することで、「述部の意味に直接影響を与える機能表現」にのみ、意味ラベルを付与することが可能となる。
（Ａ）時制の差異：述部が表す出来事の時制を特徴付ける意味ラベルが入っており、「過去」、「非過去」などの違いを区別できる。
（Ｂ）否定の差異：述部が表す出来事の否定または肯定を区別するための意味ラベルが入っており、「否定」、「肯定」などの違いを区別できる。
（Ｃ）モダリティの差異：述部が表す出来事に話者の主観（即ち、モダリティ表現）が含まれているか否かを区別するための意味ラベルが入っており、「主観」、「客観」などの違いを区別できる。 First, a semantic label is created along the following indices (A) to (C), and a list of character strings in a standard form of functional expression corresponding to each semantic label is given. By creating a semantic label along this index, it is possible to assign a semantic label only to “a functional expression that directly affects the meaning of the predicate”.
(A) Difference in tense: A semantic label characterizing the tense of the event represented by the predicate is included, and a difference such as “past” or “non-past” can be distinguished.
(B) Negative difference: A semantic label for distinguishing negative or positive of the event represented by the predicate is included, and a difference such as “negative” and “positive” can be distinguished.
(C) Modality difference: Contains a semantic label for distinguishing whether the event represented by the predicate includes the speaker's subjectivity (ie, modality expression), such as “subjectivity”, “objective”, etc. Can be distinguished.

本実施の形態では、機能表現の標準形に対して、（Ａ）〜（Ｃ）を満たす意味ラベルとして意味ラベル１２種を付与した。即ち、１２種の意味ラベルとして、（Ａ）を満たすものとして「完了」の意味ラベル（即ち、このラベルが付与されたものが「過去」、されないものが「非過去」）、（Ｂ）を満たすものとして「否定」の意味ラベル（即ち、このラベルが付与されたものが「否定」、されないものが「肯定」）、（Ｃ）を満たすものとして「疑問」「勧誘・意志」「願望」「依頼」「勧め」「必要」「許可」「推量」「可能」の意味ラベル（即ち、このラベルが付与されたものが「主観」、されないものが「客観」）、をエントリーとして用意した（なお、「勧誘・意志」を２種類としてカウントしている。）。 In this embodiment, 12 types of semantic labels are assigned as semantic labels that satisfy (A) to (C) to the standard form of functional expression. That is, as the twelve kinds of semantic labels, the meaning label “completed” is satisfied as satisfying (A) (that is, those given this label are “past”, those not given are “non-past”), and (B) The meaning label of “Negation” as satisfying (that is, “Negation” is given to this label and “Affirmation” is not given), “Question”, “Invitation / will”, “Wish” as satisfying (C) “Request”, “Recommend”, “Necessary”, “Permission”, “Inference”, “Possible” meaning labels (ie, those given this label are “subjective” and those not given are “objective”) as entries ( In addition, “solicitation / will” is counted as two types.)

この意味ラベルの種類の数については、アプリケーションに依存する。例えば、意味ラベル「推量」に対応する機能表現「かもしれない」「だろう」「ようだ」等を、それらの「確信度」に応じて詳細に区別したい場合は「推量１」「推量２」「推量３」等といった異なる意味ラベルを作成して付与するようにしても良い。 The number of types of semantic labels depends on the application. For example, when it is desired to distinguish the functional expressions “maybe”, “maybe”, “yoda” etc. corresponding to the semantic label “estimation” according to their “confidence”, “estimation 1” “estimation 2” Different semantic labels such as “estimation 3” may be created and assigned.

第二に、一つの述部に対応する機能表現が同じ意味ラベルの機能語を複数含む場合にそれらの機能語をどのように保持すべきかを表した冗長ルールを意味ラベル毎に記載する。この冗長ルールは「同一述部内に同じ意味を表す機能語が複数存在する場合、基本的に、そのうちの一つのみ残すことで当該述部が表す意味を保持できる」という本発明の知見に基づくものである。 Second, when the function expression corresponding to one predicate includes a plurality of function words having the same semantic label, a redundancy rule representing how to hold the function words is described for each semantic label. This redundancy rule is based on the knowledge of the present invention that "when there are a plurality of function words representing the same meaning in the same predicate, basically, the meaning represented by the predicate can be retained by leaving only one of them". Is.

本実施の形態では、冗長ルールとして「同一述部内に同じ意味ラベルの機能語が複数存在した際に残すべき意味ラベルの位置を表したもの」を使用する。即ち、冗長ルール「First」は同一意味ラベルの機能語のうち述部の一番最初に出てきた機能語を残すことを表し、冗長ルール「Last」は同一意味ラベルの機能語のうち述部の一番最後に現れた機能語を残すことを表している。但し、このルールは意味ラベル「否定」の場合には「If(even), Delete All If(odd), First（奇数の場合は同一意味ラベルの機能語のうち述部の一番最初に出てきた機能語を残し、偶数の場合は全て削除する）」というルールに変わる。また、「First」「Last」の冗長ルールに例外が生じる場合は、例外ルールを付与する。図４では、意味ラベル「完了」の冗長ルールに「Last」の他、「If(same surfs), First（表層形が同じ時は最初の機能語を残す）」という例外ルールを加える。 In the present embodiment, “the one representing the position of the semantic label to be left when there are a plurality of function words having the same semantic label in the same predicate” is used as the redundancy rule. That is, the redundancy rule “First” indicates that the function word that appears first in the predicate among the function words with the same meaning label is left, and the redundancy rule “Last” indicates the predicate among the function words with the same meaning label. This means that the function word that appears at the end of is left. However, this rule is “If (even), Delete All If (odd), First” when the semantic label is “Negative”. The function word is left, and if it is an even number, it will be deleted) ”. Also, if an exception occurs in the “First” and “Last” redundancy rules, an exception rule is assigned. In FIG. 4, in addition to “Last”, an exception rule “If (same surfs), First (leave the first function word when the surface shape is the same)” is added to the redundancy rule of the semantic label “complete”.

なお、図４中の指標「時制の差異」、「否定の差異」、「モダリティの差異」は説明の便宜上記載したものであって、実際の機能表現意味ラベル辞書１において含める必要があるものではない。また、冗長ルールについては機能表現意味ラベル辞書１に含めず、単独のテーブルとして構成しても良い。 Note that the indicators “difference in tense”, “difference in negative”, and “difference in modality” in FIG. 4 are described for convenience of explanation and are not necessarily included in the actual function expression meaning label dictionary 1. Absent. Further, the redundancy rule may be configured as a single table without being included in the function expression meaning label dictionary 1.

形態素解析部２は、図示しないキーボード等から直接入力され又は記憶媒体から読み出されて入力され又は通信媒体を介して他の装置等から入力された入力文に対し、周知の形態素解析処理を行い、形態素毎の表記、読み、標準形、品詞、活用型、活用形等の単語情報を述部抽出部３へ出力する。 The morpheme analysis unit 2 performs a well-known morpheme analysis process on an input sentence that is directly input from a keyboard or the like (not shown), read from a storage medium, or input from another device or the like via a communication medium. Word information such as notation, reading, standard form, part of speech, inflection type, inflection form, etc. for each morpheme is output to the predicate extraction unit 3.

述部抽出部３は、形態素解析部２から出力された入力文の形態素毎の単語情報のうち、内容語とこれに後続する機能表現に対応する形態素毎の単語情報を入力文の述部として抽出し、意味ラベル付与部４へ出力する。 The predicate extraction unit 3 uses the word information for each morpheme corresponding to the content word and the functional expression that follows from the word information for each morpheme of the input sentence output from the morpheme analysis unit 2 as a predicate for the input sentence. Extracted and output to the semantic label assigning unit 4.

意味ラベル付与部４は、機能表現意味ラベル辞書１を用いて、述部抽出部３から出力された入力文の述部に対応する形態素毎の単語情報のうち、機能表現を構成する機能語列の単語情報に意味ラベルを付与（追加）してＮＵＬＬ削除部５へ出力する。 The semantic label assigning unit 4 uses the functional expression semantic label dictionary 1 to indicate a functional word string that constitutes a functional expression among word information for each morpheme corresponding to the predicate of the input sentence output from the predicate extraction unit 3. A semantic label is attached (added) to the word information of the word information and output to the NULL deletion unit 5.

ＮＵＬＬ削除部５は、意味ラベル付与部４から出力された意味ラベル付与後の入力文の述部に対応する形態素毎の単語情報のうち、意味ラベルが付与されなかった機能語の単語情報を削除して冗長ルール適用部６へ出力する。 The NULL deletion unit 5 deletes the word information of the function word to which the semantic label is not assigned from the word information for each morpheme corresponding to the predicate of the input sentence after the semantic label is output that is output from the semantic label addition unit 4. And output to the redundancy rule application unit 6.

冗長ルール適用部６は、所定の冗長ルール、ここでは前記機能表現意味ラベル辞書１に記載された冗長ルールを用いて、ＮＵＬＬ削除部５から出力された意味ラベル付与後の入力文の述部に対応する形態素毎の単語情報のうち、前記冗長ルールによって除外される機能語の単語情報を削除して活用生成部７へ出力する。 The redundancy rule application unit 6 uses a predetermined redundancy rule, here, the redundancy rule described in the functional expression meaning label dictionary 1, to the predicate of the input sentence after the meaning label is output from the NULL deletion unit 5. Of the word information for each corresponding morpheme, the word information of the function word excluded by the redundancy rule is deleted and output to the utilization generation unit 7.

ここで、冗長ルール適用部６及び活用生成部７は、意味ラベル付与後の入力文の述部に対応する形態素毎の単語情報のうち、当該述部の意味に影響を与えない機能語及び冗長な機能語の単語情報を削除する機能語削除部を構成する。 Here, the redundancy rule application unit 6 and the utilization generation unit 7 are functional words and redundancy that do not affect the meaning of the predicate among the word information for each morpheme corresponding to the predicate of the input sentence after the semantic label is given. A function word deletion unit is configured to delete word information of a simple function word.

活用生成部７は、冗長ルール適用部６から出力された前記述部の意味に影響を与えない機能語及び冗長な機能語を削除した後の入力文の述部に対応する形態素毎の単語情報に基づき、入力文の述部を生成して出力する。 The utilization generation unit 7 outputs word information for each morpheme corresponding to the predicate of the input sentence after deleting the function word that does not affect the meaning of the previous description unit output from the redundancy rule application unit 6 and the redundant function word. Generate and output a predicate for the input statement based on

図５は前述した述部機能表現正規化装置の処理の流れを示すもので、以下、各部の構成及び動作の詳細を具体的な例に沿って説明する。 FIG. 5 shows the flow of processing of the predicate function expression normalization apparatus described above, and the details of the configuration and operation of each part will be described below with specific examples.

＜入力＞
機能表現正規化装置の入力は、日本語で書かれた文である。本実施例では、「パソコンが壊れちゃったよ。」を入力文の例として説明する。 <Input>
The input of the functional expression normalizer is a sentence written in Japanese. In the present embodiment, “the personal computer has been broken” will be described as an example of an input sentence.

＜形態素解析＞
形態素解析部２において、入力文に対し、周知の形態素解析処理を行う（ｓ１）。形態素解析では、文が形態素（単語）単位に分割され、各単語に表記、読み、標準形、品詞、活用型、活用形等の単語情報が付与される。形態素解析器は、公知のものを用いて良い。図６に本実施例における形態素解析結果の例を示す。 <Morphological analysis>
In the morphological analysis unit 2, a known morphological analysis process is performed on the input sentence (s1). In morphological analysis, a sentence is divided into morpheme (word) units, and word information such as notation, reading, standard form, part of speech, inflection type, and inflection form is given to each word. A known morphological analyzer may be used. FIG. 6 shows an example of the morphological analysis result in this embodiment.

＜述部抽出＞
述部抽出部３において、形態素解析結果、即ち入力文の形態素毎の単語情報を入力とし、「どうした」に当たる述部を抽出する（ｓ２）。「述部」は、品詞が動詞、形容詞、形容動詞もしくは副詞、または「だ」などの助動詞に後続されている名詞である「内容語」と、品詞が助詞もしくは助動詞、または「ちゃう」などのそれ自体では内容語としては機能できない非自立性の動詞、名詞、形容詞、形容動詞もしくは副詞（以下、非自立のカテゴリーと称す。）、または「こと」などの形式名詞である機能語を１または２以上含む文字列からなる「機能表現」との組み合わせから成り立っている。 <Predicate extraction>
In the predicate extraction unit 3, a morpheme analysis result, that is, word information for each morpheme of the input sentence is input, and a predicate corresponding to “what” is extracted (s2). "Predicates" are "content words" whose part of speech is a verb, adjective, adjective verb or adverb, or a noun followed by an auxiliary verb such as "da", and the part of speech is a particle or auxiliary verb, or "chau" A function word that is a non-independent verb, noun, adjective, adjective verb or adverb (hereinafter referred to as a non-independent category) or a formal noun such as “koto” that cannot function as a content word by itself It consists of a combination with a “functional expression” consisting of two or more character strings.

述部抽出部３では、入力文の形態素毎の単語情報のうち、内容語とこれに後続する機能表現に対応する形態素毎の単語情報を入力文の述部として抽出する、より詳細には、まず、入力文の形態素毎の単語情報と、機能表現の標準形が記載されている任意の辞書、ここでは前述した機能表現意味ラベル辞書１とをその標準形同士で照合し、一致したものがあればその形態素列の単語情報を機能表現として抽出し、次に残りの形態素毎の単語情報のうち、品詞が助詞もしくは助動詞、または非自立性のカテゴリー、または形式名詞である形態素の単語情報を機能表現として抽出し、最後にそれ以外の品詞が動詞、形容詞、形容動詞もしくは副詞、または助動詞に後続されている名詞である形態素の単語情報を内容語として抽出する。 The predicate extraction unit 3 extracts word information for each morpheme corresponding to the content word and the functional expression that follows from the word information for each morpheme of the input sentence as a predicate of the input sentence. First, the word information for each morpheme of the input sentence and an arbitrary dictionary in which the standard form of the functional expression is described, here, the above-described functional expression meaning label dictionary 1 are collated with each other, and the matched ones are matched. If there is, the word information of the morpheme string is extracted as a functional expression, and then the word information of the morpheme whose part of speech is a particle or auxiliary verb, a category of independence, or a formal noun among the remaining word information for each morpheme. It is extracted as a functional expression, and finally, morpheme word information that is a noun followed by a verb, an adjective, an adjective verb or an adverb, or an auxiliary verb is extracted as a content word.

前述した入力文「パソコンが壊れちゃったよ。」の場合、最初に機能表現意味ラベル辞書１の機能表現の標準形による文字列のリスト（エントリー）と一致する「ちゃ（う）」「た」が機能表現として認識され、次に品詞「助詞」に属する「よ」が機能表現として認識され、最後に品詞「動詞−自立」に属する「壊れ（る）」が内容語として認識される。その結果、「壊れちゃったよ」が述部として抽出される。図７に本実施例における述部抽出結果の例を示す。 In the case of the above-mentioned input sentence “PC has been broken”, “cha” and “ta” that match the list (entry) of character strings in the standard form of the functional expression in the functional expression meaning label dictionary 1 first. Next, “yo” belonging to the part of speech “particle” is recognized as a functional expression, and finally “broken” belonging to the part of speech “verb-independence” is recognized as a content word. As a result, “It's broken” is extracted as a predicate. FIG. 7 shows an example of the predicate extraction result in this embodiment.

＜意味ラベル付与＞
意味ラベル付与部４において、入力文の述部に対応する形態素毎の単語情報を入力とし、機能表現意味ラベル辞書１を用いて、述部が表す「出来事」の意味に影響を与える機能表現に意味ラベルを付与する（ｓ３）。即ち、意味ラベル付与部４では、機能表現意味ラベル辞書１を用いて、入力文の述部に対応する形態素毎の単語情報のうち、機能表現を構成する機能語列の単語情報に意味ラベルを付与する、より詳細には、機能語の単語情報中の標準形に対し、機能表現意味ラベル辞書１の機能表現の標準形による文字列のリストを「後方からの最長一致法」により照合し、対応する意味ラベルを付与する。 <Semantic labeling>
In the semantic label assigning unit 4, word information for each morpheme corresponding to the predicate of the input sentence is input, and a functional expression that affects the meaning of the “event” represented by the predicate using the functional expression semantic label dictionary 1. A semantic label is assigned (s3). That is, the semantic label assigning unit 4 uses the functional expression semantic label dictionary 1 to assign a semantic label to the word information of the functional word string constituting the functional expression among the word information for each morpheme corresponding to the predicate of the input sentence. More specifically, the list of character strings in the standard form of the functional expression in the functional expression meaning label dictionary 1 is collated with the standard form in the word information of the functional word by the “longest matching method from the back”, Assign the corresponding semantic label.

この意味ラベル付与処理は、通常の形態素解析と同様の手法をとると良い。例えば、２つの連続する機能語の接続可否を判定し、接続できる意味ラベル列を付与する方法や、意味ラベル列の尤もらしさを表す確率モデルを用いて、一番尤もらしい意味ラベルを付与する方法などが挙げられる。 This semantic labeling process may be performed in the same manner as normal morphological analysis. For example, a method for determining whether or not two consecutive function words can be connected and assigning a semantic label string that can be connected, or a method for assigning the most likely semantic label using a probability model that represents the likelihood of a semantic label string Etc.

また、この際、意味ラベル付与部４では、機能表現意味ラベル辞書１に対応するエントリーがない機能語、つまり述部の意味に影響を与えない機能表現に対し、意味ラベルが「空」であることを意味する「NULL」等のラベルを付与する。 At this time, in the semantic label assigning unit 4, the semantic label is “empty” for a functional word that does not have an entry corresponding to the functional expression semantic label dictionary 1, that is, a functional expression that does not affect the meaning of the predicate. A label such as “NULL” is attached.

前述したように、述部が「壊れちゃったよ」であった場合、後方からの最長一致法によると、初めに「よ」が解析されるが、機能表現意味ラベル辞書１にエントリーがないため、空の意味ラベル「NULL」が付与される。次に「た」及び「ちゃう」が解析され、それぞれ「完了」の意味ラベルが付与される。述部を構成する形態素に対して後方からの最長一致法で処理を進めた場合の最後の形態素、つまり内容語「壊れる」の直前でこの意味ラベル付与作業を終了する。図８に本実施例における意味ラベル付与結果の例を示す。 As described above, when the predicate is “broken”, according to the longest match method from the back, “yo” is first analyzed, but there is no entry in the functional expression meaning label dictionary 1, An empty semantic label "NULL" is given. Next, “ta” and “chau” are analyzed, and a “complete” semantic label is assigned to each. This semantic label assignment operation is terminated immediately before the last morpheme when the morpheme constituting the predicate is processed by the longest match method from the back, that is, immediately before the content word “break”. FIG. 8 shows an example of the meaning label assignment result in this embodiment.

＜意味ラベルに基づく機能語削除＞
＜ＮＵＬＬの削除＞
述部に付与された意味ラベルに基づき、最も単純な述部への正規化を行う。本発明では、（Ａ）〜（Ｃ）の指標に沿い「述部が表す意味に関係があるものにのみ意味ラベルを付与」する手法をとっている。そのため、まず、ＮＵＬＬ削除部５において、意味ラベル付与後の入力文の述部に対応する形態素毎の単語情報を入力とし、意味ラベルが付与されなかった機能語、ここでは空の意味ラベル「NULL」が付与された機能語について、述部の意味に影響を与えないとして、そのエントリー全て（単語情報）を削除する（ｓ４）。 <Function word deletion based on semantic labels>
<Deleting NULL>
Based on the semantic label attached to the predicate, normalization to the simplest predicate is performed. In the present invention, a method of “giving a semantic label only to a thing related to the meaning represented by the predicate” is taken along the indicators (A) to (C). Therefore, first, in the NULL deletion unit 5, word information for each morpheme corresponding to the predicate of the input sentence after the semantic label is given as an input, and a function word to which no semantic label is assigned, here an empty semantic label “NULL” As for the function word to which "" is given, all the entries (word information) are deleted assuming that the meaning of the predicate is not affected (s4).

前述したように、述部の機能表現が「ちゃったよ」であった場合、空の意味ラベル「NULL」が付与された「よ」が削除され、「ちゃった」という機能表現に単純化される。図９に本実施例におけるＮＵＬＬ削除結果の例を示す。 As mentioned above, if the predicate's functional expression is “Cha-ta-yo”, “yo” with the empty semantic label “NULL” is deleted, and the function expression is simplified to “Cha-ta”. . FIG. 9 shows an example of a NULL deletion result in this embodiment.

＜冗長ラベルの削除＞
次に、最も単純な述部に変換させるため、冗長ルール適用部６において、意味ラベル付与後でかつＮＵＬＬ削除後の入力文の述部に対応する形態素毎の単語情報を入力とし、機能表現意味ラベル辞書１に記載の冗長ルールを用いて、冗長な機能語について、そのエントリー全て（単語情報）を削除する（ｓ５）。 <Delete redundant label>
Next, in order to convert the predicate into the simplest predicate, the redundant rule application unit 6 takes as input the word information for each morpheme corresponding to the predicate of the input sentence after giving the semantic label and after deleting NULL, and the function expression meaning Using the redundancy rule described in the label dictionary 1, all entries (word information) of redundant function words are deleted (s5).

冗長ルールというのは、前述したように「同一述部内に同じ意味を表す機能語が複数存在する場合、そのうちの一つのみ残すことで当該述部が表す意味を保持できる」という本発明の知見に基づくものである。但し、このルールは意味ラベルが「否定」の場合には、「奇数の場合は残し、偶数の場合は削除」というルールに変わる。 As described above, the redundancy rule is that when there are a plurality of function words representing the same meaning in the same predicate, the meaning represented by the predicate can be retained by leaving only one of them. It is based on. However, this rule changes to a rule of “remain in case of odd number and delete in case of even number” when the semantic label is “negative”.

意味ラベル付与後でかつＮＵＬＬ削除後の述部が「壊れちゃった」であった場合、初めに当該意味ラベル付与後でかつＮＵＬＬ削除後の述部に対応する形態素毎の単語情報中に同一意味ラベルが２つ以上存在するか否かを調べ、２つ以上存在する場合は冗長ルールに沿って削除を行う。 If the predicate after the semantic label is added and after NULL deletion is “broken”, the same meaning is included in the word information for each morpheme corresponding to the predicate after the semantic label is applied and after NULL is deleted. It is checked whether or not there are two or more labels. If two or more labels exist, deletion is performed according to the redundancy rule.

本実施例の述部「壊れちゃった」に対応する形態素毎の単語情報には意味ラベル「完了」が２つ存在しているので、冗長ルールの対象となる。「完了」の冗長ルールは「表層形が同じ場合は先頭（First）、それ以外は最後（Last）を残す」というものである。これに沿い、機能語「ちゃっ」が削除され、一番単純な意味ラベル「完了」の機能語「た」が残される。図１０に本実施例における冗長ラベル削除結果の例を示す。 Since the word information for each morpheme corresponding to the predicate “broken” in this embodiment has two semantic labels “complete”, it is subject to a redundancy rule. The redundancy rule of “complete” is “the first (if the surface layer shape is the same), the first (Last) is left otherwise”. Along with this, the function word “Cha” is deleted, and the function word “ta” with the simplest semantic label “complete” is left. FIG. 10 shows an example of the redundant label deletion result in this embodiment.

＜活用生成＞
最後に、活用生成部７において、述部の意味に影響を与えない機能語及び冗長な機能語を削除した後の入力文の述部に対応する形態素毎の単語情報を入力とし、各形態素の接続を行って入力文の述部を生成する（ｓ６）。 <Utilization generation>
Finally, in the utilization generation unit 7, word information for each morpheme corresponding to the predicate of the input statement after deleting the function word that does not affect the meaning of the predicate and the redundant function word is input, Connection is made to generate a predicate of the input statement (s6).

日本語では、形態素の接続を行うには、形態素（単語）を活用させる必要があるが、どの活用形にするかは、後続する単語の表層形及び品詞で決まる。 In Japanese, it is necessary to use morphemes (words) in order to connect morphemes, but which one is used depends on the surface form and part of speech of the following word.

この単語の活用を含む形態素の接続処理には、言語モデルによる活用生成器を使用することができる。これは、予め正解データより、前方の単語の表層形、品詞及び活用型と、後方の単語の表層形及び品詞とを素性として「どの接続が尤もらしいか」を学習したモデルを用いた生成器である。このモデルをもとに、新しく前後の単語の表層形、品詞及び活用型が入力された際に最適な表記を生成する。 For the morpheme connection process including the use of the word, a utilization generator based on a language model can be used. This is a generator that uses a model that has learned from the correct data in advance which surface type, part-of-speech and inflection type of the front word, and the surface form and part-of-speech of the rear word as features. It is. Based on this model, an optimal notation is generated when the surface form, part of speech, and utilization type of new and previous words are input.

また、一番最後の形態素の場合は後続する単語がないので、文の終わりを表す形態素（例えば、句点）を接続生成の際に追加するか、もしくは「最後の機能語は標準形に直す」等の追加ルールを加える。言語モデルによる活用生成器を使用する他に、活用変換ルールをもとに述部を生成することもできる。 In the case of the last morpheme, there is no following word, so a morpheme (for example, a punctuation mark) indicating the end of a sentence is added at the time of connection generation, or “last function word is changed to a standard form” Add additional rules such as In addition to using a utilization generator based on a language model, predicates can also be generated based on utilization conversion rules.

本実施例では、内容語「壊れ（る）」と機能表現「た」を接続させる必要があるが、前述した言語モデルによる活用生成器に、内容語「壊れ（る）」の「表記；品詞；活用型」である「壊れ；動詞−自立；一段」と、機能表現「た」の「た；助動詞；特殊・タ」と、文の終わりを表す形態素「。」の「。；記号−句点」とを入力することで、正しく接続された述部である「壊れた。」を生成することができる。 In the present embodiment, it is necessary to connect the content word “broken” and the function expression “ta”, but the “notation; part of speech” of the content word “broken” is added to the utilization generator based on the language model described above. ; "Breaking; verb-independence; one step" which is "utilization type", "ta; auxiliary verb; special-ta" of functional expression "ta", and ".; Symbol-punctuation" of morpheme "." ”Can be input to generate a correctly connected predicate“ broken. ”.

以上説明したプロセスにより、「パソコンが壊れちゃったよ」という入力に対して、実際に起きている出来事を表す述部の意味を変えずに、最も単純な述部である「壊れた」が生成できる。本発明の手法をとることで、「壊れちゃったね」「壊れてしまいました」等、＜従来技術２＞で述べた非特許文献２の手法ではまとめることができなかった述部も全て「壊れた」という形に正規化される。その結果、入力文の表層形の異なりに拘わらず、マイニングで必要な同等の出来事を表している述部表現を「同じ」としてまとめることができる。 With the process described above, the simplest predicate “broken” can be generated without changing the meaning of the predicate that represents the actual event that occurred, in response to the input “PC is broken”. . By using the method of the present invention, all the predicates that could not be put together by the method of Non-Patent Document 2 described in <Prior Art 2>, such as “It's broken” or “It's broken” Normalized to the form As a result, predicate expressions representing equivalent events necessary for mining can be grouped as “same” regardless of the surface layer form of the input sentence.

次に、より複雑な文「その時計が壊れたかも知れなかったらしいのだった」が入力された場合の動作について説明する。実施例１の場合と同様に、形態素解析部２において入力文が形態素解析され（ｓ１）、図１１に示す形態素解析結果が得られ、また、この形態素解析結果に対し述部抽出部３において述部抽出が行われ（ｓ２）、図１２に示す述部抽出結果が得られる。 Next, the operation when a more complicated sentence “It seems that the clock may have been broken” is input will be described. As in the case of the first embodiment, the morphological analysis unit 2 performs morpheme analysis on the input sentence (s1), and the morpheme analysis result shown in FIG. 11 is obtained. Part extraction is performed (s2), and the predicate extraction result shown in FIG. 12 is obtained.

この際、述部抽出では、最初に機能表現意味ラベル辞書１のエントリーと一致する「た」「かも/知れ（る）/ない」「た」「らしい」「た」が機能語として認識される。次に、品詞「非自立のカテゴリー」「助動詞」に属する「の」「だ」が機能語として認識される。最後に品詞「動詞−自立」に属する「壊れ（る）」が内容語として認識される。その結果、「壊れたかも知れなかったらしいのだった」が述部として抽出される。 At this time, in the predicate extraction, “ta”, “may / know (ru) / not”, “ta”, “like”, and “ta” that match the entries in the function expression meaning label dictionary 1 are recognized as function words first. . Next, “no” and “da” belonging to the part of speech “non-independent category” and “auxiliary verb” are recognized as function words. Finally, “broken” belonging to the part of speech “verb—independence” is recognized as a content word. As a result, “It seems that it may have been broken” is extracted as a predicate.

＜意味ラベル付与＞
次に、意味ラベル付与部４において、入力文の述部に対応する形態素毎の単語情報に対し、実施例１の場合と同様に、機能表現意味ラベル辞書１を用いた意味ラベル付与が行われる（ｓ３）。 <Semantic labeling>
Next, in the semantic label assigning section 4, semantic label assignment using the functional expression semantic label dictionary 1 is performed on the word information for each morpheme corresponding to the predicate of the input sentence, as in the case of the first embodiment. (S3).

前述したように、述部が「壊れたかも知れなかったらしいのだった」であった場合、後方からの最長一致法によると、初めに「た」が解析され、「完了」の意味ラベルが付与される。次に「だ」及び「の」が解析されるが、機能表現意味ラベル辞書１にエントリーがないため、空の意味ラベル「NULL」が付与される。次に「らしい」「た」「かも/知れ（る）/ない」及び「た」が解析され、それぞれ「推量」「完了」「推量」及び「完了」の意味ラベルが付与される。述部の最後の形態素である内容語「壊れる」の直前でこの意味ラベル付与作業を終了する。図１３に本実施例における意味ラベル付与結果の例を示す。 As mentioned above, if the predicate was "It seemed that it might have been broken", according to the longest match method from the back, "ta" is first analyzed and the meaning label "Complete" is given. Is done. Next, “da” and “no” are analyzed, but since there is no entry in the function expression semantic label dictionary 1, an empty semantic label “NULL” is given. Next, “like”, “ta”, “may / know (ru) / not”, and “ta” are analyzed, and the meaning labels “estimate”, “complete”, “estimate”, and “complete” are assigned, respectively. This semantic label assignment operation is terminated immediately before the content word “broken”, which is the last morpheme of the predicate. FIG. 13 shows an example of the meaning label assignment result in this embodiment.

＜意味ラベルに基づく機能語削除＞
＜ＮＵＬＬの削除＞
次に、ＮＵＬＬ削除部５において、意味ラベル付与後の入力文の述部に対応する形態素毎の単語情報に対し、実施例１の場合と同様に、空の意味ラベル「NULL」が付与された機能語のエントリー全て（単語情報）を削除する（ｓ４）。 <Function word deletion based on semantic labels>
<Deleting NULL>
Next, in the NULL deletion unit 5, the empty semantic label “NULL” is assigned to the word information for each morpheme corresponding to the predicate of the input sentence after the semantic label is added, as in the first embodiment. All function word entries (word information) are deleted (s4).

前述したように、述部の機能表現が「たかも知れなかったらしいのだった」であった場合、空の意味ラベル「NULL」が付与された「の」及び「だ」が削除され、「たかも知れなかったらしいた」という機能表現に単純化される。図１４に本実施例におけるＮＵＬＬ削除結果の例を示す。 As mentioned above, if the predicate's functional expression was “It seemed like it did not exist”, “no” and “da” with the empty semantic label “NULL” were deleted, and “ It may be simplified to a functional expression that it may not have been. FIG. 14 shows an example of a NULL deletion result in this embodiment.

＜冗長ラベルの削除＞
次に、冗長ルール適用部６において、意味ラベル付与後でかつＮＵＬＬ削除後の入力文の述部に対応する形態素毎の単語情報に対し、実施例１の場合と同様に、機能表現意味ラベル辞書１に記載の冗長ルールを用いて、冗長な機能語のエントリー全て（単語情報）を削除する（ｓ５）。 <Delete redundant label>
Next, in the redundancy rule application unit 6, as in the case of the first embodiment, the function expression meaning label dictionary is applied to the word information for each morpheme corresponding to the predicate of the input sentence after the semantic label is added and after the NULL is deleted. Using the redundancy rule described in 1, all redundant function word entries (word information) are deleted (s5).

本実施例の述部「壊れたかも知れなかったらしいた」に対応する形態素毎の単語情報には意味ラベル「完了」が３つ、「推量」が２つ存在しているため、冗長ルールを適用する必要がある。「完了」の冗長ルールは「表層形が同じ場合は先頭（First）、それ以外は最後（Last）を残す」というものである。一方、「推量」の冗長ルールは「先頭（First）を残す」というものである。 Since the word information for each morpheme corresponding to the predicate “it may have been broken” in this example has three semantic labels “complete” and two “estimates”, the redundancy rule is applied. There is a need to. The redundancy rule of “complete” is “the first (if the surface layer shape is the same), the first (Last) is left otherwise”. On the other hand, the redundancy rule of “estimation” is “leave first”.

これに沿い、冗長な意味ラベルの削減を行う。本実施例では、完了を表す「た」は全て表層形が同じなので、先頭の「た」が保持される。推量を表す「かも知れなかっ」及び「らしい」は冗長ルールに沿って先頭の「かも知れなかっ」が保持される。その結果、機能語「た」「かも」「知れ」「なかっ」が残る。図１５に本実施例における冗長ラベル削除結果の例を示す。 Along with this, redundant semantic labels are reduced. In this embodiment, since “ta” indicating completion is the same in the surface layer shape, the first “ta” is retained. “Maybe” and “Like” representing the guess are retained along the redundancy rule. As a result, the function words “ta” “kamo” “know” “no” remain. FIG. 15 shows an example of the redundant label deletion result in this embodiment.

＜活用生成＞
最後に、活用生成部７において、述部の意味に影響を与えない機能語及び冗長な機能語を削除した後の入力文の述部に対応する形態素毎の単語情報に対し、実施例１の場合と同様に、各形態素の接続を行って入力文の述部を生成する（ｓ６）。 <Utilization generation>
Finally, in the utilization generation unit 7, the word information for each morpheme corresponding to the predicate of the input sentence after deleting the function word that does not affect the meaning of the predicate and the redundant function word of the first embodiment. Similarly to the case, each morpheme is connected to generate a predicate of the input statement (s6).

本実施例では、動詞「壊れる」と機能表現「た」「かも」「知れ」「なかっ」を接続させる必要があるが、前述した言語モデルによる活用生成器に、内容語「壊れ（る）」の「表記；品詞；活用型」である「壊れ；動詞−自立；一段」と、機能表現「た」の「た；助動詞；特殊・タ」、「かも」の「かも；助詞−副助詞」と、「知れ（る）」の「知れ；動詞−自立；一段」、「ない」の「なかっ；助動詞；特殊・ナイ」と、文の終わりを表す形態素「。」の「。；記号−句点」とを入力することで、正しく接続された述部である「壊れたかも知れない。」が生成される。 In this embodiment, it is necessary to connect the verb “broken” to the functional expressions “ta”, “kam”, “know”, and “not”, but the content word “broken” is connected to the utilization generator based on the language model described above. "Denotation; part of speech; inflection type" is "broken; verb-independence; one step" and "ta; auxiliary verb; special-ta" of functional expression "ta", "kam; particle-auxiliary particle" of "kam" “Knowledge (verb)”, “Verb—independence; one step”, “No”, “No; auxiliary verb; Special / Nai”, and “.” In the morpheme “.” Representing the end of the sentence. ”Is generated, a correctly connected predicate“ may have been broken ”is generated.

以上説明したプロセスにより、「壊れたかも知れなかったらしいのだった」は、「壊れたかも知れない」と正規化され、最も単純な形に正規化されつつも、「壊れちゃった」とは異なる意味を表すことが分かるように正規化された。その結果、＜従来技術１＞で述べた非特許文献１の手法とは異なり、述部が表す意味の違いは明確に区別した正規化を行うことができる。 By the process described above, “It seems that it may have been broken” is normalized as “It may have been broken”, and it is normalized to the simplest form, but different from “It has broken” Normalized to show meaning. As a result, unlike the method of Non-Patent Document 1 described in <Prior Art 1>, normalization can be performed in which the difference in meaning represented by the predicate is clearly distinguished.

１：機能表現意味ラベル辞書、２：形態素解析部、３：述部抽出部、４：意味ラベル付与部、５：ＮＵＬＬ削除部、６：冗長ルール適用部、７：活用生成部。 1: functional expression meaning label dictionary, 2: morpheme analysis unit, 3: predicate extraction unit, 4: semantic label assignment unit, 5: NULL deletion unit, 6: redundancy rule application unit, 7: utilization generation unit.

Claims

Without changing the meaning of the predicate of the input statement that consists of a combination of a content word and a functional expression consisting of a character string that follows it and contains one or more function words. To convert to a simple form,
A morpheme analyzer performs morpheme analysis on the input sentence and outputs word information for each morpheme;
A predicate extraction unit that extracts word information for each morpheme corresponding to a content word and a functional expression subsequent to the content word among word information for each morpheme of the input sentence as a predicate of the input sentence;
The semantic label assignment unit uses a functional expression semantic label dictionary that combines at least a semantic label that represents the meaning of the functional expression that affects the meaning of the predicate and a list of functional expressions corresponding to each semantic label. Of the word information for each morpheme corresponding to the predicate of the sentence, the step of giving a semantic label to the word information of the function word string constituting the function expression;
The function word deletion unit deletes the word information of the function word that does not affect the meaning of the predicate and the word information of the redundant function word from the word information for each morpheme corresponding to the predicate of the input sentence after the meaning label is given. And steps to
Utilization generation unit generates predicate of input sentence based on word information for each morpheme corresponding to predicate of input sentence after deleting function words that do not affect meaning of previous description part and redundant function words And a predicate function expression normalization method.

Function word deletion step
A NULL deletion unit that deletes word information of a functional word to which no semantic label has been assigned, out of word information for each morpheme corresponding to the predicate of the input sentence after the semantic label has been assigned;
The redundancy rule application unit uses a redundancy rule for each semantic label that indicates how to retain these function words when the function expression corresponding to one predicate includes a plurality of function words with the same semantic label. And deleting the word information of the functional word excluded by the redundancy rule from the word information for each morpheme corresponding to the predicate of the input sentence after the semantic label is added. Description predicate function expression normalization method.

Semantic labels that characterize the tense of events represented by predicates, semantic labels and predicates for distinguishing negation or affirmation of events represented by predicates, as semantic labels that represent the meaning of functional expressions that affect the meaning of predicates The predicate function expression normalization method according to claim 1 or 2, wherein a semantic label for distinguishing whether or not the subject's subjectivity is included in the event to be expressed is used.

Without changing the meaning of the predicate of the input statement that consists of a combination of a content word and a functional expression consisting of a character string that follows it and contains one or more function words. It is a device that converts to a simple form,
A function expression meaning label dictionary comprising at least a combination of a meaning label that represents the meaning of the function expression that affects the meaning of the predicate, and a list of function expressions corresponding to each meaning label;
A morpheme analysis unit that performs morpheme analysis processing on the input sentence and outputs word information for each morpheme,
Among the word information for each morpheme of the input sentence, a predicate extraction unit that extracts word information for each morpheme corresponding to the content word and the functional expression subsequent thereto as a predicate of the input sentence;
Using a functional expression meaning label dictionary, among the word information for each morpheme corresponding to the predicate of the input sentence, a semantic label giving unit for giving a semantic label to the word information of the functional word string constituting the functional expression;
A function word deleting unit that deletes word information of redundant function words and function words that do not affect the meaning of the predicate among the word information for each morpheme corresponding to the predicate of the input sentence after the semantic label is given; ,
A utilization generator that generates a predicate of the input sentence based on word information for each morpheme corresponding to the predicate of the input sentence after the function word that does not affect the meaning of the previous description part and the redundant function word are deleted; A predicate function expression normalizing apparatus comprising:

The function word deletion part
A NULL deletion unit that deletes word information of a function word to which no semantic label is assigned, out of word information for each morpheme corresponding to the predicate of the input sentence after the semantic label is assigned
If the functional expression corresponding to one predicate contains multiple functional words with the same semantic label, the redundant rule for each semantic label that indicates how to retain those functional words is used. 5. The predicate according to claim 4, further comprising: a redundancy rule application unit that deletes word information of function words excluded by the redundancy rule from word information for each morpheme corresponding to the predicate of the input sentence. Functional expression normalization device.

Semantic labels that characterize the tense of events represented by predicates, semantic labels and predicates for distinguishing negation or affirmation of events represented by predicates, as semantic labels that represent the meaning of functional expressions that affect the meaning of predicates The predicate function expression normalization apparatus according to claim 4 or 5, wherein a semantic label for distinguishing whether or not the subject's subjectivity is included in the represented event is used.

The program for functioning a computer as each means of the apparatus in any one of Claims 4 thru | or 6.