JP5225219B2

JP5225219B2 - Predicate term structure analysis method, apparatus and program thereof

Info

Publication number: JP5225219B2
Application number: JP2009155317A
Authority: JP
Inventors: 賢治今村; 邦子齋藤; 朋子泉
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2009-06-30
Filing date: 2009-06-30
Publication date: 2013-07-03
Anticipated expiration: 2029-06-30
Also published as: JP2011013776A

Description

本発明は、複数の文からなる文書に対し、その中に含まれる述語と名詞句との関係を、格の観点から解析する技術に関するものである。 The present invention relates to a technique for analyzing a relationship between a predicate and a noun phrase included in a document composed of a plurality of sentences from the viewpoint of case.

＜タスクの説明＞
述語項構造解析とは、自然言語で記述された文の意味解析を行うものである。具体的には、文における「どうした」などの「述語」に対応する「格」、即ち「誰が／何が」などの「ガ格」、「何を」などの「ヲ格」、「どこに」などの「ニ格」等、に該当する部分（項）を特定するものである。 <Description of task>
The predicate term structure analysis is a semantic analysis of a sentence described in a natural language. Specifically, a “case” corresponding to a “predicate” such as “what” in the sentence, that is, “a case” such as “who / what”, “wo case” such as “what”, “where” The part (term) corresponding to “dignity” such as “” is specified.

例えば、複数の文からなる文書として、図１に示すような文書１「昨日彼女はカレーを作った。そしてお昼にも食べた。」があったとする。この文書１の第１文には、述語「作る（標準形）」が存在する。述語項構造解析では、この述語「作る」に対応するガ格の項は「彼女」であり、ヲ格の項は「カレー」であることを特定する。 For example, it is assumed that there is a document 1 “she made curry yesterday and ate it at noon” as shown in FIG. In the first sentence of this document 1, there is a predicate “create (standard form)”. In the predicate term structure analysis, it is specified that the ga case term corresponding to the predicate “create” is “she” and the wo case term is “curry”.

しかし、特に日本語文においては、項となるべき単語（単語が複数の場合もあるため、以下、単語１つの場合も含めて名詞句と呼ぶ。）が省略される場合が頻繁にある。これをゼロ代名詞と呼ぶ。文書１の第２文はその例である。 However, particularly in Japanese sentences, words that should be terms (sometimes there are a plurality of words, hereinafter also referred to as noun phrases including one word) are often omitted. This is called zero pronoun. The second sentence of document 1 is an example.

文書１の第２文には述語として「食べる（標準形）」がある。この述語に対応するガ格の項は「彼女」、ヲ格の項は「カレー」、ニ格の項は「お昼」である。しかし、第２文中には、実際には項となるべき名詞句は「お昼」しか現れていない。このような文の述語項構造解析を行うには、第１文から名詞句「彼女」や「カレー」を補わなければならない。 The second sentence of document 1 has “eat (standard form)” as a predicate. The term of the ga case corresponding to this predicate is “she”, the term of wo case is “curry”, and the term of d case is “lunch”. However, in the second sentence, only “noon” appears as a noun phrase that should actually be a term. In order to analyze the predicate term structure of such a sentence, the noun phrases “she” and “curry” must be supplemented from the first sentence.

＜従来技術による述語項構造解析＞
従来技術では、一文中に述語とその項の両者が存在するものだけを対象に解析を行っていた。従来の述語項構造解析装置の一例を図２に、従来の述語項構造解析装置における処理の流れを図３にそれぞれ示す。 <Predicate structure analysis by conventional technology>
In the prior art, analysis is performed only for a sentence in which both a predicate and its term exist in one sentence. An example of a conventional predicate term structure analyzing apparatus is shown in FIG. 2, and a flow of processing in the conventional predicate term structure analyzing apparatus is shown in FIG.

〈形態素解析・係り受け解析〉
まず、制御部７に入力された文書に対し、形態素解析・係り受け解析部１により、一文毎に形態素解析を行って単語に分割し、各単語の品詞を特定して単語列を得る。次に、同じく形態素解析・係り受け解析部１により、前記形態素解析結果（単語列）を元に各文を文節に分割して文節列を得て、さらに文節同士の係り受け構造（どの文節がどの文節に係るか）を特定して文節係り受け関係（係り先文節番号）を得る（ｓ１）。文節とは、日本語において、１個以上の内容語（名詞、動詞、形容詞、副詞等）と、０個以上の機能語（助詞、助動詞等）とから成り立つ句のことである。なお、形態素解析・係り受け解析部１は、既存の形態素解析器及び係り受け解析器を用いて構成することができる。 <Morphological analysis and dependency analysis>
First, the morphological analysis and dependency analysis unit 1 performs a morphological analysis for each sentence on the document input to the control unit 7 and divides it into words, specifies the part of speech of each word, and obtains a word string. Next, similarly, the morpheme analysis / dependence analysis unit 1 divides each sentence into phrases based on the morpheme analysis result (word string) to obtain a phrase string, and further, the dependency structure between phrases (which clause is A clause dependency relationship (destination clause number) is obtained by specifying which clause) (s1). A phrase is a phrase composed of one or more content words (nouns, verbs, adjectives, adverbs, etc.) and zero or more function words (particles, auxiliary verbs, etc.) in Japanese. The morpheme analysis / dependence analysis unit 1 can be configured using an existing morpheme analyzer and dependency analysis.

この際、例えば入力文書（解析すべき対象）が前述した文書１であれば、図４に示すような結果が得られる。 At this time, for example, if the input document (object to be analyzed) is the document 1 described above, a result as shown in FIG. 4 is obtained.

〈述語同定〉
次に、述語同定部２により、処理対象文の述語を全て同定（抽出）する（ｓ２）。具体的には、処理対象文の各単語の品詞を元に、用言性の部分単語列を抽出して述語とする。例えば、「動詞」、「形容詞」、「直後に動詞『する』が後続しているサ変名詞」、「直後に助動詞『だ』が後続している形容名詞」を述語とする。 <Predicate identification>
Next, the predicate identifying unit 2 identifies (extracts) all predicates of the processing target statement (s2). Specifically, based on the part-of-speech of each word of the processing target sentence, a predicate partial word string is extracted and used as a predicate. For example, “verb”, “adjective”, “sa variable noun immediately followed by verb“ do ””, and “adjective noun immediately followed by auxiliary verb“ da ”” are predicates.

文書１の例では、第１文の述語は「作る（標準形）」、第２文の述語は「食べる」となる。以下、第１文の述語「作る」を対象にした処理を説明する。 In the example of document 1, the predicate of the first sentence is “create (standard form)”, and the predicate of the second sentence is “eat”. Hereinafter, processing for the predicate “create” of the first sentence will be described.

〈候補名詞句抽出〉
次に、候補名詞句抽出部３により、処理対象文から全ての名詞句を抽出する。但し、当該処理対象文における、その時点で処理中の述語ｖは除く。名詞句かどうかは、通常、品詞を元に判断される。例えば、文節の内容語列の最後尾（内容語主辞）の品詞が名詞、代名詞、名詞接尾辞のいずれかであれば、当該文節の内容語列を名詞句とみなす。また、このようにして抽出した名詞句に特殊名詞句NULLを追加し、これらを候補名詞句とする（ｓ３）。 <Candidate noun phrase extraction>
Next, the candidate noun phrase extraction unit 3 extracts all noun phrases from the processing target sentence. However, the predicate v being processed at that time in the processing target sentence is excluded. Whether it is a noun phrase is usually determined based on the part of speech. For example, if the part-of-speech at the end of a phrase content word string (content word main word) is one of a noun, a pronoun, or a noun suffix, the contents word string of the phrase is regarded as a noun phrase. Further, the special noun phrase NULL is added to the noun phrase extracted in this way, and these are used as candidate noun phrases (s3).

この特殊名詞句NULLは、後述する項同定において、その述語に対応する格（に該当する項）がないことを示すための名詞句である。 This special noun phrase NULL is a noun phrase indicating that there is no case corresponding to the predicate in the term identification described later.

第１文の述語「作る」に対応する候補名詞句は、「昨日」、「彼女」、「カレー」及び「NULL」の４つとなる。 There are four candidate noun phrases corresponding to the predicate “Make” of the first sentence: “Yesterday”, “She”, “Curry”, and “NULL”.

〈素性選択〉
次に、素性選択部４により、候補名詞句中の各候補ｎについて、述語ｖ、候補ｎ、形態素・係り受け解析結果（解析済み文）から、素性を選択して素性集合を作成する（ｓ４）。素性としては、例えば図５に示すように、述語ｖに関する素性（述語関連）、候補ｎに関する素性（候補関連）、述語ｖと候補ｎの関係に関する素性（述語・候補相対位置関係）が考えられる。 <Feature selection>
Next, for each candidate n in the candidate noun phrase, the feature selection unit 4 selects a feature from the predicate v, the candidate n, and the morpheme / dependency analysis result (analyzed sentence) to create a feature set (s4 ). As the features, for example, as shown in FIG. 5, a feature related to the predicate v (predicate related), a feature related to the candidate n (candidate related), and a feature related to the relationship between the predicate v and the candidate n (predicate / candidate relative positional relationship) can be considered. .

このように、第１文の述語「作る」に関して、候補名詞句の各候補「昨日」、「彼女」、「カレー」及び「NULL」の素性を選択すると、図６に示す通りになる。 As described above, regarding the predicate “make” of the first sentence, when the features of the candidate noun phrases “Yesterday”, “She”, “Curry”, and “NULL” are selected, the result is as shown in FIG.

〈項同定〉
次に、項同定部５により、前記選択された素性を用い、格毎に予め学習された項同定モデル（の記憶部）６を参照して、候補ｎが各格、即ちガ格、ヲ格、ニ格であるときの項スコアを算出する（ｓ５）。項同定モデル６は、述語項構造の正解データ（コーパス）から、最大事後確率推定法等を用いて、予め学習されたものである。項同定モデルの一例を図７に示す。 <Term identification>
Next, by using the selected feature, the term identification unit 5 refers to the term identification model (storage unit) 6 learned in advance for each case, and the candidate n is classified into each case, that is, a case, a case, and a case. Then, the term score for the second case is calculated (s5). The term identification model 6 is learned in advance from the correct data (corpus) of the predicate term structure using the maximum posterior probability estimation method or the like. An example of the term identification model is shown in FIG.

述語ｖ、候補ｎ、格ｃに対応する項スコアscore は、以下の式（１）を用いて算出する。これは、識別モデル（discriminative models）を用いた分類器のスコア算出法と同じで、score が高いほど、候補ｎが述語ｖの格ｃとして尤もらしいことを表す。 The term score score corresponding to the predicate v, the candidate n, and the case c is calculated using the following equation (1). This is the same as the score calculation method of the classifier using discriminative models. The higher the score, the more likely the candidate n is as the case c of the predicate v.

但し、ｄ_c(n)は候補ｎが格ｃの項となった時のみ１、それ以外は０となる関数である。Ｘは素性集合、ｆ_k（ｄ_c(n)＝１，Ｘ）は素性関数で、素性が引数の条件を満たすときのみ１、それ以外は０となる関数であるが、素性がその値に実数値を有する場合は、引数の条件を満たしたときのみ、素性の値そのものを返す。また、 However, d _c (n) is a function which is 1 only when the candidate n becomes a term of case c, and 0 otherwise. X is a feature set, f _k (d _c (n) = 1, X) is a feature function, which is 1 only when the feature satisfies the condition of the argument, and 0 otherwise. If it has a real value, the feature value is returned only when the condition of the argument is satisfied. Also,

は、素性関数ｆ_kの、格ｃに関する重みである。 Is a weight related to the case c of the feature function f _k .

例えば、第１文の述語ｖ「作る」、候補ｎ「彼女」に関して、各素性の素性関数の重みと、部分スコア For example, regarding the predicate v “make” and candidate n “her” in the first sentence, the weight of the feature function of each feature and the partial score

を算出すると、図８に示す通りとなる。これを、各格毎に総和を算出すると、ガ格の項スコアは6.568、ヲ格の項スコアは1.967、ニ格の項スコアは0.802となる。 Is calculated as shown in FIG. If the sum is calculated for each case, the item score for the case is 6.568, the item score for the case is 1.967, and the item score for the case is 0.802.

項同定部５により、全ての候補ｎについて、項スコアの算出を繰り返す。そして、各格毎に項スコア最大の候補ｎ_max を取得し、述語ｖの項とする（ｓ６）。但し、候補NULLが取得された場合、その格は「空」とする。 The term identification unit 5 repeats the calculation of term scores for all candidates n. Then, the candidate n _max with the maximum term score is acquired for each case and used as the term of the predicate v (s6). However, if a candidate NULL is acquired, the case is “empty”.

第１文の述語「作る」に関する全候補の項スコアは、図９に示す通りであるため、結果、ガ格の項は「彼女」、ヲ格の項は「カレー」、ニ格の項は「空」となり、述語項構造が完成する。 The term scores of all candidates related to the predicate “Make” in the first sentence are as shown in FIG. 9. As a result, the term “ga” is “she”, the term “wo” is “curry”, and the term “ni” is It becomes “empty” and the predicate term structure is completed.

以上述べた処理が全ての文について繰り返され、最後に、制御部７より全ての文の述語項構造がリストとして出力される（ｓ７）。 The processing described above is repeated for all sentences, and finally, the predicate term structures of all sentences are output as a list from the control unit 7 (s7).

Christopher D. Manning and Hinrich Schutze, "Foundations of Statistical Natural Language Processing", The MIT Press, 1999, pp.217-223Christopher D. Manning and Hinrich Schutze, "Foundations of Statistical Natural Language Processing", The MIT Press, 1999, pp.217-223

このように、従来技術では、一文中に述語とその項が存在するものに関しては、述語と名詞句の係り受け関係などを素性に用いることで述語項構造解析が可能である。しかし、日本語文にはゼロ代名詞が存在するため、一文だけの解析では述語に対応する項が特定できない場合がある。例えば、文書１の第２文では、名詞句として「お昼」しか現れていないため、述語「食べる」に対応するガ格の項として「彼女」、ヲ格の項として「カレー」を特定することができない。 As described above, in the related art, for a sentence having a predicate and its term in one sentence, the predicate term structure analysis can be performed by using a dependency relation between the predicate and the noun phrase. However, since Japanese pronouns have zero pronouns, there is a case where a term corresponding to a predicate cannot be identified by analyzing only one sentence. For example, in the second sentence of document 1, only “noon” appears as a noun phrase, so “she” is specified as the term of the ga rating corresponding to the predicate “eat” and “curry” is specified as the term of wo I can't.

一つの解決方法として、候補名詞句抽出ステップで、処理中の文の名詞句だけでなく、処理済の文（処理対象文より前の文）の名詞句も候補名詞句に追加する方法がある。しかし、文が異なると述語と候補との間に直接の文法的係り受け関係がないため、述語と異なる文に出現した名詞句は、述語との係り受け関係等、文法的特徴を利用した素性のみで項として同定することは非常に困難である。例えば、図５に示した素性を用いる場合、「依存関係」素性は、述語と異なる文に出現した名詞句に関してはどれも空になるため、これのみを利用して項を同定することはできない。 One solution is to add the noun phrase of the processed sentence (the sentence before the process target sentence) to the candidate noun phrase as well as the noun phrase of the sentence being processed in the candidate noun phrase extraction step. . However, because there is no direct grammatical dependency between predicates and candidates for different sentences, noun phrases that appear in a sentence different from the predicate are features that use grammatical features such as dependency relationships with predicates. It is very difficult to identify as a term only. For example, when the feature shown in FIG. 5 is used, since the “dependency” feature is empty for any noun phrase that appears in a sentence different from the predicate, a term cannot be identified using only this. .

本発明では、以下の方法により、上記問題点を解決する。 In the present invention, the above problem is solved by the following method.

・素性として、述語ｖが与えられたときの候補ｎの格ｃにおける生成確率を用いる。例えば、「食べる」のヲ格を考えた場合、「カレーを食べる」は日本語として妥当な句であるが、「昨日を食べる」は日本語ではほとんど言われない。このように、日本語では述語と格が決まると、両者の文法的関係を見なくても、項として取り得る名詞句がある程度、推定できるという特徴を利用する。 As the feature, the generation probability in the case c of the candidate n when the predicate v is given is used. For example, when considering the “eating” case, “eating curry” is a reasonable phrase in Japanese, but “eating yesterday” is rarely said in Japanese. In this way, in Japanese, when the predicate and case are determined, the characteristic that a noun phrase that can be taken as a term can be estimated to some extent without looking at the grammatical relationship between the two.

・同様に、処理対象文より前の文で述語の項として使われたか否かという素性を用いる。これは、一度、項として使われた名詞句は、ゼロ代名詞として繰り返し使用される傾向があるという特性を利用する。・ Similarly, the feature of whether or not it is used as a predicate term in a statement before the processing target statement is used. This takes advantage of the characteristic that noun phrases once used as terms tend to be used repeatedly as zero pronouns.

・候補名詞句抽出ステップで、処理対象文中の名詞句だけでなく、処理対象文より前の文の名詞句も候補とする。しかし、全ての名詞句を対象とした場合、文書の長さが長くなると、候補数が増大し、項同定の精度が低下するという問題が生じる。この問題を回避するため、全ての文の全ての名詞句を候補とするのではなく、処理対象文のＮ（Ｎは１以上の整数）個前までの文（以下、直前Ｎ文）に含まれる述語について項として使われた名詞句のみを候補とする。このことにより、候補数を少なく制限しながら、精度の高い項同定が可能となる。 In the candidate noun phrase extraction step, not only noun phrases in the processing target sentence but also noun phrases of sentences before the processing target sentence are candidates. However, when all the noun phrases are targeted, if the length of the document becomes long, the number of candidates increases, and the problem that the accuracy of term identification decreases. To avoid this problem, not all noun phrases in all sentences are included in the sentence up to N (N is an integer of 1 or more) previous sentences (hereinafter referred to as the immediately preceding N sentence). Only noun phrases used as terms for the predicates to be used are candidates. This makes it possible to identify terms with high accuracy while limiting the number of candidates to be small.

本発明によれば、処理中の述語が存在する文とは異なる文に項が存在しても、特定することができる。また、候補名詞句を、直前Ｎ文の述語について項として使われたものに限ることで、少ない候補から正しい項を特定することが可能となる。 According to the present invention, even if a term exists in a sentence different from the sentence in which the predicate being processed exists, it can be specified. Further, by limiting candidate noun phrases to those used as terms for the predicate of the immediately preceding N sentence, it is possible to specify the correct term from a small number of candidates.

入力文書の一例をその述語項構造とともに示す説明図An explanatory diagram showing an example of the input document along with its predicate term structure 従来の述語項構造解析装置の一例を示す構成図Configuration diagram showing an example of a conventional predicate term structure analysis apparatus 従来の述語項構造解析装置における処理を示す流れ図Flow chart showing processing in a conventional predicate term structure analyzer 形態素解析及び係り受け解析結果の一例を示す説明図Explanatory drawing showing an example of morphological analysis and dependency analysis results 選択対象の素性の例を示す説明図Explanatory diagram showing examples of features to be selected 素性集合の一例を示す説明図Explanatory drawing showing an example of feature set 項同定モデルの一例を示す説明図Explanatory drawing showing an example of a term identification model 候補ｎの各格の部分スコアの一例を示す説明図Explanatory drawing which shows an example of the partial score of each case of candidate n 候補毎の各格の項スコアの一例を示す説明図Explanatory drawing which shows an example of the item score of each case for every candidate 本発明の述語項構造解析装置の実施の形態の一例を示す構成図The block diagram which shows an example of embodiment of the predicate term structure analysis apparatus of this invention 言語モデル構築処理の一例を示す流れ図Flow chart showing an example of language model construction processing 形態素解析及び係り受け解析結果の他の例を示す説明図Explanatory drawing showing other examples of morphological analysis and dependency analysis results 言語モデルの一例を示す説明図Explanatory drawing showing an example of language model 本発明の述語項構造解析装置における処理を示す流れ図The flowchart which shows the process in the predicate term structure analysis apparatus of this invention 選択対象に追加する素性の例を示す説明図Explanatory drawing showing examples of features to be added to the selection target 本発明による素性集合の一例を示す説明図Explanatory drawing which shows an example of the feature set by this invention 本発明における項同定モデルの一例を示す説明図Explanatory drawing which shows an example of the term identification model in this invention 本発明における候補ｎの各格の部分スコアの一例を示す説明図Explanatory drawing which shows an example of the partial score of each case of the candidate n in this invention 本発明における候補毎の各格の項スコアの一例を示す説明図Explanatory drawing which shows an example of the item score of each case for every candidate in this invention 本発明による素性集合の他の例を示す説明図Explanatory drawing which shows the other example of the feature set by this invention 本発明における候補毎の各格の項スコアの他の例を示す説明図Explanatory drawing which shows the other example of the item score of each case for every candidate in this invention

以下、本発明を図示の実施の形態により詳細に説明する。 Hereinafter, the present invention will be described in detail with reference to the illustrated embodiments.

図１０は本発明の述語項構造解析装置の実施の形態の一例を示すもので、図中、従来例と同一構成要素は同一符号をもって表す。即ち、１は形態素解析・係り受け解析部、２は述語同定部、１１は項スタック、１２は候補名詞句抽出部、１３は言語モデル（の記憶部）、１４は素性選択部、１５は項同定モデル（の記憶部）、１６は項同定部、１７は制御部である。 FIG. 10 shows an example of an embodiment of the predicate term structure analyzing apparatus of the present invention. In the figure, the same components as those of the conventional example are represented by the same reference numerals. That is, 1 is a morphological analysis / dependence analysis unit, 2 is a predicate identification unit, 11 is a term stack, 12 is a candidate noun phrase extraction unit, 13 is a language model (storage unit), 14 is a feature selection unit, and 15 is a term. Identification model (storage unit), 16 is a term identification unit, and 17 is a control unit.

項スタック１１は、処理対象文より前の文で処理された述語の項を、文番号とともに保存する。 The term stack 11 stores the term of the predicate processed in the statement before the processing target statement together with the statement number.

候補名詞句抽出部１２は、処理対象文から全ての名詞句を抽出するとともに、項スタック１１から直前Ｎ文の述語の項として使われた名詞句を得て両者を合わせ、さらに特殊名詞句NULLを追加して候補名詞句とする。 The candidate noun phrase extraction unit 12 extracts all noun phrases from the sentence to be processed, obtains the noun phrase used as the term of the predicate of the immediately preceding N sentence from the term stack 11, combines them, and further adds the special noun phrase NULL Is added as a candidate noun phrase.

言語モデル（の記憶部）１３は、述語ｖと格ｃが与えられたときの名詞句ｎの生成確率Ｐ(n|c,v) を保持する。 The language model (storage unit) 13 holds the generation probability P (n | c, v) of the noun phrase n when the predicate v and the case c are given.

素性選択部１４は、候補名詞句中の各候補ｎについて、述語ｖ、候補ｎ、形態素・係り受け解析結果（解析済み文）から、述語ｖに関する素性、候補ｎに関する素性、述語ｖと候補ｎの関係に関する素性を選択するとともに、処理中の文より前の文（前方文）の述語の項として使われたかどうかの素性、並びに言語モデル１３から算出される各格ｃのときの候補ｎの述語ｖに関する言語モデルスコアを選択して素性集合を作成する。 For each candidate n in the candidate noun phrase, the feature selection unit 14 uses the predicate v, the candidate n, the morpheme / dependency analysis result (analyzed sentence), the feature related to the predicate v, the feature related to the candidate n, the predicate v and the candidate n. A feature relating to the relation of, a feature whether or not it was used as a predicate term of a sentence before the sentence being processed (forward sentence), and a candidate n for each case c calculated from the language model 13 A language model score for the predicate v is selected to create a feature set.

項同定モデル（の記憶部）１５は、素性が与えられたときの各格ｃの重みを保持する。 The term identification model (storage unit) 15 holds the weight of each case c when the feature is given.

項同定部１６は、前記選択された素性を用い、項同定モデル１５を参照して、候補ｎが各格であるときの項スコアを算出し、各格毎に項スコア最大の候補を取得して述語ｖの項とする。 The term identifying unit 16 calculates the term score when the candidate n is each case using the selected feature and refers to the term identifying model 15 and acquires the candidate with the largest term score for each case. To the term of the predicate v.

制御部１７は、前述した各部を制御して、入力された文書中の全ての文について述語項構造の解析を行い、文中の述語とこれに対応する項の一覧とからなる述語項構造リストを文毎に出力する。 The control unit 17 controls each unit described above to analyze the predicate term structure for all the sentences in the input document, and generates a predicate term structure list including predicates in the sentence and a list of terms corresponding thereto. Output for each sentence.

本実施の形態では、述語に対応する格として「ガ格」、「ヲ格」、「ニ格」を同定することとするが、他の格を含めても良い。 In the present embodiment, “ga”, “wo”, and “d” are identified as cases corresponding to the predicate, but other cases may be included.

まず、本発明で使用する言語モデル１３の構築について説明し、次に本発明による述語項構造解析について説明する。 First, the construction of the language model 13 used in the present invention will be described, and then the predicate term structure analysis according to the present invention will be described.

＜言語モデル構築＞
言語モデルを構築する際の処理の流れの一例を図１１に示す。名詞句の生成確率Ｐ(n|c,v) が算出できれば、他の手順で構築しても良い。 <Language model construction>
An example of the flow of processing when constructing a language model is shown in FIG. If the generation probability P (n | c, v) of the noun phrase can be calculated, it may be constructed by another procedure.

まず、複数の文（一般的には数万、数十万といった大量の文）を集める。これを平文コーパスと呼ぶ。平文コーパス中の各文について、従来技術の〈形態素解析・係り受け解析〉と同様な方法で形態素解析及び係り受け解析を行い、単語列、文節列、文節係り受け関係を得て（ｓ１１）、さらに、従来技術の〈述語同定〉と同様な方法で処理対象文の述語を全て同定（抽出）する（ｓ１２）。 First, collect multiple sentences (generally tens of thousands or hundreds of thousands of sentences). This is called a plaintext corpus. For each sentence in the plaintext corpus, morphological analysis and dependency analysis are performed in the same way as the conventional <morpheme analysis / dependency analysis> to obtain a word string, phrase string, and phrase dependency relation (s11), Further, all predicates of the processing target sentence are identified (extracted) in the same manner as the <predicate identification> in the prior art (s12).

次に、各述語について、述語に直接係る文節を、文節係り受け関係から取得し（ｓ１３）、その文節から名詞句と格助詞を取得する（ｓ１４）。名詞句の抽出は、従来技術の〈候補名詞句抽出〉で述べた方法と同様に行い、格助詞の特定は、品詞および単語表記を元に行う。 Next, for each predicate, a clause directly related to the predicate is obtained from the clause dependency relationship (s13), and a noun phrase and a case particle are obtained from the clause (s14). The noun phrase is extracted in the same manner as described in <Candidate Noun Phrase Extraction> in the prior art, and the case particles are specified based on the part of speech and the word notation.

この際、文節に格助詞が含まれていなければ（ｓ１５：ＮＯ）、何もしない。一方、文節に格助詞が含まれている場合（ｓ１５：ＹＥＳ）は、名詞句、格助詞、述語を三つ組みにして、平文コーパス上での三つ組みの頻度（出現回数）をカウントする（ｓ１６）。これを平文コーパス中の全ての文について繰り返す。 At this time, if a case particle is not included in the phrase (s15: NO), nothing is done. On the other hand, when a case particle is included in the phrase (s15: YES), the noun phrase, case particle, and predicate are made into a triple, and the frequency (number of appearances) of the triple on the plaintext corpus is counted (s16). ). This is repeated for all sentences in the plaintext corpus.

例えば、平文コーパス中に「お昼にカレーを作った人がいた。」という文があったとする。これを形態素解析及び係り受け解析すると、図１２に示すような単語列、文節列、文節係り受け関係が得られる。この文から述語を同定すると、「作る」と「いる」が同定される。まず、述語「作る」に直接係る文節を取り出すと、「お昼に」及び「カレーを」が得られる。「お昼に」については、名詞句は「お昼」、格助詞は「に」となる。従って、名詞句、格助詞、述語の三つ組みは［お昼，に，作る］となる。これを文節「カレーを」についても同様に行うと、三つ組み［カレー，を，作る］が得られる。このような三つ組みを、平文コーパス全体に対して作成し、出現回数をカウントする。 For example, suppose that there is a sentence in the plaintext corpus that “There was a person who made curry at noon”. When this is analyzed, a word string, phrase string, and phrase dependency relation as shown in FIG. 12 are obtained. When a predicate is identified from this sentence, “make” and “is” are identified. First, when the clause directly related to the predicate “Make” is taken out, “at noon” and “curry” are obtained. For “noon”, the noun phrase is “noon” and the case particle is “ni”. Therefore, the noun phrase, case particle, and predicate triplet is [create at noon]. If this is done in the same way for the phrase “curry”, the triplet [make curry] is obtained. Such a triple is created for the entire plaintext corpus and the number of appearances is counted.

次に、得られた三つ組みのカウントを元に、バックオフスムージング法（非特許文献１参照）を用いて、述語ｖと格助詞ｃが決まったときの名詞句ｎの生成確率（トライグラム確率）を保持する言語モデルを推定する（ｓ１７）。バックオフスムージング法を用いると、トライグラム確率Ｐ(n|c,v) の他に、同時にバイグラム確率Ｐ(n|c) も得られる。 Next, the generation probability (trigram probability) of the noun phrase n when the predicate v and the case particle c are determined using the back-off smoothing method (see Non-Patent Document 1) based on the obtained triplet count. ) Is estimated (s17). When the back-off smoothing method is used, the bigram probability P (n | c) can be obtained simultaneously with the trigram probability P (n | c, v).

最後に、バイグラム確率に、格ｃ毎の特殊名詞句NULLの生成確率Ｐ(NULL|c) を追加して（ｓ１８）、言語モデルを出力する（ｓ１９）。NULLの生成確率は、様々な述語がNULLを生成する確率を格ｃ毎に与えたもので、述語項構造の正解データがある場合、述語に対応する格に該当する項が存在しない確率である。作成された言語モデルの一例を図１３に示す。 Finally, the generation probability P (NULL | c) of the special noun phrase NULL for each case c is added to the bigram probability (s18), and the language model is output (s19). The generation probability of NULL is the probability that various predicates generate NULL for each case c, and when there is correct data with a predicate term structure, there is a probability that no term corresponding to the case corresponding to the predicate exists. . An example of the created language model is shown in FIG.

＜本発明の述語項構造解析＞
図１４は図１０の述語項構造解析装置における処理の流れを示すもので、以下、各部の構成及び動作の詳細を具体的な例に沿って説明する。 <Predicate term structure analysis of the present invention>
FIG. 14 shows the flow of processing in the predicate term structure analysis apparatus of FIG. 10, and the details of the configuration and operation of each part will be described below with specific examples.

本発明の目的は、ゼロ代名詞を含む日本語文における述語項構造解析なので、入力は複数の文からなる文書である。本例では、従来技術で説明した文書１「昨日彼女はカレーを作った。そしてお昼にも食べた。」を用いて説明する。 Since the object of the present invention is predicate term structure analysis in a Japanese sentence containing zero pronouns, the input is a document composed of a plurality of sentences. In this example, the description will be made using Document 1 “She made curry yesterday.

〈初期化〉
まず、項スタック１１を空にする（ｓ２１）。そして、制御部７に入力された文書に対し、先頭から一文毎に以下の処理を行う。 <Initialize>
First, the term stack 11 is emptied (s21). Then, the following processing is performed for each sentence from the top of the document input to the control unit 7.

〈形態素解析・係り受け解析〉
従来技術の場合と同様に、形態素解析・係り受け解析部１により、一文毎に形態素解析を行って単語に分割し、各単語の品詞を特定して単語列を得る。次に、同じく形態素解析・係り受け解析部１により、前記形態素解析結果を元に各文を文節に分割して文節列を得て、さらに文節同士の係り受け構造を特定して文節係り受け関係を得る（ｓ２２）。文書１の形態素解析及び係り受け解析結果は、図４に示した通りである。 <Morphological analysis and dependency analysis>
As in the case of the prior art, the morphological analysis and dependency analysis unit 1 performs morphological analysis for each sentence and divides it into words, specifies the part of speech of each word, and obtains a word string. Next, similarly, the morpheme analysis / dependence analysis unit 1 divides each sentence into phrases based on the morpheme analysis result to obtain a phrase string, and further specifies the dependency structure between phrases to determine the phrase dependency relation. Is obtained (s22). The morphological analysis and dependency analysis results of the document 1 are as shown in FIG.

〈述語同定〉
次に、従来技術の場合と同様に、述語同定部２により、処理対象文の述語を全て同定（抽出）する（ｓ２３）。 <Predicate identification>
Next, as in the case of the prior art, the predicate identifying unit 2 identifies (extracts) all predicates of the processing target sentence (s23).

文書１の例では、第１文の述語は「作る」、第２文の述語は「食べる」となる。以下、まず、第１文の述語「作る」を対象にした処理を説明し、第２文の述語「食べる」を対象にした処理については後述する。 In the example of document 1, the predicate of the first sentence is “create”, and the predicate of the second sentence is “eat”. Hereinafter, first, a process for the predicate “make” of the first sentence will be described, and a process for the predicate “eating” of the second sentence will be described later.

〈候補名詞句抽出〉
次に、候補名詞句抽出部１２により、従来技術の場合と同様に、処理対象文から全ての名詞句を抽出する。但し、当該処理対象文における、その時点で処理中の述語ｖは除く。なお、名詞句かどうかは、従来技術の場合と同様に、品詞を元に判断される。また、候補名詞句抽出部１２により、項スタック１１から、直前Ｎ文（本例ではＮ＝１とする。）の述語の項として使われた名詞句を取り出して両者を合わせる。また、このようにして抽出した名詞句に特殊名詞句NULLを追加し、これらを候補名詞句とする（ｓ２４）。 <Candidate noun phrase extraction>
Next, all the noun phrases are extracted from the sentence to be processed by the candidate noun phrase extracting unit 12 as in the case of the prior art. However, the predicate v being processed at that time in the processing target sentence is excluded. Whether the phrase is a noun phrase is determined based on the part of speech as in the case of the prior art. Also, the noun phrase used as a predicate term of the immediately preceding N sentence (N = 1 in this example) is extracted from the term stack 11 by the candidate noun phrase extraction unit 12 and matched. Further, the special noun phrase NULL is added to the noun phrase extracted in this way, and these are used as candidate noun phrases (s24).

この特殊名詞句NULLは、後述する項同定において、その述語に対応する格がない、又は格に該当する項となるべき名詞句が文書中に存在しない（これを外界照応と呼ぶ。）ことを示すための名詞句である。 This special noun phrase NULL means that, in the term identification described later, there is no case corresponding to the predicate or there is no noun phrase in the document that should be a term corresponding to the case (this is called external reference). This is a noun phrase to show.

第１文の述語「作る」に対する候補名詞句は、項スタック１１が空であり、処理対象文から得られるもののみであるから、「昨日」、「彼女」、「カレー」及び「NULL」の４つとなる。 Since the noun phrase for the predicate “Make” of the first sentence is only the one obtained from the sentence to be processed because the term stack 11 is empty, “Yesterday”, “She”, “Curry”, and “NULL” There will be four.

〈素性選択〉
次に、素性選択部１４により、候補名詞句中の各候補ｎについて、素性を選択して素性集合を作成する（ｓ２５）。この際、従来技術の場合と同様に、述語ｖ、候補ｎ、形態素・係り受け解析結果（解析済み文）から、述語ｖに関する素性、候補ｎに関する素性、述語ｖと候補ｎの関係に関する素性を選択するとともに、図１５に示すような、処理中の文より前の文（前方文）の述語の項として使われたかどうかの素性と、言語モデルスコアに関する素性とを追加して選択する。 <Feature selection>
Next, the feature selection unit 14 selects a feature for each candidate n in the candidate noun phrase to create a feature set (s25). At this time, as in the case of the prior art, predicate v, candidate n, morpheme / dependency analysis result (analyzed sentence), predicate v feature, candidate n feature, predicate v and candidate n feature. In addition to selection, a feature as to whether or not it has been used as a predicate term of a sentence before the sentence being processed (forward sentence) and a feature related to the language model score are selected as shown in FIG.

ここで、言語モデルスコアの素性に関しては、言語モデル１３を参照して、述語ｖが与えられたときの、候補ｎが各格、即ちガ格、ヲ格、ニ格であるときの言語モデルスコアを算出して用いる。言語モデル１３の例は図１３に示した通りである。 Here, regarding the feature of the language model score, referring to the language model 13, when the predicate v is given, the language model score when the candidate n is each case, that is, a case, a case, or a case. Is calculated and used. An example of the language model 13 is as shown in FIG.

ここで、トライグラムＰ(n|c,v) が言語モデル１３上に存在していれば、その対数確率logＰ(n|c,v) を言語モデルスコアとする。また、存在していなければ、バイグラムの対数確率logＰ(n|c) とバックオフの対数確率log bo(c,v)を言語モデル１３から取得し、両者の和を言語モデルスコアとする。例えば、述語ｖ「作る」、候補ｎ「昨日」のガ格の言語モデルスコアを算出する場合、図１３の言語モデルでは、トライグラムは存在しない。従って、log bo(ガ,作る)＋logＰ(昨日|ガ)＝−0.33＋−5.00 を計算し、-5.33を言語モデルスコアとする。 Here, if the trigram P (n | c, v) exists on the language model 13, its logarithmic probability logP (n | c, v) is used as the language model score. If not, the bigram log probability logP (n | c) and the backoff log probability log bo (c, v) are acquired from the language model 13, and the sum of the two is used as the language model score. For example, when calculating the language model score of the predicate v “Make” and candidate n “Yesterday”, there is no trigram in the language model of FIG. Therefore, log bo (ga, make) + logP (yesterday | ga) = − 0.33 + −5.00 is calculated, and −5.33 is set as the language model score.

このように、第１文の述語「作る」に関して、候補名詞句の各候補の素性を選択すると、図１６に示す通りとなる。 As described above, when the feature of each candidate noun phrase regarding the predicate “create” of the first sentence is selected, the result is as shown in FIG.

〈項同定〉
次に、項同定部１６により、項同定モデル１５を参照し、前記選択された素性を用いて、候補ｎが各格、即ちガ格、ヲ格、ニ格であるときの項スコアを算出する（ｓ２６）。項同定モデル１５は、述語項構造の正解データ（コーパス）から、最大事後確率推定法等を用いて学習されたものである。本発明における項同定モデルの一例を図１７に示す。 <Term identification>
Next, the term identification unit 16 refers to the term identification model 15 and uses the selected feature to calculate a term score when the candidate n is each case, that is, a case, a case, and a case. (S26). The term identification model 15 is learned from the correct data (corpus) of the predicate term structure using the maximum posterior probability estimation method or the like. An example of the term identification model in the present invention is shown in FIG.

従来技術における項同定モデルとの違いは、「使用（Used:0，Used:1）」に関する素性重みと、「言語モデルスコア（LM ガ，LM ヲ，LM ニ）」に関する素性重みが追加されたことである。これらは学習過程で最適化されているため、その他の素性重みに関しても値は異なっている。 The difference from the term identification model in the prior art is that the feature weight related to “Used (Used: 0, Used: 1)” and the feature weight related to “Language Model Score (LM Ga, LM wo, LM Ni)” have been added. That is. Since these are optimized in the learning process, the values of other feature weights are also different.

これらの素性重みを元に、前述した式（１）を用いて項スコアscore を算出する。 Based on these feature weights, the term score score is calculated using the above-described equation (1).

を算出すると、図１８に示す通りとなる。これを、各格毎に総和を算出すると、ガ格の項スコアは6.031、ヲ格の項スコアは0.391、ニ格の項スコアは0.345となる。 Is calculated as shown in FIG. If the sum is calculated for each case, the item score for the case is 6.031, the item score for the case is 0.391, and the item score for the case is 0.345.

項同定では、全ての候補ｎについて、項スコアの算出を繰り返す。そして、各格毎に項スコア最大の候補ｎ_max を取得し、述語ｖの項とする（ｓ２７）。但し、候補NULLが取得された場合、その格は「空」とする。 In term identification, the calculation of term scores is repeated for all candidates n. Then, the candidate n _max with the maximum term score is acquired for each case and used as the term of the predicate v (s27). However, if a candidate NULL is acquired, the case is “empty”.

第１文の述語「作る」に関する全候補の項スコアは、図１９に示す通りであるため、結果、ガ格の項は「彼女」、ヲ格の項は「カレー」、ニ格の項は「空」となり、述語項構造が完成する。 Since the term scores of all candidates related to the predicate “Make” in the first sentence are as shown in FIG. 19, as a result, the term “ga” is “she”, the term “wo” is “curry”, and the term “d” is It becomes “empty” and the predicate term structure is completed.

なお、項として認定された名詞句「彼女」及び「カレー」は、項スタック１１に文番号１とともに記録される（ｓ２８）。 The noun phrases “she” and “curry” recognized as terms are recorded in the term stack 11 together with the sentence number 1 (s28).

《第２文以降の処理》
第２文以降は、まず前述した〈形態素・係り受け解析〉及び〈述語同定〉を行う。 << Process after the second sentence >>
After the second sentence, first, the above-described <morpheme / dependency analysis> and <predicate identification> are performed.

文書１の形態素解析及び係り受け解析結果は、図４に示した通りである。また、文書１の第２文には、述語として「食べる」がある。以下、前記〈候補名詞句抽出〉、〈素性選択〉、〈項同定〉の処理を、第２文の述語「食べる」に焦点を当てて説明する。 The morphological analysis and dependency analysis results of the document 1 are as shown in FIG. The second sentence of document 1 has “eat” as a predicate. Hereinafter, the processing of <candidate noun phrase extraction>, <feature selection>, and <term identification> will be described with a focus on the predicate “eat” of the second sentence.

〈候補名詞句抽出〉では、第２文中の名詞句として、「お昼」が抽出される。さらに、項スタック１１からは、直前Ｎ文（Ｎ＝１）の述語で項として使用された名詞句が抽出される。第１文の述語で項として使われたのは、「彼女」及び「カレー」の２つである。また、NULLも追加されるため、候補名詞句は「お昼」、「彼女」、「カレー」及び「NULL」の４つとなる。 In <candidate noun phrase extraction>, “noon” is extracted as a noun phrase in the second sentence. Further, from the term stack 11, noun phrases used as terms in the predicate of the immediately preceding N sentence (N = 1) are extracted. The two sentences used in the predicate of the first sentence are “she” and “curry”. Since NULL is also added, there are four candidate noun phrases, “noon”, “her”, “curry”, and “NULL”.

〈素性選択〉では、上記４つの候補ｎに関し、言語モデル１３を用いて、前記同様の方法で素性を選択する。すると、図２０に示すような素性集合が得られる。 In <Feature Selection>, features are selected for the above four candidates n using the language model 13 by the same method as described above. Then, a feature set as shown in FIG. 20 is obtained.

続いて、〈項同定〉では、項同定モデル１５を参照し、前記選択された素性を用いて、候補ｎの格毎の項スコアを算出する。上記素性の場合、各候補ｎの項スコアは図２１に示す通りとなる。 Subsequently, in <term identification>, the term identification model 15 is referred to, and the term score for each case of the candidate n is calculated using the selected feature. In the case of the above feature, the term score of each candidate n is as shown in FIG.

その結果、第２文の述語「食べる」については、ガ格の項として「彼女」、ヲ格の項として「カレー」、ニ格の項として「お昼」が特定され、述語項構造が完成する。また、項として認定された名詞句「彼女」、「カレー」及び「お昼」は、項スタック１１に文番号２とともに記録される。 As a result, for the predicate “eat” in the second sentence, “she” is specified as the term of the ga case, “curry” as the term of the wo case, “noon” as the term of the second case, and the predicate term structure is completed. . In addition, the noun phrases “she”, “curry”, and “noon” that are recognized as terms are recorded in the term stack 11 together with the sentence number 2.

入力された文書には、これ以上文が存在しないため、最後に、制御部１７より全ての文の述語項構造がリストとして出力される（ｓ２９）。 Since there are no more sentences in the input document, the predicate term structures of all the sentences are finally output as a list from the control unit 17 (s29).

＜他の実施の形態＞
なお、前述した実施の形態では、項スタックから直前Ｎ文の述語の項として使われた名詞句を加えるようにしたが、このような制限をせず、項スタックに保存されている全ての名詞句を対象として用いるようにしても良い。また、追加する素性として、言語モデルスコアに関する素性のみを用い、処理対象文より前の文の述語の項として使われたかどうかの素性については省略するようにしても良い。 <Other embodiments>
In the above-described embodiment, the noun phrase used as the term of the predicate of the immediately preceding N sentence is added from the term stack, but all the nouns stored in the term stack are not limited to this. A phrase may be used as a target. Further, only the feature related to the language model score may be used as the feature to be added, and the feature as to whether or not it has been used as a predicate term of a statement before the processing target statement may be omitted.

また、本発明は、周知のコンピュータに媒体もしくは通信回線を介して、図１０の構成図に示された機能を実現するプログラムあるいは図１４のフローチャートに示された手順を備えるプログラムをインストールすることによっても実現可能である。 Further, the present invention installs a program for realizing the functions shown in the configuration diagram of FIG. 10 or a program having the procedure shown in the flowchart of FIG. 14 via a medium or a communication line in a known computer. Is also feasible.

１：形態素解析・係り受け解析部、２：述語同定部、１１：項スタック、１２：候補名詞句抽出部、１３：言語モデル（の記憶部）、１４：素性選択部、１５：項同定モデル（の記憶部）、１６：項同定部、１７：制御部。 1: morphological analysis / dependence analysis unit, 2: predicate identification unit, 11: term stack, 12: candidate noun phrase extraction unit, 13: language model (storage unit), 14: feature selection unit, 15: term identification model (Storage unit), 16: term identification unit, 17: control unit.

Claims

A method for identifying a term that is a noun phrase corresponding to a case corresponding to a predicate extracted from each sentence in an input document through morphological analysis and dependency analysis,
The candidate noun phrase extraction unit extracts all noun phrases from the processing target sentence, obtains the noun phrase from the term stack for storing the predicate terms processed in the sentence before the processing target sentence, and combines both, Adding a special noun phrase NULL to make it a candidate noun phrase;
The feature selection unit selects, for each candidate n in the candidate noun phrase, a feature related to the predicate, a feature related to the candidate n, a feature related to the relationship between the predicate v and the candidate n, and a noun when the predicate v and the case c are given. If a language model holding the generation probability of phrase n is used, and a trigram probability exists in the language model, a value obtained from the trigram probability is used as a feature related to the language model score, and the trigram probability is the language model. A value obtained from the bigram probability and the backoff probability if it does not exist as a feature related to the language model score, and creating a feature set;
The term identifying unit calculates the term score when the candidate n is each case by using the selected feature and referring to the term identification model that holds the weight of each case c when the feature is given, Obtaining a candidate with the highest term score for each case and making it a term of the predicate v.
Predicate term structure analysis method characterized by the above.

For each candidate n in the candidate noun phrase, the feature selection unit selects a feature related to the predicate, a feature related to the candidate n, and a feature related to the relationship between the predicate v and the candidate n, and as a predicate term of a sentence before the processing target sentence A language model that retains the feature of whether it was used and the generation probability of a noun phrase n when the predicate v and case c are given, and if the trigram probability exists in the language model, the trigram A step of creating a feature set using a value obtained from the probability as a feature related to the language model score, and if a trigram probability does not exist in the language model, a value obtained from the bigram probability and the backoff probability as a feature related to the language model score The predicate term structure analysis method according to claim 1, comprising:

The candidate noun phrase extraction unit extracts all noun phrases from the processing target sentence and is used as a predicate term of the immediately preceding N sentence from the term stack that stores predicate terms processed in a sentence before the processing target sentence. 3. The predicate term structure analysis method according to claim 1, further comprising a step of obtaining a noun phrase, combining both, and adding a special noun phrase NULL to make a noun phrase as a candidate noun phrase.

A device for identifying a term that is a noun phrase corresponding to a case corresponding to a predicate extracted from each sentence in an input document through morphological analysis and dependency analysis,
A term stack that stores the predicate terms processed in the statement before the processing target statement;
Extracting all noun phrases from the sentence to be processed, obtaining a noun phrase from the term stack, combining both, and adding a special noun phrase NULL to make a candidate noun phrase extracting part,
A language model that retains the generation probability of a noun phrase n given a predicate v and a case c;
For each candidate n in the noun phrase, a feature related to the predicate, a feature related to the candidate n, and a feature related to the relationship between the predicate v and the candidate n are selected, and a trigram probability exists in the language model. If the trigram probability does not exist in the language model, the value obtained from the bigram probability and the backoff probability is assumed as the feature related to the language model score. A feature selector for creating a set;
A term identification model that retains the weight of each case c when given a feature;
Using the selected feature, referring to the term identification model, the term score when the candidate n is each case is calculated, and the term with the largest term score is obtained for each case to obtain the term of the predicate v With an identification unit,
Predicate term structure analysis device characterized by the above.

For each candidate n in the candidate noun phrase, a feature related to the predicate, a feature related to the candidate n, and a feature related to the relationship between the predicate v and the candidate n are selected, and whether or not the predicate is used as a term of a predicate before the processing target sentence If a trigram probability exists in the language model using the feature and the language model, the value obtained from the trigram probability is set as a feature related to the language model score, and the trigram probability must exist in the language model. 5. The predicate term structure analysis apparatus according to claim 4, further comprising: a feature selection unit that creates a feature set by using a value obtained from a bigram probability and a backoff probability as a feature related to a language model score .

Extract all noun phrases from the sentence to be processed, obtain the noun phrase used as the predicate term of the previous N sentence from the term stack, combine both, and add the special noun phrase NULL to make it a candidate noun phrase The predicate term structure analysis apparatus according to claim 4 or 5, further comprising a candidate noun phrase extraction unit.

The program for functioning a computer as each means of the apparatus in any one of Claims 4 thru | or 6.