JP3972699B2

JP3972699B2 - Natural language processing system, natural language processing method, and computer program

Info

Publication number: JP3972699B2
Application number: JP2002079631A
Authority: JP
Inventors: 博増市; 智子大熊
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2002-03-20
Filing date: 2002-03-20
Publication date: 2007-09-05
Anticipated expiration: 2022-03-20
Also published as: JP2003281136A

Description

【０００１】
【発明の属する技術分野】
本発明は、人間が日常的なコミュニケーションに使用する自然言語を数学的に取り扱うための自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに係り、特に、日本語構文の統語・意味解析を行なう自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに関する。
【０００２】
さらに詳しくは、本発明は、文法ルールに従って日本語構文の統語・意味解析を行なう自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに係り、特に、副助詞を含む日本語文に対して正しい構文解析結果を出力する自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムに関する。
【０００３】
【従来の技術】
日本語や英語など、人間が日常的なコミュニケーションに使用する言葉のことを「自然言語」と呼ぶ。多くの自然言語は、自然発生的な起源を持ち、人類、民族、社会の歴史とともに進化してきた。勿論、人は身振りや手振りなどによっても意思疎通を行なうことが可能であるが、自然言語により最も自然で且つ高度なコミュニケーションを実現することができる。
【０００４】
他方、情報技術の発展に伴い、コンピュータが人間社会に定着し、各種産業や日常生活の中に深く浸透している。いまやコンピュータ・データだけでなく、画像や音響などほとんどすべての情報コンテンツがコンピュータ上で取り扱われ、情報の編集・加工、蓄積、管理、伝達、共有など高度な処理を行なうことが可能となっている。
【０００５】
自然言語は、本来抽象的であいまい性が高い性質を持つが、文章を数学的に取り扱うことにより、コンピュータ処理を行なうことができる。この結果、機械翻訳や対話システム、検索システムなど、自動化処理により自然言語に関するさまざまなアプリケーション／サービスが実現される。
【０００６】
自然言語処理は一般に、形態素解析、構文解析、意味解析、文脈解析という各処理フェーズに区分される。
【０００７】
形態素解析では、文を意味的最小単位である形態素（morpheme）に分節して品詞の認定処理を行なう。構文解析では、文法規則などを基に句構造などの文の構造を解析する。文法規則が木構造であることから、構文解析結果は一般に個々の形態素が係り受け関係などを基にして接合された木構造となる。意味解析では、文中の語の語義（概念）や、語と語の間の意味関係などに基づいて、文が伝える意味を表現する意味構造を求めて、意味構造を合成する。文脈解析では、文の系列である文章（談話）を解析の基本単位とみなして、文間の意味的なまとまりを得て談話構造を構成する。
【０００８】
ところで、数多の自然言語の中でもとりわけ日本語はあいまい性が高いとされている。その一因は、一般に副助詞（「ほど」、「ばかり」、「だけ」…）と呼ばれる品詞の存在に依拠する。日本語において、副助詞を含む表現は極めて標準的に用いられるものであり、そのような表現に対して正しい構文解析結果を出力することは重要な課題である。
【０００９】
例えば、「彼が来るまで辛抱しよう。」や「彼が来るまでが問題だ。」といった文中に現れる「まで」が副助詞の例である。副助詞の用法は、これらの例文のように、文（i.e. 「彼が来る」）に後続する場合、「美しいほど」や「その本なんか」といった例に見られるように形容詞あるいは名詞に後続する場合など、多岐にわたる。このように副助詞の出現位置が広範であることが、副助詞を適正に取り扱うことを困難にする主たる原因であると思料される。
【００１０】
例えば、「毎日報告書を書くくらいが丁度良い。」という日本語文を機械翻訳システムによって英語に翻訳した場合の典型的な翻訳結果は以下の通りとなる。
【００１１】
About which writes a report every day is good exactly.
【００１２】
この文の主たる構造は「About is good」であり、明らかな誤訳となっている。このような誤訳が生成される理由は、副助詞を品詞カテゴリとして定義しておらず、したがって副助詞に関する句構造を取り扱うメカニズムがシステムに組み込まれていないことに由来する。上記の文例では、「くらい」が名詞として取り扱われており、誤訳を導く結果となっている。
【００１３】
【発明が解決しようとする課題】
本発明の目的は、文法ルールに従って日本語構文の統語・意味解析を行なうことができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することにある。
【００１４】
本発明のさらなる目的は、出現位置が広範である品詞を正しく取り扱うことができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することにある。
【００１５】
本発明のさらなる目的は、副助詞に関する句構造を正しく取り扱うことができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することにある。
【００１６】
本発明のさらなる目的は、副助詞を含む日本語文に対して正しい構文解析結果を出力することができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することにある。
【００１７】
【課題を解決するための手段及び作用】
本発明は、上記課題を参酌してなされたものであり、その第１の側面は、副助詞を含む入力文の構文・意味を解析する自然言語処理システム又は自然言語処理方法であって、
副助詞を含む入力文の句構造を解析する第１の手段又はステップと、
副助詞を含む入力文が所定の文法ルール記述に適合する句構造を持つか否かを判定する第２の手段又はステップと、
副助詞を含む入力文が所定の句構造を持つか否かに応じた構文解析結果を出力する第３の手段又はステップと、
を具備することを特徴とする自然言語処理システム又は自然言語処理方法である。
【００１８】
前記第１の手段又はステップは、例えば、文脈自由文法に従って記述されたさまざまな文法ルール記述を適用して構文解析を行なう。例えば、所定の句の連結を文として取り扱う文法ルール記述を適用してもよい。
【００１９】
前記第２の手段又はステップは、構文解析結果を基に、入力文が文、動詞句、形容詞句、又は名詞句のいずれかの句の後に副助詞が存在するという文法ルール記述に適合するか否かを判定する。ここでは、文の判定を動詞句、形容詞句、及び名詞句の判定よりも優先させることがより好ましい
【００２０】
前記第３の手段又はステップは、副助詞を含む入力文が所定の句構造を持つことに応答して、該入力文を名詞句又は副詞句であるとする構文解析結果を出力する。より具体的には、入力文が文の後に副助詞が存在するという句構造を持つことに応答して、該文の先頭にとりたてられていない主語が存在するか否かをさらに判定する。そして、該主語が存在する場合には該主語を含んだ文及び副助詞の連結を名詞句又は副詞句とする構文解析結果を出力する。他方、該主語が存在しない場合にはとりたてられた主語を除外した文及び副助詞の連結を名詞句又は副詞句とする構文解析結果を出力する。
【００２１】
ここで、主語のとりたてとは、主語に対応する名詞句において主格を示す格助詞の代わりに係助詞を用いることによって強調表現がなされていることを言う。
【００２２】
このような主語のとりたての判定は、各句が文中において果たす文法役割を同定する格構造解析又は意味解析処理を適用した後に行なうことができる。
【００２３】
また、前記第３の手段又はステップは、副助詞を含む入力文が文を含むことがあり得ないときには、動詞句、形容詞句、又は名詞句のいずれかの句の後に副助詞が存在するという文法ルール記述に適合することに応答して、該入力文を名詞句又は副詞句であるとする構文解析結果を出力する。
【００２４】
このように、本発明に係る自然言語処理システム又は自然言語処理方法によれば、構文解析と格構造解析を併用するとともに、副助詞のカテゴリ及び構文ルールを利用した解析を行なうことにより、出現位置が広範である副助詞を含んだ文の句構造を正しく取り扱うことができる。
【００２５】
本発明に係る自然言語処理システムによれば、日本語文に副助詞が含まれている場合であっても、日本語文の構文構造の特徴を考慮した正しい構文解析結果を得ることができる。
【００２６】
例えば、本発明に係る自然言語処理システムを機械翻訳システムに適用した場合、副助詞を含む日本語文を適切な英語文へと翻訳することが可能になる。
【００２７】
また、本発明の第２の側面は、副助詞を含む入力文の構文・意味を解析する自然言語処理をコンピュータ・システム上で実行するようにコンピュータ可読形式で記述されたコンピュータ・プログラムであって、
副助詞を含む入力文の句構造を解析する第１のステップと、
副助詞を含む入力文が所定の文法ルール記述に適合する句構造を持つか否かを判定する第２のステップと、
副助詞を含む入力文が所定の句構造を持つか否かに応じた構文解析結果を出力する第３のステップと、
を具備することを特徴とするコンピュータ・プログラムである。
【００２８】
本発明の第２の側面に係るコンピュータ・プログラムは、コンピュータ・システム上で所定の処理を実現するようにコンピュータ可読形式で記述されたコンピュータ・プログラムを定義したものである。換言すれば、本発明の第２の側面に係るコンピュータ・プログラムをコンピュータ・システムにインストールすることによって、コンピュータ・システム上では協働的作用が発揮され、本発明の第１の側面に係る自然言語処理システム及び自然言語処理方法と同様の作用効果を得ることができる。
【００２９】
本発明のさらに他の目的、特徴や利点は、後述する本発明の実施形態や添付する図面に基づくより詳細な説明によって明らかになるであろう。
【００３０】
【発明の実施の形態】
以下、図面を参照しながら本発明の実施形態について詳解する。
【００３１】
自然言語の構文解析手法は、統計処理に基づく方法と文法ルール記述に基づく方法に大別することができる。本発明は、とりわけ文法ルール記述に基づく統語・意味解析に適用することで顕著な効果を奏することができる。
【００３２】
本発明に係る自然言語処理システムは、例えば、ＬＦＧ（Lexical-Functional Grammar）文法理論に基づく統語・意味解析処理に組み込んで実装することができる。ＬＦＧでは、ネイティブ・スピーカの言語知識すなわち文法を、コンピュータ処理や、コンピュータの処理動作に影響を及ぼすその他の非文法的な処理パラメータとは切り離したコンポーネントとして構成している。まず、自然言語処理システムの全体像について簡単に説明する。なお、本実施形態ではＬＦＧ文法理論に基づいて説明するが,勿論、他の文法ルールを備えた解析システムにおいても本発明を同様に適用することができる。
【００３３】
図４には、ＬＦＧに基づく自然言語処理システム１の構成を模式的に示している。
【００３４】
形態素解析部２は、日本語など特定の言語に関する形態素ルール２Ａと形態素辞書２Ｂを持ち、入力文を意味的最小単位である形態素に分節して品詞の認定処理を行なう。例えば、「私の娘は英語を話します。」という文が入力された場合、形態素解析結果として、「私{Noun} の{up} 娘{Noun} は{up} 英語{Noun} を{up} 話す{Verb1}{tr} ます{jp} 。{pt}」が出力される。
【００３５】
このような形態素解析結果は、次いで、統語・意味解析部３に入力される。統語・意味解析部は、文法ルール３Ａや結合価辞書３Ｂなどの辞書を持ち、文法ルールなどに基づく句構造の解析や、文中の語の語義や語と語の間の意味関係などに基づいて文が伝える意味を表現する意味構造の解析を行なう（結合価辞書は動詞と主語などの文中の他の構成要素との関係を記述したものであり、述部とそれに係る語の意味関係を抽出することができる）。
【００３６】
そして、構文解析した結果として、単語や形態素などからなる文章の句構造を木構造として表した"ｃ−ｓｔｒｕｃｔｕｒｅ（constituent structure）"と、主語、目的語などの格構造に基づいて入力文を疑問文、過去形、丁寧文など意味的・機能的に解析した結果として"ｆ−ｓｔｒｕｃｔｕｒｅ（functional structure）"を出力する。
【００３７】
図５及び図６には、入力文「私の娘は英語を話します。」を統語・意味解析部１により処理した結果として得られるｃ−ｓｔｒｕｃｔｕｒｅ及びｆ−ｓｔｒｕｃｔｕｒｅをそれぞれ示している。
【００３８】
ｃ−ｓｔｒｕｃｔｕｒｅは、文中の単語や句の構造を木構造形式で表したものであり、構文カテゴリーによって定義される。例えば音素列を生成するための音韻学的な解釈を、ｃ−ｓｔｒｕｃｔｕｒｅを基に行なうことができる。一方、ｆ−ｓｔｒｕｃｔｕｒｅは、文法的な機能を明確に表現したものであり、文法的な機能名、意味的形式、並びに特徴シンボルにより構成される。ｆ−ｓｔｒｕｃｔｕｒｅを参照することにより、主語（subject）、目的語（object）、補語（complement）、修飾語（adjunct）といった意味理解を得ることができる。ｆ−ｓｔｒｕｃｔｕｒｅは、ｃ−ｓｔｒｕｃｔｕｒｅの各節点に付随する素性の集合であり、図９に示すように属性−属性値のマトリックスの形で表現される。すなわち、［］で囲まれた中の左側は素性（属性）の名前であり、右側は素性の値（属性値）である。
【００３９】
なお、ＬＦＧの詳細に関しては、例えばR. M. Kaplan及びJ. Bresnan共著の論文"Lexical-Functional Grammar: A Formal System for Grammatical Representation"（The MIT Press, Cambridge (1982). Reprinted in Formal Issues in Lexical-Functional Grammar, pp. 29-130. CSLI publications, Stanford University(1995).）に記述されている。
【００４０】
ところで、文法ルール記述に基づく構文解析システムにおいては、文単位の文法ルールをあらかじめ記述しておくことが必要である。例えば、「私が本を書いた。」という文を取り扱うために、「Ｓ→ＮＰ，ＮＰ，ＶＰ」といった文法ルールを記述する。これは、「名詞句ＮＰ（e.g. 「私が」，「本を」）が２つ連続した後に、動詞句ＶＰ（e.g. 「書いた」）が存在する場合、それは文Ｓであると認める。」という意味を表す文法ルールである。
【００４１】
このように「→」の左辺が単一の記号で表現されている文法は、「文脈自由文法」とも呼ばれる。文脈自由文法は、要するに、所定の句の連結を文として取り扱う文法ルールであり、構文解析システムにおいて最もよく使用される文法表現の１つである。
【００４２】
さらに、名詞句ＮＰなどが文中においてどのような文法役割を持っているかを同定する処理のことを、格構造解析、あるいは意味解析と呼ぶ。ここでは、文法役割とは、例えば「主語」、「目的語」、「補語」といったものである。
【００４３】
本実施形態に係る構文解析システムでは、副助詞に対応する品詞カテゴリＰを定義する。
【００４４】
また、この品詞カテゴリＰを含んだ以下の２式の文法ルールを設定する。
【００４５】
【数１】

【００４６】
上記の文法ルールは文脈自由文法に基づいて記述されている。ここで、Ｓ、ＮＰ、ＶＰ、ＡＰ、ＡＤＶＰはそれぞれ、文、名詞句、動詞句、形容詞句、副詞句に相当するカテゴリである。したがって、文法ルール（ａ）は、文、動詞句、形容詞句、又は名詞句のいずれかの句の後に品詞カテゴリＰが連結している場合には名詞句ＮＰであると認める、とする文法ルールである。また、文法ルール（ｂ）は、文、動詞句、形容詞句、または名詞句のいずれかの後に品詞カテゴリＰが連結している場合には副詞句であると認める、とする文法ルールである。
【００４７】
さらに、以下の条件を設定する。
【００４８】
【数２】

【００４９】
ここで、「主語のとりたて」とは、主語に対応する名詞句において、主格を示す格助詞（典型的には「が」、「を」、「に」、「から」…）の代わりに係助詞（「は」、「こそ」、「も」）を用いることによってなされる強調表現のことである。（格は、主として補足成分と述語成分の関係のあり方の類型である。格助詞は格の関係を表す助詞である。）
【００５０】
例えば、「彼が驚くほど」（「ほど」は副助詞である）という表現においては、Ｓ「彼が驚く」中の主語はとりたてておらず、「彼は驚くほど」においては、Ｓ「彼は驚く」中の主語は係助詞「は」によってとりたてられている。
【００５１】
したがって、前者の「彼が驚くほど」は、上記の各条件（ａ）（ｂ）（ｃ）によって、「Ｓ，Ｐ」をその順に構成要素とするＮＰあるいはＡＤＶＰとして解析され得る。これに対し、後者の「彼は驚くほど」は、「驚くほど」の部分が「ＶＰ，Ｐ」を構成要素とするＮＰあるいはＡＤＶＰとして解析されることになり、条件（ａ）及び（ｂ）の適用対象とはならない。
【００５２】
条件（ａ）及び（ｂ）の判定は、文の句構造を解析した構文解析結果を基に行なうことができる。これに対し、条件（ｃ）を判定するためには、構文解析処理だけでなく、各句が文中において果たす文法役割を同定する格構造解析又は意味解析処理も行なう必要がある。
【００５３】
さらに、上記の条件（ａ）（ｂ）（ｃ）に加えて、条件（ａ）（ｂ）の文法ルールにおいて、以下の条件を設定する。
【００５４】
【数３】

【００５５】
これは、例えば「Ｓ→ＮＰ，ＮＰ，ＶＰ」などのように、文脈自由文法において、所定の句の連結を文として取り扱うという文法ルールを適用するとともに、このような文法ルールに従って判定されたＳを、ＶＰ，ＡＰ，ＮＰのような他の品詞カテゴリの判定よりも優先させることによって実現される。
【００５６】
図１には、本実施形態に係る構文・意味解析システムの処理手順をフローチャートの形式で示している。
【００５７】
まず、入力文に対して、文法ルール（ａ）及び（ｂ）を含む構文解析処理を施す（ステップＳ１）。
【００５８】
次いで、入力文が文法ルール（ａ）又は（ｂ）の句構造を含む文であるかどうかを判断する（ステップＳ２）。ここで、これらの句構造を含まないと判定された場合には、そのまま構文解析結果を出力する。
【００５９】
一方、入力文が文法ルール（ａ）又は（ｂ）の句構造を含むと判定された場合には、さらに入力文を対象とする格構造解析を実施して（ステップＳ３）、各句が文中において果たす文法役割を同定する。
【００６０】
次いで、文法ルール（ａ）又は（ｂ）に相当する文中のＳに相当する品詞カテゴリを含む解析結果があり得るかどうかを判定する（ステップＳ４）。この判定処理は、上記の条件（ｄ）に相当する処理である。
【００６１】
ここで、文法ルール（ａ）又は（ｂ）に相当する文中のＳに相当する品詞カテゴリを含む解析結果があり得ない場合のみ、以下の（ａ'）又は（ｂ'）の文法ルールに従う構文解析結果を出力する。
【００６２】
【数４】

【００６３】
他方、文法ルール（ａ）又は（ｂ）に相当する文中のＳに相当する品詞カテゴリを含む解析結果があり得る場合には、さらに、このＳに相当する句構造の先頭にとりたてられていない主語が存在するかどうかを判断する（ステップＳ５）。主語のとりたてとは、主語に対応する名詞句において、主格を示す格助詞（典型的には「が」）の代わりに係助詞（「は」、「こそ」、「も」）を用いることによってなされる強調表現のことである（前述）。
【００６４】
ステップＳ５における判断結果がどちらの場合であっても、以下の文法ルール（ａ"）又は（ｂ"）に従う構文解析結果を出力する。但し、Ｓの先頭にとりたてられていない主語が存在する場合は、その主語をＳに含める解析結果を出力し、層でなければ、とりたてられている主語が存在していたとしてもその主語をＳに含めない解析結果を出力する。
【００６５】
【数５】

【００６６】
図２には、本実施形態に係る構文・意味解析システムによって副助詞「くらい」を含んだ入力文「毎日報告書を格くらいがちょうど良い。」を解析した構文解析結果を示している。
【００６７】
「毎日報告書を書く」という句は、「Ｓ→ＡＤＶ，ＮＰ，ＶＰ」という文法ルールをあらかじめ設定しておくことにより、ステップＳ１の構文解析結果により文Ｓとして取り扱われる。
【００６８】
さらに、格構造解析により各句の文中における文法的役割を同定した結果、入力文中にはＳが含まれるという解析結果は有り得るが、このＳの先頭にはとりたてられた主語がないことが判る。
【００６９】
この結果、条件（ａ"）で規定されている文法ルール「ＮＰ→Ｓ，Ｐ」に従い、「毎日報告書を書くくらい」を名詞句ＮＰとした構文解析結果が得られる。
【００７０】
また、図３には、本実施形態に係る構文・意味解析システムによって副助詞「ほど」を含んだ入力文「彼は驚くほど頑張った。」を解析した構文解析結果を示している。
【００７１】
構文解析の結果、「彼は驚くほど」は、条件（ａ）又は（ｂ）で規定されている文法ルールに対応する句構造であることが判る。
【００７２】
さらに格構造解析を施した結果、文法ルール（ａ）又は（ｂ）中にＳが含まれる解析結果があり得るが、このＳの先頭の主語がとりたてられている。すなわち「彼」は、主格を示す格助詞「が」の代わりに係助詞「は」を用いることによって強調表現がなされている。
【００７３】
したがって、この主語をＳに含めないで、条件（ｂ"）で規定されている文法ルール「ＡＤＶＰ→ＶＰ，Ｐ」に従う構文解析結果を出力する。
【００７４】
図１からも分るように、本実施形態では、構文解析と格構造解析を併用する点に大きな特徴がある。さらに、副助詞のカテゴリ及び構文ルールを利用した解析を行なう点にも特徴がある。
【００７５】
したがって、本実施形態に係る構文解析によれば、日本語文に副助詞が含まれている場合であっても、日本語文の構文構造の特徴を考慮した正しい構文解析結果を得ることができる。
【００７６】
例えば、本実施形態に係る構文解析処理を機械翻訳システムに適用した場合、「毎日報告書を書くくらいが丁度良い。」というような副助詞を含む日本語文を適切な英語文へと翻訳することが可能になる。
【００７７】
［追補］
以上、特定の実施形態を参照しながら、本発明について詳解してきた。しかしながら、本発明の要旨を逸脱しない範囲で当業者が該実施形態の修正や代用を成し得ることは自明である。すなわち、例示という形態で本発明を開示してきたのであり、本明細書の記載内容を限定的に解釈するべきではない。本発明の要旨を判断するためには、冒頭に記載した特許請求の範囲の欄を参酌すべきである。
【００７８】
【発明の効果】
以上詳記したように、本発明によれば、出現位置が広範である品詞を正しく取り扱うことができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することができる。
【００７９】
また、本発明によれば、副助詞を品詞カテゴリとして明確に定義して、副助詞に関する句構造を正しく取り扱うことができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することができる。
【００８０】
また、本発明によれば、副助詞を品詞カテゴリとして明確に定義して、副助詞を含む日本語文に対して正しい構文解析結果を出力することができる、優れた自然言語処理システム及び自然言語処理方法、並びにコンピュータ・プログラムを提供することにある。
【００８１】
本発明では、構文解析と格構造解析を併用する点に大きな特徴がある。さらに、副助詞のカテゴリ及び構文ルールを利用した解析を行なう点にも特徴がある。
【００８２】
本発明に係る自然言語処理システムによれば、日本語文に副助詞が含まれている場合であっても、日本語文の構文構造の特徴を考慮した正しい構文解析結果を得ることができる。
【００８３】
例えば、本発明に係る自然言語処理システムを機械翻訳システムに適用した場合、「毎日報告書を書くくらいが丁度良い。」というような副助詞を含む日本語文を適切な英語文へと翻訳することが可能になる。
【図面の簡単な説明】
【図１】本実施形態に係る構文・意味解析システムの処理手順を示したフローチャートである。
【図２】本実施形態に係る構文・意味解析システムにより副助詞「くらい」を含んだ文の解析結果を例示した図である。
【図３】本実施形態に係る構文・意味解析システムにより副助詞「ほど」を含んだ文の解析結果を例示した図である。
【図４】ＬＦＧに基づく自然言語処理システム１の構成を模式的に示した図である。
【図５】入力文「私の娘は英語を話します。」を統語・意味解析部１により処理した結果として得られるｆ−ｓｔｒｕｃｔｕｒｅを示した図である。
【図６】入力文「私の娘は英語を話します。」を統語・意味解析部１により処理した結果として得られるｃ−ｓｔｒｕｃｔｕｒｅを示した図である。
【符号の説明】
１…自然言語処理システム
２…形態素解析部
２Ａ…形態素ルール，２Ｂ…形態素辞書
３…統語・意味解析部
３Ａ…文法ルール，３Ｂ…結合価辞書[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a natural language processing system, a natural language processing method, and a computer program for mathematically handling a natural language used by humans for daily communication, and in particular, syntactic and semantic analysis of Japanese syntax. The present invention relates to a natural language processing system, a natural language processing method, and a computer program.
[0002]
More particularly, the present invention relates to a natural language processing system and natural language processing method for performing syntactic / semantic analysis of Japanese syntax according to grammatical rules, and a computer program, and is particularly correct for a Japanese sentence including an adjunct particle. The present invention relates to a natural language processing system, a natural language processing method, and a computer program that output a parsing result.
[0003]
[Prior art]
Words that humans use for everyday communication, such as Japanese and English, are called “natural languages”. Many natural languages have a naturally occurring origin and have evolved with the history of mankind, people and society. Of course, people can communicate with each other by gestures and hand gestures, but natural language can realize the most natural and advanced communication.
[0004]
On the other hand, with the development of information technology, computers have become established in human society and have deeply penetrated into various industries and daily life. Now, not only computer data, but almost all information content such as images and sounds are handled on the computer, making it possible to perform advanced processing such as editing / processing, storage, management, transmission and sharing of information. .
[0005]
Natural language is inherently abstract and has a high nature of nature, but it can perform computer processing by handling sentences mathematically. As a result, various applications / services related to natural language are realized by automated processing such as machine translation, dialogue system, and search system.
[0006]
Natural language processing is generally divided into processing phases of morphological analysis, syntax analysis, semantic analysis, and context analysis.
[0007]
In morphological analysis, a sentence is segmented into morpheme, which is a semantic minimum unit, and part-of-speech recognition processing is performed. In syntax analysis, sentence structure such as phrase structure is analyzed based on grammatical rules. Since the grammatical rule is a tree structure, the parsing result generally has a tree structure in which individual morphemes are joined based on a dependency relationship. In semantic analysis, a semantic structure that expresses the meaning conveyed by a sentence is obtained based on the meaning (concept) of the words in the sentence and the semantic relationship between words, and the semantic structure is synthesized. In context analysis, a sentence (discourse) that is a sequence of sentences is regarded as a basic unit of analysis, and a discourse structure is constructed by obtaining a semantic group between sentences.
[0008]
By the way, among many natural languages, Japanese is said to be highly ambiguous. One reason for this depends on the existence of part-of-speech, commonly called adjunct particles ("do", "just", "just" ...). In Japanese, expressions containing adverbial particles are very standard, and it is an important issue to output correct parsing results for such expressions.
[0009]
For example, “until” that appears in sentences such as “Let's be patient until he comes” and “until he is a problem” are examples of adjunct particles. Adverbial usage, like these example sentences, follows a sentence (ie "He comes") followed by an adjective or noun, as seen in examples such as "Beautiful" or "The Book" A wide variety of cases. Thus, it is thought that the broad appearance position of the auxiliary particle is the main cause that makes it difficult to handle the auxiliary particle properly.
[0010]
For example, a typical translation result when a Japanese sentence “It is better to write a report every day” is translated into English by a machine translation system is as follows.
[0011]
About which writes a report every day is good exactly.
[0012]
The main structure of this sentence is “About is good”, which is an obvious mistranslation. The reason why such a mistranslation is generated is that the adjunct particle is not defined as a part-of-speech category, and therefore a mechanism for handling phrase structures related to adjunct particles is not incorporated in the system. In the above sentence example, “about” is treated as a noun, resulting in mistranslation.
[0013]
[Problems to be solved by the invention]
An object of the present invention is to provide an excellent natural language processing system, natural language processing method, and computer program capable of performing syntactic and semantic analysis of Japanese syntax according to grammatical rules.
[0014]
A further object of the present invention is to provide an excellent natural language processing system, natural language processing method, and computer program capable of correctly handling parts of speech having a wide range of appearance positions.
[0015]
A further object of the present invention is to provide an excellent natural language processing system, natural language processing method, and computer program capable of correctly handling phrase structures related to auxiliary particles.
[0016]
A further object of the present invention is to provide an excellent natural language processing system, natural language processing method, and computer program capable of outputting a correct parsing result for a Japanese sentence including an auxiliary particle.
[0017]
[Means and Actions for Solving the Problems]
The present invention has been made in consideration of the above problems, and the first aspect thereof is a natural language processing system or a natural language processing method for analyzing the syntax and meaning of an input sentence including an adjunct particle,
A first means or step for analyzing a phrase structure of an input sentence including an auxiliary particle;
A second means or step for determining whether an input sentence including an auxiliary particle has a phrase structure that conforms to a predetermined grammar rule description;
A third means or step for outputting a parsing result according to whether or not an input sentence including an auxiliary particle has a predetermined phrase structure;
A natural language processing system or a natural language processing method.
[0018]
The first means or step performs parsing by applying various grammar rule descriptions described according to, for example, a context free grammar. For example, you may apply the grammar rule description which handles the connection of a predetermined phrase as a sentence.
[0019]
Whether the second means or step conforms to the grammar rule description that the input sentence is a sentence, a verb phrase, an adjective phrase, or a noun phrase after any phrase based on the result of parsing Determine whether or not. Here, it is more preferable to prioritize sentence determination over verb phrase, adjective phrase, and noun phrase determination.
In response to the fact that the input sentence including the adjunct particle has a predetermined phrase structure, the third means or step outputs a parsing result indicating that the input sentence is a noun phrase or an adverb phrase. More specifically, in response to the fact that the input sentence has a phrase structure in which an auxiliary particle exists after the sentence, it is further determined whether or not there is a subject that is not placed at the head of the sentence. If the subject exists, a syntax analysis result is output in which the connection between the sentence including the subject and the adjunct is a noun phrase or adverb phrase. On the other hand, if the subject does not exist, a syntactic analysis result is output in which a sentence excluding the taken subject and the adjunct is connected with a noun phrase or adverb phrase.
[0021]
Here, the word “subject” means that the noun phrase corresponding to the subject is expressed in an emphasis by using an auxiliary particle instead of a case particle indicating the main case.
[0022]
Such a fresh determination of the subject can be made after applying case structure analysis or semantic analysis processing that identifies the grammatical role each phrase plays in the sentence.
[0023]
In the third means or step, when an input sentence including an auxiliary particle cannot include a sentence, an auxiliary particle is present after any phrase of a verb phrase, an adjective phrase, or a noun phrase. In response to conforming to the grammar rule description, a syntax analysis result that outputs the input sentence as a noun phrase or an adverb phrase is output.
[0024]
Thus, according to the natural language processing system or the natural language processing method of the present invention, the combination of syntax analysis and case structure analysis and the analysis using the adjunct category and syntax rules, Can correctly handle the phrase structure of sentences containing sub-particles.
[0025]
According to the natural language processing system of the present invention, it is possible to obtain a correct parsing result in consideration of the characteristics of the syntactic structure of a Japanese sentence even if the Japanese sentence includes an auxiliary particle.
[0026]
For example, when the natural language processing system according to the present invention is applied to a machine translation system, it is possible to translate a Japanese sentence including an auxiliary particle into an appropriate English sentence.
[0027]
According to a second aspect of the present invention, there is provided a computer program written in a computer-readable format so that natural language processing for analyzing the syntax and meaning of an input sentence including an auxiliary particle is executed on a computer system. ,
A first step of analyzing a phrase structure of an input sentence including an auxiliary particle;
A second step of determining whether an input sentence including an auxiliary particle has a phrase structure that conforms to a predetermined grammar rule description;
A third step of outputting a parsing result according to whether or not the input sentence including the auxiliary particle has a predetermined phrase structure;
A computer program characterized by comprising:
[0028]
The computer program according to the second aspect of the present invention defines a computer program described in a computer-readable format so as to realize predetermined processing on a computer system. In other words, by installing the computer program according to the second aspect of the present invention in the computer system, a cooperative action is exhibited on the computer system, and the natural language according to the first aspect of the present invention. Effects similar to those of the processing system and the natural language processing method can be obtained.
[0029]
Other objects, features, and advantages of the present invention will become apparent from more detailed description based on embodiments of the present invention described later and the accompanying drawings.
[0030]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0031]
Natural language parsing techniques can be broadly divided into methods based on statistical processing and methods based on grammatical rule description. The present invention can achieve a remarkable effect when applied to syntactic / semantic analysis based on grammar rule description.
[0032]
The natural language processing system according to the present invention can be implemented by being incorporated into syntactic / semantic analysis processing based on, for example, LFG (Lexical-Functional Grammar) grammar theory. In LFG, linguistic knowledge, that is, grammar of native speakers is configured as a component separated from computer processing and other non-grammatical processing parameters that affect the processing operation of the computer. First, an overview of the natural language processing system will be briefly described. Although the present embodiment will be described based on the LFG grammar theory, of course, the present invention can be similarly applied to an analysis system having other grammar rules.
[0033]
FIG. 4 schematically shows the configuration of the natural language processing system 1 based on LFG.
[0034]
The morpheme analysis unit 2 has a morpheme rule 2A and a morpheme dictionary 2B related to a specific language such as Japanese, and performs a part-of-speech recognition process by segmenting an input sentence into morphemes that are semantic minimum units. For example, if the sentence “My daughter speaks English” is entered, the result of the morphological analysis is that {up} daughter {Noun} of me {Noun} {up} English {Noun} } Speak {Verb1} {tr} Mas {jp}. {Pt} "is output.
[0035]
Such a morphological analysis result is then input to the syntactic / semantic analysis unit 3. The syntactic / semantic analysis unit has dictionaries such as grammar rules 3A and valence dictionaries 3B. Based on grammatical rule analysis, the meaning of words in a sentence, and the semantic relationship between words. Analyzes the semantic structure expressing the meaning conveyed by a sentence (a valence dictionary describes the relationship between verbs and other constituent elements in the sentence, such as the subject, and extracts the semantic relations between predicates and related words can do).
[0036]
As a result of the parsing, the input sentence is questioned based on “c-structure (constituent structure)” representing a phrase structure of a sentence composed of words, morphemes and the like as a tree structure, and a case structure such as a subject and an object. “F-structure (functional structure)” is output as a result of semantically and functionally analyzing sentences, past tense, polite sentences, and the like.
[0037]
FIGS. 5 and 6 show c-structure and f-structure obtained as a result of processing the input sentence “My daughter speaks English” by the syntactic / semantic analysis unit 1, respectively.
[0038]
c-structure represents the structure of words and phrases in a sentence in a tree structure format, and is defined by a syntax category. For example, phonological interpretation for generating a phoneme string can be performed based on c-structure. On the other hand, f-structure clearly expresses a grammatical function, and includes a grammatical function name, a semantic form, and a feature symbol. By referring to the f-structure, it is possible to obtain an understanding of the meaning such as a subject, an object, a complement, and a modifier. The f-structure is a set of features attached to each node of the c-structure, and is expressed in the form of an attribute-attribute value matrix as shown in FIG. That is, the left side in [] is a feature (attribute) name, and the right side is a feature value (attribute value).
[0039]
For details on LFG, see, for example, the paper "Lexical-Functional Grammar: A Formal System for Grammatical Representation" by RM Kaplan and J. Bresnan (The MIT Press, Cambridge (1982). Reprinted in Formal Issues in Lexical-Functional Grammar. , pp. 29-130. CSLI publications, Stanford University (1995)).
[0040]
By the way, in a syntax analysis system based on grammar rule description, it is necessary to describe a grammar rule for each sentence in advance. For example, in order to handle a sentence “I wrote a book”, a grammar rule such as “S → NP, NP, VP” is described. This means that if there is a verb phrase VP (eg “written”) after two consecutive “noun phrases NP (eg“ I am ”,“ book ””), it is recognized as a sentence S. Is a grammatical rule representing the meaning of
[0041]
The grammar in which the left side of “→” is expressed by a single symbol in this way is also called “context-free grammar”. In short, the context-free grammar is a grammar rule that treats a combination of predetermined phrases as a sentence, and is one of the most commonly used grammar expressions in a parsing system.
[0042]
Furthermore, the process of identifying what grammatical role a noun phrase NP or the like has in a sentence is called case structure analysis or semantic analysis. Here, the grammatical role is, for example, “subject”, “object”, “complement”.
[0043]
In the parsing system according to the present embodiment, the part-of-speech category P corresponding to the auxiliary particle is defined.
[0044]
Further, the following two grammatical rules including the part of speech category P are set.
[0045]
[Expression 1]

[0046]
The above grammar rules are described based on context-free grammar. Here, S, NP, VP, AP, and ADVP are categories corresponding to sentences, noun phrases, verb phrases, adjective phrases, and adverb phrases, respectively. Accordingly, the grammatical rule (a) is such that if the part of speech category P is connected after any phrase of a sentence, a verb phrase, an adjective phrase, or a noun phrase, it is recognized as a noun phrase NP. It is. The grammatical rule (b) is a grammatical rule that recognizes an adverb phrase when the part-of-speech category P is connected after any sentence, verb phrase, adjective phrase, or noun phrase.
[0047]
Furthermore, the following conditions are set.
[0048]
[Expression 2]

[0049]
Here, “the subject of the subject” refers to a case particle that indicates the subject in the noun phrase corresponding to the subject (typically “ga”, “wo”, “ni”, “kara”…). It is an emphasis expression made by using a particle ("ha", "just", "mo"). (The case is mainly a type of the relationship between the supplementary component and the predicate component. The case particle is a particle representing the relationship of the case.)
[0050]
For example, in the expression “he is surprising” (“so” is an auxiliary particle), the subject in S “he is surprised” is not taken, and in “he is surprising”, S “he The subject in "I am surprised" is taken by the particle "ha".
[0051]
Therefore, the former “surprisingly he” can be analyzed as NP or ADVP having “S, P” in that order in accordance with the above conditions (a), (b), and (c). In contrast, the latter “surprisingly” means that the “surprisingly” part is analyzed as NP or ADVP having “VP, P” as a component, and the conditions (a) and (b) Is not applicable.
[0052]
The conditions (a) and (b) can be determined based on the syntax analysis result obtained by analyzing the phrase structure of the sentence. On the other hand, in order to determine the condition (c), it is necessary to perform not only syntax analysis processing but also case structure analysis or semantic analysis processing for identifying a grammatical role that each phrase plays in a sentence.
[0053]
Further, in addition to the above conditions (a), (b), and (c), the following conditions are set in the grammar rules of the conditions (a) and (b).
[0054]
[Equation 3]

[0055]
For example, in a context-free grammar, for example, “S → NP, NP, VP”, etc., a grammar rule that treats a concatenation of predetermined phrases as a sentence is applied, and S determined according to such a grammar rule. Is given priority over the determination of other part-of-speech categories such as VP, AP, and NP.
[0056]
FIG. 1 shows a processing procedure of the syntax / semantic analysis system according to this embodiment in the form of a flowchart.
[0057]
First, a syntax analysis process including the grammar rules (a) and (b) is performed on the input sentence (step S1).
[0058]
Next, it is determined whether or not the input sentence is a sentence including the phrase structure of the grammar rule (a) or (b) (step S2). If it is determined that these phrase structures are not included, the syntax analysis result is output as it is.
[0059]
On the other hand, if it is determined that the input sentence includes the phrase structure of the grammar rule (a) or (b), a case structure analysis is further performed on the input sentence (step S3), and each phrase is included in the sentence. Identify the grammatical roles that play in
[0060]
Next, it is determined whether there can be an analysis result including a part-of-speech category corresponding to S in the sentence corresponding to the grammatical rule (a) or (b) (step S4). This determination process is a process corresponding to the condition (d).
[0061]
Here, only when there is no analysis result including the part of speech category corresponding to S in the sentence corresponding to the grammar rule (a) or (b), the syntax according to the following grammar rule (a ′) or (b ′) Output analysis results.
[0062]
[Expression 4]

[0063]
On the other hand, if there can be an analysis result including a part-of-speech category corresponding to S in the sentence corresponding to the grammatical rule (a) or (b), then the subject not taken at the head of the phrase structure corresponding to S Is determined (step S5). The subject is a noun phrase that corresponds to the subject by using a particle (“ha”, “sano”, “mo”) instead of a case particle indicating the main case (typically “ga”). It is the emphasis expression made (described above).
[0064]
Regardless of the determination result in step S5, a parsing result according to the following grammatical rule (a ") or (b") is output. However, if there is a subject that is not placed at the beginning of S, the analysis result including that subject is output in S, and if it is not a layer, even if there is a subject that has been taken, the subject is Output analysis results that are not included in.
[0065]
[Equation 5]

[0066]
FIG. 2 shows a syntax analysis result obtained by analyzing the input sentence “The daily report is just right” including the auxiliary particle “about” by the syntax / semantic analysis system according to the present embodiment.
[0067]
The phrase “write a daily report” is handled as a sentence S according to the syntax analysis result of step S1 by setting a grammatical rule “S → ADV, NP, VP” in advance.
[0068]
Further, as a result of identifying the grammatical role in the sentence of each phrase by case structure analysis, it can be seen that there is an analysis result that S is included in the input sentence, but there is no subject taken at the head of this S.
[0069]
As a result, in accordance with the grammatical rule “NP → S, P” defined by the condition (a ″), a syntax analysis result is obtained in which “no matter how much a report is written every day” as a noun phrase NP.
[0070]
Further, FIG. 3 shows a syntax analysis result obtained by analyzing the input sentence “he has worked surprisingly” including the adjunct “so” by the syntax / semantic analysis system according to the present embodiment.
[0071]
As a result of the parsing, it is understood that “he is surprising” is a phrase structure corresponding to the grammatical rule defined in the condition (a) or (b).
[0072]
As a result of further case structure analysis, there may be an analysis result in which S is included in the grammar rule (a) or (b), but the leading subject of this S is taken. That is, “he” is emphasized by using the particle “ha” instead of the case particle “ga” indicating the main case.
[0073]
Therefore, without including this subject in S, a syntax analysis result according to the grammatical rule “ADVP → VP, P” defined by the condition (b ″) is output.
[0074]
As can be seen from FIG. 1, the present embodiment has a great feature in that syntactic analysis and case structure analysis are used in combination. Another feature is that the analysis is performed using the category of the auxiliary particle and the syntax rule.
[0075]
Therefore, according to the syntax analysis according to the present embodiment, it is possible to obtain a correct syntax analysis result in consideration of the characteristics of the syntax structure of the Japanese sentence even if the Japanese sentence includes an auxiliary particle.
[0076]
For example, when the parsing process according to the present embodiment is applied to a machine translation system, a Japanese sentence including an auxiliary particle such as “It is better to write a daily report” is translated into an appropriate English sentence. Is possible.
[0077]
[Supplement]
The present invention has been described in detail above with reference to specific embodiments. However, it is obvious that those skilled in the art can make modifications and substitutions of the embodiment without departing from the gist of the present invention. That is, the present invention has been disclosed in the form of exemplification, and the contents described in the present specification should not be interpreted in a limited manner. In order to determine the gist of the present invention, the claims section described at the beginning should be considered.
[0078]
【The invention's effect】
As described in detail above, according to the present invention, it is possible to provide an excellent natural language processing system, natural language processing method, and computer program capable of correctly handling parts of speech having a wide range of appearance positions.
[0079]
The present invention also provides an excellent natural language processing system, natural language processing method, and computer program that can clearly define an auxiliary particle as a part-of-speech category and correctly handle the phrase structure related to the auxiliary particle. can do.
[0080]
Further, according to the present invention, an excellent natural language processing system and natural language processing that can clearly define an auxiliary particle as a part of speech category and output a correct parsing result for a Japanese sentence including the auxiliary particle. It is to provide a method and a computer program.
[0081]
The present invention is greatly characterized in that syntactic analysis and case structure analysis are used in combination. Another feature is that the analysis is performed using the category of the auxiliary particle and the syntax rule.
[0082]
According to the natural language processing system of the present invention, it is possible to obtain a correct parsing result in consideration of the characteristics of the syntactic structure of a Japanese sentence even if the Japanese sentence includes an auxiliary particle.
[0083]
For example, when the natural language processing system according to the present invention is applied to a machine translation system, a Japanese sentence including an auxiliary particle such as “It is better to write a daily report” is translated into an appropriate English sentence. Is possible.
[Brief description of the drawings]
FIG. 1 is a flowchart showing a processing procedure of a syntax / semantic analysis system according to an embodiment.
FIG. 2 is a diagram exemplifying an analysis result of a sentence including an auxiliary particle “about” by the syntax / semantic analysis system according to the embodiment;
FIG. 3 is a diagram exemplifying an analysis result of a sentence including an auxiliary particle “sodo” by the syntax / semantic analysis system according to the embodiment;
FIG. 4 is a diagram schematically showing a configuration of a natural language processing system 1 based on LFG.
FIG. 5 is a diagram showing f-structure obtained as a result of processing an input sentence “My daughter speaks English” by the syntactic / semantic analysis unit 1;
FIG. 6 is a diagram showing c-structure obtained as a result of processing the input sentence “My daughter speaks English” by the syntactic / semantic analysis unit 1;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Natural language processing system 2 ... Morphological analysis part 2A ... Morphological rule, 2B ... Morphological dictionary 3 ... Syntactic / semantic analysis part 3A ... Grammar rule, 3B ... Valency dictionary

Claims

A natural language processing system that analyzes the syntax and meaning of input sentences containing adverbial particles,
A first means for analyzing a phrase structure of an input sentence including an auxiliary particle;
A second means for determining whether an input sentence including an adjunct conforms to a grammatical rule description that an adjunct exists after any sentence, verb phrase, adjective phrase, or noun phrase;
In response to determining that the input sentence including the auxiliary particle matches the grammatical rule description by the second means, it is determined whether or not there is a subject that is not placed at the head of the sentence; If there is a subject, output a syntax analysis result with the noun phrase or adverb phrase including the sentence containing the subject and the adjunct particle, but if the subject does not exist, a sentence excluding the taken subject And a third means for outputting a parsing result in which the connection of adverb is a noun phrase or adverb phrase;
A natural language processing system comprising:

The first means performs syntax analysis by applying a grammar rule description that treats a connection of predetermined phrases as a sentence.
The natural language processing system according to claim 1.

The third means determines the length of the subject depending on whether or not the emphasis is made by using a counselor instead of a case particle indicating the main case in a noun phrase corresponding to the subject,
The natural language processing system according to claim 1.

The third means determines the subject's length of the input sentence after applying case structure analysis or semantic analysis processing that identifies the grammatical role that each phrase plays in the sentence.
The natural language processing system according to claim 1.

The second means applies a grammatical rule that treats the connection of a predetermined phrase as a sentence, and prioritizes sentence determination over verb phrase, adjective phrase, and noun phrase determination.
The natural language processing system according to claim 1.

A natural language processing method for analyzing the syntax and meaning of an input sentence including an auxiliary particle on a natural language processing system constructed using a computer,
A first means of the computer comprises a first step of analyzing a phrase structure of an input sentence including an auxiliary particle;
Whether the second means of the computer conforms to a grammar rule description in which an input sentence including an adverbial particle is a grammatical rule description that an adjunct exists after any of a sentence, a verb phrase, an adjective phrase, or a noun phrase A second step of determining
In response to the fact that the third means provided in the computer determines that the input sentence including the adjunct particle matches the grammatical rule description by the second means, the subject not taken at the head of the sentence is When the subject is present, if the subject is present, the result of the parsing is output with the noun phrase or adverb phrase as the combination of the sentence containing the subject and the adjunct particle, but the subject does not exist A third step of outputting a result of parsing with a noun phrase or adverb phrase as a combination of a sentence excluding the selected subject and adverb;
A natural language processing method comprising:

In the first step, a syntax analysis is performed by applying a grammatical rule description that treats a connection of predetermined phrases as a sentence.
The natural language processing method according to claim 6.

In the third step, the length of the subject is determined depending on whether or not an emphasis expression is made by using a counselor instead of a case particle indicating the main case in a noun phrase corresponding to the subject.
The natural language processing method according to claim 6.

In the third step, after applying a case structure analysis or a semantic analysis process for identifying a grammatical role that each phrase plays in a sentence, the subject of the input sentence is determined.
The natural language processing method according to claim 6.

In the second step, a grammar rule that treats connection of a predetermined phrase as a sentence is applied, and determination of a sentence is given priority over determination of a verb phrase, an adjective phrase, and a noun phrase.
The natural language processing method according to claim 6.