JP2012185790A

JP2012185790A - Dependency analysis support device

Info

Publication number: JP2012185790A
Application number: JP2011063173A
Authority: JP
Inventors: Hiroshi Yasuhara; 宏安原
Original assignee: Individual
Current assignee: Individual
Priority date: 2011-03-04
Filing date: 2011-03-04
Publication date: 2012-09-27

Abstract

PROBLEM TO BE SOLVED: To provide a dependency analysis support device capable of accurately and efficiently determining dependency of a long sentence through person's interaction.SOLUTION: The dependency analysis support device is characterized in performing correct dependency analysis by repeating the steps of dividing the sentence with punctuations, carrying out morphological analysis and syntactic analysis on divided parts, visually confirming dependency analysis results of the respective parts, and selecting another dependency rule from a previously prepared rule set if a wrong part is found; and in performing correct dependency analysis between parts by repeating the steps of performing dependency analysis between divided parts when the dependency analysis on the respective parts is completed, visually confirming dependency analysis results between the parts, and selecting another dependency rule from the previously prepared rule set if a wrong part is found.

Description

本発明は、自然言語の長文の係り受け解析に適した係り受け解析支援装置に関する。 The present invention relates to a dependency analysis support apparatus suitable for dependency analysis of long sentences in natural language.

機械翻訳や自然言語理解等の自然言語の応用に係るシステムでは、入力となる自然言語を正しく解析する必要がある。 In a system related to natural language applications such as machine translation and natural language understanding, it is necessary to correctly analyze the input natural language.

通常、日本語の解析は、単語レベルを切り出して品詞を決定し、文節にまとめる形態素解析と、文節間の係り受け関係を決定する構文解析が行われる。それらの処理が終わると更に文節間の意味的な関係を抽出する意味解析が行われることもある。係り受けの組み合わせ数は、文節数が増えると、粗く見積もると文節数の階乗になる。もちろん制約条件があるため実際は少なくなるが、文節数が１０を超えると非常に困難になるといわれている。現状の日本語解析では、係り受け解析の精度を実用レベルに高めることが重要な課題となっている。 Usually, Japanese analysis is performed by morphological analysis in which word levels are cut out and part of speech is determined and grouped into phrases, and syntactic analysis is performed to determine dependency relationships between phrases. When these processes are completed, a semantic analysis may be performed to further extract a semantic relationship between clauses. The number of combinations of dependency becomes the factorial of the number of phrases when roughly estimated as the number of phrases increases. Of course, because there are constraints, it actually decreases, but if the number of clauses exceeds 10, it is said that it will be very difficult. In the current Japanese analysis, it is an important issue to raise the accuracy of dependency analysis to a practical level.

これまで、その目標に向けて多くの試みが行われてきた。［特許文献１］では、文中の述語に着目してその述語を中心にグループ化可能なパターンを規則を用いて抽出し、そのグループで解析を行い、続いてグループ間の係り受けを規則を用いて行っている。［特許文献２］では、文の意味的な陳述単位をパターン化して、それを元に文を構造化する方法である。長文となる一つの要因は、並列句を含む場合であり、例えば、［非特許文献１］では、文中の語の並びの類似性に着目して並列句を抽出する方法を開発している。また、［特許文献３］では、長文が多い特許請求項に対し、請求項の記載パターンを抽出して、それをベースに請求項の解析を行っている。［特許文献４］では、機械翻訳の前編集として、長文を分割する方法を開示している。 So far, many attempts have been made towards that goal. In [Patent Document 1], focusing on a predicate in a sentence, a pattern that can be grouped around the predicate is extracted using a rule, analyzed by the group, and subsequently a dependency between groups is used using the rule. Is going. [Patent Document 2] is a method of patterning a semantic description unit of a sentence and structuring the sentence based on the pattern. One factor that causes a long sentence is the case where a parallel phrase is included. For example, [Non-Patent Document 1] has developed a method of extracting a parallel phrase by focusing on the similarity of the arrangement of words in a sentence. In [Patent Document 3], a claim description pattern is extracted from a claim having many long sentences, and the claim is analyzed based on the extracted pattern. [Patent Document 4] discloses a method of dividing a long sentence as pre-editing of machine translation.

特開平０７−０５６９１９号 JP 07-056919 A 特開平０６−２９５３０８号 Japanese Patent Laid-Open No. 06-295308 特許第３９０８２６１号 Japanese Patent No. 3908261 特開平０８−８７５０４号 JP-A-08-87504

「並列構造の検出に基づく長い日本語文の構文解析」，黒橋禎夫，長尾眞，自然言語処理，Ｖｏｌ．１，Ｎｏ．１，ｐｐ．３５−５８，１９９４年 "Syntactic analysis of long Japanese sentences based on parallel structure detection", Ikuo Kurohashi, Atsushi Nagao, Natural Language Processing, Vol. 1, No. 1 1, pp. 35-58, 1994

以上に述べた従来の係り受け解析では、次のような問題点が存在する。［特許文献１］の方法では、あらかじめ蓄積した述語パターンから述語を切り出すことが必要で、それが正しく行えるという保証がなされていないため高精度な解析が得られる保証が示されていない。［特許文献２］も、自然言語文をカバーする文脈パターンを収集することが難しく、一般文章で利用できるという保証は示されていない。［非特許文献１］の並列句による長文の解析精度の向上は、一般的な日本語文には並列句の解析だけでは解消できない長文が存在するため問題が残されている。また、［特許文献３］は、特許の請求項といった限定した文章に対するものであり、一般の日本語の長文解析への適用は保障されていない。［特許文献４］は、長文の分割点を見つけるために分割パターンを特定することを行うが、そもそも分割点の特定が長文を対象にした処理であるため、係り受けの選択工数が減らないので、難しいといわざるを得ない。 The conventional dependency analysis described above has the following problems. In the method of [Patent Document 1], it is necessary to cut out a predicate from a predicate pattern accumulated in advance, and there is no guarantee that this can be performed correctly, so there is no guarantee that a highly accurate analysis can be obtained. [Patent Document 2] is also difficult to collect context patterns that cover natural language sentences, and does not show that it can be used in general sentences. The improvement of the analysis accuracy of long sentences by parallel phrases in [Non-Patent Document 1] remains a problem because there are long sentences in general Japanese sentences that cannot be solved only by the analysis of parallel phrases. [Patent Document 3] is for a limited sentence such as a claim in a patent, and its application to general Japanese long sentence analysis is not guaranteed. [Patent Document 4] specifies a division pattern in order to find a division point of a long sentence, but since the identification of the division point is processing for a long sentence in the first place, the selection man-hour for dependency is not reduced. I have to say that it is difficult.

本発明は、上記目的を達成するため、長文を句読点によって複数の「部分」と呼ぶ短い単位に分割して各部分内の係り受けを解析し、その後、部分間の係り受けを解析する２段階からなる係り受け解析支援手段を提供するものである。 In order to achieve the above object, the present invention divides a long sentence into a plurality of short units called “parts” by punctuation marks, analyzes the dependency in each part, and then analyzes the dependency between parts. A dependency analysis support means consisting of

解析精度を高めるために、部分内の解析結果および部分間の解析結果を構文木等で分かり易く提示し、それをユーザが目視で確認し、もし、間違った係り受けがあれば、係り側の文節を指定することで、当該文節の受け側文節になる他の文節候補を提示し、或は他の規則集合を選択することで当該規則が指定する受け側文節を提示し、いずれかの提示でも間違っているときは新たな候補を選択する手段を繰り返し提供する。また、前記選択する規則を優先度に従って提示する手段を提供することもできる。 In order to increase the analysis accuracy, the analysis results within the part and the analysis result between the parts are presented in a simple manner such as a syntax tree, and the user visually confirms it. If there is an incorrect dependency, By specifying a clause, other candidate clauses that become recipient clauses of the clause are presented, or by selecting another rule set, the recipient clause specified by the rule is presented, and either presentation But if it's wrong, it provides repeated means to select new candidates. It is also possible to provide means for presenting the rules to be selected according to priority.

本発明の係り受け解析支援装置は、長文に対して、文を短く分割することでその部分の解析精度を高めることができ、かつ分割された部分および部分間の正確な解析を保証するため、全体として長文の係り受け解析精度を実用レベルに高めることができる。 The dependency analysis support device of the present invention can improve the analysis accuracy of a part of a long sentence by dividing the sentence into short parts, and guarantees an accurate analysis between the divided part and the part. Overall, the dependency analysis accuracy of long sentences can be raised to a practical level.

また、目視による確認および訂正が必要な場合は受け側文節の選択、係り受け規則の選択、或は文の編集を行うが、規則の適用を学習することにより係り受け規則の選択件数が減少し、人手訂正作業の負荷が漸次逓減することが期待できる。 In addition, when visual confirmation and correction are required, the recipient's clause is selected, the dependency rule is selected, or the sentence is edited, but learning the application of the rule reduces the number of dependency rules selected. It can be expected that the load of manual correction work will gradually decrease.

係り受け解析支援装置の全体構成図 Overall structure of dependency analysis support device 文の部分解析結果の表現例と係り受け解析の対応模式図 Corresponding schematic diagram of example of partial analysis result of sentence and dependency analysis 文の係り受け解析の概略ステップ図 Schematic step diagram for dependency analysis of sentences 係り受け解析支援の操作キー一覧図 List of operation keys for dependency analysis support 係り受け解析支援の入力例文図 Input example diagrams for dependency analysis support 例文の文分割結果図 Example sentence division result diagram 部分内の係り受け解析結果図 Dependency analysis result diagram in the part 部分間の係り受け解析結果図 Dependency analysis result diagram between parts 主文節に合致する部分間係り受け規則例図 Example of dependency rules for part matching the main phrase 他の部分間係り受け規則の選択後図 Figure after selection of dependency rules for other parts 選択された規則で得られた係り受け解析結果図 Dependency analysis result diagram obtained with the selected rule 部分間係り受け解析結果の途中画面図 Interim dependency analysis result screen view １文の係り受け解析支援終了の解析結果画面図 Analysis result screen diagram of dependency analysis support end for one sentence 部分内係り受け訂正説明図 Partial dependency modification explanation diagram 非文の解析例図 Non-sentence analysis example diagram 部分内係り受け解析規則集合の例 Example of set of dependency analysis rules in subpart 部分間係り受け解析規則集合の例 Example of dependency analysis rule set

請求項１の実施例について説明する。 An embodiment of claim 1 will be described.

図１は、本実施例における係り受け解析支援装置のハードウェアおよびその記憶装置内のデータおよびプログラム構成を示す。１は中央処理装置と記憶装置を含む計算機である。２は表示装置、３はキーボード、４はマウス等のポインティングデバイスである。図では１、２、３、４は独立して描かれているが、タッチパネル式の一体型計算機であってもよい。１０は記憶装置の主たる構成要素を示すもので、入力文章を格納する入力バッファ１１と、解析結果を格納する出力バッファ１２と、文を部分に分割するプログラムである文の部分への分割部１３と、形態素解析及び構文解析を実行するプログラムである形態素解析・構文解析部１４と、部分内の指定された文節の係り受け解析を行うプログラムである部分内係り受け解析部１５と、当該部分内係り受け解析を行うときに使用する部分内係り受け解析規則集合１６と、部分間の係り受け解析を行うプログラムである部分間係り受け解析部１７と、当該部分間の係り受け解析を行うときに使用する部分間係り受け解析規則集合１８と、規則の追加、更新および優先度を管理するプログラムである係り受け規則管理部１９とを含む。 FIG. 1 shows the hardware of the dependency analysis support apparatus in this embodiment, the data in the storage device, and the program configuration. Reference numeral 1 denotes a computer including a central processing unit and a storage device. 2 is a display device, 3 is a keyboard, and 4 is a pointing device such as a mouse. In the figure, 1, 2, 3, and 4 are drawn independently, but a touch panel type integrated computer may be used. Reference numeral 10 denotes main components of the storage device. The input buffer 11 stores an input sentence, the output buffer 12 stores an analysis result, and a sentence division unit 13 that is a program for dividing a sentence into parts. A morpheme analysis / syntax analysis unit 14 that is a program for executing morphological analysis and syntax analysis, an intra-partial dependency analysis unit 15 that is a program for performing dependency analysis of a specified phrase in the part, In-part dependency analysis rule set 16 used when performing dependency analysis, part-part dependency analysis unit 17 which is a program for performing dependency analysis between parts, and when performing dependency analysis between the parts It includes a dependency analysis rule set 18 for parts to be used and a dependency rule management part 19 which is a program for managing addition, update and priority of rules.

図１６は部分内係り受け解析規則集合１６の一部を例示し、図１７は部分間係り受け解析規則集合１８の一部を例示する。両方の構成の形式は同じであり、係り側５０、５３と受け側５１、５４と係り受け関係５２、５５からなる。係り側の文節と受け側の文節が規則に一致するとその２つの文節間で係り受け関係が成立する。図１６の‘ＩＲ＃ｉ’および図１７の‘ＯＲ＃ｊ’は内部的なインデックスである。規則の中の‘＜＞’および‘［］’は品詞や記号を示すマーカである。文字列はそのままで記述されている。この規則の表現形式や規則の記述方法はこれに限定したものではない。また、規則の表のカラムを拡張して、規則を管理するための情報を付加することも可能である。 16 illustrates a part of the partial dependency analysis rule set 16, and FIG. 17 illustrates a part of the partial dependency analysis rule set 18. The format of both configurations is the same, and consists of a dependency side 50, 53, a reception side 51, 54, and a dependency relationship 52, 55. If the dependency-side clause and the receiving-side clause match the rule, a dependency relationship is established between the two clauses. ‘IR # i’ in FIG. 16 and ‘OR # j’ in FIG. 17 are internal indexes. “<>” And “[]” in the rule are markers indicating parts of speech and symbols. The character string is described as it is. The expression format of this rule and the method of describing the rule are not limited to this. It is also possible to add information for managing rules by expanding the columns of the rules table.

図２は、文Ｓ＃ｎを句読点で区切られた単語列からなる部分に分割したデータ構造の概念図である。文は′｛｝‘で囲まれており、Ｓｎ＃ｉ（この例では。ｉは１から３の値をとっている。）は、分割されたｉ番目の部分を示す。部分であるためには区切られた単語列が、次の条件を満たすものに限定したものとする。すなわち、係り受けには非交差原理が適用されるとして、一つは、部分内の解析は閉じていること。つまり、当該部分内から外部の部分に出る係り受け関係は１つだけであること。もう一つは、外部の部分から当該部分内の単語に係る係り受け関係は複数個を認めることである。外部の部分からの係り受けがない場合も認める。また、部分内の主文節は機能語あるいは付属語と呼ばれるものを含んでいるものとする。主文節とは部分内で他の文節に係らない文節で、日本語では部分の右端の文節が主文節となる。従って、最初の条件は主文節のみが外部の部分を修飾する文節であるといえる。また、図２に示す部分内係り受け解析２０および部分間係り受け解析２１については後ほど説明する。各Ｓｎ＃ｉ部分内係り受け解析２２、２３、２４から出ている矢印の先の‘｛｝’で括られたところは空白になっているが、ここには形態素解析および構文解析に依存した解析結果を記録するが、特定の表現に限定されるものではない。係り受け解析では、係り受けの修飾を木構造で表現することが多い。各Ｓｎ＃ｉ部分間係り受け解析２５、２６から出ている矢印の先の‘［］’で括られたところには当該部分の主文節の係り受け情報を記載する。 FIG. 2 is a conceptual diagram of a data structure in which the sentence S # n is divided into parts each including a word string divided by punctuation marks. The sentence is surrounded by '{}', and Sn # i (in this example, i takes a value from 1 to 3) indicates the divided i-th portion. In order to be a part, the segmented word string is limited to those satisfying the following conditions. That is, the non-intersection principle is applied to the dependency, and one is that the analysis within the part is closed. In other words, there is only one dependency relationship from the part to the external part. The other is to allow a plurality of dependency relationships from the external part to the words in the part. The case where there is no dependency from outside is also allowed. In addition, the main clause in the part includes what is called a function word or an attached word. The main clause is a clause that is not related to other clauses in the part. In Japanese, the rightmost part of the part is the main clause. Therefore, it can be said that the first condition is a clause in which only the main clause modifies the external part. Further, the partial dependency analysis 20 and the partial dependency analysis 21 shown in FIG. 2 will be described later. Each Sn # i partial dependency analysis 22, 23, 24 is surrounded by '{}' at the end of the arrow, but this depends on morphological analysis and syntax analysis. The analysis result is recorded, but is not limited to a specific expression. In dependency analysis, modification of dependency is often expressed in a tree structure. The dependency information of the main clause of the relevant part is described in the portion enclosed by “[]” at the tip of the arrow from the dependency analysis 25 and 26 for each Sn # i portion.

図３は係り受け解析支援の概略のステップを示す。部分内係り受け解析する処理と部分間係り受け解析する処理に分かれる。先ず、処理対象の文が入力バッファ１１に存在するとする。文は複数存在しても良い。部分に分割３１では、各々の文に対して句読点で部分に分割する。するとｎ番目の入力文Ｓ＃ｎは部分Ｓｎ＃１、Ｓｎ＃２、Ｓｎ＃３等に分割される。部分内の形態素解析・構文解析３２では、各Ｓｎ＃ｉに対して形態素解析・構文解析部１４で処理して、係り受け構造を出力する。部分内の解析結果の確認および必要なら訂正３３では、部分の解析結果を目視で確認し、文法的な不正を検出したら、文を編集するモードに移る。もし不正がなければ、各部分の解析が正しく行われたどうかをＳｎ＃１から順に目視で確認する。もし、係り受けで間違いが見つかると係り側の文節を指定して別の受け側の文節候補を提示する。以上で、部分内の解析が終了すると、部分間の解析結果の確認および必要なら訂正３４では、主文節毎に部分間の係り受け解析を行い、この係り受けの確認は部分内と同様のやり方で行い、係り受けのやり直しや文の編集を行う。 FIG. 3 shows the general steps of dependency analysis support. It is divided into a process of analyzing the dependency within the part and a process of analyzing the dependency between the parts. First, it is assumed that a sentence to be processed exists in the input buffer 11. There may be multiple sentences. In the division into parts 31, each sentence is divided into parts with punctuation marks. Then, the nth input sentence S # n is divided into parts Sn # 1, Sn # 2, Sn # 3, and the like. In the morpheme analysis / syntax analysis 32 in the part, each Sn # i is processed by the morpheme analysis / syntax analysis unit 14 to output a dependency structure. In the confirmation of the analysis result in the part and correction 33 if necessary, the analysis result of the part is visually confirmed, and when a grammatical injustice is detected, the mode is shifted to a sentence editing mode. If there is no injustice, it is visually confirmed in order from Sn # 1 whether each part has been correctly analyzed. If an error is found in the dependency, a clause on the dependency side is specified and another phrase candidate on the reception side is presented. When the analysis in the portion is completed, the analysis result between the portions is confirmed and, if necessary, correction 34, the dependency analysis between the portions is performed for each main phrase, and the dependency check is performed in the same manner as in the portion. To redo the dependency and edit the sentence.

図４は、係り受け解析支援の操作で押下するキーの一覧を示す。この図ではタスクバー様式やタスクメニュー様式を想定して記入しているが、キーボードに割り付けても良い。キーの機能の概要を示す。［文分割］は文を分割する。［部分内解析］は前記分割された部分内の形態素解析・構文解析を実行する。［部分間解析］は部分間の係り受けを解析する。［係り受け］は指定した文節の係り受けを行う。［次係先候補］は受け側文節の新たな候補を探す。［部分内前進］は受け側の文節を１つ先に強制的に移動するもので、移動範囲は部分内に限定される。［部分内後退］は［部分内前進］とは逆の操作で受け側の文節を１つ元に戻す。［部分間前進］は部分間の係り受け解析において受け側の文節を次の部分の主文節に移動する。［部分間後退］は［部分間前進］とは逆の操作で受け側の主文節を前の部分の主文節に戻す。［編集モード］は係り受け解析を中断して文の編集モードに移動する。［確定］は現在解析済みの係り受け関係を表示する。
以上のキーはこれらに限定したものではなく、インタフェース用にファイルのＲｅａｄ／Ｗｒｉｔｅを追加したり、キーの機能を組み合わせて新たな複合キーを追加することもできる。FIG. 4 shows a list of keys to be pressed in the dependency analysis support operation. In this figure, the taskbar format and the task menu format are assumed, but they may be assigned to the keyboard. Provides an overview of key functions. [Sentence Division] divides a sentence. [Intra-part analysis] executes morpheme analysis / syntactic analysis in the divided part. [Partial analysis] analyzes the dependency between parts. [Dependency] performs dependency on the specified phrase. [Next candidate] searches for a new candidate for the recipient phrase. [Intra-part advancement] is to forcibly move the receiving clause one by one, and the movement range is limited to the part. [Reverse in part] restores the receiving phrase by one operation in the reverse operation of [Inward in part]. [Progress between parts] moves the receiving phrase to the next main phrase in the dependency analysis between the parts. [Reverse part part] returns the main sentence of the receiving side to the main part of the previous part by the reverse operation of [Progress part part]. [Edit mode] interrupts the dependency analysis and moves to the sentence edit mode. [Confirm] displays the currently analyzed dependency relationship.
The above keys are not limited to these, and a read / write of a file can be added for an interface, or a new composite key can be added by combining key functions.

図３のステップ３１〜３４を図５〜図１３を参照しながら例文によって係り受け解析支援装置の画面を例示しながら詳細に示す。先ず、部分に分割３１では、［文分割］を押下して入力バッファの文章を句読点により部分に分割する。図６に図５の入力例文の分割結果の画面を示す。次に部分内の形態素解析・構文解析３２では、［部分内解析］を押下することで前記分割された部分に対して形態素解析・構文解析部１４を実行する。ここで［確定］を押下すると図７に示す部分内解析結果の構文木が表示される。この表示方法はこれに限定したものではない。部分内の解析結果の確認および必要なら訂正３３では、先ず、前記部分内解析結果を目視で確認する。この例文の部分は全て正しく解析が行われているので次に進む。部分間の解析結果の確認および必要なら訂正３４では、先ず［部分間解析］を押下すると部分間係り受け解析部１７が起動し、部分間係り受け解析規則集合１８を用いて部分間の係り受け解析が実行される。ここで［確定］を押下すると図８に示す文の解析結果の構文木が表示される。当該解析結果の表示を目視で確認し、Ｓｎ＃２、Ｓｎ＃３の部分間の係り受けで失敗が見つかる。以下、係り受けの失敗の訂正を示す。Ｓｎ＃２の主文節「後で、」にカーソルを置き、［係り受け］を押下すると合致する規則集合の一部が、図８に示されている。下線部分ＯＲ＃８は現在選択された規則であるが、この正解は「下がり、」に係るので、それに合致する規則ＯＲ＃９に進めるため［係り受け］を押下する。その結果として図１０に示す「後、→Ｓｓ３下がり、［時間Ａｆｔｅｒ］」が得られる。ここで［確定］を押下すると木構造図１１が表示される。 Steps 31 to 34 in FIG. 3 will be described in detail with reference to FIG. 5 to FIG. First, in the division into parts 31, [Sentence Division] is pressed to divide the text in the input buffer into parts by punctuation marks. FIG. 6 shows a screen of the division result of the input example sentence of FIG. Next, in the morphological analysis / syntactic analysis 32 in the part, the morpheme analysis / syntax analysis unit 14 is executed on the divided part by pressing [Internal analysis]. If [OK] is pressed here, the syntax tree of the partial analysis result shown in FIG. 7 is displayed. This display method is not limited to this. In confirmation of the analysis result in the portion and correction 33 if necessary, first, the analysis result in the portion is visually confirmed. Since all parts of this example sentence have been correctly analyzed, the process proceeds. In confirmation of the analysis result between the parts and correction 34 if necessary, first, when [Partial analysis] is pressed, the part-part dependency analysis unit 17 is activated, and the part-part dependency analysis rule set 18 is used to determine the dependency between parts. Analysis is performed. When [OK] is pressed here, the syntax tree of the sentence analysis result shown in FIG. 8 is displayed. The display of the analysis result is visually confirmed, and a failure is found by the dependency between the Sn # 2 and Sn # 3 portions. The correction of dependency failure is shown below. FIG. 8 shows a part of a rule set that matches when the cursor is placed on the main phrase “Later,” of Sn # 2 and [Dependency] is pressed. The underlined portion OR # 8 is the currently selected rule, but this correct answer is related to “decline,” so that [Dependency] is pressed to advance to the rule OR # 9 that matches it. As a result, “After, → Ss3 falls, [Time After]” shown in FIG. 10 is obtained. If [OK] is pressed here, the tree structure diagram 11 is displayed.

図８のもう一つの失敗の箇所Ｓｎ＃３の主文節「下がり、」にカーソルを置き、カーソルを置き、［係り受け］を押下すると合致する規則集合の一部が表示される。図１７のＯＲ＃１、ＯＲ＃２、ＯＲ＃３である。ＯＲ＃１は現在選択された規則であるが、正しい係り受けは「崩れる、」に係るので、それに合致する規則ＯＲ＃３を選択するため［係り受け］を２回押下する。その結果として「下がり、→Ｓｓ４崩れる［連用中止］」が得られる。ここで［確定］を押下すると図１３にこれまで解析した木構造が表示される。これで、２つの係り受けの失敗が解消され、正しい係り受けが得られた。この結果は、出力バッファー１２に格納される。 When the cursor is placed on the main phrase “falling down” of another failure location Sn # 3 in FIG. 8, the cursor is placed, and [Dependency] is pressed, a part of the matching rule set is displayed. These are OR # 1, OR # 2, and OR # 3 in FIG. OR # 1 is the currently selected rule, but the correct dependency is related to “disrupted”, so [Dependency] is pressed twice to select the rule OR # 3 that matches it. As a result, “fall down, and Ss4 collapse [continuous use stop]” is obtained. If [OK] is pressed here, the tree structure analyzed so far is displayed in FIG. This eliminates the two dependency failures and gives the correct dependency. This result is stored in the output buffer 12.

上記の例題では出現しなかった部分内の係り受け解析で失敗した場合の訂正の具体例を示す。図１４は部分内の係り受け解析が間違った例であり、４１を訂正するには２つのやり方がある。１つは、［次係先候補］を押下して、同じ規則に合致する次の文節を見つけることで、この場合、「経済の」は、名詞「一刻」を修飾しているが、［次係先候補］の押下により、次の名詞「回復」に係るようになる。或は強制的に［部分内前進］や［部分内後退］によって正しい受け側の文節に到達することができる。もう一つは、規則の選択を変えることで、この場合は、［係り受け］を押下すると、この係り側の文節に合致する規則集合の一部ＩＲ＃２、ＩＲ＃３が４２に示されているので、ＩＲ＃３を選択することで同じように「回復」に係るようになる。何れの方法でも、［確定］を押下すると、４３のように正しくなる。 A specific example of correction in the case where the dependency analysis in the part that did not appear in the above example fails will be shown. FIG. 14 shows an example in which the dependency analysis in the part is wrong, and there are two ways to correct 41. One is to press [next candidate candidate] and find the next clause that matches the same rule. In this case, “economic” modifies the noun “Iki”, but [next Pressing “Candidates” will be related to the next noun “recovery”. Alternatively, the correct receiving phrase can be reached by [intra-part advance] or [in-part reverse]. The other is to change the selection of the rule. In this case, when [Dependency] is pressed, a part of the rule set IR # 2 and IR # 3 matching the clause on the side of the dependency is shown at 42. Therefore, by selecting IR # 3, “recovery” is related in the same way. In any method, when [OK] is pressed, it becomes correct as shown in 43.

もう一つの失敗例は、部分の条件に反する分割が生じたときであり、図１５にその例を示す。部分の例「ここで述べた安全機能は情報システムの信頼性の評価においても、」４４を解析すると、４５は４６を修飾しているが、４５は外部の部分を修飾するもので、部分内の解析は閉じているという条件に違反するため、文法的な間違いとなり、この文を編集する［編集モード］を押下することで、係り受けを中断して、文の編集を行う。 Another example of failure is when a division occurs that violates the condition of the part, and FIG. 15 shows an example thereof. An example of the part "The safety function described here is also in the evaluation of the reliability of the information system" When analyzing 44, 45 modifies 46, but 45 modifies the external part. Since the analysis of violates the condition that it is closed, it becomes a grammatical error, and by pressing [Edit Mode] to edit this sentence, the dependency is interrupted and the sentence is edited.

請求項２および請求項３の実施例について説明する。 Embodiments of claims 2 and 3 will be described.

係り受けの解析処理として部分内と部分間の２つの係り受け解析をそれぞれ図８および図９を用いて規則を適用するが、どちらの規則集合も係り受けに対する係り側の文節に合致する規則を適用するということで類型的である。部分内および部分間の係り受け規則を多くの例文により蓄積して行くことで一つの文節に合致する係り受けの規則数が増えることになるが、係り受けの規則の提示順序に優先度を適用することでこの問題点を回避する。例えば、最近使用した規則を最優先で提示する、過去の係り受け解析および特定分野のテキストの係り受け解析により蓄積した頻度データや統計データを優先させる、ユーザ毎の履歴を利用するとかである。優先度の選択は係り受け解析の対象分野や語彙・文法カテゴリに依存するので、係り受け規則管理部１９で種々の優先度を規則毎に管理し、利用者に優先度を指定できるようにしておくことで規則の提示順序を学習可能なようにし、目視作業による係り受け訂正作業の負荷を逓減させる。 As the dependency analysis processing, rules are applied to the two dependency analyzes within and between the portions using FIG. 8 and FIG. 9, respectively. Both rule sets have rules that match the clauses on the dependency side of the dependency. It is typological in applying. By accumulating dependency rules within and between parts using many example sentences, the number of dependency rules that match one phrase increases, but priority is applied to the order in which dependency rules are presented. To avoid this problem. For example, a history for each user that gives priority to frequency data and statistical data accumulated by past dependency analysis and dependency analysis of texts in a specific field, which presents the most recently used rule, is used. Since the selection of the priority depends on the subject field of dependency analysis and the vocabulary / grammar category, the dependency rule management unit 19 manages various priorities for each rule so that the user can specify the priority. This makes it possible to learn the order in which rules are presented, and reduces the burden of dependency correction work by visual work.

なお、以上の実施例１、２においては日本語文を例にして述べたが、これは必ずしも日本語に限るものではなく、係り受け解析が行える自然言語であれば共通に適用できるものである。 In the first and second embodiments, Japanese sentences have been described as examples. However, this is not necessarily limited to Japanese, and any natural language that can perform dependency analysis can be applied in common.

本発明は長文における高精度の解析結果が保証される自然言語処理システムを提供できるため、機械翻訳および自然言語理解システム等の広範囲な自然言語処理応用の実用化につながる。また、文を短い単位に分割して解析するため、文法的な間違いが見つけやすくなり文章校正支援にも利用することが出来る。 Since the present invention can provide a natural language processing system that guarantees highly accurate analysis results in long sentences, it leads to the practical application of a wide range of natural language processing applications such as machine translation and natural language understanding systems. Also, because sentences are analyzed by dividing them into short units, grammatical errors can be easily found and can be used for supporting proofreading.

１中央処理装置と記憶装置を含む計算機
２表示装置
３キーボード
４ポインティングデバイス
１０記憶装置の主たる構成要素
１１入力バッファ
１２出力バッファ
１３文の部分への分割部
１４形態素解析・構文解析部
１５部分内係り受け解析部
１６部分内係り受け解析規則集合
１７部分間係り受け解析部
１８部分間係り受け解析規則集合
１９係り受け解析規則管理部
２０部分内係り受け解析の説明用見出し
２１部分関係り受け解析の説明用見出し
２２Ｓｎ＃１部分内係り受け解析
２５Ｓｎ＃１部分間係り受け解析
３１部分に分割するステップ
３２部分内の形態素解析・構文解析するステップ
３３部分内の解析結果の確認および必要なら訂正するステップ
３４部分間の係り受け解析および解析結果の確認および必要なら訂正するステップ
４０係り受け訂正説明の入力例文
４１係り受け訂正説明の入力例文の解析結果
４２部分間係り受け規則の選択画面
４３選択規則で係り受け解析結果の画面
４４非文となる例文
５０部分内係り受け解析規則の係り側文節の記述カラム
５１部分内係り受け解析規則の受け側文節の記述カラム
５２部分内係り受け解析規則の係り受け関係カラム
５３部分間係り受け解析規則の係り側文節の記述カラム
５４部分間係り受け解析規則の受け側文節の記述カラム
５５部分間係り受け解析規則の係り受け関係カラムDESCRIPTION OF SYMBOLS 1 Computer including central processing unit and storage device 2 Display device 3 Keyboard 4 Pointing device 10 Main components of storage device 11 Input buffer 12 Output buffer 13 Division into sentence parts 14 Morphological analysis / syntax analysis part 15 Partial relation Receiving analysis unit 16 Intra-part dependency analysis rule set 17 In-part dependency analysis unit 18 In-part dependency analysis rule set 19 Dependency analysis rule management unit 20 Headline for explanation of in-part dependency analysis 21 Partial dependency analysis Description headline 22 Sn # 1 partial dependency analysis 25 Sn # partial dependency analysis 31 Step to divide into 31 parts 32 Step to perform morphological analysis / syntactic analysis in part 33 Confirmation of analysis result in part and correction if necessary Step 34 Dependency analysis between parts, confirmation of analysis results and necessary 40 Dependent correction explanation input example sentence 41 Dependency correction explanation input example sentence analysis result 42 Dependency dependency rule selection screen 43 Dependency dependency analysis result screen 44 selection sentence non-sentence sentence example 50 part Dependent clause description column of internal dependency analysis rule 51 Dependent clause description column of partial dependency analysis rule 52 Dependency relationship column of partial dependency analysis rule 53 Interdependent dependency analysis rule dependency column Description column 54 Description column of the receiving clause of the dependency analysis rule for 55 copies 55 Dependency relationship column of the dependency analysis rule for 55 copies

図３のステップ３１〜３４を図５〜図１３を参照しながら例文によって係り受け解析支援装置の画面を例示しながら詳細に示す。先ず、部分に分割３１では、［文分割］を押下して入力バッファ１１の文章を句読点により部分に分割する。図６に図５の入力例文の分割結果の画面を示す。次に部分内の形態素解析・構文解析３２では、［部分内解析］を押下することで前記分割された部分に対して形態素解析・構文解析部１４を実行する。ここで［確定］を押下すると図７に示す部分内解析結果の構文木が表示される。この表示方法はこれに限定したものではない。部分内の解析結果の確認および必要なら訂正３３では、先ず、前記部分内解析結果を目視で確認する。この例文の部分は全て正しく解析が行われているので次に進む。部分間の解析結果の確認および必要なら訂正３４では、先ず［部分間解析］を押下すると部分間係り受け解析部１７が起動し、部分間係り受け解析規則集合１８を用いて部分間の係り受け解析が実行される。ここで［確定］を押下すると図８に示す文の解析結果の構文木が表示される。当該解析結果の表示を目視で確認し、Ｓｎ＃２、Ｓｎ＃３の部分間の係り受けで失敗が見つかる。以下、係り受けの失敗の訂正を示す。Ｓｎ＃２の主文節「後で、」にカーソルを置き、［係り受け］を押下すると合致する規則集合の一部が、図９に示されている。下線部分ＯＲ＃８は現在選択された規則であるが、この正解は「下がり、」に係るので、それに合致する図９の規則ＯＲ＃９に進めるため［係り受け］を押下する。その結果として図１１に示す「後、→Ｓｎ＃３下がり、［時間Ａｆｔｅｒ］」が得られる。ここで［確定］を押下すると木構造図１２が表示される。 Steps 31 to 34 in FIG. 3 will be described in detail with reference to FIG. 5 to FIG. First, in the division into parts 31, [Sentence Division] is pressed to divide the text in the input buffer 11 into parts by punctuation marks. FIG. 6 shows a screen of the division result of the input example sentence of FIG. Next, in the morphological analysis / syntactic analysis 32 in the part, the morpheme analysis / syntax analysis unit 14 is executed on the divided part by pressing [Internal analysis]. If [OK] is pressed here, the syntax tree of the partial analysis result shown in FIG. 7 is displayed. This display method is not limited to this. In confirmation of the analysis result in the portion and correction 33 if necessary, first, the analysis result in the portion is visually confirmed. Since all parts of this example sentence have been correctly analyzed, the process proceeds. In confirmation of the analysis result between the parts and correction 34 if necessary, first, when [Partial analysis] is pressed, the part-part dependency analysis unit 17 is activated, and the part-part dependency analysis rule set 18 is used to determine the dependency between parts. Analysis is performed. When [OK] is pressed here, the syntax tree of the sentence analysis result shown in FIG. 8 is displayed. The display of the analysis result is visually confirmed, and a failure is found by the dependency between the Sn # 2 and Sn # 3 portions. The correction of dependency failure is shown below. FIG. 9 shows a part of a rule set that matches when the cursor is placed on the main phrase “Later,” of Sn # 2 and [Dependency] is pressed. The underlined portion OR # 8 is the currently selected rule, but since this correct answer is related to “decline,” press [Dependency] to advance to the rule OR # 9 of FIG. As a result, “After, → Sn # 3 drop, [Time After]” shown in FIG. 11 is obtained. If [OK] is pressed here, a tree structure diagram 12 is displayed.

図８のもう一つの失敗の箇所Ｓｎ＃３の主文節「下がり、」にカーソルを置き、［係り受け］を押下すると合致する規則集合の一部が表示される。図１７のＯＲ＃１、ＯＲ＃２、ＯＲ＃３である。ＯＲ＃１は現在選択された規則であるが、正しい係り受けは「崩れる」に係るので、それに合致する規則ＯＲ＃３を選択するため［係り受け］を２回押下する。その結果として「下がり、→Ｓｎ＃４崩れる［連用中止］」が得られる。ここで［確定］を押下すると図１３にこれまで解析した木構造が表示される。これで、２つの係り受けの失敗が解消され、正しい係り受けが得られた。この結果は、出力バッファ１２に格納される。 When the cursor is placed on the main phrase “falling down” of another failure location Sn # 3 in FIG. 8 and [Dependency] is pressed, a part of the matching rule set is displayed. These are OR # 1, OR # 2, and OR # 3 in FIG. OR # 1 is the currently selected rule, but the correct dependency is related to “disrupt”, so [Dependency] is pressed twice to select the rule OR # 3 that matches it. As a result, “falling down and then Sn # 4 collapse [continuous use stop]” is obtained. If [OK] is pressed here, the tree structure analyzed so far is displayed in FIG. This eliminates the two dependency failures and gives the correct dependency. This result is stored in the output buffer 12.

係り受けの解析処理として部分内と部分間の２つの係り受け解析をそれぞれ図１６および図１７に示す規則を用いて適用するが、どちらの規則集合も係り受けに対する係り側の文節に合致する規則を適用するということで類型的である。部分内および部分間の係り受け規則を多くの例文により蓄積して行くことで一つの文節に合致する係り受けの規則数が増えることになるが、係り受けの規則の提示順序に優先度を適用することでこの問題点を回避する。例えば、最近使用した規則を最優先で提示する、過去の係り受け解析および特定分野のテキストの係り受け解析により蓄積した頻度データや統計データを優先させる、ユーザ毎の履歴を利用するとかである。優先度の選択は係り受け解析の対象分野や語彙・文法カテゴリに依存するので、係り受け規則管理部１９で種々の優先度を規則毎に管理し、利用者に優先度を指定できるようにしておくことで規則の提示順序を学習可能なようにし、目視作業による係り受け訂正作業の負荷を逓減させる。 As the dependency analysis processing, two dependency analyzes within and between the portions are applied using the rules shown in FIGS. 16 and 17, respectively. Both rule sets match the clauses on the dependency side for the dependency. It is typified by applying. By accumulating dependency rules within and between parts using many example sentences, the number of dependency rules that match one phrase increases, but priority is applied to the order in which dependency rules are presented. To avoid this problem. For example, a history for each user that gives priority to frequency data and statistical data accumulated by past dependency analysis and dependency analysis of texts in a specific field, which presents the most recently used rule, is used. Since the selection of the priority depends on the subject field of dependency analysis and the vocabulary / grammar category, the dependency rule management unit 19 manages various priorities for each rule so that the user can specify the priority. This makes it possible to learn the order in which rules are presented, and reduces the burden of dependency correction work by visual work.

Claims

Divide the sentence with punctuation marks, perform morphological analysis and dependency analysis on the divided parts, visually check the result of dependency analysis of each part, and if there is a clause with incorrect dependency analysis, The user decides whether to correct the sentence or to edit the sentence. If the sentence is edited, the dependency analysis is interrupted, the editing mode for editing the sentence is entered, and if the dependency is corrected Search for the next dependency candidate that matches the same rule that you applied, or let the user select another dependency rule from a set of prepared rules, and the dependency is still wrong Repeat the selection of another dependency rule, or specify the receiver clause forcibly at any stage during the selection, determine the correct dependency, and correct the dependency analysis of each part. End If the user determines that the dependency analysis has been performed, the dependency analysis between the divided parts is performed using a prepared rule set, and the dependency analysis result between the parts is visually confirmed, and the dependency analysis is incorrect. If there is a clause, the user decides whether to edit the dependency or the sentence, and if editing a sentence, interrupt dependency analysis and enter edit mode to edit the sentence. When correcting a receipt, search for the next dependency candidate that matches the same rule that has been applied recently, or present another dependency rule to the user from a set of pre-created rules. If the dependency is wrong, repeat the selection of another dependency rule, or specify the receiver clause forcibly at any stage during the selection, and determine the correct dependency. of Minutes of dependency analysis dependency analysis support apparatus, characterized in that the support to be correct.

2. The dependency analysis support apparatus according to claim 1, wherein in selecting a dependency rule for the divided portion, a rule is presented in consideration of a priority.

2. The dependency analysis support apparatus according to claim 1, wherein in selecting a dependency rule for the divided parts, a rule is presented in consideration of a priority.