JP2018198002A

JP2018198002A - Document processing device, document processing method and program

Info

Publication number: JP2018198002A
Application number: JP2017102967A
Authority: JP
Inventors: 森田　明宏; Akihiro Morita; 明宏森田; 田中　宏征; Hiromasa Tanaka; 宏征田中; 松永　務; Tsutomu Matsunaga; 務松永
Original assignee: NTT Data Corp
Current assignee: NTT Data Group Corp
Priority date: 2017-05-24
Filing date: 2017-05-24
Publication date: 2018-12-13
Anticipated expiration: 2037-05-24
Also published as: JP6813432B2

Abstract

To extract a rule from a rule description document with accuracy.SOLUTION: A dependency analysis unit of a document processing device extracts a dependency structure of clauses constituting a rule description document and a dependency structure of clauses constituting a comparison object document. A likelihood ratio calculation unit calculates, for each of destination clauses constituting the rule description document, a likelihood ratio indicating the degree that the destination clauses appear more in the rule description document than the comparison object document based on the dependency structure of the clauses extracted from the rule description document and the dependency structure of the clauses extracted from the comparison object document by the dependency analysis unit. A destination clause extraction unit extracts, among the destination clauses constituting the rule description document, a predetermined number of the destination clauses of which a value of the likelihood ratio calculated by the likelihood calculation unit is relatively large or the destination clauses of which a value of the likelihood ratio calculated by the likelihood ratio calculation unit is larger than the threshold.SELECTED DRAWING: Figure 1

Description

本発明は、一組の文書のうち一方の文書に偏って出現する文節を抽出するための装置、方法及びプログラムに関する。 The present invention relates to an apparatus, a method, and a program for extracting a phrase that appears biased in one document of a set of documents.

従来、自然言語処理の分野において、一組の文書のうち一方の文書に偏って出現する単語を抽出するための方法が知られている。例えば、非特許文献１には、専門用語のうち専ら一般人に対して用いられる専門用語を抽出するための方法であって、一般人向けコーパスと専門家向けコーパスにおける専門用語の使用上の偏りを数値化して、一般人向けコーパスに偏って使用されている専門用語を一般向け専門用語として抽出する方法が記載されている。 Conventionally, in the field of natural language processing, a method for extracting a word that appears biased in one document of a set of documents is known. For example, Non-Patent Document 1 is a method for extracting technical terms that are exclusively used for general people among technical terms, and numerically indicates the bias in the use of technical terms in a general corpus and a professional corpus. And a method for extracting technical terms that are biased in the public corpus and extracting them as general technical terms.

金愛蘭、桐生りか、近藤明日子、田中牧郎、「『一般向け専門用語』抽出の試み−医療用語を例に−」、日本語学会2008年度春季大会予稿集、日本語学会、2008年5月、pp.199-206Kim Ailan, Rika Kiryu, Asako Kondo, Makuro Tanaka, “Attempts to Extract“ Terminology for the General Public: Medical Terminology ””, Proceedings of the 2008 Spring Conference of the Japanese Society, Japanese Association for Japanese Studies, 2008 5 Moon, pp.199-206

従来一般に、審査業務で使用される、審査の観点が記載された審査観点記載文書の内容を迅速に理解することは困難であった。審査観点記載文書の迅速な理解を助けるために、上記の従来技術を利用して、当該文書のうち審査の観点が記載された部分のみを抽出してリスト化することが考えられるが、従来技術を利用した方法では精度よく抽出することができなかった。 Conventionally, it has been difficult to quickly understand the contents of a screening viewpoint description document that describes the viewpoint of screening used in screening work. In order to assist the quick understanding of the examination viewpoint document, it may be possible to extract and list only the portion of the document that describes the examination viewpoint, using the conventional technology described above. It was not possible to extract accurately with the method using.

本発明は、このような事情に鑑みてなされたものであり、規則記載文書から規則を精度よく抽出することを目的とする。 The present invention has been made in view of such circumstances, and an object thereof is to extract a rule from a rule description document with high accuracy.

上記の課題を解決するため、本発明に係る文書処理装置は、規則記載文書を構成する文節の係り受け構造と、比較対象文書を構成する文節の係り受け構造を抽出する係り受け解析部と、前記規則記載文書を構成する係り先文節の各々について、前記係り受け解析部により前記規則記載文書から抽出された文節の係り受け構造と前記比較対象文書から抽出された文節の係り受け構造とに基づいて、係り先文節が前記比較対象文書よりも前記規則記載文書に偏って出現する度合いを示す尤度比を算出する尤度比算出部と、前記規則記載文書を構成する係り先文節のうち、前記尤度比算出部により算出された尤度比の値が相対的に大きい所定数の係り先文節又は前記尤度比算出部により算出された尤度比の値が閾値よりも大きい係り先文節を抽出する係り先文節抽出部とを備える。 In order to solve the above problems, a document processing apparatus according to the present invention includes a dependency structure of a clause constituting a rule description document, a dependency analysis unit for extracting a dependency structure of a clause constituting a comparison target document, For each of the dependency clauses constituting the rule description document, based on the dependency structure of the clause extracted from the rule description document by the dependency analysis unit and the dependency structure of the clause extracted from the comparison target document A likelihood ratio calculating unit that calculates a likelihood ratio indicating a degree of appearance of the dependency clause in the rule description document rather than the comparison target document, and the dependency clause constituting the rule description document, A predetermined number of dependency clauses having a relatively large likelihood ratio value calculated by the likelihood ratio calculation unit or a dependency clause having a likelihood ratio value calculated by the likelihood ratio calculation unit larger than a threshold value Extract And a dependency destination clause extractor.

好ましい態様において、上記の文書処理装置は、前記係り先文節抽出部により抽出された係り先文節の各々について、前記係り受け解析部により前記規則記載文書から抽出された文節の係り受け構造に基づいて、係り元文節を抽出する係り元文節抽出部をさらに備える。 In a preferred aspect, the document processing apparatus is configured to, for each of the dependency clauses extracted by the dependency clause extraction unit, based on the dependency structure of the clause extracted from the rule description document by the dependency analysis unit. Further, a relation source phrase extraction unit for extracting the relation source phrase is further provided.

さらに好ましい態様において、上記の文書処理装置は、前記係り先文節抽出部により抽出された係り先文節の各々について前記係り元文節抽出部により抽出された係り元文節を、係り元文節に含まれる単語の意味に基づいて分類する係り元文節分類部をさらに備える。 In a further preferred aspect, the document processing device described above includes a source phrase extracted by the source phrase extracting unit for each of the destination phrases extracted by the destination phrase extracting unit, and a word included in the source phrase. And a related phrase classification unit for classifying based on the meaning of

さらに好ましい態様において、前記係り元文節抽出部は、条件を表す係り元文節を抽出する。 In a more preferred aspect, the dependency source phrase extraction unit extracts a relationship source phrase representing a condition.

さらに好ましい態様において、上記の文書処理装置は、前記規則記載文書の構造と前記比較対象文書の構造を解析して、前記規則記載文書を構成する一の文章に対応する、前記比較対象文書を構成する他の文章を特定する文書構造解析部をさらに備え、前記尤度比算出部は、前記一の文章を構成する係り先文節の各々について、係り先文節が前記他の文章よりも前記一の文章に偏って出現する度合いを示す尤度比を算出し、前記係り先文節抽出部は、前記一の文章を構成する係り先文節のうち、前記尤度比算出部により算出された尤度比の値が相対的に大きい所定数の係り先文節又は前記尤度比算出部により算出された尤度比の値が閾値よりも大きい係り先文節を抽出する。 In a further preferred aspect, the document processing apparatus analyzes the structure of the rule description document and the structure of the comparison target document, and configures the comparison target document corresponding to one sentence constituting the rule description document. A document structure analysis unit that identifies other sentences to be processed, and the likelihood ratio calculation unit includes, for each of the dependency phrases constituting the one sentence, the dependency phrase is more than the other sentence. A likelihood ratio indicating the degree of appearance in a sentence is calculated, and the dependency clause extraction unit calculates the likelihood ratio calculated by the likelihood ratio calculation unit among the dependency clauses constituting the one sentence. A predetermined number of dependency clauses having a relatively large value or a dependency clause whose likelihood ratio value calculated by the likelihood ratio calculation unit is larger than a threshold value is extracted.

また、本発明に係る文書処理方法は、文書処理装置により実行される文書処理方法であって、規則記載文書を構成する文節の係り受け構造と、比較対象文書を構成する文節の係り受け構造を抽出する係り受け解析ステップと、前記規則記載文書を構成する係り先文節の各々について、前記係り受け解析ステップにより前記規則記載文書から抽出された文節の係り受け構造と前記比較対象文書から抽出された文節の係り受け構造とに基づいて、係り先文節が前記比較対象文書よりも前記規則記載文書に偏って出現する度合いを示す尤度比を算出する尤度比算出ステップと、前記規則記載文書を構成する係り先文節のうち、前記尤度比算出ステップにより算出された尤度比の値が相対的に大きい所定数の係り先文節又は前記尤度比算出ステップにより算出された尤度比の値が閾値よりも大きい係り先文節を抽出する係り先文節抽出ステップとを備える。 The document processing method according to the present invention is a document processing method executed by a document processing apparatus, and includes a dependency structure of clauses constituting a rule description document and a dependency structure of clauses constituting a comparison target document. For each of the dependency analysis step to extract and the dependency clauses constituting the rule description document, the dependency structure extracted from the rule description document and the comparison target document extracted by the dependency analysis step A likelihood ratio calculating step for calculating a likelihood ratio indicating a degree of occurrence of the dependency clause in the rule description document rather than the comparison target document based on the dependency structure of the clause; and the rule description document Among the dependency clauses to be configured, a predetermined number of dependency clauses having a relatively large likelihood ratio value calculated by the likelihood ratio calculation step or the likelihood ratio calculation step The value of the calculated likelihood ratio Ri comprises a dependency destination clause extracting a large dependency destination clause than the threshold value.

また、本発明に係るプログラムは、コンピュータを、規則記載文書を構成する文節の係り受け構造と、比較対象文書を構成する文節の係り受け構造を抽出する係り受け解析部と、前記規則記載文書を構成する係り先文節の各々について、前記係り受け解析部により前記規則記載文書から抽出された文節の係り受け構造と前記比較対象文書から抽出された文節の係り受け構造とに基づいて、係り先文節が前記比較対象文書よりも前記規則記載文書に偏って出現する度合いを示す尤度比を算出する尤度比算出部と、前記規則記載文書を構成する係り先文節のうち、前記尤度比算出部により算出された尤度比の値が相対的に大きい所定数の係り先文節又は前記尤度比算出部により算出された尤度比の値が閾値よりも大きい係り先文節を抽出する係り先文節抽出部として機能させる。 Further, the program according to the present invention includes a computer, a dependency structure of clauses constituting a rule description document, a dependency analysis unit for extracting a dependency structure of clauses constituting a comparison target document, and the rule description document. For each of the dependency clauses constituting the dependency clause, based on the dependency structure of the clause extracted from the rule description document by the dependency analysis unit and the dependency structure of the clause extracted from the comparison target document, Is a likelihood ratio calculation unit that calculates a likelihood ratio indicating a degree of appearance of the rule-description document rather than the comparison target document, and the likelihood ratio calculation among the related clauses constituting the rule-description document A predetermined number of dependency clauses having a relatively large likelihood ratio value calculated by the unit or a dependency clause having a likelihood ratio value calculated by the likelihood ratio calculation unit larger than a threshold value To function as a clause extractor.

本発明によれば、規則記載文書から規則を精度よく抽出することができる。 According to the present invention, it is possible to accurately extract a rule from a rule description document.

文書処理装置１の構成の一例を示す図である。2 is a diagram illustrating an example of a configuration of a document processing apparatus 1. FIG. 審査観点抽出処理の一例を示すフロー図である。It is a flowchart which shows an example of an examination viewpoint extraction process. 文書記憶部１０３に記憶されるテーブルの一例を示す図である。4 is a diagram illustrating an example of a table stored in a document storage unit 103. FIG. 係り受け解析部１０４による解析の結果の一例を示す図である。It is a figure which shows an example of the result of the analysis by the dependency analysis part 104. 係り受け構造記憶部１０５に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table memorize | stored in the dependency structure storage part. 審査観点記載文書受け節記憶部１０７に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table memorize | stored in the examination viewpoint description document receiving part memory | storage part. 比較対象文書受け節記憶部１０８に記憶されるテーブルの一例を示す図である。6 is a diagram illustrating an example of a table stored in a comparison target document receiver storage unit 108. FIG. 受け節リスト記憶部１１１に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table memorize | stored in the receiver list memory | storage part 111. FIG. 係り節リスト記憶部１１３に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table memorize | stored in the related clause list memory | storage part 113. FIG. 審査ロジックリスト記憶部１１６に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table memorize | stored in the examination logic list memory | storage part.

１．実施形態
１−１．構成
図１は、本発明の一実施形態に係る文書処理装置１の構成の一例を示す図である。文書処理装置１は、ＣＰＵ等の演算処理装置と、ＨＤＤ等の記憶装置と、ＮＩＣ等の通信装置を備え、審査観点記載文書から審査の観点を抽出するためのコンピュータである。文書処理装置１は、図１に示すように、文書入力部１０１と、文書構造解析部１０２と、文書記憶部１０３と、係り受け解析部１０４と、係り受け構造記憶部１０５と、受け節抽出部１０６と、審査観点記載文書受け節記憶部１０７と、比較対象文書受け節記憶部１０８と、ＬＬＲ（Log-Likelihood Ratio）算出部１０９と、受け節リスト作成部１１０と、受け節リスト記憶部１１１と、係り節リスト作成部１１２と、係り節リスト記憶部１１３と、係り節分類部１１４と、審査ロジックリスト作成部１１５と、審査ロジックリスト記憶部１１６と、出力部１１７という機能を備える。これらの機能のうち、文書記憶部１０３と、係り受け構造記憶部１０５と、審査観点記載文書受け節記憶部１０７と、比較対象文書受け節記憶部１０８と、受け節リスト記憶部１１１と、係り節リスト記憶部１１３と、審査ロジックリスト記憶部１１６は、記憶装置により実現され、その他の機能は、記憶装置に記憶されるプログラムを演算処理装置が実行することにより実現される。なお、文書処理装置１は、通信回線により相互に接続される複数のサーバ装置により構成されてもよい。 1. Embodiment 1-1. Configuration FIG. 1 is a diagram showing an example of a configuration of a document processing apparatus 1 according to an embodiment of the present invention. The document processing device 1 is a computer that includes an arithmetic processing device such as a CPU, a storage device such as an HDD, and a communication device such as a NIC, and extracts a viewpoint of examination from a document described in an examination viewpoint. As shown in FIG. 1, the document processing apparatus 1 includes a document input unit 101, a document structure analysis unit 102, a document storage unit 103, a dependency analysis unit 104, a dependency structure storage unit 105, and a clause extraction. Unit 106, examination viewpoint description document receiver storage unit 107, comparison target document receiver storage unit 108, LLR (Log-Likelihood Ratio) calculation unit 109, receiver list creation unit 110, and receiver list storage unit 111, a clause list creation unit 112, a clause list storage unit 113, a clause classification unit 114, a screening logic list creation unit 115, a screening logic list storage unit 116, and an output unit 117. Among these functions, the document storage unit 103, the dependency structure storage unit 105, the examination viewpoint description document receiver storage unit 107, the comparison target document receiver storage unit 108, and the receiver list storage unit 111, The clause list storage unit 113 and the examination logic list storage unit 116 are realized by a storage device, and other functions are realized by the arithmetic processing device executing a program stored in the storage device. The document processing apparatus 1 may be composed of a plurality of server apparatuses connected to each other via a communication line.

文書処理装置１が備える機能のうち、文書入力部１０１は、審査観点記載文書データと比較対象文書データを、記憶装置又は通信装置から取得する。ここで、審査観点記載文書とは、審査の観点が記載された文書である。言い換えると、手順書又は規則書である。審査観点記載文書は主に、条件節と主節からなる文（例えば、「Ａの場合はＢする。」）の集合により構成される。審査観点記載文書の具体例としては、留意事項通知がある。ここで留意事項通知とは、告示された内容の実施上の留意事項が記載された文書である。留意事項通知の例としては、診療報酬の算定方法の一部改正に伴う実施上の留意事項や、特定保険医療材料の材料価格算定に関する留意事項や、特定教育・保育等に要する費用の額の算定基準の改正に伴う実施上の留意事項や、指定障害福祉サービス等に要する費用の額の算定基準の改正に伴う実施上の留意事項がある。次に、比較対象文書とは、審査観点記載文書と比較される文書である。比較対象文書は、審査観点記載文書に記載された内容のうち審査の観点のみを抽出するために参照される文書であるため、審査観点記載文書と記載内容の分野が重複し、作成者が同一であることが好ましい。また、審査観点記載文書と対応する文書構造を有することが好ましい。具体的には、比較対象文書と審査観点記載文書の間で、互いに関連する内容が同じ順序で記載されていることが好ましい。比較対象文書の具体例としては、告示がある。審査観点記載文書として、診療報酬の算定方法の一部改正に伴う実施上の留意事項が入力される場合には、比較対象文書として、診療報酬の算定方法の一部を改正する件の告示が入力される。審査観点記載文書として、特定保険医療材料の材料価格算定に関する留意事項が入力される場合には、比較対象文書として、特定保険医療材料の材料価格の一部を改正する件の告示が入力される。審査観点記載文書として、特定教育・保育等に要する費用の額の算定基準の改正に伴う実施上の留意事項が入力される場合には、比較対象文書として、特定教育・保育等に要する費用の額の算定基準の一部を改正する件の告示が入力される。審査観点記載文書として、指定障害福祉サービス等に要する費用の額の算定基準の改正に伴う実施上の留意事項が入力される場合には、比較対象文書として、指定障害福祉サービス等に要する費用の額の算定基準の一部を改正する件の告示が入力される。 Among the functions of the document processing apparatus 1, the document input unit 101 acquires examination viewpoint description document data and comparison target document data from a storage device or a communication device. Here, the examination viewpoint description document is a document in which the examination viewpoint is described. In other words, it is a procedure manual or a rule book. The examination viewpoint description document is mainly composed of a set of conditional clauses and sentences (for example, “B in case of A”). As a specific example of the examination viewpoint document, notice of notice is given. Here, the notice of notice is a document in which notice of implementation of the notified contents is described. Examples of notices include notices on implementation due to a partial revision of the calculation method of medical fees, notes on material price calculation for specified insurance medical materials, and the amount of expenses required for specific education and childcare. There are implementation considerations accompanying the revision of the calculation standards and implementation considerations accompanying the revision of the calculation standards for the amount of expenses required for designated welfare services for the disabled. Next, the comparison target document is a document to be compared with the examination viewpoint description document. The document to be compared is a document that is referred to in order to extract only the viewpoint of examination out of the contents described in the examination viewpoint description document, so the field of description contents overlaps with the examination viewpoint description document, and the creator is the same It is preferable that Further, it preferably has a document structure corresponding to the examination viewpoint description document. Specifically, it is preferable that the contents related to each other are described in the same order between the comparison target document and the examination viewpoint description document. A specific example of the comparison target document is notification. When the points to be noted in implementation due to the partial revision of the calculation method of medical fees are entered as a document that describes the examination viewpoint, a notification to revise a part of the calculation method of medical fees will be issued as a comparison target document. Entered. When considerations related to the calculation of material prices for specified insurance medical materials are entered as a document for the viewpoint of examination, a notification regarding the revision of a part of the material prices for specified insurance medical materials is entered as a comparison target document . When the notes for implementation accompanying the revision of the calculation standard for the amount of expenses required for specific education / childcare, etc. are entered as a document for the viewpoint of examination, the cost of expenses required for specific education / childcare, etc. will be used as a comparison target document. Notification of the revision of a part of the amount calculation standard is entered. When the notes for implementation accompanying the revision of the calculation standard for the amount of expenses required for the designated disability welfare service, etc. are entered as the document for the examination viewpoint, Notification of the revision of a part of the amount calculation standard is entered.

文書構造解析部１０２は、文書入力部１０１により取得された審査観点記載文書データにより表される審査観点記載文書と比較対象文書データにより表される比較対象文書の各々について、文書構造を解析して、審査観点記載文書を構成する各文章に対応する、比較対象文書を構成する文章を特定する。その際、文書構造解析部１０２は、各文章の見出しに振られている番号や、文書中の空行を手掛かりにして、文章の対応関係を特定する。ここで見出しとは、編（部）、章、節、段落等の区分の見出しである。文書構造解析部１０２は、審査観点記載文書を構成する文章について比較対象文書において対応する文章を特定すると、対応する文章の組と、それらの文章が属する区分の識別情報とを対応付けて文書記憶部１０３に記憶する。なお、文書構造解析部１０２は、審査観点記載文書と比較対象文書の間で文章単位での対応関係を特定できない場合には、審査観点記載文書全体と比較対象文書全体を、対応する文章の組として文書記憶部１０３に記憶する。 The document structure analysis unit 102 analyzes the document structure for each of the examination viewpoint description document represented by the examination viewpoint description document data acquired by the document input unit 101 and the comparison target document represented by the comparison target document data. The sentence constituting the comparison target document corresponding to each sentence constituting the examination viewpoint description document is specified. At that time, the document structure analysis unit 102 specifies the correspondence between sentences by using the numbers assigned to the headings of the sentences and blank lines in the document as clues. Here, the heading is a heading of a division such as a chapter (part), a chapter, a section, or a paragraph. When the document structure analysis unit 102 identifies corresponding sentences in the comparison target document for the sentences constituting the examination viewpoint description document, the document structure analysis unit 102 associates the corresponding sentence sets with the identification information of the classification to which the sentences belong, and stores the document. Store in the unit 103. If the document structure analysis unit 102 cannot specify the correspondence between the examination viewpoint description document and the comparison target document in units of sentences, the document structure analysis unit 102 converts the entire examination viewpoint description document and the entire comparison target document into a set of corresponding sentences. Is stored in the document storage unit 103.

係り受け解析部１０４は、文書記憶部１０３に記憶されている各区分に対応する審査観点記載文書の文章と比較対象文書の文章を文単位に分割し、係り受け解析を行って、文章を構成する文節の係り受け構造（言い換えると係り受け木）を抽出する。その際、係り受け解析には、例えば、CaboCha（https://taku910.github.io/cabocha/）や、KNP（http://nlp.ist.i.kyoto-u.ac.jp/index.php?KNP）等の周知の係り受け解析器を使用してよい。係り受け解析部１０４は、一の区分に対応する審査観点記載文書の文章と比較対象文書の文章について文節の係り受け構造を抽出すると、抽出した係り受け構造の組と、当該一の区分の識別情報とを対応付けて係り受け構造記憶部１０５に記憶する。 The dependency analysis unit 104 divides the sentence of the examination viewpoint description document corresponding to each category stored in the document storage unit 103 and the sentence of the comparison target document into sentence units, performs dependency analysis, and composes the sentence. The dependency structure (in other words, dependency tree) of the phrase to be extracted is extracted. At that time, dependency analysis includes, for example, CaboCha (https://taku910.github.io/cabocha/) and KNP (http://nlp.ist.i.kyoto-u.ac.jp/index. You may use a well-known dependency analyzer such as php? KNP). When the dependency analysis unit 104 extracts the dependency structure of the clauses for the sentence of the examination viewpoint description document and the sentence of the comparison target document corresponding to one category, the set of the extracted dependency structure and the identification of the one category are extracted. The information is stored in the dependency structure storage unit 105 in association with the information.

受け節抽出部１０６は、係り受け構造記憶部１０５に記憶されている各区分に対応する審査観点記載文書と比較対象文書の文節の係り受け構造を参照して、係り受け関係を構成する係り先の文節（以下単に「受け節」という。）を抽出する。ここで抽出される受け節は、特に、文末の動詞である。受け節抽出部１０６は、一の区分に対応する審査観点記載文書の文節の係り受け構造を参照して受け節を抽出すると、当該受け節と、当該受け節が当該一の区分において出現する頻度（言い換えると出現回数）とを、当該一の区分の識別情報と対応付けて審査観点記載文書受け節記憶部１０７に記憶する。一方、一の区分に対応する比較対象文書の文節の係り受け構造を参照して受け節を抽出すると、当該受け節と、当該受け節が当該一の区分において出現する頻度（言い換えると出現回数）とを、当該一の区分の識別情報と対応付けて比較対象文書受け節記憶部１０８に記憶する。 The receiver extracting unit 106 refers to the dependency structure of the clauses of the examination viewpoint description document and the comparison target document corresponding to each category stored in the dependency structure storage unit 105, and determines the dependency destinations constituting the dependency relationship. (Hereinafter simply referred to as “acceptance clause”). The receiving clauses extracted here are in particular the verbs at the end of the sentence. When the receiving clause extracting unit 106 extracts a receiving clause by referring to the clause dependency structure of the examination viewpoint description document corresponding to one category, the receiving clause and the frequency at which the receiving clause appears in the one category. (In other words, the number of appearances) is stored in the examination viewpoint description document receiver storage unit 107 in association with the identification information of the one category. On the other hand, when a receiving clause is extracted with reference to the dependency structure of the clause of the comparison target document corresponding to one category, the receiving clause and the frequency at which the receiving clause appears in the one category (in other words, the number of appearances) Are stored in the comparison target document receiver storage unit 108 in association with the identification information of the one category.

ＬＬＲ算出部１０９は、審査観点記載文書受け節記憶部１０７に記憶されている各受け節について、その受け節が抽出された審査観点記載文書の区分の方に、比較対象文書の同区分と比較して偏って出現する度合いを示す対数尤度比を、審査観点記載文書受け節記憶部１０７と比較対象文書受け節記憶部１０８とを参照して、数１の式を用いて算出する。

The LLR calculation unit 109 compares, for each receiver stored in the examination viewpoint description document receiver storage unit 107, the classification of the examination viewpoint description document from which the receiver is extracted, with the same classification of the comparison target document. Then, a log likelihood ratio indicating the degree of occurrence of bias is calculated by using the formula 1 with reference to the examination viewpoint description document receiver storage unit 107 and the comparison target document receiver storage unit 108.

数１の式において、ａは、対数尤度比が算出される受け節が、当該受け節が抽出された審査観点記載文書の区分において出現する頻度を表す。ｂは、その受け節が、比較対象文書の同区分において出現する頻度を表す。ｃは、その受け節以外の受け節が、審査観点記載文書の同区分において出現する頻度を表す。ｄは、その受け節以外の受け節が、比較対象文書の同区分において出現する頻度を表す。Ｎは文書数（本実施形態の場合、「２」）を表す。なおここでｌｏｇは常用対数である。ＬＬＲ算出部１０９は、受け節について対数尤度比を算出すると、算出した対数尤度比をその受け節と対応付けて審査観点記載文書受け節記憶部１０７に記憶する。 In the formula (1), a represents the frequency at which the receiving node for which the log likelihood ratio is calculated appears in the classification of the examination viewpoint description document from which the receiving node is extracted. b represents the frequency at which the receiving node appears in the same section of the comparison target document. c represents the frequency at which receiving clauses other than the receiving clause appear in the same category of the examination viewpoint description document. “d” represents the frequency at which a receiving node other than the receiving node appears in the same section of the comparison target document. N represents the number of documents (in this embodiment, “2”). Here, log is a common logarithm. After calculating the log likelihood ratio for the receiving node, the LLR calculating unit 109 stores the calculated log likelihood ratio in the examination viewpoint description document receiving node storage unit 107 in association with the receiving node.

受け節リスト作成部１１０は、審査観点記載文書受け節記憶部１０７に記憶されている受け節のうち、対数尤度比の値が相対的に大きい所定数の受け節を抽出する。所定数の受け節を抽出すると、対数尤度比の順に、抽出した受け節と、当該受け節が抽出された区分の識別情報と、当該受け節の対数尤度比とを対応付けて受け節リスト記憶部１１１に記憶する。受け節リスト作成部１１０は、本発明に係る「係り先文節抽出部」の一例である。 The receiving clause list creation unit 110 extracts a predetermined number of receiving clauses having a relatively large log likelihood ratio value from the receiving clauses stored in the examination viewpoint description document receiving clause storage unit 107. When a predetermined number of receiving clauses are extracted, the receiving clauses are associated with the extracted receiving clauses, the identification information of the categories from which the receiving clauses are extracted, and the log likelihood ratios of the receiving clauses in the order of log likelihood ratios. Store in the list storage unit 111. The receiving clause list creating unit 110 is an example of a “relevant phrase extracting unit” according to the present invention.

係り節リスト作成部１１２は、受け節リスト記憶部１１１に記憶されている各受け節について、係り受け構造記憶部１０５に記憶されている審査観点記載文書の文節の係り受け構造のうち、その受け節が抽出された区分に対応する文節の係り受け構造を参照して、その受け節と係り受け関係を構成する係り元の文節（以下単に「係り節」という。）を抽出する。ここで抽出される係り節は、特に、その受け節と同じ文に含まれる係り節であって、その受け節に直接的に係る、目的語を含む係り節である。係り節リスト作成部１１２は、係り節を抽出すると、抽出した係り節と、当該係り節が係る受け節と、当該係り節が抽出された区分の識別情報と、当該区分において当該係り節が出現する頻度（言い換えると出現回数）とを対応付けて係り節リスト記憶部１１３に記憶する。係り節リスト作成部１１２は、本発明に係る「係り元文節抽出部」の一例である。 The dependency clause list creation unit 112, for each reception clause stored in the reception clause list storage unit 111, among the dependency structure of the clauses of the examination viewpoint description document stored in the dependency structure storage unit 105, With reference to the dependency structure of the clause corresponding to the category from which the clause is extracted, the dependency clause that forms the dependency relationship with the reception clause (hereinafter simply referred to as “dependency clause”) is extracted. The related clause extracted here is a related clause included in the same sentence as the receiving clause, and is a related clause including an object directly related to the receiving clause. When the related clause list creation unit 112 extracts the related clause, the extracted related clause, the receiving clause related to the related clause, the identification information of the category from which the related clause is extracted, and the related clause appear in the relevant category. Are stored in the clause list storage unit 113 in association with each other (in other words, the number of appearances). The related clause list creation unit 112 is an example of the “related clause extraction unit” according to the present invention.

係り節分類部１１４は、係り節リスト記憶部１１３に記憶されている各係り節について、その係り節と同じ区分において抽出された係り節であって、受け節が共通する係り節の中で、係り節に含まれる単語の意味に基づいて分類する。その際、係り節分類部１１４は、図示せぬ同義語辞書を参照して、互いに同義語を含む係り節群に対して同一の分類符号を振る。または、周知の類似度計算方法を用いて係り節同士の距離を算出し、算出した距離が閾値以内の係り節群に対して同一の分類符号を振る。係り節に振られた分類符号は、当該係り節と対応付けられて係り節リスト記憶部１１３に記憶される。 The clause classification unit 114 is a clause extracted in the same category as the clause for each of the clauses stored in the clause list storage unit 113, and is a clause having a common receiver clause. Classify based on the meaning of the words included in the clause. At that time, the clause classification unit 114 refers to a synonym dictionary (not shown) and assigns the same classification code to the clause group including the synonyms. Alternatively, the distance between the related clauses is calculated using a well-known similarity calculation method, and the same classification code is assigned to the related clause group whose calculated distance is within the threshold. The classification code assigned to the relevant clause is stored in the relevant clause list storage unit 113 in association with the relevant clause.

審査ロジックリスト作成部１１５は、係り節リスト記憶部１１３に記憶されている各受け節について、係り受け構造記憶部１０５に記憶されている審査観点記載文書の文節の係り受け構造のうち、その受け節が抽出された区分に対応する文節の係り受け構造を参照して、その受け節と係り受け関係を構成する係り節であって、条件を表す係り節の抽出を試みる。ここで抽出される係り節は、特に、その受け節と同じ文に含まれる係り節であって、その受け節に直接的に係る係り節である。審査ロジックリスト作成部１１５は、条件を表す係り節を抽出するにあたり、「場合は」、「ときは」、「時は」等の条件を表すキーワードを手掛かりにする。審査ロジックリスト作成部１１５は、条件を表す係り節を抽出すると、抽出された係り節に係る係り節をさらに抽出し、それらを条件部として特定する。また、抽出された条件を表す係り節と同一の文に含まれる係り節であって、係り節リスト記憶部１１３に記憶されている受け節に直接的に係る、目的語を含む係り節を抽出し、当該受け節と合わせて処理部として特定する。なお、そのような係り節が存在しない場合には、抽出を省略してもよい。条件部と処理部とを特定すると、これらが抽出された区分の識別情報と対応付けて審査ロジックリスト記憶部１１６に記憶する。審査ロジックリスト作成部１１５は、本発明に係る「係り元文節抽出部」の一例である。 The examination logic list creation unit 115, for each reception stored in the dependency list storage unit 113, of the dependency structure of the clause of the examination viewpoint description document stored in the dependency structure storage unit 105 receives the reception structure. With reference to the dependency structure of the clause corresponding to the category from which the clause is extracted, an attempt is made to extract a dependency clause that constitutes the dependency relationship with the reception clause and represents a condition. In particular, the extracted clause is a clause included in the same sentence as the receiving clause, and is a related clause directly related to the receiving clause. The examination logic list creation unit 115 uses a keyword representing a condition such as “when”, “when”, and “when” as a clue when extracting a clause representing the condition. When the examination logic list creation unit 115 extracts the clauses representing the conditions, the examination logic list creation unit 115 further extracts the clauses related to the extracted clauses and specifies them as the condition portion. In addition, a related clause included in the same sentence as the related clause representing the extracted condition, which is directly related to the received clause stored in the related clause list storage unit 113 and includes an object, is extracted. Then, it is specified as a processing unit together with the receiving node. Note that the extraction may be omitted when there is no such clause. When the condition part and the processing part are specified, they are stored in the examination logic list storage part 116 in association with the extracted classification identification information. The examination logic list creation unit 115 is an example of a “relationship phrase extraction unit” according to the present invention.

出力部１１７は、受け節リスト記憶部１１１を参照して受け節リストを出力する。また、係り節リスト記憶部１１３を参照して係り節リストを出力する。係り節リストを出力する際、出力部１１７は、後述する分類符号を同じくする係り節群をグループ化して出力する。また、出力部１１７は、審査ロジックリスト記憶部１１６を参照して審査ロジックのリストを出力する。なおここで出力とは、情報の画面出力又は印刷出力若しくは情報を表すデータの送信を指す。 The output unit 117 refers to the receiver list storage unit 111 and outputs a receiver list. Further, a related clause list is output with reference to the related clause list storage unit 113. When outputting the related clause list, the output unit 117 groups and outputs related clause groups having the same classification code, which will be described later. Further, the output unit 117 refers to the screening logic list storage unit 116 and outputs a screening logic list. Here, output refers to screen output of information or print output or transmission of data representing information.

１−２．動作
文書処理装置１により実行される文書処理方法について説明する。具体的には、審査観点記載文書から審査の観点を抽出する審査観点抽出処理について説明する。図２は、審査観点抽出処理の一例を示すフロー図である。なお、本動作例の説明では、審査観点記載文書として、診療報酬の算定方法の一部改正に伴う実施上の留意事項が入力され、比較対象文書として、診療報酬の算定方法の一部を改正する件の告示が入力される場合を想定する。 1-2. Operation A document processing method executed by the document processing apparatus 1 will be described. Specifically, the examination viewpoint extraction process for extracting the examination viewpoint from the examination viewpoint description document will be described. FIG. 2 is a flowchart showing an example of examination viewpoint extraction processing. In the explanation of this operation example, precautionary points related to the partial revision of the calculation method for medical fees are entered as a document describing the examination viewpoint, and a part of the calculation method for medical fees is revised as a comparative document. Assume that a notification of the matter to be entered is input.

文書入力部１０１により審査観点記載文書データと比較対象文書データが取得されると、文書構造解析部１０２は、取得された審査観点記載文書データにより表される審査観点記載文書と、取得された比較対象文書データにより表される比較対象文書の各々について、文書構造を解析して、審査観点記載文書を構成する各文章に対応する、比較対象文書を構成する文章を特定する（Ｓ１）。審査観点記載文書を構成する各文章について比較対象文書において対応する文章を特定すると、対応する文章の組と、それらの文章が属する区分の識別情報とを対応付けて文書記憶部１０３に記憶する。図３は、文書記憶部１０３に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、審査観点記載文書の文章と比較対象文書の文章の組であって、互いに対応する文章の組が格納されている。 When the document input unit 101 acquires the document data for examination viewpoint description and the document data to be compared, the document structure analysis unit 102 compares the document for examination viewpoint description represented by the obtained document data for examination viewpoint description with the acquired comparison. For each of the comparison target documents represented by the target document data, the document structure is analyzed, and the sentences constituting the comparison target document corresponding to the sentences constituting the examination viewpoint description document are specified (S1). When a corresponding sentence in the comparison target document is specified for each sentence constituting the examination viewpoint description document, the corresponding sentence pair and the identification information of the category to which the sentence belongs are associated and stored in the document storage unit 103. FIG. 3 is a diagram illustrating an example of a table stored in the document storage unit 103. In the table shown in the figure, for example, a set of sentences of the document described in the examination viewpoint and the sentence of the comparison target document is associated with the classification identification information “Chapter 1”, “Part 1”, and “General Rules”. , A set of sentences corresponding to each other is stored.

審査観点記載文書と比較対象文書の間で文章の対応関係が特定されると、係り受け解析部１０４は、文書記憶部１０３に記憶されている各区分に対応する審査観点記載文書の文章と比較対象文書の文章を文単位に分割し、係り受け解析を行って、文章を構成する文節の係り受け構造を抽出する（Ｓ２）。図４は、一例として、「医学的に初診といわれる診療行為があった場合に、初診料を算定する。」という文が係り受け解析部１０４により係り受け解析された場合に出力される係り受け構造の一例を示す図である。同図に示す係り受け構造において、アスタリスクから始まる第１、４、７、１０、１４、１７、２１及び２４行は、文節番号と、係り先の文節番号（係り先なしの場合は「−１」）とにより構成され、その他の行は、単語の表層形、品詞、品詞細分類１、品詞細分類２、品詞細分類３、活用型、活用形、原形、読み及び発音により構成される。その他の行におけるアスタリスクは、その情報が辞書に登録されていないことを示している。同図に示す係り受け構造は、例えば、文節「医学的に」が文節「いわれる」に係っていることを示している。係り受け解析部１０４は、各区分に対応する審査観点記載文書の文章と比較対象文書の文章について文節の係り受け構造を抽出すると、抽出した係り受け構造の組と、その区分の識別情報とを対応付けて係り受け構造記憶部１０５に記憶する。図５は、係り受け構造記憶部１０５に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、審査観点記載文書の文節の係り受け構造と比較対象文書の文節の係り受け構造の組が格納されている。 When the correspondence relationship between the examination viewpoint description document and the comparison target document is specified, the dependency analysis unit 104 compares the sentence with the examination viewpoint description document corresponding to each category stored in the document storage unit 103. The sentence of the target document is divided into sentence units, dependency analysis is performed, and the dependency structure of the clauses constituting the sentence is extracted (S2). As an example, FIG. 4 shows a dependency structure that is output when a dependency analysis unit 104 analyzes the dependency of the sentence “Calculate the initial fee when there is a medical practice called medically as the first diagnosis”. It is a figure which shows an example. In the dependency structure shown in the figure, the first, fourth, seventh, tenth, fourteenth, seventeenth, twenty-first and twenty-fourth lines starting with an asterisk indicate the phrase number and the phrase number of the relation destination (“-1” when there is no relation destination). The other lines are composed of the surface form of the word, the part of speech, the part of speech subcategory 1, the part of speech subcategory 2, the part of speech subcategory 3, the inflection type, the inflection form, the original form, the reading and the pronunciation. An asterisk in the other lines indicates that the information is not registered in the dictionary. The dependency structure shown in the figure indicates that the phrase “medically” is related to the phrase “speak”. When the dependency analysis unit 104 extracts the dependency structure of the clause for the sentence of the examination viewpoint description document and the sentence of the comparison target document corresponding to each category, the combination of the extracted dependency structure and the identification information of the category are obtained. The associated structure is stored in the dependency structure storage unit 105. FIG. 5 is a diagram illustrating an example of a table stored in the dependency structure storage unit 105. In the table shown in the figure, for example, the dependency structure of the clause of the document described in the examination viewpoint and the clause of the comparison target document are associated with the identification information “Chapter 1”, “Part 1”, and “General” of the classification. A set of dependency structures is stored.

各区分に対応する審査観点記載文書の文章と比較対象文書の文章について係り受け構造が抽出されると、受け節抽出部１０６は、係り受け構造記憶部１０５に記憶されている各区分に対応する審査観点記載文書と比較対象文書の文節の係り受け構造を参照して、係り受け関係を構成する受け節を抽出する（Ｓ３）。一の区分に対応する審査観点記載文書の文節の係り受け構造を参照して受け節を抽出すると、当該受け節と、当該受け節が当該一の区分において出現する頻度とを、当該一の区分の識別情報と対応付けて審査観点記載文書受け節記憶部１０７に記憶する。図６は、審査観点記載文書受け節記憶部１０７に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、受け節「記載する」、出願頻度「４８」及びＬＬＲ「８０．７８」が格納されている。なお、ＬＬＲ「８０．７８」は、後述するステップＳ４の実行の結果、格納される。一方、受け節抽出部１０６は、一の区分に対応する比較対象文書の文節の係り受け構造を参照して受け節を抽出すると、当該受け節と、当該受け節が当該一の区分において出現する頻度とを、当該一の区分の識別情報と対応付けて比較対象文書受け節記憶部１０８に記憶する。図７は、比較対象文書受け節記憶部１０８に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、受け節「記載する」及び出願頻度「８」が格納されている。 When the dependency structure is extracted for the sentence of the examination viewpoint description document corresponding to each category and the sentence of the comparison target document, the receiving node extraction unit 106 corresponds to each category stored in the dependency structure storage unit 105. Referring to the dependency structure of the clauses of the examination viewpoint description document and the comparison target document, the reception clauses constituting the dependency relationship are extracted (S3). When a receiving clause is extracted with reference to the dependency structure of the clause of the examination viewpoint document corresponding to one category, the receiving clause and the frequency at which the receiving clause appears in the one category are determined. And stored in the examination viewpoint description document receiver storage unit 107 in association with the identification information. FIG. 6 is a diagram illustrating an example of a table stored in the examination viewpoint description document receiver storage unit 107. In the table shown in the figure, for example, the receiving section “describe”, the application frequency “48”, and the LLR “80. 78 "is stored. The LLR “80.78” is stored as a result of execution of step S4 described later. On the other hand, when the receiving clause extracting unit 106 extracts a receiving clause by referring to the dependency structure of the clause of the comparison target document corresponding to one division, the receiving clause and the receiving clause appear in the one division. The frequency is stored in the comparison target document receiver storage unit 108 in association with the identification information of the one category. FIG. 7 is a diagram illustrating an example of a table stored in the comparison target document receiver storage unit 108. In the table shown in the figure, for example, the receiving section “describe” and the application frequency “8” are stored in association with the classification identification information “Chapter 1”, “Part 1”, and “General Rules”. .

審査観点記載文書と比較対象文書から受け節が抽出されると、ＬＬＲ算出部１０９は、審査観点記載文書受け節記憶部１０７に記憶されている各受け節について、その受け節が抽出された審査観点記載文書の区分の方に、比較対象文書の同区分と比較して偏って出現する度合いを示す対数尤度比を、審査観点記載文書受け節記憶部１０７と比較対象文書受け節記憶部１０８とを参照して算出する（Ｓ４）。各受け節について対数尤度比を算出すると、算出した対数尤度比をその受け節と対応付けて審査観点記載文書受け節記憶部１０７に記憶する。 When the receiving clause is extracted from the examination viewpoint description document and the comparison target document, the LLR calculation unit 109 extracts the receiving clause for each receiving clause stored in the examination viewpoint description document receiving clause storage unit 107. The log likelihood ratio indicating the degree of appearance of the viewpoint description document in comparison with the same section of the comparison target document is set to the examination viewpoint description document receiver storage unit 107 and the comparison target document receiver storage unit 108. (S4). When the log likelihood ratio is calculated for each receiver node, the calculated log likelihood ratio is stored in the examination viewpoint description document receiver storage unit 107 in association with the receiver node.

各受け節について対数尤度比が算出されると、受け節リスト作成部１１０は、審査観点記載文書受け節記憶部１０７に記憶されている受け節のうち、対数尤度比の値が相対的に大きい所定数の受け節を抽出する（Ｓ５）。所定数の受け節を抽出すると、対数尤度比の順に、抽出した受け節と、当該受け節が抽出された区分の識別情報と、その受け節の対数尤度比とを対応付けて受け節リスト記憶部１１１に記憶する。図８は、受け節リスト記憶部１１１に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、受け節「実施する」及びＬＬＲ「３２６．５４」が格納されている。 When the log likelihood ratio is calculated for each receiver, the receiver list creation unit 110 compares the log likelihood ratio value among the receivers stored in the examination viewpoint description document receiver storage unit 107. A predetermined number of receiving nodes that are large is extracted (S5). When a predetermined number of receiving clauses are extracted, the receiving clauses are associated with the extracted receiving clauses, the identification information of the categories from which the receiving clauses are extracted, and the log likelihood ratios of the receiving clauses in the order of log likelihood ratios. Store in the list storage unit 111. FIG. 8 is a diagram illustrating an example of a table stored in the receiver list storage unit 111. In the table shown in the figure, for example, the receiving clause “Implement” and LLR “326.54” are stored in association with the identification information “Chapter 1”, “Part 1”, and “General” of the classification. Yes.

受け節リストが作成されると、係り節リスト作成部１１２は、受け節リスト記憶部１１１に記憶されている各受け節について、係り受け構造記憶部１０５に記憶されている審査観点記載文書の文節の係り受け構造のうち、その受け節が抽出された区分に対応する文節の係り受け構造を参照して、その受け節と係り受け関係を構成する係り節を抽出する（Ｓ６）。係り節を抽出すると、抽出した係り節と、当該係り節が係る受け節と、当該係り節が抽出された区分の識別情報と、当該区分において当該係り節が出現する頻度とを対応付けて係り節リスト記憶部１１３に記憶する。図９は、係り節リスト記憶部１１３に記憶されるテーブルの一例を示す図である。同記憶部に記憶されるテーブルは、受け節リスト記憶部１１１に記憶される受け節ごとに作成され、同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、係り節「理由を」、出現頻度「２７」及び分類符号「１」が格納されている。なお、分類符号「１」は、後述するステップＳ７の実行の結果、格納される。 When the acceptance list is created, the dependency list creation unit 112, for each reception clause stored in the reception list storage unit 111, the clauses of the examination viewpoint description document stored in the dependency structure storage unit 105. Among the dependency structures, the dependency structure of the dependency relationship with the reception clause is extracted with reference to the dependency structure of the clause corresponding to the category from which the reception clause is extracted (S6). When extracting a related clause, the extracted related clause, the receiving clause related to the relevant clause, the identification information of the category from which the relevant clause is extracted, and the frequency of occurrence of the relevant clause in the relevant category are associated with each other. Store in the clause list storage unit 113. FIG. 9 is a diagram illustrating an example of a table stored in the related clause list storage unit 113. The table stored in the storage unit is created for each receiver stored in the receiver list storage unit 111. In the table shown in the figure, for example, classification identification information “Chapter 1”, “Part 1” ”And“ general rule ”are stored in association with the clause“ reason ”, the appearance frequency“ 27 ”, and the classification code“ 1 ”. The classification code “1” is stored as a result of execution of step S7 described later.

係り節リストが作成されると、係り節分類部１１４は、係り節リスト記憶部１１３に記憶されている各係り節について、その係り節と同じ区分において抽出された係り節であって、受け節が共通する係り節の中で、係り節に含まれる単語の意味に基づいて分類する（Ｓ７）。各係り節に分類符号を振ると、その係り節と対応付けて係り節リスト記憶部１１３に記憶する。 When the clause list is created, the clause classification unit 114 is a clause extracted in the same category as the clause for each clause stored in the clause list storage unit 113. Are classified based on the meanings of the words included in the related clauses (S7). When a classification code is assigned to each relevant clause, it is stored in the relevant clause list storage unit 113 in association with the relevant clause.

各係り節が分類されると、審査ロジックリスト作成部１１５は、係り節リスト記憶部１１３に記憶されている各受け節について、係り受け構造記憶部１０５に記憶されている審査観点記載文書の文節の係り受け構造のうち、その受け節が抽出された区分に対応する文節の係り受け構造を参照して、その受け節と係り受け関係を構成する係り節であって、条件を表す係り節の抽出を試みる。条件を表す係り節を抽出すると、抽出された係り節に係る係り節をさらに抽出し、それらを条件部として特定する。また、抽出された条件を表す係り節と同一の文に含まれる係り節であって、係り節リスト記憶部１１３に記憶されている受け節に直接的に係る、目的語を含む係り節を抽出し、当該受け節と合わせて処理部として特定する（Ｓ８）。条件部と処理部とを特定すると、これらが抽出された区分の識別情報と対応付けて審査ロジックリスト記憶部１１６に記憶する。図１０は、審査ロジックリスト記憶部１１６に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、条件部「緊急の場合は」及び処理部「理由を記載する」が格納されている。
以上が、審査観点抽出処理についての説明である。 When each dependency clause is classified, the examination logic list creation unit 115, for each reception clause stored in the dependency list storage unit 113, the clause of the examination viewpoint description document stored in the dependency structure storage unit 105. The dependency structure of the dependency structure of the dependency relationship with the reception clause, with reference to the dependency structure of the clause corresponding to the category from which the dependency clause is extracted. Attempt to extract. When a clause representing a condition is extracted, a clause related to the extracted clause is further extracted and specified as a condition part. In addition, a related clause included in the same sentence as the related clause representing the extracted condition, which is directly related to the received clause stored in the related clause list storage unit 113 and includes an object, is extracted. Then, it is specified as a processing unit together with the receiver (S8). When the condition part and the processing part are specified, they are stored in the examination logic list storage part 116 in association with the extracted classification identification information. FIG. 10 is a diagram illustrating an example of a table stored in the examination logic list storage unit 116. In the table shown in the figure, for example, the condition part “in case of emergency” and the processing part “describe the reason” are associated with the classification identification information “Chapter 1”, “Part 1”, and “General Rules”. Is stored.
The above is the description of the examination viewpoint extraction process.

以上説明した文書処理装置１では、受け節リスト作成部１１０により、審査観点記載文書に特異的に出現する受け節のリストが作成される。この受け節リストを参照することで、これから審査観点記載文書を読もうとする読者は、本文書中のどの文節に着目して読めば審査の観点を迅速に理解できるかを知ることができる。加えて、この文書処理装置１では、審査観点記載文書と比較対象文書の間で互いに関連する文章同士を比較して対数尤度比が算出されるため、そうでない場合と比較して、受け節の抽出精度が向上する。また、単語単位ではなく文節単位で抽出が行われるため、例えば、「記載する」なのか「記載しない」なのかを区別することができる。また、すべての文節ではなく、受け節であり且つ文末の動詞に限って抽出されるため、そうでない場合と比較して、ノイズの抽出が抑制される。 In the document processing apparatus 1 described above, the receiver list creation unit 110 creates a list of receivers that appear specifically in the examination viewpoint description document. By referring to this receiving list, readers who are going to read the examination viewpoint document can know which paragraphs in this document can quickly understand the examination viewpoint. In addition, in this document processing apparatus 1, since the log likelihood ratio is calculated by comparing sentences related to each other between the examination viewpoint description document and the comparison target document, compared with a case where the log likelihood ratio is not so, The extraction accuracy is improved. Further, since extraction is performed not in units of words but in terms of phrases, for example, it can be distinguished whether it is “described” or “not described”. In addition, since extraction is performed only for verbs at the end of a sentence and not for all phrases, extraction of noise is suppressed as compared with the case where it is not.

また文書処理装置１では、係り節リスト作成部１１２により、審査観点記載文書に特異的に出現する受け節に係る係り節のリストが作成される。この係り節リストを参照することで、これから審査観点記載文書を読もうとする読者にとっての審査観点の理解がより容易になる。加えて、各係り節が係り節分類部１１４により分類されて、分類符号を同じくする係り節群がグループ化されて出力されるため、審査観点の概要の把握が容易になる。 In the document processing apparatus 1, the related clause list creating unit 112 creates a list of related clauses related to the receiving clauses that appear specifically in the examination viewpoint description document. By referring to this section list, it becomes easier for readers who will read the examination viewpoint description document to understand the examination viewpoint. In addition, each of the related clauses is classified by the related clause classification unit 114, and the related clause groups having the same classification code are grouped and output, so that it is easy to grasp the outline of the examination viewpoint.

また文書処理装置１では、審査ロジックリスト作成部１１５により、条件部と処理部とからなる審査ロジックのリストが抽出される。この審査ロジックのリストを参照すれば、審査観点記載文書を読まなくても、審査観点の概要を把握することができる。 In the document processing apparatus 1, the examination logic list creation unit 115 extracts a list of examination logic including a condition part and a processing part. By referring to this list of examination logic, an outline of the examination viewpoint can be grasped without reading the examination viewpoint description document.

２．変形例
上記の実施形態は以下に記載するように変形してもよい。なお、以下に記載する１以上の変形例は互いに組み合わせてもよい。 2. Modifications The above embodiment may be modified as described below. Note that one or more modified examples described below may be combined with each other.

２−１．変形例１
文書構造解析部１０２は省略されてもよい。その場合、文書記憶部１０３には、文書入力部１０１により取得された審査観点記載文書データにより表される審査観点記載文書全体と比較対象文書データにより表される比較対象文書全体の組が、対応する文章の組として文書記憶部１０３に記憶される。 2-1. Modification 1
The document structure analysis unit 102 may be omitted. In that case, the document storage unit 103 corresponds to a set of the entire examination viewpoint description document represented by the examination viewpoint description document data acquired by the document input unit 101 and the entire comparison target document represented by the comparison target document data. Is stored in the document storage unit 103 as a set of sentences to be processed.

２−２．変形例２
上記の文書処理装置１は、日本語の文書を処理させることを想定しているが、日本語以外の言語を処理可能としてもよい。例えば、英語の文書を処理させる場合には、係り受け解析部１０４は、例えばMaltparse（http://www.maltparser.org/）を使用して、英語の文書を構成する文節の係り受け構造を抽出するようにしてよい。 2-2. Modification 2
The document processing apparatus 1 is assumed to process a Japanese document, but may process a language other than Japanese. For example, when processing an English document, the dependency analysis unit 104 uses, for example, Maltparse (http://www.maltparser.org/) to create a dependency structure of clauses constituting an English document. It may be extracted.

２−３．変形例３
上記の受け節抽出部１０６は、文末の動詞である受け節を抽出しているが、抽出される受け節は、文末の受け節や動詞の受け節に限定しなくてもよい。 2-3. Modification 3
The above-mentioned receiving clause extracting unit 106 extracts receiving clauses that are verbs at the end of a sentence. However, the extracted receiving clauses may not be limited to receiving clauses at the end of a sentence or receiving clauses of verbs.

２−４．変形例４
対数尤度比を算出するための上記の数１の式では常用対数がとられているが、自然対数がとられてもよい。または、そもそも対数をとらなくてもよい。 2-4. Modification 4
Although the common logarithm is taken in the above formula 1 for calculating the log likelihood ratio, a natural logarithm may be taken. Alternatively, the logarithm may not be taken in the first place.

対数尤度比を算出するための式は上記の数１の式に限られない。ＬＬＲ算出部１０９は、数１の式に代えて、例えば、数２の式を用いて対数尤度比を算出してもよい。

The formula for calculating the log likelihood ratio is not limited to the formula 1 above. The LLR calculation unit 109 may calculate the log likelihood ratio using, for example, the formula 2 instead of the formula 1.

数２の式において、ａは、対数尤度比が算出される受け節が、当該受け節が抽出された審査観点記載文書の区分において出現する頻度を表す。ｂは、その受け節が、比較対象文書の同区分において出現する頻度を表す。ｃは、すべての受け節が審査観点記載文書の同区分において出現する頻度（言い換えると受け節の総数）を表す。ｄは、すべての受け節が比較対象文書の同区分において出現する頻度（言い換えると受け節の総数）を表す。 In Expression 2, a represents the frequency at which the receiving node for which the log likelihood ratio is calculated appears in the classification of the examination viewpoint description document from which the receiving node is extracted. b represents the frequency at which the receiving node appears in the same section of the comparison target document. c represents the frequency of occurrence of all receiving clauses in the same category of the examination viewpoint description document (in other words, the total number of receiving clauses). d represents the frequency at which all receiving clauses appear in the same section of the comparison target document (in other words, the total number of receiving clauses).

２−５．変形例５
上記の受け節リスト作成部１１０は、対数尤度比の値が相対的に大きい所定数の受け節を抽出する代わりに、対数尤度比の値が閾値よりも受け節を抽出するようにしてもよい。 2-5. Modification 5
Instead of extracting a predetermined number of receiving nodes having a relatively large log likelihood ratio value, the receiving node list creation unit 110 extracts a receiving node having a log likelihood ratio value that is greater than a threshold value. Also good.

２−６．変形例６
上記の係り節リスト作成部１１２は、受け節に直接的に係る、目的語を含む係り節を抽出しているが、抽出される係り節は、受け節に直接的に係る係り節や目的語を含む係り節に限定しなくてもよい。 2-6. Modification 6
The above-mentioned clause list creation unit 112 extracts a clause including an object directly related to the receiver clause. The extracted clause is a clause or object directly related to the receiver clause. It does not need to be limited to the hanging clause including.

２−７．変形例７
上記の係り節分類部１１４は、同じ区分において抽出された係り節であって、受け節が共通する係り節の中で分類を行っているが、異なる区分において抽出された係り節の中で分類を行ってもよいし、受け節が共通しない係り節の中で分類を行ってもよい。 2-7. Modification 7
The above-mentioned clause classifying unit 114 classifies among the clauses extracted in the same category and having the common clauses, but classifies among the clauses extracted in different categories. It is also possible to classify in a clause that does not share the receiving clause.

２−８．変形例８
文書処理装置１が備える各機能を実現するためのプログラムは、コンピュータ装置が読み取り可能な記録媒体を介して提供されてもよい。ここで記録媒体とは、例えば、磁気テープや磁気ディスクなどの磁気記録媒体や、光ディスクなどの光記録媒体や、光磁気記録媒体や、半導体メモリ等である。また、このプログラムは、インターネット等のネットワークを介して提供されてもよい。 2-8. Modification 8
A program for realizing each function included in the document processing apparatus 1 may be provided via a recording medium readable by the computer apparatus. Here, the recording medium is, for example, a magnetic recording medium such as a magnetic tape or a magnetic disk, an optical recording medium such as an optical disk, a magneto-optical recording medium, or a semiconductor memory. In addition, this program may be provided via a network such as the Internet.

２−９．変形例９
審査観点記載文書は、規則記載文書の一例である。ここで規則記載文書とは、規則が記載された文書である。より具体的には、例えば、人の行為や事務手続きの標準となる事柄が記載された文書である。 2-9. Modification 9
The examination viewpoint description document is an example of a rule description document. Here, the rule description document is a document in which the rule is described. More specifically, for example, it is a document in which matters that become the standard of human actions and office procedures are described.

１…文書処理装置、１０１…文書入力部、１０２…文書構造解析部、１０３…文書記憶部、１０４…係り受け解析部、１０５…係り受け構造記憶部、１０６…受け節抽出部、１０７…審査観点記載文書受け節記憶部、１０８…比較対象文書受け節記憶部、１０９…ＬＬＲ算出部、１１０…受け節リスト作成部、１１１…受け節リスト記憶部、１１２…係り節リスト作成部、１１３…係り節リスト記憶部、１１４…係り節分類部、１１５…審査ロジックリスト作成部、１１６…審査ロジックリスト記憶部、１１７…出力部 DESCRIPTION OF SYMBOLS 1 ... Document processing apparatus, 101 ... Document input part, 102 ... Document structure analysis part, 103 ... Document storage part, 104 ... Dependency analysis part, 105 ... Dependency structure storage part, 106 ... Receiving part extraction part, 107 ... Examination Perspective description document receiving section storage section 108... Comparison target document receiving section storage section 109... LLR calculating section 110... Receiving section list creating section 111... Receiving section list storing section 112. Related Section List Storage Unit 114. Related Section Classification Unit 115 115 Examination Logic List Creation Unit 116 116 Examination Logic List Storage Unit 117 Output Unit

Claims

A dependency structure of clauses constituting the rule description document, a dependency analysis unit for extracting the dependency structure of clauses constituting the comparison target document,
For each of the dependency clauses constituting the rule description document, based on the dependency structure of the clause extracted from the rule description document by the dependency analysis unit and the dependency structure of the clause extracted from the comparison target document A likelihood ratio calculation unit that calculates a likelihood ratio indicating a degree of occurrence of the dependency clause in the rule description document rather than the comparison target document;
Of the dependency clauses constituting the rule description document, the likelihood ratio value calculated by the likelihood ratio calculation unit is a relatively large number of dependency clauses or calculated by the likelihood ratio calculation unit A document processing apparatus comprising: a dependency clause extracting unit that extracts a dependency clause having a likelihood ratio value larger than a threshold value.

For each of the dependency clauses extracted by the dependency clause extraction unit, a dependency source phrase extraction for extracting the dependency source clause based on the dependency structure of the clause extracted from the rule description document by the dependency analysis unit The document processing apparatus according to claim 1, further comprising a unit.

A relational phrase classification unit that classifies the relational phrase extracted by the relational phrase extraction part for each of the relational phrases extracted by the relational phrase extraction unit based on the meaning of the word included in the relational phrase. The document processing apparatus according to claim 2, further comprising:

The document processing apparatus according to claim 2, wherein the dependency source phrase extracting unit extracts a relationship source phrase representing a condition.

A document structure analysis unit that analyzes the structure of the rule description document and the structure of the comparison target document, and identifies another sentence constituting the comparison target document corresponding to the one sentence constituting the rule description document; In addition,
The likelihood ratio calculation unit calculates a likelihood ratio indicating the degree of appearance of the dependency clause more biased to the one sentence than the other sentences for each of the dependency clauses constituting the one sentence. ,
The dependency clause extraction unit includes a predetermined number of dependency clauses having a relatively large likelihood ratio value calculated by the likelihood ratio calculation unit among the dependency clauses constituting the one sentence, or the likelihood. 5. The document processing apparatus according to claim 1, wherein a dependency clause having a likelihood ratio value calculated by the degree ratio calculation unit larger than a threshold value is extracted.

A document processing method executed by a document processing apparatus,
A dependency structure of clauses constituting the rule description document, a dependency analysis step of extracting a dependency structure of clauses constituting the comparison target document,
For each dependency clause constituting the rule description document, based on the dependency structure of the clause extracted from the rule description document and the dependency structure of the clause extracted from the comparison target document in the dependency analysis step. A likelihood ratio calculating step of calculating a likelihood ratio indicating a degree of occurrence of the dependency clause in the rule description document rather than the comparison target document;
Among the dependency clauses constituting the rule description document, a predetermined number of dependency clauses having a relatively large likelihood ratio value calculated by the likelihood ratio calculation step or the likelihood ratio calculation step A dependency clause extraction step of extracting a dependency clause whose likelihood ratio value is larger than a threshold value.

Computer
A dependency structure of clauses constituting the rule description document, a dependency analysis unit for extracting the dependency structure of clauses constituting the comparison target document,
For each of the dependency clauses constituting the rule description document, based on the dependency structure of the clause extracted from the rule description document by the dependency analysis unit and the dependency structure of the clause extracted from the comparison target document A likelihood ratio calculation unit that calculates a likelihood ratio indicating a degree of occurrence of the dependency clause in the rule description document rather than the comparison target document;
Of the dependency clauses constituting the rule description document, the likelihood ratio value calculated by the likelihood ratio calculation unit is a relatively large number of dependency clauses or calculated by the likelihood ratio calculation unit A program for functioning as a dependency clause extraction unit that extracts a dependency clause whose likelihood ratio is larger than a threshold value.