JP6813432B2

JP6813432B2 - Document processing equipment, document processing methods and programs

Info

Publication number: JP6813432B2
Application number: JP2017102967A
Authority: JP
Inventors: 森田　明宏; 明宏森田; 田中　宏征; 宏征田中; 松永　務; 務松永
Original assignee: NTT Data Corp
Current assignee: NTT Data Corp
Priority date: 2017-05-24
Filing date: 2017-05-24
Publication date: 2021-01-13
Anticipated expiration: 2037-05-24
Also published as: JP2018198002A

Description

本発明は、一組の文書のうち一方の文書に偏って出現する文節を抽出するための装置、方法及びプログラムに関する。 The present invention relates to a device, a method and a program for extracting phrases that appear unevenly in one of a set of documents.

従来、自然言語処理の分野において、一組の文書のうち一方の文書に偏って出現する単語を抽出するための方法が知られている。例えば、非特許文献１には、専門用語のうち専ら一般人に対して用いられる専門用語を抽出するための方法であって、一般人向けコーパスと専門家向けコーパスにおける専門用語の使用上の偏りを数値化して、一般人向けコーパスに偏って使用されている専門用語を一般向け専門用語として抽出する方法が記載されている。 Conventionally, in the field of natural language processing, a method for extracting words that appear unevenly in one of a set of documents has been known. For example, Non-Patent Document 1 is a method for extracting technical terms used exclusively for ordinary people from among technical terms, and numerically indicates a bias in the use of technical terms in a corpus for general public and a corpus for professionals. It describes a method of extracting technical terms that are biased toward the corpus for the general public as technical terms for the general public.

金愛蘭、桐生りか、近藤明日子、田中牧郎、「『一般向け専門用語』抽出の試み−医療用語を例に−」、日本語学会2008年度春季大会予稿集、日本語学会、2008年5月、pp.199-206Ran Kim, Rika Kiryu, Asuko Kondo, Makio Tanaka, "Attempts to Extract'Technical Terms for the General Public'-Examples of Medical Terms-", Proceedings of the 2008 Spring Meeting of the Japanese Society, Japanese Society, May 2008 Month, pp.199-206

従来一般に、審査業務で使用される、審査の観点が記載された審査観点記載文書の内容を迅速に理解することは困難であった。審査観点記載文書の迅速な理解を助けるために、上記の従来技術を利用して、当該文書のうち審査の観点が記載された部分のみを抽出してリスト化することが考えられるが、従来技術を利用した方法では精度よく抽出することができなかった。 Conventionally, it has been generally difficult to quickly understand the contents of the examination viewpoint description document that describes the examination viewpoint, which is used in the examination work. In order to help the quick understanding of the examination viewpoint description document, it is conceivable to use the above-mentioned prior art to extract and list only the part of the document in which the examination viewpoint is described. It was not possible to extract accurately by the method using.

本発明は、このような事情に鑑みてなされたものであり、規則記載文書から規則を精度よく抽出することを目的とする。 The present invention has been made in view of such circumstances, and an object of the present invention is to accurately extract rules from a rule description document.

上記の課題を解決するため、本発明に係る文書処理装置は、規則記載文書を構成する文節の係り受け構造と、比較対象文書を構成する文節の係り受け構造を抽出する係り受け解析部と、前記規則記載文書を構成する係り先文節の各々について、前記係り受け解析部により前記規則記載文書から抽出された文節の係り受け構造と前記比較対象文書から抽出された文節の係り受け構造とに基づいて、係り先文節が前記比較対象文書よりも前記規則記載文書に偏って出現する度合いを示す尤度比を算出する尤度比算出部と、前記規則記載文書を構成する係り先文節のうち、前記尤度比算出部により算出された尤度比の値が相対的に大きい所定数の係り先文節又は前記尤度比算出部により算出された尤度比の値が閾値よりも大きい係り先文節を抽出する係り先文節抽出部とを備える。 In order to solve the above problems, the document processing apparatus according to the present invention includes a dependency structure of clauses constituting a rule-described document, a dependency analysis unit for extracting a dependency structure of clauses constituting a comparison target document, and a dependency analysis unit. For each of the interleaved clauses constituting the rule-described document, the dependency structure of the clause extracted from the rule-described document by the dependency analysis unit and the dependency structure of the clause extracted from the comparison target document are used. Of the dependency ratio calculation unit that calculates the likelihood ratio indicating the degree to which the dependency clause appears more biasedly in the rule description document than in the comparison target document, and the dependency clause that constitutes the rule description document. A predetermined number of related clauses in which the value of the likelihood ratio calculated by the likelihood ratio calculation unit is relatively large, or a related clause in which the value of the likelihood ratio calculated by the likelihood ratio calculation unit is larger than the threshold value. It is provided with a relational clause extraction unit for extracting.

好ましい態様において、上記の文書処理装置は、前記係り先文節抽出部により抽出された係り先文節の各々について、前記係り受け解析部により前記規則記載文書から抽出された文節の係り受け構造に基づいて、係り元文節を抽出する係り元文節抽出部をさらに備える。 In a preferred embodiment, the document processing apparatus is based on the dependency structure of the clause extracted from the rule description document by the dependency analysis unit for each of the dependency clauses extracted by the dependency clause extraction unit. , It is further provided with a relational element clause extraction unit for extracting the relational element clause.

さらに好ましい態様において、上記の文書処理装置は、前記係り先文節抽出部により抽出された係り先文節の各々について前記係り元文節抽出部により抽出された係り元文節を、係り元文節に含まれる単語の意味に基づいて分類する係り元文節分類部をさらに備える。 In a more preferred embodiment, the above-mentioned document processing device uses the relational element clause extracted by the relational element clause extraction unit for each of the relational clauses extracted by the relational element clause extraction unit, and the word included in the relational element clause. It also has a section for classifying original clauses that classify based on the meaning of.

さらに好ましい態様において、前記係り元文節抽出部は、条件を表す係り元文節を抽出する。 In a more preferred embodiment, the entwined clause extraction unit extracts the entangled clause representing the condition.

さらに好ましい態様において、上記の文書処理装置は、前記規則記載文書の構造と前記比較対象文書の構造を解析して、前記規則記載文書を構成する一の文章に対応する、前記比較対象文書を構成する他の文章を特定する文書構造解析部をさらに備え、前記尤度比算出部は、前記一の文章を構成する係り先文節の各々について、係り先文節が前記他の文章よりも前記一の文章に偏って出現する度合いを示す尤度比を算出し、前記係り先文節抽出部は、前記一の文章を構成する係り先文節のうち、前記尤度比算出部により算出された尤度比の値が相対的に大きい所定数の係り先文節又は前記尤度比算出部により算出された尤度比の値が閾値よりも大きい係り先文節を抽出する。 In a more preferred embodiment, the document processing apparatus analyzes the structure of the rule-described document and the structure of the comparison target document to form the comparison-target document corresponding to one sentence constituting the rule-described document. The document structure analysis unit for specifying the other sentences to be used is further provided, and the likelihood ratio calculation unit has the above-mentioned one for each of the interposition clauses constituting the one sentence. The likelihood ratio indicating the degree of appearance biased to the sentence is calculated, and the related clause extraction unit calculates the likelihood ratio of the related clauses constituting the one sentence, which is calculated by the likelihood ratio calculation unit. Extracts a predetermined number of related clauses having a relatively large value of, or a dependent clause having a likelihood ratio value calculated by the likelihood ratio calculation unit larger than the threshold value.

また、本発明に係る文書処理方法は、文書処理装置により実行される文書処理方法であって、規則記載文書を構成する文節の係り受け構造と、比較対象文書を構成する文節の係り受け構造を抽出する係り受け解析ステップと、前記規則記載文書を構成する係り先文節の各々について、前記係り受け解析ステップにより前記規則記載文書から抽出された文節の係り受け構造と前記比較対象文書から抽出された文節の係り受け構造とに基づいて、係り先文節が前記比較対象文書よりも前記規則記載文書に偏って出現する度合いを示す尤度比を算出する尤度比算出ステップと、前記規則記載文書を構成する係り先文節のうち、前記尤度比算出ステップにより算出された尤度比の値が相対的に大きい所定数の係り先文節又は前記尤度比算出ステップにより算出された尤度比の値が閾値よりも大きい係り先文節を抽出する係り先文節抽出ステップとを備える。 Further, the document processing method according to the present invention is a document processing method executed by a document processing apparatus, and has a dependency structure of clauses constituting a rule-described document and a dependency structure of clauses constituting a document to be compared. For each of the dependency analysis step to be extracted and the dependency clauses constituting the rule description document, the dependency structure of the clause extracted from the rule description document and the comparison target document were extracted from the dependency analysis step. Based on the dependency structure of the clause, the likelihood ratio calculation step for calculating the likelihood ratio indicating the degree to which the dependency clause appears more biasedly in the rule description document than in the comparison target document, and the rule description document Among the constituent clauses, the value of the likelihood ratio calculated by the likelihood ratio calculation step is relatively large, or the value of the likelihood ratio calculated by the predetermined number of clauses or the likelihood ratio calculation step. It includes a dependency clause extraction step for extracting a dependency clause in which is larger than the threshold.

また、本発明に係るプログラムは、コンピュータを、規則記載文書を構成する文節の係り受け構造と、比較対象文書を構成する文節の係り受け構造を抽出する係り受け解析部と、前記規則記載文書を構成する係り先文節の各々について、前記係り受け解析部により前記規則記載文書から抽出された文節の係り受け構造と前記比較対象文書から抽出された文節の係り受け構造とに基づいて、係り先文節が前記比較対象文書よりも前記規則記載文書に偏って出現する度合いを示す尤度比を算出する尤度比算出部と、前記規則記載文書を構成する係り先文節のうち、前記尤度比算出部により算出された尤度比の値が相対的に大きい所定数の係り先文節又は前記尤度比算出部により算出された尤度比の値が閾値よりも大きい係り先文節を抽出する係り先文節抽出部として機能させる。 Further, the program according to the present invention uses a computer as a dependency analysis unit for extracting a dependency structure of clauses constituting a rule description document, a clause dependency structure constituting a comparison target document, and the rule description document. For each of the constituent clauses, the dependency clause is based on the dependency structure of the clause extracted from the rule description document by the dependency analysis unit and the dependency structure of the clause extracted from the comparison target document. Of the likelihood ratio calculation unit that calculates the likelihood ratio indicating the degree to which is biased toward the rule description document rather than the comparison target document, and the related clauses that constitute the rule description document, the likelihood ratio calculation A predetermined number of involved clauses having a relatively large likelihood ratio value calculated by the unit, or a related clause having a likelihood ratio value calculated by the likelihood ratio calculation unit larger than the threshold value. It functions as a phrase extractor.

本発明によれば、規則記載文書から規則を精度よく抽出することができる。 According to the present invention, rules can be accurately extracted from a rule description document.

文書処理装置１の構成の一例を示す図である。It is a figure which shows an example of the structure of the document processing apparatus 1. 審査観点抽出処理の一例を示すフロー図である。It is a flow chart which shows an example of examination viewpoint extraction processing. 文書記憶部１０３に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table which is stored in the document storage part 103. 係り受け解析部１０４による解析の結果の一例を示す図である。It is a figure which shows an example of the result of the analysis by the dependency analysis unit 104. 係り受け構造記憶部１０５に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table which is stored in the dependency structure storage part 105. 審査観点記載文書受け節記憶部１０７に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table which is stored in the examination viewpoint description document receiving section storage unit 107. 比較対象文書受け節記憶部１０８に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table which is stored in the comparison target document receiving section storage unit 108. 受け節リスト記憶部１１１に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table which is stored in the receiving clause list storage unit 111. 係り節リスト記憶部１１３に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table stored in the clause list storage unit 113. 審査ロジックリスト記憶部１１６に記憶されるテーブルの一例を示す図である。It is a figure which shows an example of the table which is stored in the examination logic list storage part 116.

１．実施形態
１−１．構成
図１は、本発明の一実施形態に係る文書処理装置１の構成の一例を示す図である。文書処理装置１は、ＣＰＵ等の演算処理装置と、ＨＤＤ等の記憶装置と、ＮＩＣ等の通信装置を備え、審査観点記載文書から審査の観点を抽出するためのコンピュータである。文書処理装置１は、図１に示すように、文書入力部１０１と、文書構造解析部１０２と、文書記憶部１０３と、係り受け解析部１０４と、係り受け構造記憶部１０５と、受け節抽出部１０６と、審査観点記載文書受け節記憶部１０７と、比較対象文書受け節記憶部１０８と、ＬＬＲ（Log-Likelihood Ratio）算出部１０９と、受け節リスト作成部１１０と、受け節リスト記憶部１１１と、係り節リスト作成部１１２と、係り節リスト記憶部１１３と、係り節分類部１１４と、審査ロジックリスト作成部１１５と、審査ロジックリスト記憶部１１６と、出力部１１７という機能を備える。これらの機能のうち、文書記憶部１０３と、係り受け構造記憶部１０５と、審査観点記載文書受け節記憶部１０７と、比較対象文書受け節記憶部１０８と、受け節リスト記憶部１１１と、係り節リスト記憶部１１３と、審査ロジックリスト記憶部１１６は、記憶装置により実現され、その他の機能は、記憶装置に記憶されるプログラムを演算処理装置が実行することにより実現される。なお、文書処理装置１は、通信回線により相互に接続される複数のサーバ装置により構成されてもよい。 1. 1. Embodiment 1-1. Configuration FIG. 1 is a diagram showing an example of the configuration of the document processing apparatus 1 according to the embodiment of the present invention. The document processing device 1 is a computer including an arithmetic processing unit such as a CPU, a storage device such as an HDD, and a communication device such as a NIC, for extracting a viewpoint of examination from a document describing the viewpoint of examination. As shown in FIG. 1, the document processing device 1 includes a document input unit 101, a document structure analysis unit 102, a document storage unit 103, a dependency analysis unit 104, a dependency structure storage unit 105, and a receiving section extraction. Section 106, document receiving section storage section 107 for review viewpoint description, document receiving section storage section 108 for comparison, LLR (Log-Likelihood Ratio) calculation section 109, receiving section list creation section 110, receiving section list storage section It has functions of 111, a related section list creation unit 112, a related section list storage unit 113, a related section classification unit 114, an examination logic list creation unit 115, an examination logic list storage unit 116, and an output unit 117. Among these functions, the document storage unit 103, the dependency structure storage unit 105, the examination viewpoint description document receiver storage unit 107, the comparison target document receiver storage unit 108, and the receiver list storage unit 111 are involved. The section list storage unit 113 and the examination logic list storage unit 116 are realized by the storage device, and other functions are realized by the arithmetic processing unit executing the program stored in the storage device. The document processing device 1 may be composed of a plurality of server devices connected to each other by a communication line.

文書処理装置１が備える機能のうち、文書入力部１０１は、審査観点記載文書データと比較対象文書データを、記憶装置又は通信装置から取得する。ここで、審査観点記載文書とは、審査の観点が記載された文書である。言い換えると、手順書又は規則書である。審査観点記載文書は主に、条件節と主節からなる文（例えば、「Ａの場合はＢする。」）の集合により構成される。審査観点記載文書の具体例としては、留意事項通知がある。ここで留意事項通知とは、告示された内容の実施上の留意事項が記載された文書である。留意事項通知の例としては、診療報酬の算定方法の一部改正に伴う実施上の留意事項や、特定保険医療材料の材料価格算定に関する留意事項や、特定教育・保育等に要する費用の額の算定基準の改正に伴う実施上の留意事項や、指定障害福祉サービス等に要する費用の額の算定基準の改正に伴う実施上の留意事項がある。次に、比較対象文書とは、審査観点記載文書と比較される文書である。比較対象文書は、審査観点記載文書に記載された内容のうち審査の観点のみを抽出するために参照される文書であるため、審査観点記載文書と記載内容の分野が重複し、作成者が同一であることが好ましい。また、審査観点記載文書と対応する文書構造を有することが好ましい。具体的には、比較対象文書と審査観点記載文書の間で、互いに関連する内容が同じ順序で記載されていることが好ましい。比較対象文書の具体例としては、告示がある。審査観点記載文書として、診療報酬の算定方法の一部改正に伴う実施上の留意事項が入力される場合には、比較対象文書として、診療報酬の算定方法の一部を改正する件の告示が入力される。審査観点記載文書として、特定保険医療材料の材料価格算定に関する留意事項が入力される場合には、比較対象文書として、特定保険医療材料の材料価格の一部を改正する件の告示が入力される。審査観点記載文書として、特定教育・保育等に要する費用の額の算定基準の改正に伴う実施上の留意事項が入力される場合には、比較対象文書として、特定教育・保育等に要する費用の額の算定基準の一部を改正する件の告示が入力される。審査観点記載文書として、指定障害福祉サービス等に要する費用の額の算定基準の改正に伴う実施上の留意事項が入力される場合には、比較対象文書として、指定障害福祉サービス等に要する費用の額の算定基準の一部を改正する件の告示が入力される。 Among the functions provided in the document processing device 1, the document input unit 101 acquires the document data described in the examination viewpoint and the document data to be compared from the storage device or the communication device. Here, the examination viewpoint description document is a document in which the examination viewpoint is described. In other words, it is a runbook or rulebook. The examination viewpoint description document is mainly composed of a set of sentences consisting of a conditional clause and a main clause (for example, "B in the case of A"). As a specific example of the document describing the examination viewpoint, there is a notice of points to be noted. Here, the notice of notes is a document in which notes on implementation of the notified contents are described. Examples of notices include points to note regarding implementation due to the partial revision of the medical fee calculation method, points to note regarding the calculation of material prices for specified insurance medical materials, and the amount of expenses required for specific education and childcare. There are precautions for implementation due to the revision of the calculation standard and points for implementation due to the revision of the calculation standard for the amount of expenses required for designated disability welfare services. Next, the comparison target document is a document to be compared with the examination viewpoint description document. Since the comparison target document is a document that is referred to to extract only the examination viewpoint from the contents described in the examination viewpoint description document, the fields of the examination viewpoint description document and the description content overlap, and the creator is the same. Is preferable. In addition, it is preferable to have a document structure corresponding to the document describing the examination viewpoint. Specifically, it is preferable that the contents related to each other are described in the same order between the comparison target document and the examination viewpoint description document. A specific example of the document to be compared is a notification. When the notes on implementation due to the partial revision of the medical fee calculation method are entered as the examination viewpoint description document, the notification of the partial revision of the medical fee calculation method will be issued as the comparison target document. Entered. When the notes regarding the calculation of the material price of the specified insurance medical material are input as the examination viewpoint description document, the notification of the partial revision of the material price of the specified insurance medical material is input as the comparison target document. .. If the document describing the examination viewpoint contains notes on implementation due to the revision of the calculation standard for the amount of expenses required for specific education / childcare, etc., the document for comparison is the cost required for specific education / childcare, etc. A notification will be entered to revise a part of the amount calculation standard. If notes on implementation due to the revision of the calculation criteria for the amount of expenses required for designated disability welfare services, etc. are entered as a document describing the examination viewpoint, the costs required for designated disability welfare services, etc. A notice is entered to revise a part of the amount calculation standard.

文書構造解析部１０２は、文書入力部１０１により取得された審査観点記載文書データにより表される審査観点記載文書と比較対象文書データにより表される比較対象文書の各々について、文書構造を解析して、審査観点記載文書を構成する各文章に対応する、比較対象文書を構成する文章を特定する。その際、文書構造解析部１０２は、各文章の見出しに振られている番号や、文書中の空行を手掛かりにして、文章の対応関係を特定する。ここで見出しとは、編（部）、章、節、段落等の区分の見出しである。文書構造解析部１０２は、審査観点記載文書を構成する文章について比較対象文書において対応する文章を特定すると、対応する文章の組と、それらの文章が属する区分の識別情報とを対応付けて文書記憶部１０３に記憶する。なお、文書構造解析部１０２は、審査観点記載文書と比較対象文書の間で文章単位での対応関係を特定できない場合には、審査観点記載文書全体と比較対象文書全体を、対応する文章の組として文書記憶部１０３に記憶する。 The document structure analysis unit 102 analyzes the document structure of each of the examination viewpoint description document represented by the examination viewpoint description document data acquired by the document input unit 101 and the comparison target document represented by the comparison target document data. , Identify the sentences that make up the document to be compared, which correspond to each sentence that makes up the document that describes the examination viewpoint. At that time, the document structure analysis unit 102 identifies the correspondence between the sentences by using the numbers assigned to the headings of each sentence and the blank lines in the document as clues. Here, the heading is a heading of a division (part), chapter, section, paragraph, or the like. When the document structure analysis unit 102 identifies the corresponding sentences in the comparison target document for the sentences constituting the examination viewpoint description document, the document storage is associated with the corresponding sentence set and the identification information of the division to which those sentences belong. Stored in unit 103. If the document structure analysis unit 102 cannot specify the correspondence relationship between the examination viewpoint description document and the comparison target document on a sentence-by-sentence basis, the entire examination viewpoint description document and the comparison target document are combined with the corresponding sentence set. Is stored in the document storage unit 103.

係り受け解析部１０４は、文書記憶部１０３に記憶されている各区分に対応する審査観点記載文書の文章と比較対象文書の文章を文単位に分割し、係り受け解析を行って、文章を構成する文節の係り受け構造（言い換えると係り受け木）を抽出する。その際、係り受け解析には、例えば、CaboCha（https://taku910.github.io/cabocha/）や、KNP（http://nlp.ist.i.kyoto-u.ac.jp/index.php?KNP）等の周知の係り受け解析器を使用してよい。係り受け解析部１０４は、一の区分に対応する審査観点記載文書の文章と比較対象文書の文章について文節の係り受け構造を抽出すると、抽出した係り受け構造の組と、当該一の区分の識別情報とを対応付けて係り受け構造記憶部１０５に記憶する。 The dependency analysis unit 104 divides the text of the document describing the examination viewpoint corresponding to each category stored in the document storage unit 103 and the text of the document to be compared into sentence units, performs the dependency analysis, and composes the sentence. Extract the dependency structure (in other words, the dependency tree) of the phrase to be used. At that time, for dependency analysis, for example, CaboCha (https://taku910.github.io/cabocha/) and KNP (http://nlp.ist.i.kyoto-u.ac.jp/index. You may use a well-known dependency analyzer such as php? KNP). When the dependency analysis unit 104 extracts the dependency structure of the clause from the text of the document describing the examination viewpoint corresponding to one division and the text of the document to be compared, the set of the extracted dependency structure and the identification of the one division are identified. The information is associated with the information and stored in the dependency structure storage unit 105.

受け節抽出部１０６は、係り受け構造記憶部１０５に記憶されている各区分に対応する審査観点記載文書と比較対象文書の文節の係り受け構造を参照して、係り受け関係を構成する係り先の文節（以下単に「受け節」という。）を抽出する。ここで抽出される受け節は、特に、文末の動詞である。受け節抽出部１０６は、一の区分に対応する審査観点記載文書の文節の係り受け構造を参照して受け節を抽出すると、当該受け節と、当該受け節が当該一の区分において出現する頻度（言い換えると出現回数）とを、当該一の区分の識別情報と対応付けて審査観点記載文書受け節記憶部１０７に記憶する。一方、一の区分に対応する比較対象文書の文節の係り受け構造を参照して受け節を抽出すると、当該受け節と、当該受け節が当該一の区分において出現する頻度（言い換えると出現回数）とを、当該一の区分の識別情報と対応付けて比較対象文書受け節記憶部１０８に記憶する。 The receiving section extraction unit 106 refers to the dependency structure of the clauses of the examination viewpoint description document and the comparison target document corresponding to each category stored in the dependency structure storage unit 105, and constitutes the dependency relationship. (Hereinafter, simply referred to as "reception clause") is extracted. The receiving clause extracted here is, in particular, the verb at the end of the sentence. When the receiving clause extraction unit 106 extracts the receiving clause by referring to the dependency structure of the clause of the examination viewpoint description document corresponding to one division, the receiving clause and the frequency at which the receiving clause appears in the one division (In other words, the number of occurrences) is stored in the examination viewpoint description document receiving section storage unit 107 in association with the identification information of the one category. On the other hand, when the receiving clause is extracted by referring to the dependency structure of the clause of the comparison target document corresponding to one division, the receiving clause and the frequency at which the receiving clause appears in the one division (in other words, the number of occurrences). Is stored in the comparison target document receiving section storage unit 108 in association with the identification information of the one category.

ＬＬＲ算出部１０９は、審査観点記載文書受け節記憶部１０７に記憶されている各受け節について、その受け節が抽出された審査観点記載文書の区分の方に、比較対象文書の同区分と比較して偏って出現する度合いを示す対数尤度比を、審査観点記載文書受け節記憶部１０７と比較対象文書受け節記憶部１０８とを参照して、数１の式を用いて算出する。

The LLR calculation unit 109 compares each receiving section stored in the examination viewpoint description document receiving section storage unit 107 with the classification of the examination viewpoint description document from which the receiving section is extracted with the same classification of the comparison target document. The logarithmic likelihood ratio, which indicates the degree of uneven appearance, is calculated using the formula of Equation 1 with reference to the document receiving section storage unit 107 described in the examination viewpoint and the document receiving section storage unit 108 to be compared.

数１の式において、ａは、対数尤度比が算出される受け節が、当該受け節が抽出された審査観点記載文書の区分において出現する頻度を表す。ｂは、その受け節が、比較対象文書の同区分において出現する頻度を表す。ｃは、その受け節以外の受け節が、審査観点記載文書の同区分において出現する頻度を表す。ｄは、その受け節以外の受け節が、比較対象文書の同区分において出現する頻度を表す。Ｎは文書数（本実施形態の場合、「２」）を表す。なおここでｌｏｇは常用対数である。ＬＬＲ算出部１０９は、受け節について対数尤度比を算出すると、算出した対数尤度比をその受け節と対応付けて審査観点記載文書受け節記憶部１０７に記憶する。 In the formula of Equation 1, a represents the frequency with which the receiving clause from which the log-likelihood ratio is calculated appears in the classification of the examination viewpoint description document from which the receiving clause is extracted. b represents the frequency with which the receiving clause appears in the same category of the comparison target document. c represents the frequency with which the receiving clauses other than the receiving clauses appear in the same category of the examination viewpoint description document. d represents the frequency with which the receiving clauses other than the receiving clause appear in the same category of the comparison target document. N represents the number of documents (“2” in the case of this embodiment). Here, log is a common logarithm. When the LLR calculation unit 109 calculates the log-likelihood ratio for the receiving node, the calculated log-likelihood ratio is associated with the receiving node and stored in the examination viewpoint description document receiving section storage unit 107.

受け節リスト作成部１１０は、審査観点記載文書受け節記憶部１０７に記憶されている受け節のうち、対数尤度比の値が相対的に大きい所定数の受け節を抽出する。所定数の受け節を抽出すると、対数尤度比の順に、抽出した受け節と、当該受け節が抽出された区分の識別情報と、当該受け節の対数尤度比とを対応付けて受け節リスト記憶部１１１に記憶する。受け節リスト作成部１１０は、本発明に係る「係り先文節抽出部」の一例である。 The receiving clause list creation unit 110 extracts a predetermined number of receiving clauses having a relatively large log-likelihood ratio value from the receiving clauses stored in the receiving clause storage unit 107 of the examination viewpoint description document. When a predetermined number of receiving clauses are extracted, the extracted receiving clauses, the identification information of the division from which the receiving clauses are extracted, and the log-likelihood ratio of the receiving clauses are associated with each other in the order of the log-likelihood ratio. It is stored in the list storage unit 111. The receiving clause list creation unit 110 is an example of the “relationship clause extraction unit” according to the present invention.

係り節リスト作成部１１２は、受け節リスト記憶部１１１に記憶されている各受け節について、係り受け構造記憶部１０５に記憶されている審査観点記載文書の文節の係り受け構造のうち、その受け節が抽出された区分に対応する文節の係り受け構造を参照して、その受け節と係り受け関係を構成する係り元の文節（以下単に「係り節」という。）を抽出する。ここで抽出される係り節は、特に、その受け節と同じ文に含まれる係り節であって、その受け節に直接的に係る、目的語を含む係り節である。係り節リスト作成部１１２は、係り節を抽出すると、抽出した係り節と、当該係り節が係る受け節と、当該係り節が抽出された区分の識別情報と、当該区分において当該係り節が出現する頻度（言い換えると出現回数）とを対応付けて係り節リスト記憶部１１３に記憶する。係り節リスト作成部１１２は、本発明に係る「係り元文節抽出部」の一例である。 The dependency list creation unit 112 receives, for each reception clause stored in the reception list storage unit 111, among the dependency structures of the clauses of the examination viewpoint description document stored in the dependency structure storage unit 105. With reference to the dependency structure of the clause corresponding to the section from which the clause is extracted, the clause of the dependency that constitutes the dependency relationship with the clause (hereinafter, simply referred to as "dependent clause") is extracted. The relational clause extracted here is, in particular, a relational clause included in the same sentence as the receiving clause, and is a relational clause including an object directly related to the receiving clause. When the clause list creation unit 112 extracts the clause, the extracted clause, the receiving clause to which the clause is related, the identification information of the division from which the clause is extracted, and the clause appearing in the division. It is stored in the clause list storage unit 113 in association with the frequency of occurrence (in other words, the number of occurrences). The related clause list creation unit 112 is an example of the “related original clause extraction unit” according to the present invention.

係り節分類部１１４は、係り節リスト記憶部１１３に記憶されている各係り節について、その係り節と同じ区分において抽出された係り節であって、受け節が共通する係り節の中で、係り節に含まれる単語の意味に基づいて分類する。その際、係り節分類部１１４は、図示せぬ同義語辞書を参照して、互いに同義語を含む係り節群に対して同一の分類符号を振る。または、周知の類似度計算方法を用いて係り節同士の距離を算出し、算出した距離が閾値以内の係り節群に対して同一の分類符号を振る。係り節に振られた分類符号は、当該係り節と対応付けられて係り節リスト記憶部１１３に記憶される。 The relational clause classification unit 114 is a relational clause extracted in the same division as the relational clause for each relational clause stored in the relational clause list storage unit 113, and is among the relational clauses having a common receiving clause. Classify based on the meaning of the words contained in the clause. At that time, the related clause classification unit 114 refers to a synonym dictionary (not shown) and assigns the same classification code to the related clause groups including synonyms with each other. Alternatively, the distance between the internodes is calculated using a well-known similarity calculation method, and the same classification code is assigned to the interlocking nodes whose calculated distance is within the threshold value. The classification code assigned to the related clause is stored in the related clause list storage unit 113 in association with the related clause.

審査ロジックリスト作成部１１５は、係り節リスト記憶部１１３に記憶されている各受け節について、係り受け構造記憶部１０５に記憶されている審査観点記載文書の文節の係り受け構造のうち、その受け節が抽出された区分に対応する文節の係り受け構造を参照して、その受け節と係り受け関係を構成する係り節であって、条件を表す係り節の抽出を試みる。ここで抽出される係り節は、特に、その受け節と同じ文に含まれる係り節であって、その受け節に直接的に係る係り節である。審査ロジックリスト作成部１１５は、条件を表す係り節を抽出するにあたり、「場合は」、「ときは」、「時は」等の条件を表すキーワードを手掛かりにする。審査ロジックリスト作成部１１５は、条件を表す係り節を抽出すると、抽出された係り節に係る係り節をさらに抽出し、それらを条件部として特定する。また、抽出された条件を表す係り節と同一の文に含まれる係り節であって、係り節リスト記憶部１１３に記憶されている受け節に直接的に係る、目的語を含む係り節を抽出し、当該受け節と合わせて処理部として特定する。なお、そのような係り節が存在しない場合には、抽出を省略してもよい。条件部と処理部とを特定すると、これらが抽出された区分の識別情報と対応付けて審査ロジックリスト記憶部１１６に記憶する。審査ロジックリスト作成部１１５は、本発明に係る「係り元文節抽出部」の一例である。 The examination logic list creation unit 115 receives each of the receiving clauses stored in the dependency list storage unit 113 among the clauses of the examination viewpoint description document stored in the dependency structure storage unit 105. By referring to the dependency structure of the clause corresponding to the section in which the clause is extracted, we try to extract the dependency clause that constitutes the dependency relationship with the clause and represents the condition. The relational clause extracted here is, in particular, a relational clause included in the same sentence as the receiving clause, and is a relational clause directly related to the receiving clause. The examination logic list creation unit 115 uses keywords representing conditions such as "case", "time", and "time" as clues when extracting the clauses representing the conditions. When the examination logic list creation unit 115 extracts the relational clauses representing the conditions, the examination logic list creation unit 115 further extracts the relational clauses related to the extracted relational clauses and specifies them as the condition parts. In addition, a clause that is included in the same sentence as the clause that represents the extracted condition and that is directly related to the receiving clause stored in the clause list storage unit 113 and includes the object is extracted. Then, it is specified as a processing unit together with the receiving clause. If such a clause does not exist, the extraction may be omitted. When the condition unit and the processing unit are specified, they are stored in the examination logic list storage unit 116 in association with the identification information of the extracted categories. The examination logic list creation unit 115 is an example of the “relationship source clause extraction unit” according to the present invention.

出力部１１７は、受け節リスト記憶部１１１を参照して受け節リストを出力する。また、係り節リスト記憶部１１３を参照して係り節リストを出力する。係り節リストを出力する際、出力部１１７は、後述する分類符号を同じくする係り節群をグループ化して出力する。また、出力部１１７は、審査ロジックリスト記憶部１１６を参照して審査ロジックのリストを出力する。なおここで出力とは、情報の画面出力又は印刷出力若しくは情報を表すデータの送信を指す。 The output unit 117 outputs the receiving clause list with reference to the receiving clause list storage unit 111. Further, the related clause list is output with reference to the related clause list storage unit 113. When outputting the related clause list, the output unit 117 groups and outputs a group of related clauses having the same classification code, which will be described later. Further, the output unit 117 outputs a list of examination logics with reference to the examination logic list storage unit 116. Here, the output refers to screen output of information, print output, or transmission of data representing information.

１−２．動作
文書処理装置１により実行される文書処理方法について説明する。具体的には、審査観点記載文書から審査の観点を抽出する審査観点抽出処理について説明する。図２は、審査観点抽出処理の一例を示すフロー図である。なお、本動作例の説明では、審査観点記載文書として、診療報酬の算定方法の一部改正に伴う実施上の留意事項が入力され、比較対象文書として、診療報酬の算定方法の一部を改正する件の告示が入力される場合を想定する。 1-2. The document processing method executed by the operation document processing apparatus 1 will be described. Specifically, the examination viewpoint extraction process for extracting the examination viewpoint from the examination viewpoint description document will be described. FIG. 2 is a flow chart showing an example of the examination viewpoint extraction process. In the explanation of this operation example, the points to be noted in implementation due to the partial revision of the medical fee calculation method are input as the examination viewpoint description document, and the medical fee calculation method is partially revised as the comparison target document. It is assumed that a notification of the matter to be entered is input.

文書入力部１０１により審査観点記載文書データと比較対象文書データが取得されると、文書構造解析部１０２は、取得された審査観点記載文書データにより表される審査観点記載文書と、取得された比較対象文書データにより表される比較対象文書の各々について、文書構造を解析して、審査観点記載文書を構成する各文章に対応する、比較対象文書を構成する文章を特定する（Ｓ１）。審査観点記載文書を構成する各文章について比較対象文書において対応する文章を特定すると、対応する文章の組と、それらの文章が属する区分の識別情報とを対応付けて文書記憶部１０３に記憶する。図３は、文書記憶部１０３に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、審査観点記載文書の文章と比較対象文書の文章の組であって、互いに対応する文章の組が格納されている。 When the document input unit 101 acquires the examination viewpoint description document data and the comparison target document data, the document structure analysis unit 102 compares the acquired examination viewpoint description document with the examination viewpoint description document represented by the acquired examination viewpoint description document data. For each of the comparison target documents represented by the target document data, the document structure is analyzed, and the sentences constituting the comparison target document corresponding to each sentence constituting the examination viewpoint description document are specified (S1). When the corresponding sentence is specified in the comparison target document for each sentence constituting the examination viewpoint description document, the corresponding sentence set and the identification information of the category to which the sentence belongs are associated and stored in the document storage unit 103. FIG. 3 is a diagram showing an example of a table stored in the document storage unit 103. In the table shown in the figure, for example, it is a set of the text of the examination viewpoint description document and the text of the comparison target document in association with the classification identification information "Chapter 1", "Part 1" and "General rules". , Corresponding sets of sentences are stored.

審査観点記載文書と比較対象文書の間で文章の対応関係が特定されると、係り受け解析部１０４は、文書記憶部１０３に記憶されている各区分に対応する審査観点記載文書の文章と比較対象文書の文章を文単位に分割し、係り受け解析を行って、文章を構成する文節の係り受け構造を抽出する（Ｓ２）。図４は、一例として、「医学的に初診といわれる診療行為があった場合に、初診料を算定する。」という文が係り受け解析部１０４により係り受け解析された場合に出力される係り受け構造の一例を示す図である。同図に示す係り受け構造において、アスタリスクから始まる第１、４、７、１０、１４、１７、２１及び２４行は、文節番号と、係り先の文節番号（係り先なしの場合は「−１」）とにより構成され、その他の行は、単語の表層形、品詞、品詞細分類１、品詞細分類２、品詞細分類３、活用型、活用形、原形、読み及び発音により構成される。その他の行におけるアスタリスクは、その情報が辞書に登録されていないことを示している。同図に示す係り受け構造は、例えば、文節「医学的に」が文節「いわれる」に係っていることを示している。係り受け解析部１０４は、各区分に対応する審査観点記載文書の文章と比較対象文書の文章について文節の係り受け構造を抽出すると、抽出した係り受け構造の組と、その区分の識別情報とを対応付けて係り受け構造記憶部１０５に記憶する。図５は、係り受け構造記憶部１０５に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、審査観点記載文書の文節の係り受け構造と比較対象文書の文節の係り受け構造の組が格納されている。 When the correspondence between the texts of the examination viewpoint description document and the comparison target document is specified, the dependency analysis unit 104 compares the texts of the examination viewpoint description documents corresponding to each category stored in the document storage unit 103. The sentence of the target document is divided into sentence units, and the dependency analysis is performed to extract the dependency structure of the clauses constituting the sentence (S2). As an example, FIG. 4 shows a dependency structure that is output when the sentence "The initial examination fee is calculated when there is a medical practice called the first examination medically" is performed by the dependency analysis unit 104. It is a figure which shows an example. In the dependency structure shown in the figure, the first, fourth, seventh, tenth, 14, 17, 21 and 24 lines starting with the asterisk are the clause number and the clause number of the destination (in the case of no contact, "-1". The other lines are composed of the surface form of the word, the part of speech, the part of speech subclassification 1, the part of speech subclassification 2, the part of speech subclassification 3, the inflected type, the inflected form, the original form, the reading and the pronunciation. Asterisks in the other lines indicate that the information is not registered in the dictionary. The dependency structure shown in the figure shows, for example, that the phrase "medically" is related to the phrase "said". When the dependency analysis unit 104 extracts the dependency structure of the clause from the text of the document describing the examination viewpoint corresponding to each category and the text of the document to be compared, the set of the extracted dependency structure and the identification information of the category are obtained. It is stored in the dependency structure storage unit 105 in association with each other. FIG. 5 is a diagram showing an example of a table stored in the dependency structure storage unit 105. In the table shown in the figure, for example, the dependency structure of the clause of the examination viewpoint description document and the clause of the comparison target document are associated with the classification identification information "Chapter 1", "Part 1" and "General rules". A set of dependency structures is stored.

各区分に対応する審査観点記載文書の文章と比較対象文書の文章について係り受け構造が抽出されると、受け節抽出部１０６は、係り受け構造記憶部１０５に記憶されている各区分に対応する審査観点記載文書と比較対象文書の文節の係り受け構造を参照して、係り受け関係を構成する受け節を抽出する（Ｓ３）。一の区分に対応する審査観点記載文書の文節の係り受け構造を参照して受け節を抽出すると、当該受け節と、当該受け節が当該一の区分において出現する頻度とを、当該一の区分の識別情報と対応付けて審査観点記載文書受け節記憶部１０７に記憶する。図６は、審査観点記載文書受け節記憶部１０７に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、受け節「記載する」、出願頻度「４８」及びＬＬＲ「８０．７８」が格納されている。なお、ＬＬＲ「８０．７８」は、後述するステップＳ４の実行の結果、格納される。一方、受け節抽出部１０６は、一の区分に対応する比較対象文書の文節の係り受け構造を参照して受け節を抽出すると、当該受け節と、当該受け節が当該一の区分において出現する頻度とを、当該一の区分の識別情報と対応付けて比較対象文書受け節記憶部１０８に記憶する。図７は、比較対象文書受け節記憶部１０８に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、受け節「記載する」及び出願頻度「８」が格納されている。 When the dependency structure is extracted for the text of the examination viewpoint description document corresponding to each category and the text of the document to be compared, the receiving clause extraction unit 106 corresponds to each category stored in the dependency structure storage unit 105. By referring to the dependency structure of the clauses of the document describing the examination viewpoint and the document to be compared, the receiver clauses constituting the dependency relationship are extracted (S3). When the receiving clause is extracted by referring to the dependency structure of the clause of the examination viewpoint description document corresponding to one classification, the receiving clause and the frequency at which the receiving clause appears in the one division are classified into the one classification. It is stored in the examination viewpoint description document receiving section storage unit 107 in association with the identification information of. FIG. 6 is a diagram showing an example of a table stored in the examination viewpoint description document receiving section storage unit 107. In the table shown in the figure, for example, in association with the classification identification information "Chapter 1", "Part 1" and "General rules", the receiving clause "state", the application frequency "48" and the LLR "80. 78 ”is stored. The LLR "80.78" is stored as a result of the execution of step S4 described later. On the other hand, when the receiving clause extraction unit 106 extracts the receiving clause by referring to the dependency structure of the clause of the comparison target document corresponding to one division, the receiving clause and the receiving clause appear in the one division. The frequency is stored in the comparison target document receiving section storage unit 108 in association with the identification information of the one category. FIG. 7 is a diagram showing an example of a table stored in the comparison target document receiving section storage unit 108. In the table shown in the figure, for example, the receiving clause "state" and the application frequency "8" are stored in association with the classification identification information "Chapter 1", "Part 1" and "general rule". ..

審査観点記載文書と比較対象文書から受け節が抽出されると、ＬＬＲ算出部１０９は、審査観点記載文書受け節記憶部１０７に記憶されている各受け節について、その受け節が抽出された審査観点記載文書の区分の方に、比較対象文書の同区分と比較して偏って出現する度合いを示す対数尤度比を、審査観点記載文書受け節記憶部１０７と比較対象文書受け節記憶部１０８とを参照して算出する（Ｓ４）。各受け節について対数尤度比を算出すると、算出した対数尤度比をその受け節と対応付けて審査観点記載文書受け節記憶部１０７に記憶する。 When the receiving clause is extracted from the examination viewpoint description document and the comparison target document, the LLR calculation unit 109 examines each receiving clause stored in the examination viewpoint description document receiving clause storage unit 107 from which the receiving clause is extracted. The logarithmic likelihood ratio, which indicates the degree of biased appearance of the comparison target document in the category of the viewpoint description document, is set to the examination viewpoint description document receiver storage unit 107 and the comparison target document receiver storage unit 108. It is calculated with reference to (S4). When the log-likelihood ratio is calculated for each receiving node, the calculated log-likelihood ratio is stored in the examination viewpoint description document receiving section storage unit 107 in association with the receiving node.

各受け節について対数尤度比が算出されると、受け節リスト作成部１１０は、審査観点記載文書受け節記憶部１０７に記憶されている受け節のうち、対数尤度比の値が相対的に大きい所定数の受け節を抽出する（Ｓ５）。所定数の受け節を抽出すると、対数尤度比の順に、抽出した受け節と、当該受け節が抽出された区分の識別情報と、その受け節の対数尤度比とを対応付けて受け節リスト記憶部１１１に記憶する。図８は、受け節リスト記憶部１１１に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、受け節「実施する」及びＬＬＲ「３２６．５４」が格納されている。 When the log-likelihood ratio is calculated for each receiving clause, the receiving clause list creating unit 110 has a relative value of the log-likelihood ratio among the receiving clauses stored in the receiving clause storage unit 107 of the document describing the examination viewpoint. A large predetermined number of receiving clauses are extracted (S5). When a predetermined number of receiving clauses are extracted, the extracted receiving clauses, the identification information of the division from which the receiving clauses are extracted, and the log-likelihood ratio of the receiving clauses are associated with each other in the order of the log-likelihood ratio. It is stored in the list storage unit 111. FIG. 8 is a diagram showing an example of a table stored in the receiving clause list storage unit 111. In the table shown in the figure, for example, the receiving clauses "implement" and LLR "326.54" are stored in association with the classification identification information "Chapter 1", "Part 1" and "general rules". There is.

受け節リストが作成されると、係り節リスト作成部１１２は、受け節リスト記憶部１１１に記憶されている各受け節について、係り受け構造記憶部１０５に記憶されている審査観点記載文書の文節の係り受け構造のうち、その受け節が抽出された区分に対応する文節の係り受け構造を参照して、その受け節と係り受け関係を構成する係り節を抽出する（Ｓ６）。係り節を抽出すると、抽出した係り節と、当該係り節が係る受け節と、当該係り節が抽出された区分の識別情報と、当該区分において当該係り節が出現する頻度とを対応付けて係り節リスト記憶部１１３に記憶する。図９は、係り節リスト記憶部１１３に記憶されるテーブルの一例を示す図である。同記憶部に記憶されるテーブルは、受け節リスト記憶部１１１に記憶される受け節ごとに作成され、同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、係り節「理由を」、出現頻度「２７」及び分類符号「１」が格納されている。なお、分類符号「１」は、後述するステップＳ７の実行の結果、格納される。 When the receiving clause list is created, the dependency list creation unit 112 determines the clause of the examination viewpoint description document stored in the dependency structure storage unit 105 for each receiving clause stored in the receiving clause list storage unit 111. With reference to the dependency structure of the clause corresponding to the section in which the reception clause is extracted, the dependency clause constituting the dependency relationship with the reception clause is extracted (S6). When a related clause is extracted, the extracted related clause, the receiving clause to which the related clause is related, the identification information of the division from which the relevant clause is extracted, and the frequency at which the relevant clause appears in the relevant division are associated with each other. It is stored in the section list storage unit 113. FIG. 9 is a diagram showing an example of a table stored in the clause list storage unit 113. The table stored in the storage unit is created for each receiving node stored in the receiving section list storage unit 111. In the table shown in the figure, for example, the classification identification information "Chapter 1" and "Part 1" , And the relational clause "reason", the frequency of appearance "27", and the classification code "1" are stored in association with the "general rule". The classification code "1" is stored as a result of the execution of step S7 described later.

係り節リストが作成されると、係り節分類部１１４は、係り節リスト記憶部１１３に記憶されている各係り節について、その係り節と同じ区分において抽出された係り節であって、受け節が共通する係り節の中で、係り節に含まれる単語の意味に基づいて分類する（Ｓ７）。各係り節に分類符号を振ると、その係り節と対応付けて係り節リスト記憶部１１３に記憶する。 When the clause list is created, the clause classification unit 114 is a clause extracted in the same division as the clause for each clause stored in the clause list storage unit 113, and is a receiving clause. Is classified based on the meaning of the words included in the clauses in the common clauses (S7). When a classification code is assigned to each related clause, it is stored in the related clause list storage unit 113 in association with the related clause.

各係り節が分類されると、審査ロジックリスト作成部１１５は、係り節リスト記憶部１１３に記憶されている各受け節について、係り受け構造記憶部１０５に記憶されている審査観点記載文書の文節の係り受け構造のうち、その受け節が抽出された区分に対応する文節の係り受け構造を参照して、その受け節と係り受け関係を構成する係り節であって、条件を表す係り節の抽出を試みる。条件を表す係り節を抽出すると、抽出された係り節に係る係り節をさらに抽出し、それらを条件部として特定する。また、抽出された条件を表す係り節と同一の文に含まれる係り節であって、係り節リスト記憶部１１３に記憶されている受け節に直接的に係る、目的語を含む係り節を抽出し、当該受け節と合わせて処理部として特定する（Ｓ８）。条件部と処理部とを特定すると、これらが抽出された区分の識別情報と対応付けて審査ロジックリスト記憶部１１６に記憶する。図１０は、審査ロジックリスト記憶部１１６に記憶されるテーブルの一例を示す図である。同図に示すテーブルでは、例えば、区分の識別情報「第１章」、「第１部」及び「通則」に対応付けて、条件部「緊急の場合は」及び処理部「理由を記載する」が格納されている。
以上が、審査観点抽出処理についての説明である。 When each clause is classified, the examination logic list creation unit 115 describes the clause of the examination viewpoint description document stored in the dependency structure storage unit 105 for each receiving clause stored in the dependency list storage unit 113. Of the dependency structure of, the dependency structure of the clause corresponding to the extracted division is referred to, and the dependency clause that constitutes the dependency relationship with the reception clause of the Try to extract. When the clauses representing the conditions are extracted, the clauses related to the extracted clauses are further extracted and they are specified as the condition part. In addition, a clause that is included in the same sentence as the clause that represents the extracted condition and that is directly related to the receiving clause stored in the clause list storage unit 113 and includes the object is extracted. Then, it is specified as a processing unit together with the receiving clause (S8). When the condition unit and the processing unit are specified, they are stored in the examination logic list storage unit 116 in association with the identification information of the extracted categories. FIG. 10 is a diagram showing an example of a table stored in the examination logic list storage unit 116. In the table shown in the figure, for example, the condition part "in case of emergency" and the processing part "state the reason" are associated with the classification identification information "Chapter 1", "Part 1" and "General rules". Is stored.
The above is the description of the examination viewpoint extraction process.

以上説明した文書処理装置１では、受け節リスト作成部１１０により、審査観点記載文書に特異的に出現する受け節のリストが作成される。この受け節リストを参照することで、これから審査観点記載文書を読もうとする読者は、本文書中のどの文節に着目して読めば審査の観点を迅速に理解できるかを知ることができる。加えて、この文書処理装置１では、審査観点記載文書と比較対象文書の間で互いに関連する文章同士を比較して対数尤度比が算出されるため、そうでない場合と比較して、受け節の抽出精度が向上する。また、単語単位ではなく文節単位で抽出が行われるため、例えば、「記載する」なのか「記載しない」なのかを区別することができる。また、すべての文節ではなく、受け節であり且つ文末の動詞に限って抽出されるため、そうでない場合と比較して、ノイズの抽出が抑制される。 In the document processing device 1 described above, the receiving clause list creating unit 110 creates a receiving clause list that appears specifically in the document describing the examination viewpoint. By referring to this list of receiving clauses, the reader who is going to read the document describing the examination viewpoint can know which clause in this document should be focused on to quickly understand the viewpoint of examination. In addition, in this document processing device 1, since the log-likelihood ratio is calculated by comparing the sentences related to each other between the document describing the examination viewpoint and the document to be compared, the log-likelihood ratio is calculated. Extraction accuracy is improved. Further, since the extraction is performed not in units of words but in units of phrases, it is possible to distinguish between "state" and "not describe", for example. In addition, since only the verbs that are the receiving clauses and the end of the sentences are extracted instead of all the clauses, the noise extraction is suppressed as compared with the case where they are not.

また文書処理装置１では、係り節リスト作成部１１２により、審査観点記載文書に特異的に出現する受け節に係る係り節のリストが作成される。この係り節リストを参照することで、これから審査観点記載文書を読もうとする読者にとっての審査観点の理解がより容易になる。加えて、各係り節が係り節分類部１１４により分類されて、分類符号を同じくする係り節群がグループ化されて出力されるため、審査観点の概要の把握が容易になる。 Further, in the document processing device 1, the related clause list creation unit 112 creates a list of related clauses related to the receiving clauses that specifically appear in the document describing the examination viewpoint. By referring to this list of related clauses, it becomes easier for readers who are going to read the examination viewpoint description document to understand the examination viewpoint. In addition, each related clause is classified by the related clause classification unit 114, and the related clauses having the same classification code are grouped and output, so that the outline of the examination viewpoint can be easily grasped.

また文書処理装置１では、審査ロジックリスト作成部１１５により、条件部と処理部とからなる審査ロジックのリストが抽出される。この審査ロジックのリストを参照すれば、審査観点記載文書を読まなくても、審査観点の概要を把握することができる。 Further, in the document processing apparatus 1, the examination logic list creation unit 115 extracts a list of examination logic including the condition unit and the processing unit. By referring to this list of examination logic, it is possible to grasp the outline of the examination viewpoint without reading the examination viewpoint description document.

２．変形例
上記の実施形態は以下に記載するように変形してもよい。なお、以下に記載する１以上の変形例は互いに組み合わせてもよい。 2. 2. Modification Example The above embodiment may be modified as described below. In addition, one or more modified examples described below may be combined with each other.

２−１．変形例１
文書構造解析部１０２は省略されてもよい。その場合、文書記憶部１０３には、文書入力部１０１により取得された審査観点記載文書データにより表される審査観点記載文書全体と比較対象文書データにより表される比較対象文書全体の組が、対応する文章の組として文書記憶部１０３に記憶される。 2-1. Modification 1
The document structure analysis unit 102 may be omitted. In that case, the document storage unit 103 corresponds to a set of the entire examination viewpoint description document represented by the examination viewpoint description document data acquired by the document input unit 101 and the entire comparison target document represented by the comparison target document data. It is stored in the document storage unit 103 as a set of sentences to be written.

２−２．変形例２
上記の文書処理装置１は、日本語の文書を処理させることを想定しているが、日本語以外の言語を処理可能としてもよい。例えば、英語の文書を処理させる場合には、係り受け解析部１０４は、例えばMaltparse（http://www.maltparser.org/）を使用して、英語の文書を構成する文節の係り受け構造を抽出するようにしてよい。 2-2. Modification 2
The above-mentioned document processing device 1 is supposed to process a Japanese document, but may be capable of processing a language other than Japanese. For example, when processing an English document, the dependency analysis unit 104 uses, for example, Maltparse (http://www.maltparser.org/) to determine the dependency structure of the phrases that make up the English document. It may be extracted.

２−３．変形例３
上記の受け節抽出部１０６は、文末の動詞である受け節を抽出しているが、抽出される受け節は、文末の受け節や動詞の受け節に限定しなくてもよい。 2-3. Modification 3
The above-mentioned receiving clause extraction unit 106 extracts the receiving clause which is the verb at the end of the sentence, but the extracted receiving clause does not have to be limited to the receiving clause at the end of the sentence or the receiving clause of the verb.

２−４．変形例４
対数尤度比を算出するための上記の数１の式では常用対数がとられているが、自然対数がとられてもよい。または、そもそも対数をとらなくてもよい。 2-4. Modification 4
In the above equation of Equation 1 for calculating the log-likelihood ratio, the common logarithm is taken, but the natural logarithm may be taken. Alternatively, it is not necessary to take the logarithm in the first place.

対数尤度比を算出するための式は上記の数１の式に限られない。ＬＬＲ算出部１０９は、数１の式に代えて、例えば、数２の式を用いて対数尤度比を算出してもよい。

The formula for calculating the log-likelihood ratio is not limited to the above formula of Equation 1. The LLR calculation unit 109 may calculate the log-likelihood ratio using, for example, the equation of Equation 2 instead of the equation of Equation 1.

数２の式において、ａは、対数尤度比が算出される受け節が、当該受け節が抽出された審査観点記載文書の区分において出現する頻度を表す。ｂは、その受け節が、比較対象文書の同区分において出現する頻度を表す。ｃは、すべての受け節が審査観点記載文書の同区分において出現する頻度（言い換えると受け節の総数）を表す。ｄは、すべての受け節が比較対象文書の同区分において出現する頻度（言い換えると受け節の総数）を表す。 In the formula of Equation 2, a represents the frequency with which the receiving clause from which the log-likelihood ratio is calculated appears in the classification of the examination viewpoint description document from which the receiving clause is extracted. b represents the frequency with which the receiving clause appears in the same category of the comparison target document. c represents the frequency (in other words, the total number of receiving clauses) in which all receiving clauses appear in the same category of the examination viewpoint description document. d represents the frequency (in other words, the total number of receiving clauses) in which all receiving clauses appear in the same category of the comparison target document.

２−５．変形例５
上記の受け節リスト作成部１１０は、対数尤度比の値が相対的に大きい所定数の受け節を抽出する代わりに、対数尤度比の値が閾値よりも受け節を抽出するようにしてもよい。 2-5. Modification 5
The above-mentioned receiving clause list creating unit 110 extracts the receiving clauses whose log-likelihood ratio value is smaller than the threshold value, instead of extracting a predetermined number of receiving clauses having a relatively large log-likelihood ratio value. May be good.

２−６．変形例６
上記の係り節リスト作成部１１２は、受け節に直接的に係る、目的語を含む係り節を抽出しているが、抽出される係り節は、受け節に直接的に係る係り節や目的語を含む係り節に限定しなくてもよい。 2-6. Modification 6
The above-mentioned clause list creation unit 112 extracts the clauses including the object directly related to the receiving clause, and the extracted clauses are the clauses and the objects directly related to the receiving clause. It is not necessary to limit the clause to the clause including.

２−７．変形例７
上記の係り節分類部１１４は、同じ区分において抽出された係り節であって、受け節が共通する係り節の中で分類を行っているが、異なる区分において抽出された係り節の中で分類を行ってもよいし、受け節が共通しない係り節の中で分類を行ってもよい。 2-7. Modification 7
The above-mentioned relational clause classification unit 114 is a relational clause extracted in the same division and classifies in the relational clauses having a common receiving clause, but is classified in the relational clauses extracted in different divisions. Or you may classify in the relational clauses that do not have a common receiving clause.

２−８．変形例８
文書処理装置１が備える各機能を実現するためのプログラムは、コンピュータ装置が読み取り可能な記録媒体を介して提供されてもよい。ここで記録媒体とは、例えば、磁気テープや磁気ディスクなどの磁気記録媒体や、光ディスクなどの光記録媒体や、光磁気記録媒体や、半導体メモリ等である。また、このプログラムは、インターネット等のネットワークを介して提供されてもよい。 2-8. Modification 8
The program for realizing each function included in the document processing device 1 may be provided via a recording medium readable by the computer device. Here, the recording medium is, for example, a magnetic recording medium such as a magnetic tape or a magnetic disk, an optical recording medium such as an optical disk, an optical magnetic recording medium, a semiconductor memory, or the like. In addition, this program may be provided via a network such as the Internet.

２−９．変形例９
審査観点記載文書は、規則記載文書の一例である。ここで規則記載文書とは、規則が記載された文書である。より具体的には、例えば、人の行為や事務手続きの標準となる事柄が記載された文書である。 2-9. Modification 9
The examination viewpoint description document is an example of the rule description document. Here, the rule description document is a document in which the rule is described. More specifically, for example, it is a document that describes the standard matters of human actions and administrative procedures.

１…文書処理装置、１０１…文書入力部、１０２…文書構造解析部、１０３…文書記憶部、１０４…係り受け解析部、１０５…係り受け構造記憶部、１０６…受け節抽出部、１０７…審査観点記載文書受け節記憶部、１０８…比較対象文書受け節記憶部、１０９…ＬＬＲ算出部、１１０…受け節リスト作成部、１１１…受け節リスト記憶部、１１２…係り節リスト作成部、１１３…係り節リスト記憶部、１１４…係り節分類部、１１５…審査ロジックリスト作成部、１１６…審査ロジックリスト記憶部、１１７…出力部 1 ... Document processing device, 101 ... Document input unit, 102 ... Document structure analysis unit, 103 ... Document storage unit, 104 ... Dependency analysis unit, 105 ... Dependency structure storage unit, 106 ... Receiving clause extraction unit, 107 ... Examination Perspective description document receiving section storage unit, 108 ... Comparison target document receiving section storage unit, 109 ... LLR calculation unit, 110 ... receiving section list creating unit, 111 ... receiving section list storage unit, 112 ... related clause list creating unit, 113 ... Dependent section list storage unit, 114 ... Dependent section classification unit, 115 ... Examination logic list creation unit, 116 ... Examination logic list storage unit, 117 ... Output unit

Claims

The dependency structure of the clauses that make up the rule description document, the dependency analysis unit that extracts the dependency structure of the clauses that make up the document to be compared, and
For each of the interleaved clauses constituting the rule description document, the dependency structure of the clause extracted from the rule description document by the dependency analysis unit and the dependency structure of the clause extracted from the comparison target document are used. Therefore, a likelihood ratio calculation unit that calculates a likelihood ratio indicating the degree to which the related clause appears more biasedly in the rule description document than in the comparison target document,
Among the related clauses constituting the rule description document, a predetermined number of related clauses having a relatively large likelihood ratio value calculated by the likelihood ratio calculation unit or the likelihood ratio calculation unit calculated the likelihood ratio. A document processing device including a dependency clause extraction unit that extracts a dependency clause whose likelihood ratio value is larger than a threshold value.

For each of the dependency clauses extracted by the dependency clause extraction unit, the dependency clause extraction is performed by extracting the dependency clauses based on the dependency structure of the clauses extracted from the rule description document by the dependency analysis unit. The document processing apparatus according to claim 1, further comprising a unit.

For each of the related clauses extracted by the related clause extraction unit, the related original clauses extracted by the related original clause extraction unit are classified based on the meanings of the words included in the related original clauses. The document processing apparatus according to claim 2, further comprising.

The document processing apparatus according to claim 2, wherein the entwined clause extraction unit extracts the entangled clause representing a condition.

A document structure analysis unit that analyzes the structure of the rule description document and the structure of the comparison target document and identifies another sentence constituting the comparison target document corresponding to one sentence constituting the rule description document. Further prepare
The likelihood ratio calculation unit calculates a likelihood ratio indicating the degree to which the related clauses appear more biased toward the one sentence than the other sentences for each of the related clauses constituting the one sentence. ,
The dependency clause extraction unit is a predetermined number of dependency clauses or the likelihood in which the value of the likelihood ratio calculated by the likelihood ratio calculation unit is relatively large among the dependency clauses constituting the one sentence. The document processing apparatus according to any one of claims 1 to 4, wherein a dependency clause in which the value of the likelihood ratio calculated by the degree ratio calculation unit is larger than the threshold value is extracted.

A document processing method executed by a document processing device.
The dependency structure of the clauses that make up the rule description document, the dependency analysis step that extracts the dependency structure of the clauses that make up the document to be compared, and
For each of the interleaved clauses constituting the rule description document, the dependency structure of the clause extracted from the rule description document by the dependency analysis step and the dependency structure of the clause extracted from the comparison target document are used. Therefore, the likelihood ratio calculation step of calculating the likelihood ratio indicating the degree to which the related clause appears more biasedly in the rule description document than in the comparison target document, and
Among the related clauses constituting the rule description document, a predetermined number of related clauses having a relatively large likelihood ratio value calculated by the likelihood ratio calculation step or the likelihood ratio calculation step calculated the likelihood ratio. A document processing method including a dependent clause extraction step for extracting a related clause whose likelihood ratio value is larger than the threshold.

Computer,
The dependency structure of the clauses that make up the rule description document, the dependency analysis unit that extracts the dependency structure of the clauses that make up the document to be compared, and
For each of the interleaved clauses constituting the rule description document, the dependency structure of the clause extracted from the rule description document by the dependency analysis unit and the dependency structure of the clause extracted from the comparison target document are used. Therefore, a likelihood ratio calculation unit that calculates a likelihood ratio indicating the degree to which the related clause appears more biasedly in the rule description document than in the comparison target document,
Of the related clauses constituting the rule description document, a predetermined number of related clauses having a relatively large likelihood ratio value calculated by the likelihood ratio calculation unit or the likelihood ratio calculation unit calculated the likelihood ratio. A program that functions as a dependency clause extraction unit that extracts a dependency clause whose likelihood ratio value is larger than the threshold value.