JP5417359B2

JP5417359B2 - Document evaluation support system and document evaluation support method

Info

Publication number: JP5417359B2
Application number: JP2011041118A
Authority: JP
Inventors: 薫川端; 毅横田; 君吉待井; 義行小林; 正和藤尾
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2011-02-28
Filing date: 2011-02-28
Publication date: 2014-02-12
Anticipated expiration: 2031-02-28
Also published as: JP2012178079A

Description

本発明は、入力文書に記述されている内容が、その文書の内容に関する標準的な知識ネットワーク構造において、どこが関連する箇所であるかを示すために、入力文書の語句間の関係を上記と同様の知識ネットワーク構造で表して、標準的な知識ネットワーク構造とマッチングすることにより、類似度を評価し、その評価結果に応じて関連箇所の表示を行うことを支援する文書評価支援システムに関する。 In the present invention, the relationship between words in an input document is similar to the above in order to indicate where the content described in the input document is a related location in a standard knowledge network structure related to the content of the document. This is related to a document evaluation support system that supports similarity evaluation by matching with a standard knowledge network structure and displaying related parts according to the evaluation result.

従来から、文書内の文を構造化データとして記述し、他のデータベースから情報を検索したり、検索結果についてユーザが経験と裁量をもとに判断して、その結果を活用したりしていた。 Traditionally, sentences in a document are described as structured data, information is searched from other databases, and the search results are judged by the user based on experience and discretion, and the results are used. .

任意の内容についての語句、及び語句間の関係のように、ある知識ベース上の概念を体系的に表す技術としてオントロジが知られている。また、オントロジを表す構造化データを記述する形式として、ＸＭＬ（eXtensible Makeup Language）形式や、ＲＤＦ（Resource Description Framework）がある。 An ontology is known as a technique for systematically expressing a concept on a certain knowledge base such as a phrase about an arbitrary content and a relation between phrases. In addition, as a format for describing structured data representing an ontology, there are an XML (eXtensible Makeup Language) format and an RDF (Resource Description Framework).

オントロジ記述形式で記述された照会文により情報を検索する方法として［特許文献１］［特許文献２］に示すような情報検索システム，構造化データ検索プログラムがある。 Information retrieval systems and structured data retrieval programs such as those disclosed in [Patent Document 1] and [Patent Document 2] are known as methods for retrieving information using an inquiry statement described in an ontology description format.

オントロジを用いた検索要求により文書を検索し、検索結果についてユーザが評価情報をフィードバックする方法として［特許文献３］に示すような情報検索システムがある。 There is an information search system as shown in [Patent Document 3] as a method of searching a document by a search request using an ontology and feeding back evaluation information about a search result.

特開２００５−１６５９５８号公報JP 2005-165958 A 特開２０１０−７９８５７号公報JP 2010-79857 A 特開２００３−１０８５９７号公報JP 2003-108597 A

従来の検索方法では、検索語句や検索文、及びその語句や文に関連する情報を他の文書やデータベースの中から検索することを実現していた。 In a conventional search method, a search phrase or search sentence and information related to the phrase or sentence are searched from other documents or databases.

しかしながら、入力された文書の記述内容がその文書の内容に関する標準的な知識ネットワーク構造、または、その文書が標準的に記載すべき内容を表す知識ネットワーク構造において、どの部分にどの位関連しているか、どの部分について記載されているか、またはどの部分が記載されていないか等を知るためには、入力文書の記述内容についての語句、及び語句と語句との関係を検索するだけでなく、標準的な知識ネットワーク構造と入力文書の知識ネットワーク構造を比較して関連箇所を検索して類似度を評価し、評価結果に対応して、標準的な知識ネットワーク構造を構成する各語句の表示を変えることが必要である。 However, how much is the description content of the input document related to which part in the standard knowledge network structure related to the content of the document or the knowledge network structure representing the content that the document should be described as standard? In order to know which part is described, which part is not described, etc., not only the words and phrases between the contents of the input document but also the relation between the words and phrases are searched. The knowledge network structure of the input document and the knowledge network structure of the input document, search for related parts, evaluate the similarity, and change the display of each word constituting the standard knowledge network structure according to the evaluation result is necessary.

上記の課題を解決するために、本発明では、任意の文書の内容に関する語句と語句、及び２つの語句の関係を表す標準的な知識ネットワーク構造を保管する手段、入力文書を前記と同様に、語句と語句、及び２つの語句の関係を表し、その文書内に含まれる語句で構成される知識ネットワーク構造に変換する手段、入力文書の中から選択した文の知識ネットワーク構造と、前記標準的な知識ネットワーク構造とを比較し、一致、または類似する箇所を検索する手段、その類似度を各語句のスコアとして評価する手段、標準的なネットワーク構造を構成する各語句の表示を上記の評価結果に応じて変更する手段、を備えたことを特徴とするものである。 In order to solve the above-mentioned problem, in the present invention, a word and phrase related to the contents of an arbitrary document, means for storing a standard knowledge network structure representing the relationship between two phrases, and an input document as described above, Means for expressing a relationship between words and phrases, and two phrases, and converting to a knowledge network structure composed of the words included in the document; a knowledge network structure of a sentence selected from an input document; A means for comparing the knowledge network structure and searching for a match or similar part, a means for evaluating the similarity as a score of each word, and a display of each word constituting the standard network structure in the above evaluation result It is characterized by comprising means for changing in response.

また、本発明の文書評価支援システムにおいて、前記入力文書を知識ネットワーク構造に変換する手段とは、入力文書の各文を構文解析して、主語，述語，目的語等の関係を語句と語句、及びその関係として表し、さらに、標準的な知識ネットワーク構造と同様の関係に変換することを特徴とするものである。 Further, in the document evaluation support system of the present invention, the means for converting the input document into a knowledge network structure parses each sentence of the input document to show the relationship between the subject, predicate, object, etc. And the relationship thereof, and further converted into a relationship similar to a standard knowledge network structure.

また、本発明の文書評価支援システムにおいて、前記入力文書の知識ネットワーク構造と標準的な知識ネットワーク構造が一致する箇所というのは、入力文書の中の各文の知識ネットワーク構造と同じ知識ネットワーク構造が標準的な知識ネットワーク構造の中に存在した場合であり、前記入力文書の知識ネットワーク構造と標準的な知識ネットワーク構造が類似するというのは、入力文書の中の各文の知識ネットワーク構造の語句と語句の間に語句と語句の関係を保持できる状態で、他の語句、及び関係を補間した知識ネットワーク構造が、標準的な知識ネットワーク構造の中に存在した場合、または、入力文書の中の各文の知識ネットワーク構造の語句と語句の間の関係を保持できる状態で削除した知識ネットワーク構造が、標準的な知識ネットワーク構造の中に存在した場合であることを特徴とするものである。 In the document evaluation support system of the present invention, the location where the knowledge network structure of the input document matches the standard knowledge network structure is the same as the knowledge network structure of each sentence in the input document. The knowledge network structure of the input document is similar to the knowledge network structure of the input document because the knowledge network structure of the input document is similar to the knowledge network structure of the input document. When a knowledge network structure interpolating other words and relations exists in the standard knowledge network structure in a state where the relation between words and phrases can be maintained between words, or in each input document A knowledge network structure that has been deleted in a state that maintains the relationship between phrases in the knowledge network structure of the sentence It is characterized in that it is when present in the network structure.

本発明の文書評価支援システムによれば、任意の文書の内容についての標準的な知識ネットワーク構造において、入力文書の記載内容の関連箇所や特徴を把握することができる。 According to the document evaluation support system of the present invention, it is possible to grasp the related parts and features of the description content of the input document in the standard knowledge network structure about the content of an arbitrary document.

基本構成図。Basic configuration diagram. 標準文書構造化データから入力文書構造化データの一致、類似箇所を抽出，評価，表示するための処理手順。Processing procedure for extracting, evaluating, and displaying matching and similar parts of input document structured data from standard document structured data. 標準文書構造化データ例。Standard document structured data example. 標準文書構造化データに要注意箇所を組み込んだ例。An example in which a point requiring attention is incorporated into standard document structured data. 標準文書構造化データから入力文書構造化データの要注意箇所を抽出，参考情報を出力するための処理手順。Processing procedure for extracting important points of input document structured data from standard document structured data and outputting reference information. 入力文書を入力文書構造化データに変換する例。Example of converting input document to input document structured data. 構造化データ比較評価装置における処理手順の一例。An example of the process sequence in a structured data comparison and evaluation apparatus. 標準文書構造化データにおける、入力文書の任意文の構造化データと一致する箇所の例。The example of the location in the standard document structured data that matches the structured data of an arbitrary sentence in the input document. 標準文書構造化データにおける、入力文書の任意文の構造化データと類似する箇所の例。The example of the location similar to the structured data of the arbitrary sentences of an input document in standard document structured data. 標準文書構造化データにおける、入力文書の任意文の構造化データと類似する箇所の例。The example of the location similar to the structured data of the arbitrary sentences of an input document in standard document structured data. 標準文書構造化データにおける、入力文書の任意文の構造化データと一致する箇所の評価例。An example of evaluating a portion in the standard document structured data that matches the structured data of an arbitrary sentence in the input document. 標準文書構造化データにおける、入力文書の任意文の構造化データと類似する箇所の評価例。An example of evaluating a portion similar to structured data of an arbitrary sentence in an input document in standard document structured data. 標準文書構造化データにおける、入力文書の任意文の構造化データと類似する箇所の評価例。An example of evaluating a portion similar to structured data of an arbitrary sentence in an input document in standard document structured data. 標準文書構造化データにおける、入力文書の選択範囲についての評価結果を表示する画面例。The example of a screen which displays the evaluation result about the selection range of an input document in standard document structured data.

以下に図１から図１３を用いて本発明に係る情報参照支援システムの一実施形態について説明する。 An embodiment of an information reference support system according to the present invention will be described below with reference to FIGS.

図１に本実施例の文書評価支援システムの基本構成を示す。本システムは、任意の文書内容についての標準的なネットワーク構造データ（以後、標準文書構造化データと呼ぶ）を保存する標準文書構造化データベース（１０１）、入力文書（１０３）を知識ネットワーク構造データ（以後、入力文書構造化データと呼ぶ）に変換する構造化文書変換装置（１０４）、構造化文書変換装置によって変換された入力文書構造化データを保存する入力文書構造化データベース（１０５）、任意の文書内容についての要注意箇所を保管する要注意箇所データベース（１０２）、標準文書構造化データと入力文書構造化データを比較したり、評価することによって、要注意箇所を抽出する構造化データ比較評価装置（１０６）、入力文書構造化データの各文ごとに評価した結果を蓄積する評価結果データベース（１０９）、要注意箇所についての参考情報を保管する参考情報データベース（１０７）、入力した文書の中から要注意箇所が抽出されたとき、対応する参照情報を作成する参考情報作成装置（１０８）、評価結果に応じてグラフや語句の表示を変更して画面等に表示したり、参考情報を表示したりする評価結果表示装置（１１０）によって構成されている。 FIG. 1 shows the basic configuration of the document evaluation support system of this embodiment. The system includes a standard document structure database (101) for storing standard network structure data (hereinafter referred to as standard document structured data) for arbitrary document contents, and an input document (103) as knowledge network structure data ( (Hereinafter referred to as input document structured data), a structured document conversion device (104) for conversion to an input document structured data, an input document structured database (105) for storing input document structured data converted by the structured document conversion device, Structured data comparison / evaluation to extract points of caution by comparing or evaluating the standard document structured data and the input document structured data. Device (106), evaluation result database (accumulation result for each sentence of input document structured data) ( 09), a reference information database (107) for storing reference information about a point requiring attention, a reference information creation device (108) for creating corresponding reference information when a point of interest is extracted from the input document, It is configured by an evaluation result display device (110) that changes the display of graphs and phrases according to the evaluation results and displays them on a screen or the like, or displays reference information.

図２は標準文書構造化データから入力文書構造化データの一致、類似箇所を抽出，評価，表示するための処理手順である。図１に図示した本発明の文書評価支援システムにおける処理手順の一例をフローチャートで示す。 FIG. 2 shows a processing procedure for extracting, evaluating, and displaying matching and similar parts of the input document structured data from the standard document structured data. An example of a processing procedure in the document evaluation support system of the present invention shown in FIG. 1 is shown in a flowchart.

開始（ステップ２０１）後、標準文書構造化データベースを読み込む（ステップ２０２）。標準文書構造化データベースは、対象とする文書の内容によって、予め定義されている。例えば、ある製品の技術仕様書のような契約文書では、その製品の技術仕様書として、どのような内容を記述すべきかといった観点で、構成品や実施事項等について、必要とされる語句とその語句間の予め定義されている相互の関連性（これを関係とよぶ）が知識ネットワーク構造で記述されている。 After the start (step 201), the standard document structured database is read (step 202). The standard document structured database is defined in advance according to the content of the target document. For example, in a contract document such as a technical specification for a certain product, the required terms and terms for components and implementation items, etc. from the viewpoint of what should be described as the technical specification for that product. A predefined mutual relationship between words (this is called a relationship) is described in a knowledge network structure.

次に、入力文書を読み込み（ステップ２０３）、その中から今回評価する対象範囲を指定する（ステップ２０４）。文書は１つまたは複数の文によって構成され、指定した対象範囲についてもまた、１つまたは複数の文によって構成される。この指定された対象範囲内の文を標準文書構造化データと同様の知識ネットワーク構造に変換して、入力文書構造化データを作成する（ステップ２０５）。変換方法については、後述する。 Next, the input document is read (step 203), and a target range to be evaluated this time is designated from the input document (step 204). The document is composed of one or more sentences, and the designated target range is also composed of one or more sentences. The sentence in the designated target range is converted into a knowledge network structure similar to the standard document structured data, and input document structured data is created (step 205). The conversion method will be described later.

次に、前記入力文書構造化データの標準文書構造化データにおける関連箇所との一致，類似度を評価するために、入力文書構造化データの中の全ての文について、各文の知識ネットワーク構造データを標準文書構造化データの中から検索し（ステップ２０６）、検索状況をもとに評価（ステップ２０７）する。検索方法は図８にて、評価方法（一致及び類似の評価）は図８〜図１３にて後述する。このとき、各文の評価により類似度が高い語句については、標準文書構造化データの語句の評価スコアを高くする。入力文書構造化データの中の全ての文について評価が終了したら（ステップ２０８）、標準文書構造化データの各語句の評価スコアを合計して、その結果を表示し（ステップ２０９）、終了する（ステップ２１０）。 Next, in order to evaluate the coincidence and similarity of the input document structured data with the related parts in the standard document structured data, knowledge network structure data of each sentence for all sentences in the input document structured data Are retrieved from the standard document structured data (step 206) and evaluated based on the retrieval status (step 207). The search method will be described later with reference to FIG. 8, and the evaluation method (matching and similar evaluation) will be described later with reference to FIGS. At this time, the evaluation score of the word / phrase in the standard document structured data is increased for the word / phrase having a high degree of similarity due to the evaluation of each sentence. When the evaluation is completed for all sentences in the input document structured data (step 208), the evaluation scores of the words and phrases in the standard document structured data are summed up, and the result is displayed (step 209), and the process is completed (step 209). Step 210).

図３に、本実施例の文書評価支援システムにおける、図１に図示した標準文書構造化データ１０１の例を示す。 FIG. 3 shows an example of the standard document structured data 101 shown in FIG. 1 in the document evaluation support system of this embodiment.

この例３００では、「Ｔ１００」という製品の技術仕様書に記載されるべき内容について、語句と語句、及びその関係をツリー状のデータとして表している。楕円で囲まれているのが技術仕様書の内容に標準的に含まれる語句であり、語句と語句の関係を線でつなぎ、その関係を表す定義語を線付近に記載している。関係を概念的に表す定義語は予め定義されており、この例の定義語は、ｐ／ｏ（part_of）が機器の構成を示し、ａ／ｏ（attribute_of）が属性や種類，実施項目を示す。これにより、例えば、「Ｔ１００」３０１の技術仕様書における構成要素のひとつが「Compressor」３０２であることを示している。標準文書構造化データをここでは、ツリー状のデータとして表しているが、語句と語句、及び２つの語句の関係が定義できれば、例えば、ＸＭＬやＲＤＦ，表形式であっても構わない。 In this example 300, for the contents to be described in the technical specification of the product “T100”, the words and phrases and the relationship thereof are represented as tree-like data. Enclosed in an ellipse are the words that are normally included in the contents of the technical specification, and the relationship between the words is connected with a line, and a definition word representing the relationship is written near the line. Definition terms that conceptually represent relationships are defined in advance. In this example, p / o (part_of) indicates the device configuration, and a / o (attribute_of) indicates the attribute, type, and action item. . Thus, for example, one of the components in the technical specification “T100” 301 is “Compressor” 302. Here, the standard document structured data is represented as tree-like data. However, as long as the relationship between words and phrases and the relationship between the two words can be defined, for example, XML, RDF, or table format may be used.

図４に、本実施例の文書評価支援システムにおける、図３で図示した標準文書構造化データ１０１の中に要注意箇所データベース１０２の情報を明示した例を示す。 FIG. 4 shows an example in which the information of the caution area database 102 is specified in the standard document structured data 101 shown in FIG. 3 in the document evaluation support system of this embodiment.

この例４００では、「Fuel」４０１の属性が「Gas」４０２であるような内容が技術仕様書に記述されていれば、要注意箇所（４０３）であることを示し、参考情報（４０４）がリンクされている。例えば、語句「Fuel」と語句「Gas」、及び２つの語句の関係（ａ／ｏ）を示すようなデータに参考情報のNo.等が付与されている形式が考えられる。このとき、入力文書構造化データにおいて、語句「Fuel」と語句「Gas」、及び２つの語句の関係（ａ／ｏ）が抽出された場合、参考情報データベースに保管されているリンクしている参考情報を提示するといった活用方法が考えられる。標準文書構造化データの中から入力文書構造化データと一致、または類似する箇所を抽出するときに、要注意箇所の検索ができる。 In this example 400, if a content such that the attribute of “Fuel” 401 is “Gas” 402 is described in the technical specification, it indicates a point requiring attention (403), and reference information (404) is displayed. Linked. For example, a format in which reference information No. is added to data indicating the phrase “Fuel”, the phrase “Gas”, and the relationship (a / o) between the two phrases is conceivable. At this time, when the phrase “Fuel”, the phrase “Gas”, and the relationship (a / o) between the two phrases are extracted from the input document structured data, the linked reference stored in the reference information database. A method of utilizing information can be considered. When extracting a part that matches or is similar to the input document structured data from the standard document structured data, it is possible to search for a part requiring attention.

図５に、本実施例の文書評価支援システムにおける、図４に図示した標準文書構造化データの１０１中に要注意箇所１０２を明示している場合の処理手順の一例をフローチャートで示す。 FIG. 5 is a flowchart illustrating an example of a processing procedure in the document evaluation support system according to the present exemplary embodiment when the point of interest 102 is clearly specified in the standard document structured data 101 illustrated in FIG. 4.

図５では、図２で示したフローチャートに、要注意箇所の判定（ステップ２１１）と要注意箇所が判定された場合に参考情報を出力する（ステップ２１２）処理が追加されている。 In FIG. 5, determination of a point requiring attention (step 211) and processing for outputting reference information when a point requiring attention (step 212) are added to the flowchart shown in FIG.

図６に、本実施例の文書評価支援システムにおける、図１に図示した標準化文書変換装置１０４が実施する、構造化文書変換方法例を示す。 FIG. 6 shows an example of a structured document conversion method executed by the standardized document conversion apparatus 104 shown in FIG. 1 in the document evaluation support system of this embodiment.

例えば、入力文書に記載されている文例１（６０１）に示すような文について、自然言語処理等、形式言語の解析で使用される構文解析を実施した結果として、文法的な語句の係り受け関係がわかる。この例では、構文解析の結果として、主語「panel」（６０２），述語「be provided」（６０３），目的語「device」（６０４）を抽出したとする（６０５）。さらに、この中の述語（動詞）を対象文書の内容によって標準文書構造データをもとに定義される、動詞―関係変換表（６０６）を使って、語句「panel」と語句「device」、及び２つの語句の関係「ｐ／ｏ」に変換される（６０７）。文例２（６１１）も同様に、構文解析の結果として、主語「heater」（６１２），述語「have」（６１３），目的語「ＫＯＳＨＡ」（６１４），「ＡＳＭＥ」（６１５）を抽出したとき（６１６）、動詞―関係変換表（６０６）を使って、語句「heater」と語句「ＫＯＳＨＡ」、及び２つの語句の関係「ａ／ｏ」と語句「heater」と語句「ＫＯＳＨＡ」、及び２つの語句の関係「ａ／ｏ」に変換される（６１７）。この結果、入力文書は標準文書構造化データと比較，評価しやすい形式になる。 For example, as a result of performing a syntax analysis used in a formal language analysis such as natural language processing on a sentence as shown in Sentence Example 1 (601) described in an input document, a grammatical phrase dependency relationship I understand. In this example, it is assumed that the subject “panel” (602), the predicate “be provided” (603), and the object “device” (604) are extracted as a result of the syntax analysis (605). Furthermore, using the verb-relationship conversion table (606), the predicate (verb) is defined based on the standard document structure data according to the content of the target document, and the phrase “panel”, the phrase “device”, and The relationship between the two phrases is converted to “p / o” (607). Similarly, in the sentence example 2 (611), when the subject “heater” (612), the predicate “have” (613), the object “KOSHA” (614), and “ASME” (615) are extracted as the result of the syntax analysis. (616), using the verb-relative conversion table (606), the phrase “heater” and the phrase “KOSHA”, and the relationship between the two phrases “a / o”, the phrase “heater”, the phrase “KOSHA”, and 2 It is converted into the relationship “a / o” of two words (617). As a result, the input document is in a format that can be easily compared and evaluated with the standard document structured data.

図７に、本発明の文書評価支援システムにおける、図１に図示した構造化データ比較評価装置１０６における処理手順の一例をフローチャートで示す。 FIG. 7 is a flowchart showing an example of a processing procedure in the structured data comparison / evaluation apparatus 106 shown in FIG. 1 in the document evaluation support system of the present invention.

開始（ステップ７０１）後、入力文書構造化データを読み込む（ステップ７０２）。さらに、その中から対象となる文の構造化データを読み込む（ステップ７０３）。文の構造化データ、例えば、文例１から変換された、語句「panel」と語句「device」、及び２つの語句の関係「ｐ／ｏ」（６０７）や、文例２から変換された語句「heater」と語句「ＫＯＳＨＡ」、及び２つの語句の関係「ａ／ｏ」と語句「heater」と語句「ＫＯＳＨＡ」、及び２つの語句の関係「ａ／ｏ」（６１７）を読み込む。次に、文の構造化データから親子関係を全て抽出する（７０４）。例えば、文例１からは、親子関係は１つであるが、文例２では、親子関係は２つになる。親子関係は、親の語句と子の語句、及び２つの語句の関係からなる。さらに、各親子関係を標準構造化データの中から抽出する（７０５，７０６）。これを、対象となる文書の中の全ての文から抽出した親子関係について繰り返し（７０７，７０８，７０９）、終了する。 After the start (step 701), the input document structured data is read (step 702). Further, the structured data of the target sentence is read from among them (step 703). Structured data of a sentence, for example, the phrase “panel” and the phrase “device” converted from the sentence example 1 and the relationship “p / o” (607) between the two phrases, and the phrase “heater” converted from the sentence example 2 ”And the phrase“ KOSHA ”, and the relationship“ a / o ”between the two phrases, the phrase“ heater ”, the phrase“ KOSHA ”, and the relationship“ a / o ”(617) between the two phrases. Next, all parent-child relationships are extracted from the structured data of the sentence (704). For example, the sentence example 1 has one parent-child relationship, but the sentence example 2 has two parent-child relationships. The parent-child relationship includes a parent word and a child word and a relationship between two words. Further, each parent-child relationship is extracted from the standard structured data (705, 706). This is repeated for the parent-child relationships extracted from all sentences in the target document (707, 708, 709), and the process ends.

図８に、本発明の文書評価支援システムにおける、図１に図示した構造化データ比較評価装置１０６が図７に図示した処理手順に従って実施した、標準文書構造化データの中から入力文書構造化データを検索する例を示す。 FIG. 8 shows the input document structured data from the standard document structured data, which is executed by the structured data comparison / evaluation apparatus 106 shown in FIG. 1 according to the processing procedure shown in FIG. 7 in the document evaluation support system of the present invention. An example of searching is shown.

図６で図示した文例１（６０１）は、語句「panel」と語句「device」、及び２つの語句の関係「ｐ／ｏ」に変換される（６０７，８１０）。これと一致する、語句と語句、及び２つの語句の関係を標準文書構造化データの中から検索する。その結果、この例では標準文書構造化データの中に、文例と一致する語句と語句、及び２つの語句の関係で示された親子関係が検索される（８２１）。 The sentence example 1 (601) illustrated in FIG. 6 is converted into the phrase “panel”, the phrase “device”, and the relationship “p / o” between the two phrases (607, 810). The word / phrase and the relationship between the two words / phrases that match this are searched from the standard document structured data. As a result, in this example, in the standard document structured data, a phrase and a phrase that match the sentence example and a parent-child relationship indicated by the relationship between the two phrases are searched (821).

図９に、本発明の文書評価支援システムにおける、図１に図示した構造化データ比較評価装置１０６が図７に図示した処理手順に従って実施した、標準文書構造化データの中から入力文書構造化データを検索する他の例を示す。 FIG. 9 shows the input document structured data from the standard document structured data, which is executed by the structured data comparison / evaluation apparatus 106 shown in FIG. 1 according to the processing procedure shown in FIG. Another example of searching is shown.

図６で図示した文例２（６１１）は、語句「heater」と語句「ＫＯＳＨＡ」、及び２つの語句の関係「ａ／ｏ」と語句「heater」と語句「ＡＳＭＥ」、及び２つの語句の関係「ａ／ｏ」に変換される（６１７，９１０）。これと一致する、語句と語句、及び２つの語句の関係を標準文書構造化データの中から検索する。このとき、標準文書構造化データの語句「Heater」と語句「Certificate」、及び２つの語句の関係「ａ／ｏ」と、語句「Certificate」と語句「ＫＯＳＨＡ」、及び２つの語句の関係「ａ／ｏ」（９２１）において、２つの関係が同じことから、「Heater」と語句「ＫＯＳＨＡ」について、２つの語句の関係が継承されて「ａ／ｏ」になる（９３１，９１１）。すなわち、「Heater」の属性（実施事項）の一つが「Certificate」で、「Certificate」の属性（種類）の一つが「ＫＯＳＨＡ」であるとき、「Heater」の属性（実施事項）の一つが「ＫＯＳＨＡ」（９１１）であるとする。その結果、この例では標準文書構造化データの中に、文例と一致する語句と語句、及び２つの語句の関係で示された親子関係が検索される（９３１）。「Heater」と「ＡＳＭＥ」の関係も同様である（９１２，９２２，９３２）。 Sentence example 2 (611) illustrated in FIG. 6 includes the phrase “heater” and the phrase “KOSHA”, the relationship between the two phrases “a / o”, the phrase “heater”, the phrase “ASME”, and the relationship between the two phrases. It is converted to “a / o” (617, 910). The word / phrase and the relationship between the two words / phrases that match this are searched from the standard document structured data. At this time, the phrase “Heater” and the phrase “Certificate” of the standard document structured data and the relationship “a / o” between the two phrases, the phrase “Certificate”, the phrase “KOSHA”, and the relationship “a” between the two phrases / O "(921), since the two relations are the same, the relation between the two phrases for" Heater "and the phrase" KOSHA "is inherited to become" a / o "(931, 911). That is, when one of the attributes (implementation items) of “Heater” is “Certificate” and one of the attributes (types) of “Certificate” is “KOSHA”, one of the attributes (implementation items) of “Heater” is “ KOSHA "(911). As a result, in this example, in the standard document structured data, the phrase and phrase that match the sentence example and the parent-child relationship indicated by the relationship between the two phrases are retrieved (931). The relationship between “Heater” and “ASME” is also the same (912, 922, 932).

このように、入力文書構造化データの中の任意の文構造化データについて、２つの語句の間の語句を２つの語句の間の相互の関連性が保持されるように削除，省略することによって、文構造化データ内の語句と語句、及び相互の関連性が高い２つの語句の関係が標準文書構造化データの一部と一致する場合を類似する箇所としている。 In this way, by deleting or omitting a phrase between two phrases so that the mutual relationship between the two phrases is maintained for any sentence structured data in the input document structured data The phrase in the sentence structured data and the phrase and the case where the relationship between the two phrases having a high correlation with each other coincide with a part of the standard document structured data are regarded as similar parts.

このように入力文書の評価結果を表示することによって、入力文書の関連箇所や特徴、また、記載内容の十分な箇所，不十分な箇所等を把握することができる。 By displaying the evaluation result of the input document in this way, it is possible to grasp the related parts and features of the input document, and the sufficient and insufficient parts of the description content.

図１０に、本発明の文書評価支援システムにおける、図１に図示した構造化データ比較評価装置１０６が図７に図示した処理手順に従って実施した、標準文書構造化データの中から入力文書構造化データを検索する他の例を示す。 FIG. 10 shows the input document structured data out of the standard document structured data, which is executed by the structured data comparison and evaluation apparatus 106 shown in FIG. 1 according to the processing procedure shown in FIG. 7 in the document evaluation support system of the present invention. Another example of searching is shown.

入力文書構造化データの中の文例３の構造化データとして、語句「Fuel」と語句「system」の２つの語句の関係が「ｐ／ｏ」、さらに、語句「system」と語句「Oil」の２つの語句の関係が「ｐ／ｏ」である場合（１０１０）、これと一致する、語句と語句、及び２つの語句の関係を標準文書構造化データの中から検索する。このとき、検索する親子関係のパターンとして、例３（ａ）語句「Fuel」と語句「system」、及び２つの語句の関係「ｐ／ｏ」（１０１１）、例３（ｂ）語句「system」と語句「Oil」、及び２つの語句の関係「ｐ／ｏ」（１０１２）、さらに、２つの関係が同じ「ｐ／ｏ」であることから、例３（ｃ）語句「Fuel」と語句「Oil」、及び２つの語句の関係「ｐ／ｏ」（１０１３）を設定する。その結果、この例では標準文書構造化データの中に、例３（ｃ）と一致する語句と語句、及び２つの語句の関係で示された親子関係が検索される（１０２１）。 As the structured data of sentence example 3 in the input document structured data, the relationship between the two phrases of the phrase “Fuel” and the phrase “system” is “p / o”, and the phrase “system” and the phrase “Oil” If the relationship between the two phrases is “p / o” (1010), the phrase and the phrase and the relationship between the two phrases that match this are searched from the standard document structured data. At this time, as a pattern of the parent-child relationship to be searched, Example 3 (a) the phrase “Fuel” and the phrase “system”, and the relationship between the two phrases “p / o” (1011), Example 3 (b) the phrase “system” And the phrase “Oil”, the relationship between the two phrases “p / o” (1012), and the two relations are the same “p / o”. Example 3 (c) The phrase “Fuel” and the phrase “ Oil ”and the relationship“ p / o ”(1013) between the two phrases. As a result, in this example, in the standard document structured data, the phrase and the phrase that match Example 3 (c) and the parent-child relationship indicated by the relationship between the two phrases are searched (1021).

図１１，図１２，図１３に、本発明の文書評価支援システムにおける、図１に図示した構造化データ比較評価装置１０６が標準文書構造化データの中から入力文書構造化データを検索したときの評価スコアの算出例を示す。 11, 12, and 13, when the structured data comparison and evaluation apparatus 106 illustrated in FIG. 1 retrieves the input document structured data from the standard document structured data in the document evaluation support system of the present invention. The example of calculation of an evaluation score is shown.

図１１は、図８で検索した文例１について、評価スコアを算出した例を示す。
親の語句：１.０×α₀₁（α₀₁：「一致」した場合の親の重み）
子の語句：１.０×α₀₂（α₀₂：「一致」した場合の子の重み）
その結果、α₀₁＝α₀₂＝１.０のとき、
親の語句「panel」：１.０×１.０＝１.０
子の語句「device」：１.０×１.０＝１.０
図１２は、図９で検索した文例２について、評価スコアを算出した例を示す。
親の語句：１.０×（１／ｎ）×β₁₁（β₁₁：語句の補間により「類似」した場合
の親の重み）
子の語句：１.０×（１／ｎ）×β₁₂（β₁₂：語句の補間により「類似」した場合
の子の重み）
その結果、β₁₁＝β₁₂＝１.０のとき、ｎ＝２
親の語句「Heater」：１.０×０.５×１.０＝０.５（１）
子の語句「ＫＯＳＨＡ」：１.０×０.５×１.０＝０.５
親の語句「Heater」：１.０×０.５×１.０＝０.５（２）
子の語句「ＡＳＭＥ」：１.０×０.５×１.０＝０.５
（１），（２）から、「Heater」のスコアは、０.５＋０.５＝１.０となる。 FIG. 11 shows an example in which an evaluation score is calculated for sentence example 1 retrieved in FIG.
Parent phrase: 1.0 × α ₀₁ (α ₀₁ : Parent weight when matched)
Child phrase: 1.0 × α ₀₂ (α ₀₂ : Child weight when matched)
As a result, when α ₀₁ = α ₀₂ = 1.0,
Parent phrase “panel”: 1.0 × 1.0 = 1.0
Child phrase “device”: 1.0 × 1.0 = 1.0
FIG. 12 shows an example in which an evaluation score is calculated for sentence example 2 retrieved in FIG.
Parent word / phrase: 1.0 × (1 / n) × β ₁₁ (β ₁₁ : “similar” by word interpolation
Parent weight)
Child phrase: 1.0 × (1 / n) × β ₁₂ (β ₁₂ : “similar” by interpolation of phrases
Child weight)
As a result, when β ₁₁ = β ₁₂ = 1.0, n = 2
Parent phrase “Heater”: 1.0 × 0.5 × 1.0 = 0.5 (1)
Child phrase “KOSHA”: 1.0 × 0.5 × 1.0 = 0.5
Parent phrase “Heater”: 1.0 × 0.5 × 1.0 = 0.5 (2)
Child phrase “ASME”: 1.0 × 0.5 × 1.0 = 0.5
From (1) and (2), the score of “Heater” is 0.5 + 0.5 = 1.0.

このように、入力文書構造化データの中の任意の文構造化データについて、２つの語句の間に２つの語句の相互の関連性が保持されるように語句を追加，補間することによって文構造化データ内の語句と語句、及び相互の関連性が高い２つの語句の関係が標準文書構造化データの一部と一致する場合を類似する箇所としている。図１３は、図１０で検索した文例３について、評価スコアを算出した例を示す。
親の語句：１.０×（１／ｍ）×γ₂₁（γ₂₁：語句の削除により「類似」した場合
の親の重み）
子の語句：１.０×（１／ｍ）×γ₂₂（γ₂₂：語句の削除により「類似」した場合
の子の重み）
その結果、γ₂₁＝γ₂₂＝１.０のとき、ｍ＝２
親の語句「Fuel」：１.０×０.５×１.０＝０.５
子の語句「Oil」：１.０×０.５×１.０＝０.５
このように、入力文書構造化データの中の任意の文構造化データについて、２つの語句の間の語句を２つの語句の間の相互の関連性が保持されるように削除，省略することによって、文構造化データ内の語句と語句、及び相互の関連性が高い２つの語句の関係が標準文書構造化データの一部と一致する場合を類似する箇所としている。 Thus, for any sentence structured data in the input document structured data, the sentence structure is added and interpolated so that the mutual relationship between the two phrases is maintained between the two phrases. The case where the relation between the words and phrases in the structured data and the two phrases having high correlation with each other coincides with a part of the standard document structured data is regarded as a similar part. FIG. 13 shows an example in which an evaluation score is calculated for sentence example 3 retrieved in FIG.
Parent word / phrase: 1.0 × (1 / m) × γ ₂₁ (γ ₂₁ : When “similar” by deleting word / phrase
Parent weight)
Child phrases: 1.0 × (1 / m) × γ ₂₂ (γ ₂₂ : “similar” by deleting a phrase
Child weight)
As a result, when γ ₂₁ = γ ₂₂ = 1.0, m = 2
Parent word “Fuel”: 1.0 × 0.5 × 1.0 = 0.5
Child phrase “Oil”: 1.0 × 0.5 × 1.0 = 0.5
In this way, by deleting or omitting a phrase between two phrases so that the mutual relationship between the two phrases is maintained for any sentence structured data in the input document structured data The phrase in the sentence structured data and the phrase and the case where the relationship between the two phrases having a high correlation with each other coincide with a part of the standard document structured data are regarded as similar parts.

尚、ここで示した重みについては、標準文書構造化データや入力文書の構成や詳細度によって変更する。 Note that the weights shown here are changed depending on the standard document structured data and the configuration and level of detail of the input document.

図１４に、本発明の文書評価支援システムにおける、図１に図示した評価結果表示装置１１０によって、評価結果に応じて標準構造化データを構成する語句の表示を変更した出力画面例を示す。この例では、文書の対象範囲を選択して、標準文書構造を表示させると、標準文書構造化データ全体の中に、対象となった文の内容に関連する語句について、評価スコアの大きさに応じて、語句を表す楕円内の色の濃淡を変えて表示している。これによって、例えば、契約文書の中の選択した文の内容の特徴や、標準的な契約文書に対して、どこが関係しているか、内容として不十分なところはどこかといったことが容易にわかる。 FIG. 14 shows an example of an output screen in which the display of the words constituting the standard structured data is changed according to the evaluation result by the evaluation result display device 110 shown in FIG. 1 in the document evaluation support system of the present invention. In this example, when the target range of the document is selected and the standard document structure is displayed, the evaluation score of the word or phrase related to the content of the target sentence in the entire standard document structured data is increased. Correspondingly, the shades of colors in the ellipses representing words are changed and displayed. As a result, for example, it is easy to know the characteristics of the content of the selected sentence in the contract document, where it relates to the standard contract document, and where the content is insufficient.

実施例１では、構造化文書変換装置１０４により変換することを前提としていたが、その変換処理を終えたデータは入力文書構造化データベース１０５へ記録しておけば必ずしも解析処理は必要ではない。つまり、図３，図４，図８〜図１４の画面は、構造化データ比較評価装置１０６で、それらデータベースに記録された標準文書構造化データと入力文書構造化データを比較することにより表示する。 In the first embodiment, it is assumed that the structured document conversion apparatus 104 converts the data. However, if the converted data is recorded in the input document structured database 105, the analysis processing is not necessarily required. That is, the screens of FIGS. 3, 4, and 8 to 14 are displayed by comparing the standard document structured data recorded in the database with the input document structured data by the structured data comparison / evaluation apparatus 106. .

つまり、指定された種類の文書の内容に関する、語句と語句、及び相互の関連性が高い２つの語句の関係を記述した標準的な知識ネットワーク構造データ（以後、標準文書構造化データとよぶ）を予め記録したデータベース（１０１）に保持し、入力した任意の文書を前記標準文書構造化データと同様の形式である、語句と語句、及び相互の関連性が高い２つの語句の関係を示す知識ネットワーク構造データ（以後、入力文書構造化データ）を記録したデータベース（１０５）に保持し、前記標準文書構造化データ（１０１）と前記入力文書構造化データ（１０５）との一致、または類似度を語句ごとに評価し、その評価結果に応じて、標準文書構造化データを構成する語句の表示を変えて、標準文書構造化データのなかの、入力文書に記述されている内容についての関連箇所を表示手段に可視化する文書評価支援システムの表示方法により、任意の文書の内容についての標準的な知識ネットワーク構造において、入力文書の記載内容の関連箇所や特徴を把握することができる。 That is, standard knowledge network structure data (hereinafter referred to as standard document structured data) that describes the relationship between two phrases that are highly related to each other with respect to the contents of a specified type of document. Knowledge network indicating the relationship between words and phrases and two words and phrases having a high relationship with each other, which is stored in the database (101) recorded in advance and has an input arbitrary document in the same format as the standard document structured data Structure data (hereinafter referred to as input document structured data) is stored in a database (105), and a match or similarity between the standard document structured data (101) and the input document structured data (105) is expressed as a phrase. In accordance with the evaluation result, the display of the words constituting the standard document structured data is changed and described in the input document in the standard document structured data. By using the display method of the document evaluation support system that visualizes the relevant parts of the contents being displayed on the display means, the relevant parts and characteristics of the contents of the input document are grasped in the standard knowledge network structure of the contents of any document can do.

また、前記語句の表示を変えた箇所に、評価スコアを関連付けて表示する文書評価支援システムの表示方法により、作業者は関連箇所と合わせて、標準文書構造化データと入力文書構造化データの類似の程度を容易に把握することができる。 Further, according to the display method of the document evaluation support system in which the evaluation score is displayed in association with the location where the display of the phrase is changed, the operator can combine the related document with the similarity between the standard document structured data and the input document structured data Can be easily grasped.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。例えば、上記した実施例は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、ある実施例の構成の一部を他の実施例の構成に置き換えることが可能であり、また、ある実施例の構成に他の実施例の構成を加えることも可能である。また、各実施例の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 In addition, this invention is not limited to an above-described Example, Various modifications are included. For example, the above-described embodiments have been described in detail for easy understanding of the present invention, and are not necessarily limited to those having all the configurations described. Further, a part of the configuration of one embodiment can be replaced with the configuration of another embodiment, and the configuration of another embodiment can be added to the configuration of one embodiment. Further, it is possible to add, delete, and replace other configurations for a part of the configuration of each embodiment.

また、上記の各構成，機能，処理部，処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、上記の各構成，機能等は、プロセッサがそれぞれの機能を実現するプログラムを解釈し、実行することによりソフトウェアで実現してもよい。各機能を実現するプログラム，テーブル，ファイル，測定情報，算出情報等の情報は、メモリや、ハードディスク，ＳＳＤ（Solid State Drive）等の記録装置、または、ＩＣカード，ＳＤカード，ＤＶＤ等の記録媒体に置くことができる。よって、各処理，各構成は、処理部，処理ユニット，プログラムモジュールなどとして各機能を実現可能である。 Each of the above-described configurations, functions, processing units, processing means, and the like may be realized by hardware by designing a part or all of them with, for example, an integrated circuit. Further, each of the above-described configurations, functions, and the like may be realized by software by interpreting and executing a program that realizes each function by the processor. Information such as programs, tables, files, measurement information, and calculation information for realizing each function is stored in a recording device such as a memory, a hard disk, or an SSD (Solid State Drive), or a recording medium such as an IC card, an SD card, or a DVD. Can be put in. Therefore, each process and each configuration can realize each function as a processing unit, a processing unit, a program module, and the like.

また、制御線や情報線は説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。実際には殆ど全ての構成が相互に接続されていると考えてもよい。 Further, the control lines and information lines indicate what is considered necessary for the explanation, and not all the control lines and information lines on the product are necessarily shown. Actually, it may be considered that almost all the components are connected to each other.

１０１標準文書構造化データベース
１０２要注意箇所データベース
１０３入力文書
１０４構造化文書変換装置
１０５入力文書構造化データベース
１０６構造化データ比較評価装置
１０７参考情報データベース
１０８参考情報作成装置
１０９評価結果データベース 101 Standard Document Structured Database 102 Important Point Database 103 Input Document 104 Structured Document Conversion Device 105 Input Document Structured Database 106 Structured Data Comparison Evaluation Device 107 Reference Information Database 108 Reference Information Creation Device 109 Evaluation Result Database

Claims

Standard knowledge network structure data (hereinafter referred to as standard document structured data) that describes the relationship between two phrases that are highly related to each other with respect to the contents of a specified type of document. Database and
Knowledge network structure data (hereinafter referred to as input document structured data) indicating the relationship between two phrases that are highly related to each other, in terms of phrases and phrases that are in the same format as the standard document structured data. A structured document conversion device (104) for converting to
The agreement or similarity between the standard document structured data and the input document structured data is evaluated for each word, and the display of the words constituting the standard document structured data is changed according to the evaluation result. A document evaluation support system comprising a structured data comparison / evaluation device for visualizing a related portion of the contents described in an input document in the document structured data.

The document evaluation support system according to claim 1,
In the standard document structured data, a point of caution in a specified type of document is described as a relationship between a phrase and a phrase, and two phrases that are highly related to each other,
The structured data comparison / evaluation apparatus searches for a point requiring attention when extracting a part that matches or is similar to the input document structured data from the standard document structured data. .

The document evaluation support system according to claim 1,
The input arbitrary document is composed of a plurality of sentences, and each sentence is converted into knowledge network structure data (hereinafter, sentence structured data) indicating the relation between two phrases that are highly related to each other. The input document structured data is composed of one or more sentence structured data,
The structured data comparison / evaluation apparatus selects a sentence to be evaluated when evaluating the match or similarity between the standard document structured data and the input document structured data. Evaluation support system.

The document evaluation support system according to claim 1,
The portion that matches the input document structured data in the standard document structured data is a phrase and a phrase of arbitrary sentence structured data in the input document structured data, and two that are highly related to each other. This is a case where the relationship of words / phrases exists in the standard document structured data as well, and the similar part refers to any sentence structured data in the input document structured data (1) between two words / phrases By adding and interpolating words so that the relationship between the two words is maintained, the words and phrases in the sentence structuring data and the relationship between the two words having a high relationship with each other are structured in the standard document. If it matches part of the data, or (2) delete or omit the phrase between the two phrases so that the mutual relationship between the two phrases is preserved. Words and phrases and their interrelationships Document evaluation assistance system, wherein the relationship have two words is if it matches a portion of the standard document structured data.

The document evaluation support system according to claim 1,
The structured data comparison / evaluation apparatus determines how much the standard document structured data and the input document structured data match or are similar to each other, a phrase of sentence structured data in the input document structured data, and mutual Evaluation of each phrase that is determined by how the relationship between the two highly related phrases matches the phrase in the standard document structured data and the relationship between the two highly related phrases A document evaluation support system, wherein a score is used to determine the evaluation score of all sentences in the input document structured data to be evaluated for each word.

The document evaluation support system according to claim 1,
The structured data comparison / evaluation apparatus relates to an arbitrary document input into the standard document structured data based on a match or similarity result between the standard document structured data and the input document structured data. To display a location, a network diagram representing standard document structured data, a tree-structured phrase, or a display of mutual relations is highlighted by distinguishing it by color, shape, etc. depending on the total evaluation score of each phrase. Document evaluation support system, characterized by visualization.

Standard knowledge network structure data (hereinafter referred to as standard document structured data) that describes the relationship between two phrases that are highly related to each other with respect to the contents of a specified type of document. Stored in the database
Knowledge network structure data (hereinafter referred to as input document structured data) indicating the relationship between two phrases that are highly related to each other, in terms of phrases and phrases that are in the same format as the standard document structured data. In the recorded database,
The agreement or similarity between the standard document structured data and the input document structured data is evaluated for each word, and the display of the words constituting the standard document structured data is changed according to the evaluation result. A display method for a document evaluation support system, characterized in that a related portion of contents described in an input document in document structured data is visualized on a display means.

8. The display method for a document evaluation support system according to claim 7, wherein an evaluation score is displayed in association with a place where the display of the phrase is changed.