JP5644087B2

JP5644087B2 - Component highlighting apparatus, program, and method

Info

Publication number: JP5644087B2
Application number: JP2009252341A
Authority: JP
Inventors: 田中　一成; 一成田中
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2009-11-02
Filing date: 2009-11-02
Publication date: 2014-12-24
Anticipated expiration: 2029-11-02
Also published as: JP2011096200A

Description

特許文書の理解を支援するための技術に係り、とくに、構成要素ハイライト装置に関する。 More particularly, the present invention relates to a component highlighting device.

特許文書は特許の専門家以外でも、発明者が他社特許の侵害を回避するなどの目的で読む必要がある。
しかし、特許文書中の特許請求の範囲に記載される請求項などは一般に、単語又は単語の組合せで記述されることにより、或る特定の意味を持つが抽象的な言葉で書かれることが多く、特許文書に不慣れな発明者が、請求項に書かれている発明のポイントをつかむのは難しい。 Patent documents must be read by non-patent experts for the purpose of inventors to avoid infringement of other companies' patents.
However, claims and the like described in claims of patent documents are generally written in words or combinations of words, and thus are often written in abstract words with a specific meaning. It is difficult for an inventor who is unfamiliar with patent documents to grasp the points of the invention described in the claims.

また、特許の明細書では、その発明においてポイントとなる構成要素以外にも、周辺の構成要素についても一通り説明が書かれるため、明細書を読む際には、その特許のポイントとなる構成要素についての説明を探しながら読む必要がある。 In addition, in the specification of a patent, in addition to the constituent elements that are the points in the invention, the entire surrounding constituent elements are also described, so when reading the specification, the constituent elements that are the points of the patent Need to read while looking for an explanation about.

一方、発明の効果を記載している文書において例えば「〜でき、・・・できる。」というような表現パターンを使って、課題表現と効果表現を抽出することができるという研究結果が報告されている。（たとえば、非特許文献１）
また、発明の効果を記載している文書において例えば「〜ことにより・・・」というような表現パターンを使って、手段の記載と効果の記載の間の関係（因果関係）を抽出することができるという研究結果も報告されている。（たとえば、非特許文献２） On the other hand, in a document describing the effects of the invention, research results have been reported that it is possible to extract problem expressions and effect expressions by using an expression pattern such as “can do it ... Yes. (For example, Non-Patent Document 1)
In addition, in a document describing the effect of the invention, for example, using an expression pattern such as “depending on ...”, a relationship (causal relationship) between the description of the means and the description of the effect can be extracted. Research results have also been reported. (For example, Non-Patent Document 2)

特開２００２−６３１９２号公報JP 2002-63192 A

坂地泰紀，野中尋史，酒井浩之，増山繁，特許文書からのブートストラップ手法を用いた課題・効果表現対の抽出，情報処理学会研究報告，ｖｏｌ．２００９−ＮＬ−１９２，ｎｏ．１４，ｐｐ．８５−９２，２００９．Y. Sakachi, Hiroshi Nonaka, Hiroyuki Sakai, Shigeru Masuyama, Extraction of problem / effect expression pairs from patent documents using bootstrap method, IPSJ research report, vol. 2009-NL-192, no. 14, pp. 85-92, 2009. 石川大介，石塚英弘，宇陀則彦，藤原譲，特許文献における因果関係の抽出と統合：概要とその後の展開，情報知識学会誌、Ｖｏｌ．１５，ｐｐ．９８−１０６，２００５．Daisuke Ishikawa, Hidehiro Ishizuka, Norihiko Uda, Joe Fujiwara, Extraction and integration of causal relationships in patent literature: Overview and subsequent development, Journal of Information Knowledge Society, Vol. 15, pp. 98-106, 2005.

しかし、従来は、特許文書においてポイントとなる構成要素を提示することができる技術は知られていなかった。
そこで、本発明の課題では、請求項中のポイントとなる構成要素を特定して提示することで、そのポイントとなる構成要素の説明を重点的に読むことがでるようにすることである。 However, conventionally, there has been no known technique that can present a constituent element that is a point in a patent document.
Therefore, an object of the present invention is to identify and present a constituent element as a point in the claims so that the explanation of the constituent element as the point can be read with emphasis.

本発明の一つの態様の構成要素ハイライト装置は、特許文書の理解を支援するための装置であって、請求項を構成する構成要素に対応する構成要素名の前又は後に記載される文字列パターンを記録した構成要素表現パターン辞書を用いて、特許文書データベースから抽出した特許文書中の請求項のテキストデータから各構成要素名を抽出する構成要素名抽出部と、各構成要素名と構成要素表現パターン辞書に記録されている文字列パターンとに基づいて、請求項のテキストデータから、各構成要素名に対応する構成要素の説明をしている説明文のテキストデータを抽出する請求項からの構成要素の説明抽出部と、効果が記載されている文字列部分と効果が得られる理由が記載されている文字列部分とを結びつける文字列パターンを記録した効果が得られる理由表現パターン辞書を用いて、特許文書中の発明の効果を記載しているテキストデータから、効果が得られる理由が記述されているテキストデータを抽出する効果が得られる理由抽出部と、各構成要素の説明文のテキストデータと効果が得られる理由が記述されているテキストデータとの類似度を計算する類似度計算部と、類似度が最も高い説明文のテキストデータに対応する構成要素を、ポイントとなる構成要素として特定するポイント構成要素特定部と、ポイントとなる構成要素を出力する出力部とを含む。 A component highlighting device according to one aspect of the present invention is a device for supporting understanding of a patent document, and is a character string described before or after a component name corresponding to a component constituting a claim. A component name extraction unit that extracts each component name from the text data of the claims in the patent document extracted from the patent document database using the component expression pattern dictionary in which the pattern is recorded, and each component name and component Based on the character string pattern recorded in the expression pattern dictionary, from the text data of the claim, the text data of the descriptive text explaining the constituent element corresponding to each constituent element name is extracted. The effect of recording the character string pattern that links the component's explanation extractor, the character string part describing the effect, and the character string part describing the reason for obtaining the effect. A reason extraction unit that can obtain the effect of extracting the text data describing the reason for obtaining the effect from the text data describing the effect of the invention in the patent document using the reason expression pattern dictionary , A similarity calculation unit for calculating the similarity between the text data of the explanatory text of each component and the text data describing the reason why the effect is obtained, and a configuration corresponding to the text data of the explanatory text with the highest similarity A point component specifying unit that specifies an element as a component that becomes a point, and an output unit that outputs the component that becomes a point are included.

特許文書中の発明の効果を記載している文書において、効果が得られる理由と関係が強い作用をする構成要素を特定し、ポイントとなる構成要素として利用者に提示することが可能となる。 In the document describing the effect of the invention in the patent document, it is possible to identify a component that has a strong relationship with the reason why the effect is obtained and present it to the user as a component that becomes a point.

実施形態の構成図である。It is a block diagram of embodiment. 特許文書の例を示す図（その１：特開２００２−２７８５６２号公報）である。FIG. 3 is a diagram (part 1: Japanese Patent Laid-Open No. 2002-278562) showing an example of a patent document. 実施形態の動作説明図（その１：請求項から構成要素名を抽出する処理）である。It is operation | movement explanatory drawing of embodiment (the 1: process which extracts a component element name from a claim). 実施形態の動作説明図（その２：請求項から構成要素の説明をしている文を抽出する処理）である。It is operation | movement explanatory drawing (2): The process which extracts the sentence which has demonstrated the component from the claim. 実施形態の動作説明図（その３：効果が得られる理由を書いている部分を抽出する処理）である。It is operation | movement explanatory drawing (3: The process which extracts the part which has written the reason for obtaining an effect) of embodiment. 実施形態の動作説明図（その４：効果が得られる理由と最も類似する構成要素の説明を求めて構成要素を特定する処理）である。It is operation | movement explanatory drawing (No. 4: The process which specifies the component in search of the description of the component most similar to the reason from which an effect is acquired). 特許文書の例を示す図（その２：特許第２７４１５６６号公報）である。It is a figure which shows the example of a patent document (the 2nd: patent 2754166 gazette). 実施形態の動作説明図（その５：請求項から構成要素を抽出する処理）である。It is operation | movement explanatory drawing (5: the process which extracts a component from a claim) of embodiment. 実施形態の動作説明図（その６：請求項から構成要素の説明をしている文を抽出する処理）である。It is operation | movement explanatory drawing (6: the process which extracts the sentence which has demonstrated the component from the claim) of embodiment. 実施形態の動作説明図（その７：請求項から抽出された説明に十分な情報があるかどうかを判定する処理）である。It is operation | movement explanatory drawing (7: The process which determines whether there is sufficient information in the description extracted from the claim) of embodiment. 実施形態の動作説明図（その８：実施例から構成要素の説明をしている文を抽出する処理）である。It is operation | movement explanatory drawing (8: The process which extracts the sentence which has demonstrated the component from the Example) of embodiment. 実施形態の動作説明図（その９：効果が得られる理由を書いている部分を抽出する処理）である。It is operation | movement explanatory drawing (9: The process which extracts the part which has written the reason for obtaining an effect) of embodiment. 実施形態の動作説明図（その１０：効果が得られる理由と最も類似する構成要素の説明を求めて構成要素を特定する処理）である。It is operation | movement explanatory drawing of embodiment (the 10: the process which calculates | requires description of the component most similar to the reason with which an effect is acquired, and specifies a component). 実施形態のフローチャートである。It is a flowchart of an embodiment. 図１４のステップＳ１４０６（抽出された説明文に十分な情報量があるかを判定する処理）の詳細フローチャートである。FIG. 15 is a detailed flowchart of step S1406 in FIG. 14 (processing for determining whether there is a sufficient amount of information in the extracted explanatory text). 図１４のステップＳ１４０９（効果が得られる理由を抽出する処理）の詳細フローチャートである。FIG. 15 is a detailed flowchart of step S1409 in FIG. 14 (processing for extracting a reason for obtaining an effect). 図１４のステップＳ１４１０（効果が得られる理由と各構成要素の説明との類似度を計算する処理）の詳細フローチャートである。FIG. 15 is a detailed flowchart of step S1410 in FIG. 14 (processing for calculating the similarity between the reason why the effect is obtained and the description of each component). 特許データベースの例を示す図である。It is a figure which shows the example of a patent database. 解析済み文書テーブルの例を示す図である。It is a figure which shows the example of an analyzed document table. 構成要素表現パターン辞書の例を示す図である。It is a figure which shows the example of a component element expression pattern dictionary. 実施例中の説明表現パターン辞書の例を示す図である。It is a figure which shows the example of the explanatory expression pattern dictionary in an Example. 構成要素テーブルの例を示す図である。It is a figure which shows the example of a component table. 効果が得られる理由表現パターン辞書の例を示す図である。It is a figure which shows the example of the reason expression pattern dictionary from which an effect is acquired. 形態素重みテーブルの例を示す図である。It is a figure which shows the example of a morpheme weight table. 類似度テーブルの例を示す図である。It is a figure which shows the example of a similarity table. 表示部の表示例を示す図である。It is a figure which shows the example of a display of a display part. 修飾語付き各構成要素抽出処理に使われる係り受け解析を示す動作フローチャートである。It is an operation | movement flowchart which shows the dependency analysis used for each component extraction process with a modifier. 係り受け解析処理の説明図である。It is explanatory drawing of a dependency analysis process. パターンマッチング処理の具体的な動作を示す詳細フローチャートである。It is a detailed flowchart which shows the specific operation | movement of a pattern matching process. 実施形態のシステムを実現可能なハードウェアの例を示す図である。It is a figure which shows the example of the hardware which can implement | achieve the system of embodiment.

以下、本発明を実施するための形態について図面を参照しながら詳細に説明する。
図１は、構成要素ハイライト装置の実施形態の構成図である。本実施形態のシステムは、特許文書検索部１０１、特許文書構造解析部１０２、構成要素名抽出部１０３、請求項からの構成要素の説明抽出部１０４、情報量判定部１０５、実施例からの説明抽出部１０６、効果が得られる理由抽出部１０７、類似度計算部１０８、ポイント構成要素特定部１０９、表示部１１０を備える。また、本実施形態のシステムは、特許データベース１１１、構成要素表現パターン辞書１１２、構成要素テーブル１１３、解析済み文書テーブル１１４、実施例中の説明表現パターン辞書１１５、効果が得られる理由表現パターン辞書１１６、及び類似度テーブル１１７を備える。 Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings.
FIG. 1 is a block diagram of an embodiment of a component highlight device. The system of the present embodiment includes a patent document search unit 101, a patent document structure analysis unit 102, a component name extraction unit 103, a component description extraction unit 104 from the claims, an information amount determination unit 105, and an explanation from the examples. An extraction unit 106, a reason extraction unit 107 for obtaining an effect, a similarity calculation unit 108, a point component specifying unit 109, and a display unit 110 are provided. In addition, the system according to the present embodiment includes a patent database 111, a component expression pattern dictionary 112, a component table 113, an analyzed document table 114, an explanation expression pattern dictionary 115 in the embodiment, and a reason expression pattern dictionary 116 for obtaining the effect. , And a similarity table 117.

上述の構成を有する本実施形態の基本的な動作について、以下に説明する。
請求項においてポイントとなる構成要素を特定する手がかりとして、明細書中の「発明の効果」の項目等の、発明の効果を記載している文書の記載が役に立つ。このような文書には、発明の最も特徴とする効果が記載されていると考えられるため、このような文書の記載から発明の構成要素を抽出することができ請求項中の構成要素と対応付けることができれば、ポイントとなる構成要素を特定することができる。 The basic operation of the present embodiment having the above-described configuration will be described below.
As a clue for identifying the constituent elements that are the points in the claims, description of documents describing the effects of the invention, such as the item “Effects of the Invention” in the specification, is useful. Such a document is considered to contain the most characteristic effects of the invention, so that the constituent elements of the invention can be extracted from the description of such documents and associated with the constituent elements in the claims. If it is possible, the component which becomes a point can be specified.

そこで、本実施形態では、特許文書中の発明の効果を記載している文書において、効果が得られる理由と関係が強い作用をする構成要素を特定して提示することを実現する。
しかし、発明の効果を記載している文書においては、ポイントとなる構成要素は、構成要素名ではなくその作用の記述に置き換えて書かれる場合がある。 Therefore, in the present embodiment, it is realized to identify and present a component that acts strongly on the reason for obtaining the effect in the document describing the effect of the invention in the patent document.
However, in a document describing the effect of the invention, a constituent element as a point may be written in place of a constituent element name instead of a description of its operation.

例えば、特許公開公報中の図２（ｂ）に示される「発明の効果」の記述欄からは、「〜できるので、・・・できる。」という表現パターンが抽出されることにより、「〜できるので」に対応する前半の下線部分が効果が得られる理由の記載であり、「・・・できる」に対応する後半の下線部分が効果の記載であると認識できる。この結果、前半の下線部分の記載「聴取した時刻と、放送会社、放送チャンネル、あるいは店コードにより、放送されている曲を特定してダウンロードすることができるので、」が、請求項中の構成要素と関係が強いと判断することができる。しかし、上記前半の下線部分の記載には、例えば図２（ａ）中の「請求項１」で記載されているどの構成要素「携帯電話」「ダウンロードサーバー」「放送表示手段」も、直接的には記載されていない。従って、上記前半の下線部分の記載から「請求項１」中でポイントとなる構成要素を直接的に特定することはできない。 For example, from the description column of “Effect of the invention” shown in FIG. 2B of the patent publication gazette, the expression pattern “can do so ... can be done” is extracted, so The underlined part of the first half corresponding to “So” is a description of the reason why the effect is obtained, and it can be recognized that the underlined part of the latter half corresponding to “... As a result, the description in the underlined part of the first half “The broadcasted song can be specified and downloaded by the time of listening and the broadcasting company, broadcasting channel, or store code”, It can be determined that the relationship with the element is strong. However, in the description of the underlined part in the first half, for example, any of the components “mobile phone”, “download server”, and “broadcast display means” described in “Claim 1” in FIG. Is not listed. Therefore, it is not possible to directly specify the constituent element that is the point in “Claim 1” from the description of the underlined portion in the first half.

そこで、本実施形態では、以下の第１の手順１から４に基づくコンピュータ処理により、特許文書において、その発明の各構成要素の説明文とその発明の効果が得られる理由の文書との類似度を計算して、最も類似する構成要素をポイントとなる構成要素として特定する。以下、これらの手順について、図２の特許文書の例及び図３から図６の動作説明図を用いて説明する。 Therefore, in this embodiment, the degree of similarity between the explanatory text of each component of the invention and the document of the reason why the effect of the invention is obtained in the patent document by computer processing based on the following first procedures 1 to 4 And the most similar component is identified as the component that becomes the point. Hereinafter, these procedures will be described with reference to the example of the patent document of FIG. 2 and the operation explanatory diagrams of FIGS.

第１の手順１：特許文書データベース１１１（図１）から抽出した特許文書中の請求項のテキストデータから、構成要素表現パターン辞書１１２（図１及び後述する図２０）に記録された表現パターンに基づいて、構成要素名を抽出する。この結果得られる各構成要素名のテキストデータを、構成要素テーブル１１３（図１及び後述する図２２）に登録する。 First procedure 1 : From the text data of the claims in the patent document extracted from the patent document database 111 (FIG. 1), to the expression pattern recorded in the component expression pattern dictionary 112 (FIG. 1 and FIG. 20 described later). Based on this, the component name is extracted. The text data of each component name obtained as a result is registered in the component table 113 (FIG. 1 and FIG. 22 described later).

例えば、図２（ａ）の「請求項１」について、図３の３０１から３０２として示されるように、構成要素名「携帯電話」「ダウンロードサーバー」「放送表示手段」が抽出され、構成要素テーブル１１３に登録される。このとき、例えば図２０に示される構成要素を抽出するための構成要素表現パターン辞書１１２に記録された表現パターン「〜と、〜と、〜とを有する」を用いて、各構成要素が識別される。 For example, with respect to “Claim 1” in FIG. 2A, the component names “mobile phone”, “download server”, and “broadcast display means” are extracted as indicated by 301 to 302 in FIG. 113 is registered. At this time, for example, each component is identified by using the expression pattern “having to, and to” recorded in the component expression pattern dictionary 112 for extracting the component shown in FIG. The

第１の手順２：請求項のテキストデータから、第１の手順１で抽出した各構成要素名に対応する構成要素の説明をしているテキストデータを抽出する。この結果得られる各構成要素の説明文のテキストデータを、各構成要素名に対応させて構成要素テーブル１１３（図１及び後述する図２２）に登録する。 First procedure 2 : Extract text data describing the component corresponding to each component name extracted in the first procedure 1 from the text data of the claims. The text data of the explanatory text of each component obtained as a result is registered in the component table 113 (FIG. 1 and FIG. 22 described later) in association with each component name.

例えば、図２（ａ）の「請求項１」について、図４の４０１から４０２として示されるように、構成要素名「携帯電話」を説明するテキストデータが抽出され、構成要素名「携帯電話」に対応させられて構成要素の説明文として構成要素テーブル１１３に登録される。他の構成要素名「ダウンロードサーバー」「放送表示手段」についても同様である。 For example, with respect to “Claim 1” in FIG. 2A, text data describing the component name “mobile phone” is extracted as indicated by 401 to 402 in FIG. 4, and the component name “mobile phone” is extracted. Is registered in the component table 113 as a description of the component. The same applies to the other component name “download server” and “broadcast display means”.

第１の手順３：第１の手順１で抽出した特許文書中の発明の作用や効果を書いているテキストデータから、効果が得られる理由が記述されているテキストデータを抽出する。このとき、効果が得られる理由表現パターン辞書１１６（図１及び後述する図２３）に記録された表現パターンが参照されることにより、効果が得られる理由の部分が抽出される。 First procedure 3 : Extract text data describing the reason why the effect is obtained from the text data describing the action and effect of the invention in the patent document extracted in the first procedure 1. At this time, by referring to the expression pattern recorded in the reason expression pattern dictionary 116 (FIG. 1 and FIG. 23 described later) from which the effect is obtained, the part of the reason for obtaining the effect is extracted.

例えば、図２（ｂ）の「発明の効果」について、図５の５０１から５０２として示されるように、効果が得られる理由が記述されているテキストデータが抽出される。このとき、例えば図２３に示される効果が得られる理由表現パターン辞書１１６に記録された表現パターン「ので」を用い、その表現パターンより前の部分のテキストデータ（図５の５０２の下線部）が、効果が得られる理由が記述されている部分として抽出される。 For example, as shown by 501 to 502 in FIG. 5 for “effect of the invention” in FIG. 2B, text data describing the reason for obtaining the effect is extracted. At this time, for example, using the expression pattern “NO” recorded in the reason expression pattern dictionary 116 for obtaining the effect shown in FIG. 23, the text data of the part before the expression pattern (the underlined portion 502 in FIG. 5) is obtained. It is extracted as a part where the reason why the effect is obtained is described.

第１の手順４：第１の手順２で構成要素テーブル１１３に登録した各構成要素の説明文のテキストデータのうち、第１の手順３で抽出した効果が得られる理由のテキストデータと最も類似するものを算出する。そして、その算出した説明文のテキストデータに対応する構成要素を、ポイントとなる構成要素として特定する。 First procedure 4 : Of the text data of the explanatory text of each component registered in the component table 113 in the first procedure 2, the most similar to the text data of the reason for obtaining the effect extracted in the first procedure 3 Calculate what to do. And the component corresponding to the text data of the calculated explanatory note is specified as a component which becomes a point.

最も類似するものの算出は以下の如く行なう。
例えば、図２（ａ）の「請求項１」と図２（ｂ）の「発明の効果」につき、図６の６０２−１、６０２−２、６０２−３として示されるように、第１の手順２で請求項１から抽出された各構成要素「携帯電話」、「ダウンロードサーバ」、「放送表示手段」の説明文のテキストデータから、形態素解析により名詞のテキストデータが抽出される。これらの名詞のテキストデータは、例えば図２２に示されるように、各説明文に対応させて予め形態素解析され構成要素テーブル１１３に登録されるように構成できる。一方、図６の６０１として示されるように、第１の手順３で抽出された効果が得られる理由のテキストデータから、形態素解析により名詞のテキストデータが抽出される。そして、請求項中の各構成要素の説明文のテキストデータ６０２−１、６０２−２、６０２−３のそれぞれについて、効果が得られる理由のテキストデータ６０１と共通に含まれる名詞の数が、類似度としてカウントされる。このとき、特許文書の明細書中で各名詞が現れる出現頻度が少ないものほど有意な（独自性が高い）名詞として、類似度のカウントにおいて高い重み付けがされるように構成することができる。この結果、請求項中の構成要素の説明文と効果の得られる理由のテキストデータに共通に含まれる有意な名詞の数が最も多いテキストデータ６０２−１が類似度が最も高い構成要素のテキストデータとして抽出される。そして、そのテキストデータ６０２−１に対応する構成要素名「携帯電話」が、構成要素テーブル１１３（図１及び後述する図２２）から抽出され、ポイントとなる構成要素として特定される。 The most similar is calculated as follows.
For example, with respect to “Claim 1” in FIG. 2A and “Effect of the invention” in FIG. 2B, as shown as 602-1, 602-2, and 602-3 in FIG. The text data of the noun is extracted by the morphological analysis from the text data of the description of each component “mobile phone”, “download server”, and “broadcast display means” extracted from claim 1 in the procedure 2. The text data of these nouns can be configured to be morphologically analyzed and registered in the component element table 113 in advance corresponding to each explanatory sentence as shown in FIG. On the other hand, as shown as 601 in FIG. 6, noun text data is extracted by morphological analysis from the text data of the reason why the effect extracted in the first procedure 3 is obtained. The number of nouns included in common with the text data 601 of the reason for obtaining the effect is similar for each of the text data 602-1, 602-2, and 602-3 of the explanatory text of each component in the claims. Counted as degrees. At this time, it can be configured such that, as the frequency of appearance of each noun in the specification of the patent document is lower, the noun is more significant (higher uniqueness) and is weighted higher in the similarity count. As a result, the text data 602-1 having the largest number of significant nouns commonly included in the explanation data of the constituent elements in the claims and the text data of the reason for obtaining the effect is the text data of the constituent elements having the highest similarity. Extracted as Then, the component name “mobile phone” corresponding to the text data 602-1 is extracted from the component table 113 (FIG. 1 and FIG. 22 to be described later) and specified as the component to be a point.

以上説明した第１の手順１から４の処理により、特許文書中の発明の効果を記載している文書において、効果が得られる理由と関係が強い作用をする構成要素を特定し、ポイントとなる構成要素として利用者に提示することが可能となる。 In the document describing the effect of the invention in the patent document by the processing of the first procedure 1 to 4 described above, a component that acts strongly with the reason for obtaining the effect is identified and becomes a point. It can be presented to the user as a component.

この場合に、発明の効果を記載している文書中で、構成要素が構成要素名ではなくその作用の記述に置き換えて書かれている場合であっても、請求項中の構成要素の説明文のテキストデータと効果が得られる理由のテキストデータとの類似性が判定される。これにより、ポイントとなる構成要素を的確に特定することが可能となる。 In this case, in the document describing the effect of the invention, even if the component is written in place of the component name instead of the description of the operation, the description of the component in the claim The similarity between the text data of the text and the text data of the reason for obtaining the effect is determined. Thereby, it becomes possible to pinpoint the component which becomes a point exactly.

以上の第１の手順１から４の処理において、請求項に書かれている構成要素の説明部分だけでは、第１の手順４での類似度判定を行えるだけの十分な情報が含まれない場合がある。例えば、図７（ａ）に示される特許文書の「請求項１」中の構成要素「音声変換手段」については、「読み出し手段（６）から読み出されたデジタル音声にデジタルアナログ変換を施しアナログ音声出力する音声変換手段（８）と」としか説明がない。この説明部分には、具体的な内容がほとんど含まれていない。このため、図７（ｂ）に示される「発明の効果」中の効果が得られる理由の部分との間（図７（ｂ）中の前半の下線部）で、有意な名詞の数による類似度の判定が正確に行えない。 In the processing of the first procedure 1 to 4 described above, the description of the constituent elements described in the claims alone does not contain sufficient information for determining the similarity in the first procedure 4 There is. For example, with respect to the component “voice converting means” in “Claim 1” of the patent document shown in FIG. 7A, “digital audio read from the reading means (6) is subjected to digital-analog conversion and analog. Only the voice conversion means (8) for outputting voice is described. This explanation contains almost no specific content. For this reason, the similarity in terms of the number of significant nouns between the reason why the effect in the “effect of the invention” shown in FIG. 7B is obtained (the underlined part in the first half in FIG. 7B). The degree cannot be determined accurately.

そこで、本実施形態では、上述の第１の手順１から４を改良して、以下の第２の手順１から７に基づくコンピュータ処理が実行される。 Therefore, in the present embodiment, the above-described first procedures 1 to 4 are improved, and computer processing based on the following second procedures 1 to 7 is executed.

第２の手順１：特許文書データベース１１１（図１）から抽出した特許文書中の請求項のテキストデータから、構成要素表現パターン辞書１１２（図１及び後述する図２０）に記録された表現パターンに基づいて、構成要素名を抽出する。この結果得られる各構成要素名のテキストデータを、構成要素テーブル１１３（図１及び後述する図２２）に登録する。この処理は、前述した第１の手順１と同様である。 Second procedure 1 : from the text data of the claims in the patent document extracted from the patent document database 111 (FIG. 1) to the expression pattern recorded in the component expression pattern dictionary 112 (FIG. 1 and FIG. 20 described later). Based on this, the component name is extracted. The text data of each component name obtained as a result is registered in the component table 113 (FIG. 1 and FIG. 22 described later). This process is the same as the first procedure 1 described above.

例えば、図７（ａ）の「請求項１」について、図８の８０１から８０２として示されるように、構成要素名「再生手段」「デジタル音声格納手段」「読み出し手段」「音声変換出力手段」が抽出される。 For example, with respect to “Claim 1” in FIG. 7A, as indicated by reference numerals 801 to 802 in FIG. 8, the component element names “reproducing means” “digital voice storing means” “reading means” “voice conversion output means” Is extracted.

第２の手順２：請求項のテキストデータから、第２の手順１で抽出した各構成要素名に対応する構成要素の説明をしているテキストデータを抽出する。
例えば、図７（ａ）の「請求項１」のテキストデータから、図９の９０１から９０２として示されるように、構成要素名「再生手段」を説明する文のテキストデータが抽出される。 Second procedure 2 : Extract the text data describing the component corresponding to each component name extracted in the second procedure 1 from the text data of the claims.
For example, from the text data of “Claim 1” in FIG. 7A, text data of a sentence explaining the component name “reproducing means” is extracted as indicated by 901 to 902 in FIG.

第２の手順３：第２の手順２で抽出した、請求項中の各構成要素毎に各構成要素の説明をしているテキストデータに十分な情報が含まれているかどうかを判定する。
具体的には、例えば、図７（ａ）の「請求項１」について、図１０の１００１として示されるように、構成要素「音声変換手段」の説明に含まれる名詞群「手段」「デジタル」「音声」「アナログ」「変換」「出力」が、形態素解析により抽出される。 Second procedure 3 : It is determined whether or not sufficient information is included in the text data describing each component, extracted for each component in the claims, extracted in the second procedure 2.
Specifically, for example, with respect to “Claim 1” in FIG. 7A, as shown as 1001 in FIG. 10, the noun group “means” “digital” included in the description of the component “speech conversion means” “Speech”, “analog”, “conversion”, and “output” are extracted by morphological analysis.

次に、構成要素名「音声変換手段」自体が形態素解析されることにより、この構成要素名に含まれる名詞「音声」「変換」「手段」が抽出され、これらの名詞が上述の名詞１００１から削除される。この結果、図１０の１００２として示される名詞群「デジタル」「アナログ」「出力」が得られる。 Next, the component name “speech conversion means” itself is subjected to morphological analysis to extract the nouns “speech”, “conversion”, and “means” included in the component name, and these nouns are extracted from the above-described noun 1001. Deleted. As a result, the noun group “digital”, “analog”, and “output” shown as 1002 in FIG. 10 are obtained.

次に、図７（ａ）の「請求項１」中の全ての構成要素「再生手段」「デジタル音声格納手段」「読み出し手段」「音声変換出力手段」の説明に共通に含まれる名詞「デジタル」が、図１０の１００２として得られる名詞群から削除される。この結果、図１０の１００３として示される名詞群「アナログ」「出力」が得られる。 Next, the noun “digital” which is commonly included in the description of all the constituent elements “reproducing means”, “digital sound storing means”, “reading means”, and “sound conversion output means” in “Claim 1” of FIG. "Is deleted from the noun group obtained as 1002 in FIG. As a result, the noun group “analog” and “output” shown as 1003 in FIG. 10 are obtained.

このようにして図１０の１００３として得られる上記削除の結果残った名詞群に含まれる名詞の数が、所定の閾値以上であるか否かが判定される。 In this way, it is determined whether or not the number of nouns included in the noun group remaining as a result of the deletion obtained as 1003 in FIG. 10 is equal to or greater than a predetermined threshold.

第２の手順４：各構成要素について名詞群に含まれる名詞の数が所定の閾値以上である構成要素については、情報量が十分に多いと判定して、その構成要素の説明をしている請求項中のテキストデータを、その構成要素名に対応させて構成要素テーブル１１３（図１及び後述する図２２）に登録する。この第２の手順２と第２の手順４とを合わせた処理が、前述の第１の手順２に対応する。 Second procedure 4 : For each component, the number of nouns included in the noun group is greater than or equal to a predetermined threshold, and it is determined that the amount of information is sufficiently large and the component is described. The text data in the claims is registered in the component table 113 (FIG. 1 and FIG. 22 described later) in association with the component name. The process combining the second procedure 2 and the second procedure 4 corresponds to the first procedure 2 described above.

第２の手順５：各構成要素について名詞群に含まれる名詞の数が所定の閾値より少なければ、情報量が少ないと判定する。この場合、第２の手順１で抽出した特許文書中の「実施例」又は「発明を実施するための形態」の文書部分から、以下のテキストデータ部分を抽出する。即ち、実施例中の説明表現パターン辞書１１５（図１及び後述する図２１）に記録されている何れかの表現パターンの文字列の前に上記構成要素名が位置するテキストデータ部分を抽出する。そして、その表現パターン文字列の後に続くテキストデータ部分を、上記構成要素名に対応する説明文として抽出する。表現パターンは、例えば「であり、」等である。この結果得られるテキストデータを、上記構成要素を説明するテキストデータとして、各構成要素名に対応させて構成要素テーブル１１３（図１及び後述する図２２）に登録する。図２２の構成要素テーブルにおいて、構成要素の説明は第２の手順４で情報量が十分と判定されたときは、上側に記載した請求項中の説明文のみが登録される。また、第２の手順５でテーブル中に記載されている情報量が十分でないと判定されたときは、下側に記載した請求項以外のセクションの説明文も併せて登録される。 Second procedure 5 : If the number of nouns included in the noun group for each component is less than a predetermined threshold, it is determined that the amount of information is small. In this case, the following text data portion is extracted from the document portion of “Example” or “Mode for carrying out the invention” in the patent document extracted in the second procedure 1. That is, the text data portion where the component name is located is extracted before the character string of any expression pattern recorded in the explanatory expression pattern dictionary 115 (FIG. 1 and FIG. 21 described later) in the embodiment. Then, the text data portion following the expression pattern character string is extracted as an explanatory text corresponding to the component name. The expression pattern is, for example, “is”. The text data obtained as a result is registered in the component table 113 (FIG. 1 and FIG. 22 to be described later) as text data explaining the component in association with each component name. In the component element table of FIG. 22, when it is determined that the amount of information is sufficient in the second procedure 4 for the explanation of the component elements, only the explanatory text in the claims described above is registered. Further, when it is determined in the second procedure 5 that the amount of information described in the table is not sufficient, explanations of sections other than the claims described below are also registered.

例えば、図９の９０２として抽出された構成要素名「再生手段」に対応する説明文のテキストデータが、第２の手順３により情報量が少ないと判定された場合、例えば図１１に示される処理が実行される。即ち、１１０１として示される実施例の文書部分から、１１０２として示されるテキストデータ部分が抽出され、構成要素名「再生手段」に対応する説明として構成要素テーブル１１３（図１及び後述する図２２）に追加登録される。 For example, when it is determined that the text data of the explanatory text corresponding to the component name “reproduction means” extracted as 902 in FIG. 9 has a small amount of information according to the second procedure 3, for example, the processing illustrated in FIG. Is executed. That is, the text data portion shown as 1102 is extracted from the document portion of the embodiment shown as 1101, and is stored in the component table 113 (FIG. 1 and FIG. 22 described later) as an explanation corresponding to the component name “reproducing means”. It is additionally registered.

第２の手順６：第２の手順１で抽出した特許文書中の発明の作用や効果を書いているテキストデータから、効果が得られる理由が記述されているテキストデータを抽出する。このとき、効果が得られる理由表現パターン辞書１１６（図１及び後述する図２３）に記録された表現パターンが参照されることにより、効果が得られる理由の部分が抽出される。この処理は、前述の第１の手順３と同様である。 Second procedure 6 : Extract text data describing the reason why the effect is obtained from the text data describing the action and effect of the invention in the patent document extracted in the second procedure 1. At this time, by referring to the expression pattern recorded in the reason expression pattern dictionary 116 (FIG. 1 and FIG. 23 described later) from which the effect is obtained, the part of the reason for obtaining the effect is extracted. This process is the same as in the first procedure 3 described above.

例えば、図７（ｂ）の「発明の効果」について、図１２の１２０１から１２０２として示されるように、効果が得られる理由が記述されているテキストデータが抽出される。このとき、例えば図２３に示される効果が得られる理由表現パターン辞書１１６に記録された表現パターン「ことにより、」を用い、その表現パターンより前の部分のテキストデータ（図１２の１２０２）が、効果が得られる理由が記述されている部分として抽出される。 For example, as shown by 1201 to 1202 in FIG. 12, the text data describing the reason why the effect is obtained is extracted for the “effect of the invention” in FIG. 7B. At this time, for example, by using the expression pattern “Koto,” recorded in the reason expression pattern dictionary 116 for obtaining the effect shown in FIG. 23, the text data (1202 in FIG. 12) of the part before the expression pattern is It is extracted as a part where the reason why the effect is obtained is described.

第２の手順７：第２の手順４又は５で構成要素テーブル１１３に登録した各構成要素の説明文のテキストデータのうち、第２の手順６で抽出した効果が得られる理由のテキストデータと最も類似するものを算出する。そして、その算出した説明文のテキストデータに対応する構成要素を、ポイントとなる構成要素として特定する。この処理は、前述の第１の手順４と同様である。 Second procedure 7 : Text data of the reason why the effect extracted in the second procedure 6 is obtained from the text data of the explanatory text of each component registered in the component table 113 in the second procedure 4 or 5. Calculate the most similar. And the component corresponding to the text data of the calculated explanatory note is specified as a component which becomes a point. This process is the same as in the first procedure 4 described above.

例えば、図７（ａ）の「請求項１」と図７（ｂ）の「発明の効果」につき、図１３の１３０２−１、１３０２−２、１３０２−３、１３０２−４として示されるように、第２の手順４又は５で抽出された請求項中の各構成要素の説明文のテキストデータから、形態素解析により名詞のテキストデータが抽出される。一方、図１３の１３０１として示されるように、第２の手順６で抽出された効果が得られる理由のテキストデータから、形態素解析により名詞のテキストデータが抽出される。そして、各構成要素の説明文のテキストデータ１３０２−１、１３０２−２、１３０２−３、１３０２−４のそれぞれについて、効果の得られる理由のテキストデータ１３０１と共通に含まれる名詞の数が、類似度としてカウントされる。このとき、特許文書の明細書中で各名詞が現れる出現頻度が少ないものほど有意な（独自性が高い）名詞として、類似度のカウントにおいて高い重み付けがされるように構成することができる。この結果、各構成要素の説明文のテキストデータと効果の得られる理由のテキストデータに共通に含まれる有意な名詞の数が最も多いテキストデータ１３０２−４が類似度が最も高いテキストデータとして抽出される。そして、そのテキストデータ１３０２−４に対応する構成要素名「音声変換手段」が、構成要素テーブル１１３（図１及び後述する図２２）から抽出され、ポイントとなる構成要素として特定される。 For example, as shown in 1302-1, 1302-2, 1302-3, and 1302-4 in FIG. 13, the “claim 1” in FIG. 7A and the “effect of the invention” in FIG. The text data of the noun is extracted by morphological analysis from the text data of the explanatory text of each component in the claim extracted in the second procedure 4 or 5. On the other hand, as shown as 1301 in FIG. 13, noun text data is extracted by morphological analysis from the text data of the reason why the effect extracted in the second procedure 6 is obtained. The number of nouns included in common with the text data 1301 of the reason for obtaining the effect is similar for each of the text data 1302-1, 1302-2, 1302-3, and 1302-4 of the explanatory text of each component. Counted as degrees. At this time, it can be configured such that, as the frequency of appearance of each noun in the specification of the patent document is lower, the noun is more significant (higher uniqueness) and is weighted higher in the similarity count. As a result, the text data 1302-4 having the largest number of significant nouns included in both the text data of the explanatory text of each component and the text data of the reason for obtaining the effect is extracted as the text data having the highest similarity. The Then, the component name “speech conversion unit” corresponding to the text data 1302-4 is extracted from the component table 113 (FIG. 1 and FIG. 22 described later), and specified as a component serving as a point.

以上説明した第２の手順により、請求項に書かれている構成要素の説明部分だけでは、第２の手順４での類似度判定を行えるだけの十分な情報が含まれない場合には、実施例等からも構成要素に対応する説明文のテキストデータが抽出される。これにより、高い精度で、かつ、より多くの特許に対してポイントとなる構成要素の抽出が可能となる。 When the second procedure described above does not include enough information for the similarity determination in the second procedure 4 only by the explanation part of the constituent elements described in the claims, The text data of the explanatory text corresponding to the component is also extracted from the example. As a result, it is possible to extract components that are points with respect to more patents with high accuracy.

図１４は、図１に示される構成を有する構成要素ハイライト装置の実施形態の制御動作を示す動作フローチャートである。このフローチャートによる処理は、例えば、ＣＰＵ（中央演算装置）とメモリと外部記憶装置を搭載したコンピュータシステムによって実行される。 FIG. 14 is an operation flowchart showing a control operation of the embodiment of the component highlighting apparatus having the configuration shown in FIG. The process according to this flowchart is executed by, for example, a computer system equipped with a CPU (Central Processing Unit), a memory, and an external storage device.

まず、特許データベース１０９（図１）が検索されることにより、利用者によって指定された特許文書のファイルが検索される（ステップＳ１４０１）。この処理は、図１の特許文書検索部１０１の機能を実現する。特許データベース１０９は、記憶装置（例えばハードディスク記憶装置）上に記憶されている。図１８は、特許データベース１１１のデータ構成例を示す図である。特許データベース１１１に登録される各特許文書のレコードデータは、出願番号、公開番号に続いて、要約インデックス、請求項インデックス、実施例インデックス、発明の名称データ、及び特許文書の本文データの各フィールドから構成される。なお、実施例のインデックスは、発明を実施するための形態や課題を解決するための手段のインデックスであってもよい。インデックスは例えば、バイグラムアルゴリズムに基づいて、特許文書中のそれぞれ対応する部分のテキストデータから、隣り合う２文字を結合して得られる索引文字列の集合である。利用者は、出願番号、公開番号、要約中の言葉、請求項中の言葉、実施例中の言葉、発明の名称に含まれる言葉等のキーワードデータによって、特許文書のファイルを検索することができる。出願番号が検索指定された場合には、特許データベース１１１中の出願番号フィールドに検索指定された出願番号の文字列データと一致するレコードがあるか否かが検索される。公開番号が検索指定された場合には、特許データベース１１１中の公開番号フィールドに検索指定された公開番号の文字列データと一致するレコードがあるか否かが検索される。要約、請求項、実施例、又は発明の名称に含まれる言葉が検索指定された場合には、特許データベース１１１中の要約、請求項、実施例のインデックスフィールド、又は発明の名称フィールド内に検索指定された言葉の文字列データが登録されているか否かが検索される。この結果、検索されたレコードの本文フィールドに登録されているテキストデータが抽出される。 First, the patent database 109 (FIG. 1) is searched to search for a patent document file designated by the user (step S1401). This process realizes the function of the patent document search unit 101 of FIG. The patent database 109 is stored on a storage device (for example, a hard disk storage device). FIG. 18 is a diagram illustrating a data configuration example of the patent database 111. The record data of each patent document registered in the patent database 111 includes, from the application number and the publication number, the summary index, the claim index, the embodiment index, the name data of the invention, and the body data of the patent document. Composed. In addition, the index of an Example may be an index of the means for solving the form for implementing invention, or a subject. The index is, for example, a set of index character strings obtained by combining two adjacent characters from text data corresponding to each part in the patent document based on the bigram algorithm. Users can search patent document files by keyword data such as application number, publication number, words in abstract, words in claims, words in examples, words included in the title of the invention, etc. . When the application number is designated as a search, it is searched whether or not there is a record that matches the character string data of the application number specified as a search in the application number field in the patent database 111. When the public number is designated for retrieval, it is retrieved whether there is a record that matches the character string data of the public number designated for retrieval in the public number field in the patent database 111. When a word included in the abstract, claim, example, or invention name is specified by search, the search specification is made in the abstract, claim, example index field, or invention name field in the patent database 111. It is searched whether the character string data of the entered word is registered. As a result, the text data registered in the body field of the retrieved record is extracted.

次に、ステップＳ１４０１で抽出された特許文書のテキストデータに対して、特許文書構造の解析が行われる（ステップＳ１４０２）。この処理は、図１の特許文書構造解析部１０２の機能を実現する。これは、「請求項」、「実施例」（「発明を実施するための形態」又は「発明を実施するための最良の形態」等）、「発明の効果」などのセクション毎に、特許文書のテキストデータのどの部分が対応するかを解析する処理である。具体的には、特許文書のテキストデータから、墨付き括弧で囲まれた各セクションの見出しが検索されることにより、その見出しから次の見出しまでの部分として、各セクションのテキストデータが抽出される。この解析結果として得られるセクションとテキストデータの組は、ワークメモリ上に、例えば図１９に示されるデータ構成を有する解析済み文書テーブルとして保持される。 Next, the patent document structure is analyzed for the text data of the patent document extracted in step S1401 (step S1402). This process realizes the function of the patent document structure analysis unit 102 of FIG. For each section such as “claims”, “examples” (“modes for carrying out the invention” or “best mode for carrying out the invention”), “effects of the invention”, patent documents This is a process of analyzing which part of the text data corresponds. Specifically, the text data of each section is extracted from the text data of the patent document as a part from the heading to the next heading by searching for the heading of each section enclosed in black brackets. . A set of sections and text data obtained as a result of the analysis is held on the work memory as an analyzed document table having the data configuration shown in FIG. 19, for example.

次に、ステップＳ１４０２で解析されたセクションのうち請求項のセクション（図１９参照）に対応するテキストデータが、ワークメモリ上の解析済み文書テーブルから取得される。次に、そのテキストデータから、構成要素表現パターン辞書１１２（図１）に記録された表現パターンに基づいて、構成要素名が抽出される（ステップＳ１４０３）。この処理は、図１の構成要素名抽出部１０３の機能を実現し、前述した第１の手順１又は第２の手順１に対応する。図２０は、構成要素表現パターン辞書１１２のデータ構成例を示す図である。この構成要素表現パターン辞書１１２には、構成要素名の後（又は前）に記載される文字列パターンが表現パターンとして記録されている。この表現パターン文字列の例えば直前に位置する単語が構成要素名となる。また例えば、その構成要素名の直前から１つ前の表現パターン文字列（又は先頭）までが、その構成要素名に対応する構成要素の説明部分となる。構成要素表現パターン辞書１１２のパターン文字列としては、例えば、「〜と、〜と、〜とを有する」、「〜と、〜と、〜とを具備する」、「〜と、〜と、〜とを備える」である。構成要素表現パターン辞書１１２は、ワークメモリ又は記憶装置（ハードディスク装置等）に保持される。ステップ１４０３では、構成要素表現パターン辞書１１２の各エントリに記録されたパターン文字列によって、ワークメモリに取得された請求項のセクションに対応するテキストデータに対してパターンマッチング処理または係り受け処理が実行される。そして、例えば「〜と、」で区切られたテキストデータ部分「〜」の末尾に位置する名詞あるいは名詞を連結して生成される複合語または修飾語の係り受けも含めた名詞句が、例えば形態素解析によって、構成要素名として抽出される。この結果得られる各構成要素名のテキストデータは、ワークメモリに保持される構成要素テーブル１１３（図１）に新エントリを作成して登録される。図２２は、構成要素テーブル１１３のデータ構成例を示す図である。構成要素テーブル１１３の各エントリは、「構成要素名」フィールド、「構成要素の説明文」フィールド、「構成要素の説明文に含まれる名詞」フィールドから構成される。「構成要素名」フィールドには、構成要素名のテキストデータが登録される。「構成要素の説明文」フィールドには、後述する構成要素毎の説明文のテキストデータが登録される。「構成要素の説明文に含まれる名詞」フィールドには、「構成要素の説明文」フィールドに登録されたテキストデータを形態素解析して得られる名詞のテキストデータ群が登録される。具体例として、前述した図８の８０１から８０２として示される請求項から構成要素を抽出する処理が実行される。 Next, text data corresponding to a claim section (see FIG. 19) among the sections analyzed in step S1402 is acquired from the analyzed document table on the work memory. Next, the component name is extracted from the text data based on the expression pattern recorded in the component expression pattern dictionary 112 (FIG. 1) (step S1403). This process realizes the function of the component name extraction unit 103 in FIG. 1 and corresponds to the first procedure 1 or the second procedure 1 described above. FIG. 20 is a diagram illustrating a data configuration example of the component element expression pattern dictionary 112. In the component element expression pattern dictionary 112, a character string pattern described after (or before) the component element name is recorded as an expression pattern. For example, the word located immediately before this expression pattern character string becomes the component name. Also, for example, the description pattern character string from the immediately preceding to the immediately preceding expression pattern character string (or the head) is the description part of the component corresponding to the component name. Examples of the pattern character string of the component expression pattern dictionary 112 include “having to, to, and”, “having to, to, and”, “to, to, and It is equipped with. The component expression pattern dictionary 112 is held in a work memory or a storage device (such as a hard disk device). In step 1403, pattern matching processing or dependency processing is executed on the text data corresponding to the section of the claim acquired in the work memory by the pattern character string recorded in each entry of the component element pattern dictionary 112. The For example, a noun phrase including a dependency of a noun or a noun located at the end of a text data portion “to” delimited by “to” and a compound word or a modifier is generated, for example, a morpheme. It is extracted as a component name by analysis. The text data of each component name obtained as a result is registered by creating a new entry in the component table 113 (FIG. 1) held in the work memory. FIG. 22 is a diagram illustrating a data configuration example of the component table 113. Each entry of the component table 113 includes a “component name” field, a “component description” field, and a “noun included in the component description” field. In the “component name” field, text data of the component name is registered. In the “description of component” field, text data of an explanation for each component described later is registered. A text data group of nouns obtained by morphological analysis of text data registered in the “description of component” field is registered in the “noun included in the description of component” field. As a specific example, the process of extracting the constituent elements from the claims shown as 801 to 802 in FIG. 8 is executed.

次に、ステップＳ１４０３でワークメモリ上に取得された請求項のセクション（図１９参照）に対応するテキストデータから、ステップＳ１４０３で抽出された各構成要素名に対応する構成要素の説明をしているテキストデータが抽出される（ステップＳ１４０４）。この処理は、図１の請求項からの構成要素の説明抽出部１０４の機能を実現し、前述した第１の手順２の前半処理又は第２の手順２に対応する。より具体的には、ステップ１４０４では、ステップＳ１４０３で抽出された「〜と、」で区切られた各テキストデータ部分「〜」から、ステップＳ１４０３で抽出された構成要素名のテキストデータを削除した部分として、説明文のテキストデータ部分が抽出される。抽出されたテキストデータは、ワークメモリ上に保持される。 Next, the component corresponding to each component name extracted in step S1403 from the text data corresponding to the claim section (see FIG. 19) acquired on the work memory in step S1403 will be described. Text data is extracted (step S1404). This process realizes the function of the component description extracting unit 104 from the claims of FIG. 1 and corresponds to the first half process of the first procedure 2 or the second procedure 2 described above. More specifically, in step 1404, a part obtained by deleting the text data of the component name extracted in step S1403 from each text data part “˜” delimited by “to” extracted in step S1403. As described above, the text data portion of the explanatory text is extracted. The extracted text data is held on the work memory.

次に、ステップＳ１４０４にてワークメモリに得られている現在の構成要素名に対応する説明文のテキストデータが、ワークメモリ上の構成要素テーブル１１３（図１、図２２）の上記構成要素名の登録エントリの「構成要素の説明文」フィールドに登録される（ステップＳ１４０５）。この処理は、前述の第１の手順２の後半の処理又は第２の手順４に対応する。 Next, the text data of the explanatory text corresponding to the current component name obtained in the work memory in step S1404 is stored in the component name of the component table 113 (FIGS. 1 and 22) on the work memory. It is registered in the “description of component” field of the registration entry (step S1405). This process corresponds to the latter half of the first procedure 2 or the second procedure 4.

次に、ステップＳ１４０４で抽出された、請求項中の各構成要素の説明をしているテキストデータに十分な情報が含まれているかどうかが判定される（ステップＳ１４０６）。この処理は、図１の情報量判定部１０５の機能を実現し、前述した第２の手順３に対応する。図１５は、このステップＳ１４０６の処理の詳細処理を示す動作フローチャートである。 Next, it is determined whether or not sufficient information is included in the text data describing each component in the claim extracted in step S1404 (step S1406). This process realizes the function of the information amount determination unit 105 in FIG. 1 and corresponds to the second procedure 3 described above. FIG. 15 is an operation flowchart showing detailed processing of step S1406.

図１５においてまず、ステップＳ１４０３で抽出された各構成要素名毎に、それぞれに対応するステップＳ１４０４で抽出された説明のテキストデータがそれぞれ形態素解析され、その解析結果のうち名詞の形態素のデータが、構成要素名毎にワークメモリ上に保持される（ステップＳ１５０１）。 In FIG. 15, first, for each component name extracted in step S1403, the text data of the explanation extracted in step S1404 corresponding to each is subjected to morphological analysis, and the morpheme data of the noun in the analysis result is Each component name is held on the work memory (step S1501).

次に、ステップＳ１５０１で得られた各構成要素名毎の名詞群において、全ての構成要素名に対応して共通に含まれる名詞が、ワークメモリ上の変数である名詞列Ｙに代入される（ステップＳ１５０２）。 Next, in the noun group for each component name obtained in step S1501, nouns that are commonly included corresponding to all component names are substituted into the noun string Y, which is a variable in the work memory ( Step S1502).

次に、ステップＳ１４０３で抽出された各構成要素名のうちの１つ目の構成要素名のテキストデータが、ワークメモリ上に取得される（ステップＳ１５０３）。
次に、ステップＳ１５０３又は後述するステップＳ１５１３で取得された構成要素名のテキストデータが形態素解析される。そして、その結果得られる名詞の形態素のデータが、ワークメモリ上の変数である名詞列Ｘに代入される（ステップＳ１５０４）。 Next, text data of the first component name among the component names extracted in step S1403 is acquired on the work memory (step S1503).
Next, the text data of the component name acquired in step S1503 or step S1513 described later is subjected to morphological analysis. Then, the noun morpheme data obtained as a result is substituted into the noun string X, which is a variable on the work memory (step S1504).

次に、ステップＳ１５０３又は後述するステップＳ１５１３で取得された構成要素名のテキストデータに対応してステップＳ１５０１でワークメモリ上に得られている説明文に含まれる名詞の形態素データ群が、ワークメモリ上の変数である名詞列Ｚに代入される（ステップＳ１５０５）。 Next, the noun morpheme data group included in the explanatory text obtained on the work memory in step S1501 corresponding to the text data of the component name acquired in step S1503 or step S1513 described later is stored on the work memory. Is substituted into a noun string Z that is a variable of (step S1505).

次に、ステップＳ１５０５で得られたワークメモリ上の名詞列Ｚから、ステップＳ１５０４で得られたワークメモリ上の名詞列Ｘ中の各名詞の形態素データが削除される（ステップＳ１５０６）。これは、前述した図１０の１００１から１００２の処理例、すなわち構成要素名に含まれる名詞の削除に対応する。 Next, the morpheme data of each noun in the noun string X on the work memory obtained at step S1504 is deleted from the noun string Z on the work memory obtained at step S1505 (step S1506). This corresponds to the processing examples 1001 to 1002 in FIG. 10 described above, that is, deletion of a noun included in the component name.

次に、ステップＳ１５０５で得られたワークメモリ上の名詞列Ｚから、ステップＳ１５０２で得られたワークメモリ上の名詞列Ｙ中の各名詞の形態素データが削除される（ステップＳ１５０７）。これは、前述した図１０の１００２から１００３の処理例、すなわち全ての構成要素の説明に含まれる名詞の削除に対応する。 Next, the morpheme data of each noun in the noun string Y on the work memory obtained at step S1502 is deleted from the noun string Z on the work memory obtained at step S1505 (step S1507). This corresponds to the processing examples 1002 to 1003 in FIG. 10 described above, that is, deletion of nouns included in the description of all components.

続いて、ステップＳ１５０７の結果、ワークメモリ上の名詞列Ｚに含まれる名詞の数がカウントされる（ステップＳ１５０８）。
そして、ステップＳ１５０８のカウント処理の結果、名詞列Ｚに含まれる名詞の数が所定の閾値以上であるか否かが判定される（ステップＳ１５０９）。 Subsequently, as a result of step S1507, the number of nouns included in the noun string Z on the work memory is counted (step S1508).
Then, as a result of the counting process in step S1508, it is determined whether or not the number of nouns included in the noun string Z is greater than or equal to a predetermined threshold (step S1509).

名詞列Ｚに含まれる名詞の数が所定の閾値以上であれば、その構成要素名に関しては、十分な情報量があると判定され、その判定結果が現在の構成要素名に対応させられてワークメモリ上に保持される（ステップＳ１５１０）。 If the number of nouns included in the noun string Z is equal to or greater than a predetermined threshold value, it is determined that there is a sufficient amount of information regarding the component name, and the determination result is associated with the current component name and the work is performed. It is held on the memory (step S1510).

一方、名詞列Ｚに含まれる名詞の数が所定の閾値より少なければ、その構成要素名に関しては、情報量が十分にはないと判定され、その判定結果が現在の構成要素名に対応させられてワークメモリ上に保持される（ステップＳ１５１１）。 On the other hand, if the number of nouns included in the noun string Z is less than a predetermined threshold, it is determined that the amount of information regarding the component name is not sufficient, and the determination result is associated with the current component name. Is held on the work memory (step S1511).

以上のステップＳ１５０４からＳ１５１１の一連の処理によって１つの構成要素名に対応する情報量の判定処理が終わると、全ての構成要素名について処理が完了したか否かが判定される（ステップＳ１５１２）。 When the information amount determination process corresponding to one component element name is completed by the series of processes of steps S1504 to S1511 described above, it is determined whether or not the process has been completed for all component element names (step S1512).

全ての構成要素名について処理が完了しておらずステップＳ１５１２の判定がＮＯならば、ステップＳ１４０３で抽出された各構成要素名のうちの未処理の次の構成要素名のテキストデータが、ワークメモリ上に取得される（ステップＳ１５１３）。 If the processing has not been completed for all the component names and the determination in step S1512 is NO, the text data of the next unprocessed component name among the component names extracted in step S1403 is stored in the work memory. Acquired above (step S1513).

そして、その構成要素名に対して再び、ステップＳ１５０４からＳ１５１１の一連の処理によって次の構成要素名に対応する情報量の判定処理が繰り返し実行される。
全ての構成要素名について処理が完了しステップＳ１５１２の判定がＹＥＳとなれば、図１５のフローチャートの処理、即ち図１４のステップＳ１４０６の処理を終了する。 Then, the information amount determination process corresponding to the next component element name is repeatedly performed on the component element name again by the series of processes of steps S1504 to S1511.
If the processing is completed for all component names and the determination in step S1512 is YES, the processing in the flowchart in FIG. 15, that is, the processing in step S1406 in FIG.

次に、ステップＳ１４０６によってワークメモリ上に保持されている構成要素名毎の判定結果がそれぞれ確認され（ステップＳ１４０７）、構成要素毎に、十分な情報量があると判定されていればステップＳ１４０９が実行される。 Next, the determination result for each component name held in the work memory is confirmed in step S1406 (step S1407). If it is determined that there is a sufficient amount of information for each component, step S1409 is performed. Executed.

一方、前述した図１５のステップＳ１５１１において情報量が十分にはないとの判定結果がワークメモリ上に得られておりステップＳ１４０７の判定がＮＯとなった場合には、以下のステップＳ１４０８の処理が実行された後に、ステップＳ１４０９が実行される。ステップＳ１４０８では、ステップＳ１４０２で解析されたセクションのうち実施例（図１９参照）又は発明を実施するための形態のセクションに対応するテキストデータが、ワークメモリ上の解析済み文書テーブルから取得される。次に、そのテキストデータから、現在の構成要素名を含み、かつ、実施例中の説明表現パターン辞書１１５（図１）に記録された表現パターンによって記述されているテキストデータ部分が抽出される。この結果得られるテキストデータが、上記構成要素を説明するテキストデータとして、ワークメモリ上の構成要素テーブル１１３（図１、図２２）の上記構成要素名の登録エントリの「構成要素の説明文」フィールドに登録される。この処理は、図１の実施例からの説明抽出部１０６の機能を実現し、前述の第２の手順５に対応する。図２１は、実施例中の説明表現パターン辞書１１５のデータ構成例を示す図である。この辞書には、実施例（図１９参照）又は発明を実施するための形態中で構成要素名とその構成要素名に対応する説明文とを結びつける表現パターンの文字列、例えば「であり、」が記録されている。この文字列の前が構成要素名に対応し後が説明文に対応するとして、その説明文が検出される。 On the other hand, if the determination result that the amount of information is not sufficient is obtained on the work memory in step S1511 of FIG. 15 described above and the determination in step S1407 is NO, the processing in the following step S1408 is performed. After being executed, step S1409 is executed. In step S1408, text data corresponding to the section of the embodiment (see FIG. 19) or the form for carrying out the invention among the sections analyzed in step S1402 is acquired from the analyzed document table on the work memory. Next, the text data portion including the current component name and described by the expression pattern recorded in the explanatory expression pattern dictionary 115 (FIG. 1) in the embodiment is extracted from the text data. The text data obtained as a result is the “descriptive text of the constituent element” field of the registered entry of the constituent element name in the constituent element table 113 (FIGS. 1 and 22) on the work memory as the text data explaining the constituent element. Registered in This process realizes the function of the explanation extracting unit 106 from the embodiment of FIG. 1 and corresponds to the second procedure 5 described above. FIG. 21 is a diagram illustrating a data configuration example of the explanatory expression pattern dictionary 115 in the embodiment. In this dictionary, in the embodiment (see FIG. 19) or in the form for carrying out the invention, a character string of an expression pattern that links a component name and a description corresponding to the component name, for example, “is,” Is recorded. The description is detected assuming that the front of this character string corresponds to the component name and the subsequent corresponds to the description.

次に、ステップＳ１４０２で解析されたセクションのうち発明の効果（図１９参照）又は発明の作用のセクションに対応するテキストデータから、効果が得られる理由が記述されているテキストデータが抽出される（ステップＳ１４０９）。この処理は、図１の効果が得られる理由抽出部１０７の機能を実現し、前述の第１の手順３又は第２の手順６に対応する。図１６は、ステップＳ１４０９の処理の詳細を示すフローチャートである。このフローチャートによる処理は、例えば、ＣＰＵ（中央演算装置）とメモリと外部記憶装置を搭載したコンピュータシステムによって実行される。 Next, text data describing the reason why the effect is obtained is extracted from the text data corresponding to the section of the effect of the invention (see FIG. 19) or the action of the invention among the sections analyzed in step S1402 (see FIG. 19). Step S1409). This process realizes the function of the reason extraction unit 107 that can obtain the effect of FIG. 1 and corresponds to the first procedure 3 or the second procedure 6 described above. FIG. 16 is a flowchart showing details of the process in step S1409. The process according to this flowchart is executed by, for example, a computer system equipped with a CPU (Central Processing Unit), a memory, and an external storage device.

図１６においてまず、効果が得られる理由表現パターン辞書１１６（図１）から効果が得られる理由表現パターンがワークメモリに読み込まれる（ステップＳ１６０１）。図２３は、効果が得られる理由表現パターン辞書１１６のデータ構成例を示す図である。この辞書には、特許文書中の発明の効果を記載した文章部分において、効果が記載される文字列部分とその効果が得られる理由が記載される文字列部分とを分離する表現パターンの文字列が記録されている。効果が得られる理由表現パターン辞書のパターン文字列フィールドには、例えば、「ので」、「ことにより」、「ため」が格納されている。これらの表現パターン文字列に続く部分が、効果が記載されている文字列部分である。また、これらの表現パターン文字列の前の部分が、効果が得られる理由が記載されている文字列部分である。この辞書は、ワークメモリ又はハードディスク装置等に保持されている。ステップ１６０１では、効果が得られる理由表現パターン辞書１１６の各エントリに記録されたパターン文字列が、ワークメモリに読み込まれる。 In FIG. 16, first, a reason expression pattern for obtaining an effect is read into the work memory from the reason expression pattern dictionary 116 (FIG. 1) for obtaining the effect (step S1601). FIG. 23 is a diagram illustrating a data configuration example of the reason expression pattern dictionary 116 from which an effect is obtained. In this dictionary, in a text part describing the effect of the invention in a patent document, a character string of an expression pattern that separates a character string part describing the effect and a character string part describing the reason for obtaining the effect Is recorded. In the pattern character string field of the reason expression pattern dictionary where the effect is obtained, for example, “so”, “by” and “for” are stored. The part following these expression pattern character strings is a character string part describing the effect. Further, the part before these expression pattern character strings is a character string part describing the reason why the effect is obtained. This dictionary is held in a work memory or a hard disk device. In step 1601, the pattern character string recorded in each entry of the reason expression pattern dictionary 116 for obtaining the effect is read into the work memory.

次に、ステップＳ１４０２で解析されたセクションのうち発明の効果（図１９参照）又は発明の作用のセクションに対応するテキストデータが、ワークメモリ上の解析済み文書テーブルから取得される（ステップＳ１６０２）。 Next, text data corresponding to the effect of the invention (see FIG. 19) or the section of the action of the invention in the section analyzed in step S1402 is acquired from the analyzed document table on the work memory (step S1602).

次に、取得された発明の効果等のセクションに対応するテキストデータが、句点によって１つ以上の文字列に分割される（ステップＳ１６０３）。
次に、ステップＳ１６０３で分割された文字列のうち１つ目の文字列が、ワークメモリ上の変数である文字列ｒに代入される（ステップＳ１６０４）。 Next, the acquired text data corresponding to the section of the effect or the like of the invention is divided into one or more character strings by the phrase (step S1603).
Next, the first character string among the character strings divided in step S1603 is substituted into the character string r that is a variable on the work memory (step S1604).

次に、文字列ｒに、ステップＳ１６０１にてワークメモリ上に読み込まれている何れかの効果が得られる理由表現パターンのパターン文字列（図２３参照）のいずれかを含むか否かが判定される（ステップＳ１６０５）。 Next, it is determined whether or not the character string r includes any one of the pattern character strings (see FIG. 23) of the reason expression pattern that can obtain any of the effects read in the work memory in step S1601. (Step S1605).

ステップＳ１６０５の判定がＹＥＳならば、文字列ｒにおいて、ステップＳ１６０５で含まれると判定された効果が得られる理由表現パターンのパターン文字列の前の文字列のテキストデータ部分が、効果が得られる理由のテキストデータとして抽出される（ステップＳ１６０６）。このテキストデータは、ワークメモリ上に保持される。その後、図１６の動作フローチャート即ち図１５のステップＳ１４０９の処理を終了する。 If the determination in step S1605 is YES, in the character string r, the reason why the effect determined to be included in step S1605 is obtained. The reason why the text data part of the character string before the pattern character string of the expression pattern is effective. Are extracted as text data (step S1606). This text data is held on the work memory. Thereafter, the operation flowchart of FIG. 16, that is, the process of step S1409 of FIG.

ステップＳ１６０５の判定がＮＯならば、ステップＳ１６０３にて分割された全ての文字列が処理されたか否かが判定される（ステップＳ１６０７）。
全ての文字列が処理されておらずステップＳ１６０７の判定がＮＯならば、ステップＳ１６０３で分割された文字列のうち未処理の次の文字列が、ワークメモリ上の文字列ｒに代入される（ステップＳ１６０８）。 If the determination in step S1605 is NO, it is determined whether all the character strings divided in step S1603 have been processed (step S1607).
If all the character strings have not been processed and the determination in step S1607 is NO, the next unprocessed character string among the character strings divided in step S1603 is assigned to the character string r on the work memory ( Step S1608).

その後、ステップＳ１６０５からＳ１６０７までの一連の処理が繰り返し実行される。
上記繰り返しの処理において、ステップＳ１６０３にて分割された全ての文字列が処理されステップＳ１６０７の判定がＹＥＳとなると、効果が得られる理由のテキストデータは抽出されずに、図１６のフローチャート即ち図１５のステップＳ１４０９の処理を終了する。 Thereafter, a series of processing from step S1605 to S1607 is repeatedly executed.
In the above repetitive processing, if all the character strings divided in step S1603 are processed and the determination in step S1607 is YES, the text data of the reason why the effect is obtained is not extracted, and the flowchart of FIG. In step S1409, the process ends.

以上の処理により、例えば、図７（ｂ）の「発明の効果」について、図１２の１２０１から１２０２として示されるように、効果が得られる理由が記述されているテキストデータが抽出される。 With the above processing, for example, text data describing the reason why the effect is obtained is extracted as shown by 1201 to 1202 in FIG.

次に、図１４で、ステップＳ１４０７の判定がＹＥＳの場合にはステップＳ１４０５で、一方、ステップＳ１４０７の判定がＮＯの場合にはステップＳ１４０８で、それぞれ構成要素テーブル１１３に登録された各構成要素の説明文のテキストデータと、ステップＳ１４０９で抽出された効果が得られる理由のテキストデータとの類似度が算出される（ステップＳ１４１０）。この処理は、図１の類似度計算部１０８の機能を実現し、第１の手順４又は第２の手順７の前半に対応する。なお、ステップＳ１４０９で効果が得られる理由のテキストデータが抽出されなかった場合（図１６のステップＳ１６０７の判定がＹＥＳとなった場合）は、ステップＳ１４１０〜Ｓ１４１２の構成要素のハイライト処理は実行されずに処理を終了する。図１７は、ステップＳ１４１０の処理の詳細を示すフローチャートである。このフローチャートによる処理は、例えば、ＣＰＵ（中央演算装置）とメモリと外部記憶装置を搭載したコンピュータシステムによって実行される。 Next, in FIG. 14, if the determination in step S 1407 is YES, in step S 1405, while if the determination in step S 1407 is NO, in step S 1408, each component registered in the component table 113 is displayed. The degree of similarity between the text data of the explanatory text and the text data of the reason for obtaining the effect extracted in step S1409 is calculated (step S1410). This process realizes the function of the similarity calculation unit 108 in FIG. 1 and corresponds to the first half of the first procedure 4 or the second procedure 7. Note that if text data indicating that the effect is obtained in step S1409 is not extracted (when the determination in step S1607 in FIG. 16 is YES), the highlight processing of the components in steps S1410 to S1412 is executed. The process ends without FIG. 17 is a flowchart showing details of the process in step S1410. The process according to this flowchart is executed by, for example, a computer system equipped with a CPU (Central Processing Unit), a memory, and an external storage device.

まず、図２２に例示される構成要素テーブル１１３の「構成要素の説明文」フィールドに登録されている全ての説明文に対して形態素解析が実行され、その結果得られる形態素データのうちの名詞のデータが、構成要素テーブル１１３（図２２）の「構成要素の説明文に含まれる名詞」フィールドに登録される（ステップＳ１７０１）。 First, morpheme analysis is performed on all explanatory texts registered in the “descriptive text of constituent elements” field of the constituent element table 113 illustrated in FIG. The data is registered in the “nouns included in the explanatory text of the component” field of the component table 113 (FIG. 22) (step S1701).

次に、ステップＳ１７０１で算出された各名詞の形態素データについて、図１４のステップＳ１４０１で読み込まれた特許文書の全文での出現頻度がカウントされる（ステップＳ１７０２）。 Next, for the morphological data of each noun calculated in step S1701, the appearance frequency of the patent document read in step S1401 in FIG. 14 in the full text is counted (step S1702).

次に、ステップＳ１７０２でカウントされた出現頻度の逆数として、ステップＳ１７０１で算出された各名詞の形態素データの重みが算出される（ステップＳ１７０３）。この結果得られる重み値は、ワークメモリ上の形態素重みテーブルに保持される。図２４は、形態素重みテーブルのデータ構成例を示す図である。このテーブルは例えば、形態素の文字列をキーとして出現頻度及び重み値を出力する２つの連想配列データによって実現することができる。 Next, the weight of the morpheme data of each noun calculated in step S1701 is calculated as the reciprocal of the appearance frequency counted in step S1702 (step S1703). The weight value obtained as a result is held in a morpheme weight table on the work memory. FIG. 24 is a diagram illustrating a data configuration example of a morpheme weight table. This table can be realized, for example, by two associative array data that outputs the appearance frequency and the weight value using a morpheme character string as a key.

次に、図１４のステップＳ１４０９で算出された効果が得られる理由に対応するテキストデータが形態素解析されて名詞の形態素データが取得され、それらがワークメモリ上の変数として保持される名詞列αに代入される（ステップＳ１７０４）。 Next, the text data corresponding to the reason why the effect calculated in step S1409 in FIG. 14 is obtained is subjected to morphological analysis to obtain noun morpheme data, and these are stored in the noun string α held as variables on the work memory. It is substituted (step S1704).

次に、構成要素テーブル１１３から、１つ目の構成要素のエントリの「構成要素の説明文」フィールドに登録されている説明文のテキストデータがワークメモリに読み込まれる（ステップＳ１７０５）。 Next, the text data of the explanatory text registered in the “descriptive text of the constituent element” field of the entry of the first constituent element is read from the constituent element table 113 into the work memory (step S1705).

次に、ステップＳ１７０５で読み込まれた構成要素の説明文に含まれる名詞群が、構成要素テーブル１１３上の該当するエントリの「構成要素の説明文に含まれる名詞」フィールドから、ワークメモリ上の変数である名詞列βに読み込まれる（ステップＳ１７０６）。 Next, the noun group included in the description of the component read in step S1705 is a variable on the work memory from the “noun included in the description of component” field of the corresponding entry on the component table 113. Is read into the noun string β (step S1706).

次に、ワークメモリ上の名詞列αと名詞列βとで、それぞれに共通する名詞が抽出される。そして、形態素重みテーブル（図２４）が参照・取得されながら、各抽出された名詞に対応するステップＳ１７０３で算出された重み値の総和値が計算される（ステップＳ１７０７）。 Next, common nouns are extracted from the noun string α and the noun string β on the work memory. Then, while referring to and acquiring the morpheme weight table (FIG. 24), the sum of the weight values calculated in step S1703 corresponding to each extracted noun is calculated (step S1707).

次に、ステップＳ１７０７で算出された総和値が、現在処理している構成要素に対応する類似度としてワークメモリ上の類似度テーブル１１７に保持される（ステップＳ１７０８）。図２５は、類似度テーブル１１７のデータ構成例を示す図である。構成要素名毎に、効果が得られる理由と共通する形態素の文字列と、類似度値が登録される。 Next, the total value calculated in step S1707 is held in the similarity table 117 on the work memory as the similarity corresponding to the component currently being processed (step S1708). FIG. 25 is a diagram illustrating a data configuration example of the similarity table 117. For each component name, a morpheme character string common to the reason for obtaining the effect and a similarity value are registered.

その後、構成要素テーブル１１３上の全ての構成要素のエントリについて処理が完了したか否かが判定される（ステップＳ１７０９）。
ステップＳ１７０９の判定がＮＯならば、構成要素テーブル１１３から次の構成要素のエントリの「構成要素の説明文」フィールドに登録されている説明文のテキストデータがワークメモリに読み込まれる（ステップＳ１７１０）。そして、ステップＳ１７０６からＳ１７０８までの一連の処理により、その構成要素に対応する類似度を算出する処理が繰り返し実行される。 Thereafter, it is determined whether or not the processing has been completed for all the component entries on the component table 113 (step S1709).
If the determination in step S1709 is NO, the text data of the explanatory text registered in the “descriptive text of the constituent element” field of the entry of the next constituent element is read from the constituent element table 113 into the work memory (step S1710). Then, through a series of processing from step S1706 to S1708, processing for calculating the similarity corresponding to the component is repeatedly executed.

構成要素テーブル１１３上の全ての構成要素のエントリについて処理が完了しステップＳ１７０９の判定がＹＥＳとなると、図１７のフローチャート、即ち図１４のステップＳ１４１０の処理を終了する。 When the processing is completed for all the component entries on the component table 113 and the determination in step S1709 is YES, the flowchart in FIG. 17, that is, the process in step S1410 in FIG.

その後、図１４において、ステップＳ１４１０にてワークメモリ上に得られた図２５に例示される類似度テーブル１１７が参照され、各構成要素毎の類似度のうち最も類似度が高い構成要素が、ポイントとなる構成要素として算出される（ステップＳ１４１１）。この処理は、図１のポイント構成要素特定部１０９の機能を実現し、第１の手順４又は第２手順７の後半に対応する。 Thereafter, in FIG. 14, the similarity table 117 illustrated in FIG. 25 obtained on the work memory in step S <b> 1410 is referred to, and the component having the highest similarity among the similarities for each component is the point. Is calculated as a constituent element (step S1411). This process realizes the function of the point component specifying unit 109 in FIG. 1 and corresponds to the second half of the first procedure 4 or the second procedure 7.

最後に、ステップＳ１４０１上で読み込まれた特許文書中で、ステップＳ１４１１で特定されたポイントとなる構成要素に対応する文字列がハイライト表示されて、ディスプレイ（例えば図３０の入出力装置３００３の出力部）に表示される。この処理は、図１の表示部１１０の機能を実現する。 Finally, in the patent document read in step S1401, the character string corresponding to the constituent element specified in step S1411 is highlighted and displayed on the display (for example, the output of the input / output device 3003 in FIG. 30). Part). This process realizes the function of the display unit 110 in FIG.

以上説明した実施形態において、構成要素としては名詞のみが対象とされて処理されているが、名詞以外にも、動詞、形容詞、形容動詞、連体詞等の所定の品詞であってもよい。 In the embodiment described above, only nouns are processed as components, but predetermined parts of speech such as verbs, adjectives, adjective verbs, and conjunctions may be used in addition to nouns.

また、構成要素名としては、「抽出手段」のような複合語だけではなく、「抽出する手段」のように句を使うこともできる。句を使う場合には、構文解析を行い、「と、」の直前にある名詞と、その名詞に係っている修飾語をつなげて名詞句が生成され、構成要素名とされる。また、全ての修飾語をつなげなくても、例えば、最も近接する修飾語１つのみをつなげるようにしてもいい。 As the component name, not only a compound word such as “extraction means” but also a phrase such as “extraction means” can be used. In the case of using a phrase, parsing is performed, and a noun phrase is generated by connecting a noun immediately preceding “to” and a modifier related to the noun, and is used as a component name. Further, even if not all modifiers are connected, for example, only one closest modifier may be connected.

上述の構文解析の１つとして、係り受け解析の処理を図２７のフローチャート及び図２８の形態素の辞書のテーブル（ａ）、品詞の接続確率のテーブル（ｂ）、文節の接続ルールのテーブル（ｃ）、文節の接続確率のテーブル（ｄ）を参照して説明する。なお、この動作フローチャートにおける各処理は、例えばＣＰＵとメモリと外部記憶装置を搭載したコンピュータシステムによって実行される。 As one of the above-mentioned syntax analysis, the dependency analysis processing is performed by using the flowchart of FIG. 27 and the morpheme dictionary table (a), the part-of-speech connection probability table (b), and the clause connection rule table (c). ) And the phrase connection probability table (d). Each process in the operation flowchart is executed by a computer system including, for example, a CPU, a memory, and an external storage device.

ステップＳ２７０１：まず、構成要素とその説明文に対して、形態素解析が実行される。名詞、助詞、動詞などの品詞と文字列を要素とする形態素の辞書（ａ）と、品詞の接続確率を定義したテーブル（ｂ）を用いて、最も確率が高い形態素列を選択する。形態素の辞書（ａ）と、品詞の接続確率を定義したテーブル（ｂ）は、例えばワークメモリ又は記憶装置上に記憶されている。 Step S2701 : First, morphological analysis is performed on the constituent elements and their explanations. The morpheme string having the highest probability is selected using a dictionary (a) of morphemes whose elements are parts of speech and character strings such as nouns, particles, and verbs, and a table (b) defining connection probabilities of parts of speech. The morpheme dictionary (a) and the table (b) defining the connection probabilities of parts of speech are stored, for example, on a work memory or a storage device.

ステップＳ２７０２：次に、文節分割の処理を行なう。すなわち文節の接続ルール（ｃ）によって形態素を接続して文節を生成する。 Step S2702 : Next, phrase division processing is performed. That is, a phrase is generated by connecting morphemes according to the phrase connection rule (c).

ステップＳ２７０３：次に、係り受け解析を行う。即ち、
（１）文節の接続確率（ｄ）と制約ルールによって、最も確率が高い係り受け先を選択する。
（２）接続確率（ｄ）は文節のタイプや近接関係などにより定義する。
（３）近接関係の処理を行なう。例えば、２つの形態素は隣接している、あるいは、２つの形態素は１文節間隔である等である。 Step S2703 : Next, dependency analysis is performed. That is,
(1) The dependency destination with the highest probability is selected according to the connection probability (d) of the clause and the constraint rule.
(2) The connection probability (d) is defined by the phrase type and proximity relationship.
(3) Perform proximity processing. For example, two morphemes are adjacent, or two morphemes are one segment interval, etc.

これに対し、以下のような制約条件を満たし、かつ、確率が最大になる係り元を求める。
・最後以外の文節は後方に必ず一つの係り先文節を持つ
・係り受け関係は交差しない
最終的には、２つの形態素間の係り受け組が生成される。 On the other hand, a relational element that satisfies the following constraints and has the maximum probability is obtained.
・ A clause other than the last one always has one dependency clause behind. ・ A dependency relationship does not intersect. Finally, a dependency pair between two morphemes is generated.

上述の係り受け解析処理では、係り受け組は、辞書とのパターンマッチングと、マッチしたパターンに付けられている確率の計算によって生成される。
以上の係り受け解析処理によって、例えば、「と、」の直前にある名詞と、その名詞に係っている修飾語をつなげて名詞句が生成され、構成要素名とされる。 In the dependency analysis process described above, the dependency pair is generated by pattern matching with the dictionary and calculation of the probability attached to the matched pattern.
By the dependency analysis process described above, for example, a noun phrase is generated by connecting a noun immediately preceding “to” and a modifier related to the noun to form a component name.

図２９は、図１４のステップＳ１４０３、Ｓ１４０４、図１５のステップＳ１５０２、Ｓ１５０６、Ｓ１５０７、図１６のステップＳ１６０３、ステップＳ１６０５、Ｓ１６０６、図１７のステップＳ１７０２、Ｓ１７０７等の各処理におけるパターンマッチング処理及び文字列位置検出処理の具体的な動作を示す動作フローチャートである。この動作フローチャートにおける各処理は、例えばＣＰＵとメモリと外部記憶装置を搭載したコンピュータシステムによって実行される。 FIG. 29 shows pattern matching processing and characters in steps S1403 and S1404 in FIG. 14, steps S1502, S1506 and S1507 in FIG. 15, steps S1603, S1605 and S1606 in FIG. 16, steps S1702 and S1707 in FIG. It is an operation | movement flowchart which shows the specific operation | movement of a column position detection process. Each process in the operation flowchart is executed by, for example, a computer system including a CPU, a memory, and an external storage device.

図２９では、まず、検索を行いたいパターン文字列のデータがワークメモリ又は記憶装置からワークメモリ上の所定領域に読み込まれる（ステップＳ２９０１）。例えば、図１４のステップＳ１４０３では、このパターン文字列は、記憶装置上の構成要素表現パターン辞書１１２（図１）に記録されている表現パターンの文字列である。 In FIG. 29, first, pattern character string data to be searched is read from a work memory or a storage device into a predetermined area on the work memory (step S2901). For example, in step S1403 of FIG. 14, this pattern character string is a character string of an expression pattern recorded in the component element expression pattern dictionary 112 (FIG. 1) on the storage device.

次に、パターンマッチング処理の対象となる文字列であるマッチング対象文字列のデータがワークメモリ又は記憶装置からワークメモリ上の所定領域に読み込まれる（ステップＳ２９０２）。例えば、図１４のステップＳ１４０３では、マッチング対象文字列は、ワークメモリ上の図１４に例示される解析済み文書テーブルから取得される請求項のセクションに対応するテキストデータである。 Next, data of a matching target character string that is a character string to be subjected to pattern matching processing is read from a work memory or a storage device into a predetermined area on the work memory (step S2902). For example, in step S1403 of FIG. 14, the matching target character string is text data corresponding to a section of a claim acquired from the analyzed document table illustrated in FIG. 14 on the work memory.

次に、ワークメモリ上の変数領域に保持されるパターンポインタｐ１に、ステップＳ２９０１でワークメモリ上の所定領域に読み込まれたパターン文字列の先頭文字のアドレスがセットされる（ステップ６０３）。 Next, the address of the first character of the pattern character string read in the predetermined area in step S2901 is set in the pattern pointer p1 held in the variable area on the work memory (step 603).

次に、ワークメモリ上の変数領域に保持されるマッチング基点ポインタｐ２とマッチングポインタｐ３に、ステップＳ２９０２でワークメモリ上の所定領域に読み込まれたマッチング対象文字列の先頭文字のアドレスがセットされる（ステップＳ２９０４、Ｓ２９０５）。 Next, the address of the first character of the character string to be matched read in the predetermined area in the work memory in step S2902 is set in the matching base point pointer p2 and the matching pointer p3 held in the variable area on the work memory ( Steps S2904, S2905).

次に、マッチングポインタｐ３のアドレスが１ずつインクリメントされながら（ステップＳ２９０７）、パターンポインタｐ１で示されるパターン文字列中の先頭文字と一致する文字が、マッチング対象文字列中から検索される（ステップＳ２９０６→Ｓ２９０７→Ｓ２９０８→Ｓ２９０６の繰返し処理）。 Next, while the address of the matching pointer p3 is incremented by 1 (step S2907), a character that matches the first character in the pattern character string indicated by the pattern pointer p1 is searched from the matching target character string (step S2906). → S2907 → S2908 → S2906 repetitive processing).

なお、マッチングポインタｐ３がインクリメントされた結果、ｐ３の値がワークメモリ上の所定領域に読み込まれているマッチング対象文字列の末尾のアドレスを超えた場合には、マッチング無しが出力され図２９の処理を終了する（ステップＳ２９０８→Ｓ２９０９）。 As a result of incrementing the matching pointer p3, if the value of p3 exceeds the end address of the matching target character string read into the predetermined area on the work memory, no matching is output and the processing of FIG. Is finished (step S2908 → S2909).

パターンポインタｐ１で示されるパターン文字列中の先頭文字がマッチング対象文字列中で一致しステップＳ２９０６の判定がＹＥＳになると、以下のステップＳ２８１０からＳ２８１７の一連の処理が実行される。ここでは、マッチング対象文字列中で上記一致位置から続く文字列がパターン文字列の２文字目以降の全てと一致するか否かが判定される。 When the first character in the pattern character string indicated by the pattern pointer p1 matches in the character string to be matched and the determination in step S2906 becomes YES, the following series of processing from step S2810 to S2817 is executed. Here, it is determined whether or not the character string continuing from the matching position in the matching target character string matches all the second and subsequent characters of the pattern character string.

まず、ワークメモリ上のマッチング基点ポインタｐ２にステップＳ２９０６で一致が検出されたときのマッチングポインタｐ３のアドレスがセットされる（ステップＳ２８１０）。 First, the address of the matching pointer p3 when a match is detected in step S2906 is set to the matching base point pointer p2 on the work memory (step S2810).

次に、ワークメモリ上のパターンポインタｐ１のアドレス値が＋１される（ステップＳ２８１１）。
次に、パターンポインタｐ１のアドレス値が、ワークメモリ上の所定領域に読み込まれているパターン文字列の末尾のアドレスを超えたか否かが判定される（ステップＳ２８１２）。 Next, the address value of the pattern pointer p1 on the work memory is incremented by 1 (step S2811).
Next, it is determined whether or not the address value of the pattern pointer p1 has exceeded the last address of the pattern character string read into the predetermined area on the work memory (step S2812).

ステップＳ２８１２の判定がＮＯならば、マッチングポインタｐ３のアドレス値が＋１される（ステップＳ２８１３）。
次に、マッチングポインタｐ３のアドレス値が、ワークメモリ上の所定領域に読み込まれているマッチング対象文字列の末尾のアドレスを超えたか否かが判定される（ステップＳ２８１５）。 If the determination in step S2812 is NO, the address value of the matching pointer p3 is incremented by 1 (step S2813).
Next, it is determined whether or not the address value of the matching pointer p3 exceeds the end address of the matching target character string read into the predetermined area on the work memory (step S2815).

ステップＳ２８１４の判定がＮＯならば、マッチングポインタｐ３が指すマッチング対象文字列中のアドレス位置の文字が、パターンポインタｐ１が指すパターン文字列中のアドレス位置の文字と一致するか否かが判定される（ステップＳ２８１７）。つまり、ステップＳ２９０６がＹＥＳとなった後に最初にステップＳ２８１７が実行されるときには、マッチング対象文字列中でステップＳ２９０６で一致位置が検出された位置から数えて２文字目が、パターン文字列の２文字目と一致するか否かが判定される。 If the determination in step S2814 is NO, it is determined whether or not the character at the address position in the matching target character string pointed to by the matching pointer p3 matches the character at the address position in the pattern character string pointed to by the pattern pointer p1. (Step S2817). That is, when step S2817 is executed first after step S2906 becomes YES, the second character counted from the position where the matching position is detected in step S2906 in the matching target character string is the two characters of the pattern character string. It is determined whether or not the eye matches.

ステップＳ２８１７の判定がＹＥＳならば、ステップＳ２８１１に戻って再びステップＳ２８１７までの一連の処理が実行される。ここでは、パターンポインタｐ１とマッチングポインタｐ３の各アドレス値が＋１される（ステップＳ２８１１、Ｓ２８１４）。この結果、更にマッチング対象文字列中でステップＳ２９０６により一致位置が検出された位置から数えて３文字目が、パターン文字列の３文字目と一致するか否かが判定される。 If the determination in step S2817 is YES, the process returns to step S2811, and a series of processing up to step S2817 is executed again. Here, the address values of the pattern pointer p1 and the matching pointer p3 are incremented by 1 (steps S2811, S2814). As a result, it is further determined whether or not the third character counted from the position where the matching position is detected in step S2906 in the matching target character string matches the third character of the pattern character string.

以上の処理の繰返しにより、マッチング対象文字列中でステップＳ２９０６で一致が検出された位置から続く文字列がパターン文字列の２文字目以降の全てと一致するか否かが判定される。 By repeating the above processing, it is determined whether or not the character string continuing from the position where the match is detected in step S2906 in the matching target character string matches all the second and subsequent characters of the pattern character string.

上記一連の処理で、ステップＳ２８１１でインクリメントされたパターンポインタｐ１の値がワークメモリ上の所定領域に読み込まれているパターン文字列の末尾のアドレスを超えた場合は、マッチング対象文字列上でパターン文字列との一致が検出されたことになる。この場合には、マッチング対象文字列中でステップＳ２９０６により一致が検出された位置を示すマッチング基点ポインタｐ２のアドレス値がマッチング結果として出力され、図２９の処理を終了する（ステップＳ２８１２→Ｓ２８１３）。 In the above series of processing, if the value of the pattern pointer p1 incremented in step S2811 exceeds the end address of the pattern character string read in the predetermined area on the work memory, the pattern character on the matching target character string A match with the column is detected. In this case, the address value of the matching base point pointer p2 indicating the position where a match is detected in step S2906 in the matching target character string is output as a matching result, and the processing in FIG. 29 is terminated (steps S2812 → S2813).

一方、ステップＳ２８１４でインクリメントされたマッチングポインタｐ３の値が、ワークメモリ上の所定領域に読み込まれているマッチング対象文字列の末尾のアドレスを超えた場合には、マッチング無しが出力され図２９の処理を終了する（ステップＳ２８１５→Ｓ２８１６）。 On the other hand, if the value of the matching pointer p3 incremented in step S2814 exceeds the end address of the matching target character string read into the predetermined area on the work memory, no matching is output and the processing in FIG. Is terminated (steps S2815 → S2816).

上記一連の処理で、マッチングポインタｐ３が指すマッチング対象文字列中のアドレス位置の文字が、パターンポインタｐ１が指すパターン文字列中のアドレス位置の文字と一致せずステップＳ２８１７の判定がＮＯとなったとなった場合には、再検索が実行される。即ち、パターンポインタｐ１に、ステップＳ２９０１でワークメモリ上の所定領域に読み込まれたパターン文字列の先頭文字のアドレスがセットされる（ステップ６１８）。また、マッチングポインタｐ３に、マッチング基点ポインタｐ２が指すステップＳ２９０６で一致が検出された位置の次の位置に対応するアドレス値がセットされる（ステップＳ２８１９）。その後、ステップＳ２９０６の処理に戻り、マッチングポインタｐ３のアドレスが上記一致位置の次の位置から１ずつインクリメントされながら（ステップＳ２９０７）、以下の処理が実行される。即ち、パターンポインタｐ１で示されるパターン文字列中の先頭文字と一致する文字が、マッチング対象文字列中から再検索される（ステップＳ２９０６→Ｓ２９０７→Ｓ２９０８→Ｓ２９０６の繰返し処理）。 In the above-described series of processing, the character at the address position in the matching target character string pointed to by the matching pointer p3 does not match the character at the address position in the pattern character string pointed to by the pattern pointer p1, and the determination in step S2817 is NO. If it becomes, re-search is executed. That is, the address of the first character of the pattern character string read into the predetermined area on the work memory in step S2901 is set to the pattern pointer p1 (step 618). Further, the address value corresponding to the position next to the position where the match is detected in step S2906 pointed to by the matching base point pointer p2 is set in the matching pointer p3 (step S2819). Thereafter, the processing returns to step S2906, and the following processing is executed while the address of the matching pointer p3 is incremented by 1 from the position next to the matching position (step S2907). That is, the character that matches the first character in the pattern character string indicated by the pattern pointer p1 is searched again from the character string to be matched (steps S2906 → S2907 → S2908 → S2906 repetitive processing).

そして、パターンポインタｐ１で示されるパターン文字列中の先頭文字がマッチング対象文字列中で再度一致しステップＳ２９０６の判定がＹＥＳになると、以下のステップＳ２８１０からＳ２８１７の一連の処理が実行される。これにより、マッチング対象文字列中で上記一致位置から続く文字列がパターン文字列の２文字目以降の全てと一致するか否かが再度判定される。 When the first character in the pattern character string indicated by the pattern pointer p1 matches again in the matching target character string and the determination in step S2906 becomes YES, the following series of processing from step S2810 to S2817 is executed. Thereby, it is determined again whether or not the character string continuing from the matching position in the matching target character string matches all the second and subsequent characters of the pattern character string.

以上の一連の処理の繰返しにより、マッチング対象文字列中でパターン文字列が検索され、マッチングに成功した場合には、その一致の先頭位置がマッチング結果として出力される（ステップＳ２８１３）。 By repeating the above-described series of processing, a pattern character string is searched for in the matching target character string. If matching is successful, the head position of the match is output as a matching result (step S2813).

図３０は、上述の各実施形態のシステムを実現できるコンピュータのハードウェア構成の一例を示す図である。
図３０に示されるコンピュータは、ＣＰＵ３００１、メモリ３００２、入出力装置３００３、外部記憶装置３００５、可搬記録媒体３００９が挿入される可搬記録媒体駆動装置３００６、及び通信インターフェース３００７を有し、これらがバス３００８によって相互に接続された構成を有する。 FIG. 30 is a diagram illustrating an example of a hardware configuration of a computer that can implement the system according to each of the above-described embodiments.
The computer shown in FIG. 30 includes a CPU 3001, a memory 3002, an input / output device 3003, an external storage device 3005, a portable recording medium driving device 3006 into which a portable recording medium 3009 is inserted, and a communication interface 3007. The buses 3008 are connected to each other.

ＣＰＵ３００１は、当該コンピュータ全体の制御を行う。メモリ３００２は、プログラムの実行、データ更新等の際に、外部記憶装置３００５（或いは可搬記録媒体３００９）に記憶されているプログラム又はデータを一時的に格納するワーク領域を有する。ＣＵＰ３００１は、プログラムをメモリ３００２に読み出して実行することにより、全体の制御を行う。 The CPU 3001 controls the entire computer. The memory 3002 has a work area for temporarily storing a program or data stored in the external storage device 3005 (or portable recording medium 3009) when executing a program, updating data, or the like. The CUP 3001 performs overall control by reading the program into the memory 3002 and executing it.

入出力装置３００３は、ユーザによるキーボードやマウス等による入力操作を検出し、その検出結果をＣＰＵ３００１に通知すし、ＣＰＵ３００１の制御によって送られてくるデータを表示装置や印刷装置に出力する。 The input / output device 3003 detects an input operation by a user using a keyboard, a mouse, or the like, notifies the CPU 3001 of the detection result, and outputs data sent under the control of the CPU 3001 to a display device or a printing device.

外部記憶装置３００５は、例えばハードディスク記憶装置である。主に各種データやプログラムの保存に用いられる。
可搬記録媒体駆動装置３００６は、光ディスクやＳＤＲＡＭ、コンパクトフラッシュ（登録商標）等の可搬記録媒体３００９を収容するもので、外部記憶装置３００５の補助の役割を有する。 The external storage device 3005 is, for example, a hard disk storage device. Mainly used for storing various data and programs.
The portable recording medium driving device 3006 accommodates a portable recording medium 3009 such as an optical disk, SDRAM, or Compact Flash (registered trademark), and has an auxiliary role for the external storage device 3005.

通信インターフェース３００７は、例えばＬＡＮ（ローカルエリアネットワーク）又はＷＡＮ（ワイドエリアネットワーク）の通信回線を接続するための装置である。
本実施形態によるシステムは、実施形態の機能を実現する動作フローチャートに対応するプログラムをＣＰＵ３００１が実行することで実現される。そのプログラムは、例えば外部記憶装置３００５や可搬記録媒体３００９に記録して配布してもよく、或いは通信インターフェース３００７によりネットワークから取得できるようにしてもよい。 The communication interface 3007 is a device for connecting, for example, a LAN (local area network) or WAN (wide area network) communication line.
The system according to the present embodiment is realized by the CPU 3001 executing a program corresponding to the operation flowchart for realizing the functions of the embodiment. The program may be recorded and distributed in, for example, the external storage device 3005 or the portable recording medium 3009, or may be acquired from the network by the communication interface 3007.

以上の実施形態に関して、更に以下の付記を開示する。
（付記１）
特許文書の理解を支援するための装置であって、
請求項を構成する構成要素に対応する構成要素名の前又は後に記載される文字列パターンを記録した構成要素表現パターン辞書を用いて、特許文書データベースから抽出した特許文書中の請求項のテキストデータから各構成要素名を抽出する構成要素名抽出部と、
前記各構成要素名と前記構成要素表現パターン辞書に記録されている文字列パターンとに基づいて、前記請求項のテキストデータから、前記各構成要素名に対応する構成要素の説明をしている説明文のテキストデータを抽出する請求項からの構成要素の説明抽出部と、
効果が記載されている文字列部分と前記効果が得られる理由が記載されている文字列部分とを結びつける文字列パターンを記録した効果が得られる理由表現パターン辞書を用いて、前記特許文書中の発明の効果を記載しているテキストデータから、前記効果が得られる理由が記述されているテキストデータを抽出する効果が得られる理由抽出部と、
前記各構成要素の説明文のテキストデータと前記効果が得られる理由が記述されているテキストデータとの類似度を計算する類似度計算部と、
前記類似度が最も高い前記説明文のテキストデータに対応する構成要素を、ポイントとなる構成要素として特定するポイント構成要素特定部と、
前記ポイントとなる構成要素を出力する出力部と、
を含むことを特徴とする構成要素ハイライト装置。
（付記２）
特許文書の理解を支援するための装置であって、
請求項を構成する構成要素に対応する構成要素名の前又は後に記載される文字列パターンを記録した構成要素表現パターン辞書を用いて、特許文書データベースから抽出した特許文書中の請求項のテキストデータから各構成要素名を抽出する構成要素名抽出部と、
前記各構成要素名と前記構成要素表現パターン辞書に記録されている文字列パターンとに基づいて、前記請求項のテキストデータから、前記各構成要素名に対応する構成要素の説明をしている説明文のテキストデータを抽出する請求項からの構成要素の説明抽出部と、
前記各構成要素の説明をしている説明文のテキストデータに十分な情報が含まれているか否かを判定する情報量判定部と、
前記特許文書中の発明の詳細な説明中で前記構成要素名と前記構成要素名に対応する説明文とを結びつける文字列パターンを記録した実施例中の説明表現パターン辞書を用いて、前記発明の詳細な説明を記載しているテキストデータから、前記説明文のテキストデータに十分な情報が含まれていないと判定された構成要素名に対応する説明文のテキストデータを抽出する実施例からの説明抽出部と、
効果が記載される文字列部分と前記効果が得られる理由が記載される文字列部分とを結びつける文字列パターンを記録した効果が得られる理由表現パターン辞書を用いて、前記特許文書中の発明の効果を記載しているテキストデータから、前記効果が得られる理由が記述されているテキストデータを抽出する効果が得られる理由抽出部と、
前記各構成要素の説明文のテキストデータと前記効果が得られる理由が記述されているテキストデータとの類似度を計算する類似度計算部と、
前記類似度が最も高い前記説明文のテキストデータに対応する構成要素を、ポイントとなる構成要素として特定するポイント構成要素特定部と、
前記ポイントとなる構成要素を出力する出力部と、
を含むことを特徴とする構成要素ハイライト装置。
（付記３）
前記情報量判定部は、前記各構成要素毎に、前記構成要素の説明をしている説明文のテキストデータを形態素解析して得られる所定の品詞の形態素データから、前記構成要素に対応する前記構成要素名に含まれる形態素データと、他の全ての構成要素の説明文に共通に含まれる形態素データとを削除して得られる残りの前記所定の品詞の形態素データの数が所定の閾値以上であるか否かを判定することにより、前記各構成要素の説明をしている説明文のテキストデータに十分な情報が含まれているか否かを判定する、
ことを特徴とする付記２に記載の構成要素ハイライト装置。
（付記４）
前記類似度計算部は、
前記各構成要素名に対応する説明文のテキストデータから、形態素解析により所定の品詞の形態素データを抽出し、
前記効果が得られる理由が記述されているテキストデータから、形態素解析により所定の品詞の形態素データを抽出し、
前記各構成要素名に対応する説明文の形態素データと前記効果が得られる理由が記述されている形態素データとで、共通に抽出された前記所定の品詞の形態素データの数を、前記各構成要素名に対応する説明文のテキストデータと前記効果が得られる理由が記述されているテキストデータとの前記類似度として計算する、
ことを特徴とする付記１又は２に記載の構成要素ハイライト装置。
（付記５）
前記類似度計算部は、
前記所定の品詞の形態素データの各々について、前記特許文書中での出現頻度が少ないほど大きくなる重みを設定し、前記共通に抽出された前記所定の品詞の形態素データの数を、前記形態素データ毎の重みを付加して算出する、
ことを特徴とする付記６に記載の構成要素ハイライト装置。
（付記６）
特許文書の理解を支援するためのコンピュータに、
請求項を構成する構成要素に対応する構成要素名の前又は後に記載される文字列パターンを記録した表現パターンの文字列が記録されている構成要素表現パターン辞書を用いて、特許文書データベースから抽出した特許文書中の請求項のテキストデータから各構成要素名を抽出し、
前記各構成要素名と前記構成要素表現パターン辞書に記録されている文字列パターンとに基づいて、前記請求項のテキストデータから、前記各構成要素名に対応する構成要素の説明をしている説明文のテキストデータを抽出し、
効果が記載されている文字列部分と前記効果が得られる理由が記載されている文字列部分とを結びつける文字列パターンを記録した効果が得られる理由表現パターン辞書を用いて、前記特許文書中の発明の効果を記載しているテキストデータから、前記効果が得られる理由が記述されているテキストデータを抽出し、
前記各構成要素の説明文のテキストデータと前記効果が得られる理由が記述されているテキストデータとの類似度を計算し、
前記類似度が最も高い前記説明文のテキストデータに対応する構成要素を、ポイントとなる構成要素として特定し、
前記ポイントとなる構成要素を出力する、
機能を実行させるためのプログラム。
（付記７）
特許文書の理解を支援するためのコンピュータに、
請求項を構成する構成要素に対応する構成要素名の前又は後に記載される文字列パターンを記録した構成要素表現パターン辞書を用いて、特許文書データベースから抽出した特許文書中の請求項のテキストデータから各構成要素名を抽出し、
前記各構成要素名と前記構成要素表現パターン辞書に記録されている文字列パターンとに基づいて、前記請求項のテキストデータから、前記各構成要素名に対応する構成要素の説明をしている説明文のテキストデータを抽出し、
前記各構成要素の説明をしている説明文のテキストデータに十分な情報が含まれているか否かを判定し、
前記特許文書中の発明の詳細な説明中で前記構成要素名と前記構成要素名に対応する説明文とを結びつける文字列パターンを記録した実施例中の説明表現パターン辞書を用いて、前記発明の詳細な説明を記載しているテキストデータから、前記説明文のテキストデータに十分な情報が含まれていないと判定された構成要素名に対応する説明文のテキストデータを抽出し、
効果が記載される文字列部分と前記効果が得られる理由が記載される文字列部分とを結びつける文字列パターンを記録した効果が得られる理由表現パターン辞書を用いて、前記特許文書中の発明の効果を記載しているテキストデータから、前記効果が得られる理由が記述されているテキストデータを抽出し、
前記各構成要素の説明文のテキストデータと前記効果が得られる理由が記述されているテキストデータとの類似度を計算し、
前記類似度が最も高い前記説明文のテキストデータに対応する構成要素を、ポイントとなる構成要素として特定し、
前記ポイントとなる構成要素を出力する、
機能を実行させるためのプログラム。
（付記８）
前記情報量の判定において、前記各構成要素毎に、前記構成要素の説明をしている説明文のテキストデータを形態素解析して得られる所定の品詞の形態素データから、前記構成要素に対応する前記構成要素名に含まれる形態素データと、他の全ての構成要素の説明文に共通に含まれる形態素データとを削除して得られる残りの前記所定の品詞の形態素データの数が所定の閾値以上であるか否かを判定することにより、前記各構成要素の説明をしている説明文のテキストデータに十分な情報が含まれているか否かを判定する、
ことを特徴とする付記７に記載のプログラム。
（付記９）
前記類似度の計算において、
前記各構成要素名に対応する説明文のテキストデータから、形態素解析により所定の品詞の形態素データを抽出し、
前記効果が得られる理由が記述されているテキストデータから、形態素解析により所定の品詞の形態素データを抽出し、
前記各構成要素名に対応する説明文の形態素データと前記効果が得られる理由が記述されている形態素データとで、共通に抽出された前記所定の品詞の形態素データの数を、前記各構成要素名に対応する説明文のテキストデータと前記効果が得られる理由が記述されているテキストデータとの前記類似度として計算する、
ことを特徴とする付記６又は７に記載のプログラム。
（付記１０）
前記類似度の計算において、
前記所定の品詞の形態素データの各々について、前記特許文書中での出現頻度が少ないほど大きくなる重みを設定し、前記共通に抽出された前記所定の品詞の形態素データの数を、前記形態素データ毎の重みを付加して算出する、
ことを特徴とする付記９に記載のプログラム。
（付記１１）
コンピュータが特許文書の理解を支援するための方法であって、
請求項を構成する構成要素に対応する構成要素名の前又は後に記載される文字列パターンを記録した表現パターンの文字列が記録されている構成要素表現パターン辞書を用いて、特許文書データベースから抽出した特許文書中の請求項のテキストデータから各構成要素名を抽出し、
前記各構成要素名と前記構成要素表現パターン辞書に記録されている文字列パターンとに基づいて、前記請求項のテキストデータから、前記各構成要素名に対応する構成要素の説明をしている説明文のテキストデータを抽出し、
効果が記載されている文字列部分と前記効果が得られる理由が記載されている文字列部分とを結びつける文字列パターンを記録した効果が得られる理由表現パターン辞書を用いて、前記特許文書中の発明の効果を記載しているテキストデータから、前記効果が得られる理由が記述されているテキストデータを抽出し、
前記各構成要素の説明文のテキストデータと前記効果が得られる理由が記述されているテキストデータとの類似度を計算し、
前記類似度が最も高い前記説明文のテキストデータに対応する構成要素を、ポイントとなる構成要素として特定し、
前記ポイントとなる構成要素を出力する、
ことを前記コンピュータが実行をすることを特徴とする構成要素ハイライト方法。
（付記１２）
コンピュータが特許文書の理解を支援するための方法であって、
請求項を構成する構成要素に対応する構成要素名の前又は後に記載される文字列パターンを記録した構成要素表現パターン辞書を用いて、特許文書データベースから抽出した特許文書中の請求項のテキストデータから各構成要素名を抽出し、
前記各構成要素名と前記構成要素表現パターン辞書に記録されている文字列パターンとに基づいて、前記請求項のテキストデータから、前記各構成要素名に対応する構成要素の説明をしている説明文のテキストデータを抽出し、
前記各構成要素の説明をしている説明文のテキストデータに十分な情報が含まれているか否かを判定し、
前記特許文書中の発明の詳細な説明中で前記構成要素名と前記構成要素名に対応する説明文とを結びつける文字列パターンを記録した実施例中の説明表現パターン辞書を用いて、前記発明の詳細な説明を記載しているテキストデータから、前記説明文のテキストデータに十分な情報が含まれていないと判定された構成要素名に対応する説明文のテキストデータを抽出し、
効果が記載される文字列部分と前記効果が得られる理由が記載される文字列部分とを結びつける文字列パターンを記録した効果が得られる理由表現パターン辞書を用いて、前記特許文書中の発明の効果を記載しているテキストデータから、前記効果が得られる理由が記述されているテキストデータを抽出し、
前記各構成要素の説明文のテキストデータと前記効果が得られる理由が記述されているテキストデータとの類似度を計算し、
前記類似度が最も高い前記説明文のテキストデータに対応する構成要素を、ポイントとなる構成要素として特定し、
前記ポイントとなる構成要素を出力する、
ことを前記コンピュータが実行をすることを特徴とする構成要素ハイライト方法。
（付記１３）
前記情報量の判定において、前記各構成要素毎に、前記構成要素の説明をしている説明文のテキストデータを形態素解析して得られる所定の品詞の形態素データから、前記構成要素に対応する前記構成要素名に含まれる形態素データと、他の全ての構成要素の説明文に共通に含まれる形態素データとを削除して得られる残りの前記所定の品詞の形態素データの数が所定の閾値以上であるか否かを判定することにより、前記各構成要素の説明をしている説明文のテキストデータに十分な情報が含まれているか否かを判定する、
ことを特徴とする付記１２に記載の構成要素ハイライト方法。
（付記１４）
前記類似度の計算において、
前記各構成要素名に対応する説明文のテキストデータから、形態素解析により所定の品詞の形態素データを抽出し、
前記効果が得られる理由が記述されているテキストデータから、形態素解析により所定の品詞の形態素データを抽出し、
前記各構成要素名に対応する説明文の形態素データと前記効果が得られる理由が記述されている形態素データとで、共通に抽出された前記所定の品詞の形態素データの数を、前記各構成要素名に対応する説明文のテキストデータと前記効果が得られる理由が記述されているテキストデータとの前記類似度として計算する、
ことを前記コンピュータが実行をすることを特徴とする付記１１又は１２に記載の構成要素ハイライト方法。
（付記１５）
前記類似度の計算において、
前記所定の品詞の形態素データの各々について、前記特許文書中での出現頻度が少ないほど大きくなる重みを設定し、前記共通に抽出された前記所定の品詞の形態素データの数を、前記形態素データ毎の重みを付加して算出する、
ことを前記コンピュータが実行をすることを特徴とする付記１４に記載の構成要素ハイライト方法。 Regarding the above embodiment, the following additional notes are disclosed.
(Appendix 1)
A device for supporting understanding of patent documents,
The text data of the claims in the patent document extracted from the patent document database using the component expression pattern dictionary that records the character string pattern described before or after the component name corresponding to the component constituting the claim A component name extraction unit that extracts each component name from
Description that explains the component corresponding to each component name from the text data of the claim based on each component name and the character string pattern recorded in the component expression pattern dictionary A component description extractor from the claim for extracting sentence text data;
Using a reason expression pattern dictionary that provides an effect of recording a character string pattern that links a character string portion that describes an effect and a character string portion that describes the reason for obtaining the effect. A reason extraction unit capable of obtaining the effect of extracting the text data describing the reason for obtaining the effect from the text data describing the effect of the invention;
A similarity calculation unit that calculates the similarity between the text data of the description of each component and the text data describing the reason why the effect is obtained;
A point component identifying unit that identifies a component corresponding to the text data of the explanatory text having the highest similarity as a component serving as a point;
An output unit for outputting the constituent elements serving as the points;
A component highlighting device comprising:
(Appendix 2)
A device for supporting understanding of patent documents,
The text data of the claims in the patent document extracted from the patent document database using the component expression pattern dictionary that records the character string pattern described before or after the component name corresponding to the component constituting the claim A component name extraction unit that extracts each component name from
Description that explains the component corresponding to each component name from the text data of the claim based on each component name and the character string pattern recorded in the component expression pattern dictionary A component description extractor from the claim for extracting sentence text data;
An information amount determination unit for determining whether or not sufficient information is included in the text data of the explanatory text describing each component;
In the detailed description of the invention in the patent document, the description expression pattern dictionary in the embodiment in which the character string pattern that links the component name and the description corresponding to the component name is recorded is used. Explanation from an embodiment in which text data of an explanatory text corresponding to a component name determined that sufficient information is not included in the text data of the explanatory text is extracted from text data describing a detailed explanation An extractor;
Using the reason expression pattern dictionary that provides the effect of recording the character string pattern that links the character string part describing the effect and the character string part describing the reason for obtaining the effect, the invention in the patent document A reason extraction unit that obtains the effect of extracting the text data describing the reason for obtaining the effect from the text data describing the effect;
A similarity calculation unit that calculates the similarity between the text data of the description of each component and the text data describing the reason why the effect is obtained;
A point component identifying unit that identifies a component corresponding to the text data of the explanatory text having the highest similarity as a component serving as a point;
An output unit for outputting the constituent elements serving as the points;
A component highlighting device comprising:
(Appendix 3)
The information amount determination unit corresponds to the component from the morphological data of a predetermined part of speech obtained by morphological analysis of the text data of the explanatory text explaining the component for each component. The number of remaining morpheme data of the predetermined part-of-speech obtained by deleting the morpheme data included in the component name and the morpheme data commonly included in the explanations of all other component elements is equal to or greater than a predetermined threshold. By determining whether or not there is sufficient information included in the text data of the explanatory text describing each component,
The component highlighting device according to appendix 2, characterized in that:
(Appendix 4)
The similarity calculation unit
Extracting morpheme data of a predetermined part of speech by morphological analysis from the text data of the description corresponding to each component name,
From text data describing the reason why the effect is obtained, morphological data of a predetermined part of speech is extracted by morphological analysis,
The number of morpheme data of the predetermined part-of-speech extracted in common between the morpheme data in the explanatory text corresponding to each component name and the morpheme data describing the reason why the effect is obtained. Calculating the similarity between the text data of the explanatory text corresponding to the name and the text data describing the reason why the effect is obtained;
3. The component highlighting device according to appendix 1 or 2, wherein
(Appendix 5)
The similarity calculation unit
For each of the morpheme data of the predetermined part of speech, a weight that increases as the frequency of appearance in the patent document decreases, and the number of morpheme data of the predetermined part of speech extracted in common is set for each morpheme data. Calculate by adding the weight of
The component highlighting device according to appendix 6, wherein:
(Appendix 6)
To support the understanding of patent documents,
Extracted from a patent document database using a component element expression pattern dictionary in which a character string of an expression pattern in which a character string pattern described before or after a component name corresponding to a component element constituting a claim is recorded is recorded Each component name is extracted from the text data of the claims in the patent document
Description that explains the component corresponding to each component name from the text data of the claim based on each component name and the character string pattern recorded in the component expression pattern dictionary Extract text data of sentence,
Using a reason expression pattern dictionary that provides an effect of recording a character string pattern that links a character string portion that describes an effect and a character string portion that describes the reason for obtaining the effect. Extracting the text data describing the reason why the effect is obtained from the text data describing the effect of the invention,
Calculating the similarity between the text data of the description of each component and the text data describing the reason why the effect is obtained;
A component corresponding to the text data of the explanatory note having the highest similarity is identified as a component that becomes a point,
Output the component that is the point;
A program for executing functions.
(Appendix 7)
To support the understanding of patent documents,
The text data of the claims in the patent document extracted from the patent document database using the component expression pattern dictionary that records the character string pattern described before or after the component name corresponding to the component constituting the claim Extract each component name from
Description that explains the component corresponding to each component name from the text data of the claim based on each component name and the character string pattern recorded in the component expression pattern dictionary Extract text data of sentence,
It is determined whether or not sufficient information is included in the text data of the explanatory text describing each component,
In the detailed description of the invention in the patent document, the description expression pattern dictionary in the embodiment in which the character string pattern that links the component name and the description corresponding to the component name is recorded is used. Extracting the text data of the explanatory text corresponding to the component name determined that the text data of the explanatory text does not contain sufficient information from the text data describing the detailed explanation,
Using the reason expression pattern dictionary that provides the effect of recording the character string pattern that links the character string part describing the effect and the character string part describing the reason for obtaining the effect, the invention in the patent document Extract text data describing the reason for obtaining the effect from text data describing the effect,
Calculating the similarity between the text data of the description of each component and the text data describing the reason why the effect is obtained;
A component corresponding to the text data of the explanatory note having the highest similarity is identified as a component that becomes a point,
Output the component that is the point;
A program for executing functions.
(Appendix 8)
In the determination of the amount of information, for each component, the morpheme data of a predetermined part-of-speech obtained by morphological analysis of the text data of the explanatory text explaining the component, the corresponding to the component The number of remaining morpheme data of the predetermined part-of-speech obtained by deleting the morpheme data included in the component name and the morpheme data commonly included in the explanations of all other component elements is equal to or greater than a predetermined threshold. By determining whether or not there is sufficient information included in the text data of the explanatory text describing each component,
The program according to appendix 7, characterized by:
(Appendix 9)
In the calculation of the similarity,
Extracting morpheme data of a predetermined part of speech by morphological analysis from the text data of the description corresponding to each component name,
From text data describing the reason why the effect is obtained, morphological data of a predetermined part of speech is extracted by morphological analysis,
The number of morpheme data of the predetermined part-of-speech extracted in common between the morpheme data in the explanatory text corresponding to each component name and the morpheme data describing the reason why the effect is obtained. Calculating the similarity between the text data of the explanatory text corresponding to the name and the text data describing the reason why the effect is obtained;
The program according to appendix 6 or 7, characterized by:
(Appendix 10)
In the calculation of the similarity,
For each of the morpheme data of the predetermined part of speech, a weight that increases as the frequency of appearance in the patent document decreases, and the number of morpheme data of the predetermined part of speech extracted in common is set for each morpheme data. Calculate by adding the weight of
The program according to appendix 9, characterized by:
(Appendix 11)
A method for a computer to assist in understanding patent documents,
Extracted from a patent document database using a component element expression pattern dictionary in which a character string of an expression pattern in which a character string pattern described before or after a component name corresponding to a component element constituting a claim is recorded is recorded Each component name is extracted from the text data of the claims in the patent document
Description that explains the component corresponding to each component name from the text data of the claim based on each component name and the character string pattern recorded in the component expression pattern dictionary Extract text data of sentence,
Using a reason expression pattern dictionary that provides an effect of recording a character string pattern that links a character string portion that describes an effect and a character string portion that describes the reason for obtaining the effect. Extracting the text data describing the reason why the effect is obtained from the text data describing the effect of the invention,
Calculating the similarity between the text data of the description of each component and the text data describing the reason why the effect is obtained;
A component corresponding to the text data of the explanatory note having the highest similarity is identified as a component that becomes a point,
Output the component that is the point;
A component highlighting method, wherein the computer executes the above.
(Appendix 12)
A method for a computer to assist in understanding patent documents,
The text data of the claims in the patent document extracted from the patent document database using the component expression pattern dictionary that records the character string pattern described before or after the component name corresponding to the component constituting the claim Extract each component name from
Description that explains the component corresponding to each component name from the text data of the claim based on each component name and the character string pattern recorded in the component expression pattern dictionary Extract text data of sentence,
It is determined whether or not sufficient information is included in the text data of the explanatory text describing each component,
In the detailed description of the invention in the patent document, the description expression pattern dictionary in the embodiment in which the character string pattern that links the component name and the description corresponding to the component name is recorded is used. Extracting the text data of the explanatory text corresponding to the component name determined that the text data of the explanatory text does not contain sufficient information from the text data describing the detailed explanation,
Using the reason expression pattern dictionary that provides the effect of recording the character string pattern that links the character string part describing the effect and the character string part describing the reason for obtaining the effect, the invention in the patent document Extract text data describing the reason for obtaining the effect from text data describing the effect,
Calculating the similarity between the text data of the description of each component and the text data describing the reason why the effect is obtained;
A component corresponding to the text data of the explanatory note having the highest similarity is identified as a component that becomes a point,
Output the component that is the point;
A component highlighting method, wherein the computer executes the above.
(Appendix 13)
In the determination of the amount of information, for each component, the morpheme data of a predetermined part-of-speech obtained by morphological analysis of the text data of the explanatory text explaining the component, the corresponding to the component The number of remaining morpheme data of the predetermined part-of-speech obtained by deleting the morpheme data included in the component name and the morpheme data commonly included in the explanations of all other component elements is equal to or greater than a predetermined threshold. By determining whether or not there is sufficient information included in the text data of the explanatory text describing each component,
Item 14. The component highlighting method according to Appendix 12.
(Appendix 14)
In the calculation of the similarity,
Extracting morpheme data of a predetermined part of speech by morphological analysis from the text data of the description corresponding to each component name,
From text data describing the reason why the effect is obtained, morphological data of a predetermined part of speech is extracted by morphological analysis,
The number of morpheme data of the predetermined part-of-speech extracted in common between the morpheme data in the explanatory text corresponding to each component name and the morpheme data describing the reason why the effect is obtained. Calculating the similarity between the text data of the explanatory text corresponding to the name and the text data describing the reason why the effect is obtained;
The component highlighting method according to appendix 11 or 12, wherein the computer executes the above.
(Appendix 15)
In the calculation of the similarity,
For each of the morpheme data of the predetermined part of speech, a weight that increases as the frequency of appearance in the patent document decreases, and the number of morpheme data of the predetermined part of speech extracted in common is set for each morpheme data. Calculate by adding the weight of
The component highlighting method according to claim 14, wherein the computer executes the above.

１０１特許文書検索部
１０２特許文書構造解析部
１０３構成要素名抽出部
１０４請求項からの構成要素の説明抽出部
１０５情報量判定部
１０６実施例からの説明抽出部
１０７効果が得られる理由抽出部
１０８類似度計算部１
１０９ポイント構成要素特定部
１１０表示部
１１１特許データベース
１１２構成要素表現パターン辞書
１１３構成要素テーブル
１１４解析済み文書テーブル
１１５実施例中の説明表現パターン辞書
１１６効果が得られる理由表現パターン辞書
１１７類似度テーブル DESCRIPTION OF SYMBOLS 101 Patent document search part 102 Patent document structure analysis part 103 Constituent element name extraction part 104 Description extraction part of the constituent element from Claim 105 Information amount determination part 106 Explanation extraction part from Example 107 Reason extraction part 108 from which an effect is acquired Similarity calculation unit 1
DESCRIPTION OF SYMBOLS 109 Point component specific part 110 Display part 111 Patent database 112 Component element expression pattern dictionary 113 Component element table 114 Analyzed document table 115 Explanation expression pattern dictionary 116 Example effect expression pattern dictionary 117 effect degree table Similarity table

Claims

A device for supporting understanding of patent documents,
The text data of the claims in the patent document extracted from the patent document database using the component expression pattern dictionary that records the character string pattern described before or after the component name corresponding to the component constituting the claim A component name extraction unit that extracts each component name including a noun phrase connecting a noun and one or more modifiers associated with the noun and the closest modifier.
Description that explains the component corresponding to each component name from the text data of the claim based on each component name and the character string pattern recorded in the component expression pattern dictionary A component description extractor from the claim for extracting sentence text data;
Using a reason expression pattern dictionary that provides an effect of recording a character string pattern that links a character string portion that describes an effect and a character string portion that describes the reason for obtaining the effect. A reason extraction unit capable of obtaining the effect of extracting the text data describing the reason for obtaining the effect from the text data describing the effect of the invention;
A similarity calculation unit that calculates the similarity between the text data of the description of each component and the text data describing the reason why the effect is obtained;
A point component identifying unit that identifies a component corresponding to the text data of the explanatory text having the highest similarity as a component serving as a point;
An output unit for outputting the constituent elements serving as the points;
A component highlighting device comprising:

A device for supporting understanding of patent documents,
The text data of the claims in the patent document extracted from the patent document database using the component expression pattern dictionary that records the character string pattern described before or after the component name corresponding to the component constituting the claim A component name extraction unit that extracts each component name from
Description that explains the component corresponding to each component name from the text data of the claim based on each component name and the character string pattern recorded in the component expression pattern dictionary A component description extractor from the claim for extracting sentence text data;
For each component, the morpheme included in the component name corresponding to the component from the morpheme data of a predetermined part of speech obtained by morphological analysis of the text data of the explanatory text describing the component Determining whether the number of remaining morpheme data of the predetermined part-of-speech obtained by deleting data and morpheme data commonly included in the explanations of all other components is equal to or greater than a predetermined threshold An information amount determination unit that determines whether or not sufficient information is included in the text data of the explanatory text describing each component,
In the detailed description of the invention in the patent document, the description expression pattern dictionary in the embodiment in which the character string pattern that links the component name and the description corresponding to the component name is recorded is used. Explanation from an embodiment in which text data of an explanatory text corresponding to a component name determined that sufficient information is not included in the text data of the explanatory text is extracted from text data describing a detailed explanation An extractor;
Using the reason expression pattern dictionary that provides the effect of recording the character string pattern that links the character string part describing the effect and the character string part describing the reason for obtaining the effect, the invention in the patent document A reason extraction unit that obtains the effect of extracting the text data describing the reason for obtaining the effect from the text data describing the effect;
Similarity between the text data of the explanatory text of each constituent element extracted by the explanation extracting section of the constituent element from the claim or the explanation extracting section from the embodiment and the text data describing the reason why the effect is obtained A similarity calculator for calculating the degree;
A point component identifying unit that identifies a component corresponding to the text data of the explanatory text having the highest similarity as a component serving as a point;
An output unit for outputting the constituent elements serving as the points;
A component highlighting device comprising:

The similarity calculation unit
Extracting morpheme data of a predetermined part of speech by morphological analysis from the text data of the description corresponding to each component name,
From text data describing the reason why the effect is obtained, morphological data of a predetermined part of speech is extracted by morphological analysis,
The number of morpheme data of the predetermined part-of-speech extracted in common between the morpheme data in the explanatory text corresponding to each component name and the morpheme data describing the reason why the effect is obtained. Calculating the similarity between the text data of the explanatory text corresponding to the name and the text data describing the reason why the effect is obtained;
The component highlighting device according to claim 1, wherein the component highlighting device is a component highlighting device.

The similarity calculation unit
For each of the morpheme data of the predetermined part of speech, a weight that increases as the frequency of appearance in the patent document decreases, and the number of morpheme data of the predetermined part of speech extracted in common is set for each morpheme data. Calculate by adding the weight of
The component highlighting device according to claim 3.

To support the understanding of patent documents,
Extracted from a patent document database using a component element expression pattern dictionary in which a character string of an expression pattern in which a character string pattern described before or after a component name corresponding to a component element constituting a claim is recorded is recorded Each component name including a noun phrase connecting a noun and one or more modifiers closest to the noun from the text data of the claims in the patent document ,
Description that explains the component corresponding to each component name from the text data of the claim based on each component name and the character string pattern recorded in the component expression pattern dictionary Extract text data of sentence,
Using a reason expression pattern dictionary that provides an effect of recording a character string pattern that links a character string portion that describes an effect and a character string portion that describes the reason for obtaining the effect. Extracting the text data describing the reason why the effect is obtained from the text data describing the effect of the invention,
Calculating the similarity between the text data of the description of each component and the text data describing the reason why the effect is obtained;
A component corresponding to the text data of the explanatory note having the highest similarity is identified as a component that becomes a point,
Output the component that is the point;
A program for executing functions.

A method for a computer to assist in understanding patent documents,
Extracted from a patent document database using a component element expression pattern dictionary in which a character string of an expression pattern in which a character string pattern described before or after a component name corresponding to a component element constituting a claim is recorded is recorded Each component name including a noun phrase connecting a noun and one or more modifiers closest to the noun from the text data of the claims in the patent document ,
Description that explains the component corresponding to each component name from the text data of the claim based on each component name and the character string pattern recorded in the component expression pattern dictionary Extract text data of sentence,
Using a reason expression pattern dictionary that provides an effect of recording a character string pattern that links a character string portion that describes an effect and a character string portion that describes the reason for obtaining the effect. Extracting the text data describing the reason why the effect is obtained from the text data describing the effect of the invention,
Calculating the similarity between the text data of the description of each component and the text data describing the reason why the effect is obtained;
A component corresponding to the text data of the explanatory note having the highest similarity is identified as a component that becomes a point,
Output the component that is the point;
A component highlighting method, wherein the computer executes the above.