JP2011118861A

JP2011118861A - Device, program and method for checking document

Info

Publication number: JP2011118861A
Application number: JP2010096224A
Authority: JP
Inventors: Hideaki Ogawa; 秀明小川
Original assignee: HYPER TEC KK
Current assignee: HYPER TEC KK
Priority date: 2009-11-02
Filing date: 2010-04-19
Publication date: 2011-06-16
Anticipated expiration: 2030-04-19
Also published as: JP5621145B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a document check device which allows checking the consistency of using a specific character string in a document. <P>SOLUTION: The document check device 100 includes a memory 124 for storing document data. The memory 124 stores, when a term following the specific character string is used to show that it has already appeared in the document, information of the specific character string and information of a plurality of term suffixes to be used in common at the end of the term. A document data analysis program 131 recognizes and specifies a specific character string and term suffixes in the document, performs a morphological analysis of the document, specifies parts of speech in the document, and connects a noun that continues before term suffixes to the term suffixes on the basis of a result of the morphological analysis to thereby specify term candidates being candidates for the term. The consistency of using a specific character string is checked among the specific term documents. <P>COPYRIGHT: (C)2011,JPO&INPIT

Description

この発明は、文書中の記載の整合性をチェックするための文書チェック装置、文書チェックプログラムおよび文書チェック方法に関する。 The present invention relates to a document check apparatus, a document check program, and a document check method for checking the consistency of descriptions in a document.

文書には、その記載中の整合性が、とりわけ厳しく要求されるものがある。たとえば、公文書、特許文書、特に、その特許請求の範囲の記載や、英文契約書の日本語訳文などである。 Some documents require particularly stringent consistency in the description. For example, official documents, patent documents, especially descriptions of the scope of claims, and Japanese translations of English contracts.

このような文書では、特に、その記載内容の正確性、記載フォーマットの正しさ、記載の整合性等が、重要であるために、たとえば、特許明細書などでは、その記載の正確性をチェックするソフトウェアについての技術などが提案されている（たとえば、特許文献１を参照）。 In such a document, the accuracy of the description, correctness of the description format, consistency of description, etc. are particularly important. For example, in patent specifications, the accuracy of the description is checked. A technique relating to software has been proposed (see, for example, Patent Document 1).

ただし、この特許文献１では、記載フォーマットの正しさのチェックや、文の構文解析によるチェック、使用される符号の整合性のチェックなどが行なわれているに過ぎない。 However, in this patent document 1, the correctness of the description format, the check by the syntax analysis of the sentence, the check of the consistency of the used codes, etc. are merely performed.

また、特許出願明細書のような文書の中から、コンピュータを用いて、指定された全ての符号を探索して、各符号ごとに、その位置を表すデータと、その符号に隣接する一定の長さの文字列を自動的に抽出し、抽出結果を符号順にソートして表示する、との技術も公開されている（たとえば、特許文献２を参照）。 In addition, a computer is used to search all specified codes from documents such as patent application specifications, and for each code, data indicating the position and a certain length adjacent to the code. A technique is also disclosed in which a character string is automatically extracted and the extraction results are sorted and displayed in code order (see, for example, Patent Document 2).

特許文献２に開示された発明では、抽出した文字列には符号を付した単語が含まれ、これが並べて表示されるから、符号付けミスや単語の表現の違いを容易に検査できることになる。 In the invention disclosed in Patent Document 2, the extracted character string includes a word with a sign, which is displayed side by side, so that it is possible to easily inspect a signing error and a difference in expression of the word.

あるいは、要素名とこれに後続する符号とを、不一致のないように、効率的に入力するための技術が、特許文献３に開示されている。特許文献３に開示の技術では、以下の処理が行われる。ワープロ等の文書編集中に適宜の操作が行われた場合に、（１）文書の編集位置に入力されている文字列から要素名（たとえば、「車輪」）を抽出する。（２）抽出された要素名を、文書から検索する。（３）検索された位置を含む文字列を文書から取り出して解析し、前記（１）で抽出された要素名に対応する符号（たとえば、「６」，「７」）を取得する。（４）符号が複数得られた場合は、その符号を、並べて選択可能に画面に表示する。（５）ユーザに選択された符号を、文書の編集位置に自動入力する。 Alternatively, Patent Document 3 discloses a technique for efficiently inputting an element name and a code that follows the element name so that there is no mismatch. In the technique disclosed in Patent Document 3, the following processing is performed. When an appropriate operation is performed while editing a document such as a word processor, (1) an element name (for example, “wheel”) is extracted from the character string input at the editing position of the document. (2) The extracted element name is searched from the document. (3) A character string including the searched position is extracted from the document and analyzed, and a code (for example, “6”, “7”) corresponding to the element name extracted in (1) is acquired. (4) When a plurality of codes are obtained, the codes are displayed side by side on the screen so that they can be selected. (5) The code selected by the user is automatically input at the editing position of the document.

しかし、これらの技術も、単に特許出願明細書の符号とこれに対応する文字列との不一致を検出する技術、あるいは、一致するように入力する作業を支援する技術に過ぎない。 However, these techniques are merely techniques for detecting a mismatch between a code of a patent application specification and a character string corresponding thereto, or a technique for supporting an input operation to match.

特開２００２−１８３２７８号公報JP 2002-183278 A 特開平０９−２５９１４８号公報JP 09-259148 A 特開２００５−２５２６５号公報JP 2005-25265 A

しかしながら、上述したような種類の文書では、通常の文章での記載に比べて、その厳密性を担保するために、独特の用語が使用される場合がある。たとえば、文書の記載中に現れる用語が、すでに当該文書中に記載されている用語と同一のものを指す場合には、文書中でその用語が２回目以降に現れる場合に、「前記」等の言葉を前置することで、当該用語が既出のものであることを明示する習慣がある。 However, in the types of documents described above, unique terms may be used in order to ensure the strictness of the document as compared with the description in a normal sentence. For example, if a term that appears in the description of a document refers to the same term that has already been described in the document, when the term appears in the document for the second time or later, There is a habit of demonstrating that the term has already appeared by prepending the word.

たとえば、特許請求の範囲の記載では、上述した「前記」の使用には、特に、厳格性が要求される。なお、特許請求の範囲の記載では、「前記」と同様の機能を果たすものとして、「当該」などが使用される場合もある。この場合、一般には、「当該」は、単に既出であることを示すばかりではなく、直前に出てきた用語を指すことを意図して用いられる場合が多い。 For example, in the claims, strictness is particularly required for the use of “above”. In the description of the scope of claims, “this” or the like may be used to perform the same function as “the above”. In this case, in general, the term “subject” is often used with the intention of referring to the term that has just appeared, not just to indicate that it has already occurred.

この「前記」（英語では、”said”）の記載が用語の前にあるかないかは、単なる形式的な問題ではなく、権利範囲の解釈に直接影響を与えうるものである。たとえば、米国のBell Communications Research Inc. v. Vitalink Communications Corp. 事件判決（55 F.3d 615, 34 US.S.P.Q.2d (BNA) 1816(Fed. Cir.1995)）では、クレーム本体部分中に記載された「前記パケット」との記載により、クレームの前提部分（プリアンブル部分）に記載された「パケット」についての記載が、このクレーム本体部分中の「パケット」の技術内容を限定するものとして、権利範囲の解釈が行なわれた例がある。 Whether this “above” (“said” in English) precedes the term is not just a formal issue, but can directly affect the interpretation of the scope of rights. For example, in the US case of Bell Communications Research Inc. v. Vitalink Communications Corp. (55 F.3d 615, 34 US.SPQ2d (BNA) 1816 (Fed. Cir.1995)) The description of “packet” described in the premise part (preamble part) of the claim by the description of “the packet” limits the technical content of “packet” in the main part of the claim. There is an example of interpretation of.

また、非常に似かよった名称を、同一の対象構成要素を呼ぶ際に、誤って使用してしまい、両者が、特許請求の範囲に混在してしまう記載となっている場合もある。たとえば、「○△□☆○□手段」との記載が既出である場合に、その後に「前記○△□□○□手段」との記載が存在する、というようなことが起こりうる。この場合、特許請求の範囲の起草者にとっては、「○△□□○□手段」がすでに既出であるとの認識の下に、このような記載となっていることになる。しかしながら、正確には、「○△□□○□手段」は既出でないために、審査においては、請求項の記載が不明確（特許法３６条６項２号違反）であるとして拒絶されたり、特許の成立後では、権利範囲の解釈に影響を与えてしまう可能性がある、というような問題があった。 In addition, a name that is very similar may be used by mistake when calling the same target component, and both may be mixed in the claims. For example, in the case where the description “○ △ □ ☆ ○ □ means” has already been mentioned, the description “the above-mentioned ○ Δ □□ ○ □ means” may exist after that. In this case, for the drafter of the claims, such a description is made with the recognition that “means of △△ □□ ○ □” has already been issued. However, to be precise, “○ △ □□ ○ □ Means” has not been issued, so in the examination, the description of the claim is unclear (violation of Patent Act Article 36 (6) (ii)), After the patent was granted, there was a problem that it could affect the interpretation of the scope of rights.

この発明は、上記のような問題点を解決するためになされたものであって、その目的は、文書中において、特定の文字列が、その文字列に後続する用語が、その文書中で既出であることを示すために用いられる場合に、このような特定の文字列の使用の整合性をチェックすることが可能な文書チェック装置、文書チェックプログラムおよび文書チェック方法を提供することである。 The present invention has been made in order to solve the above-described problems. The object of the present invention is to identify a specific character string in a document and a term following the character string in the document. It is to provide a document check apparatus, a document check program, and a document check method capable of checking the consistency of the use of such a specific character string when used to indicate that it is.

この発明の一つの局面に従うと、解析対象となる文書中の記載の整合性をチェックするための文書チェック装置が提供される。文書チェック装置は、文書を表す文書データを記憶する記憶手段と、特定の文字列が、当該特定の文字列に後続する用語が、文書中で既出であることを示すために用いられる場合に、特定の文字列の情報を取得する情報取得手段と、文書中に含まれる品詞を特定するための品詞特定手段とを備える。品詞特定手段は、文書中において、情報取得手段により取得された情報に基づいて、特定の文字列を認識して特定するための特定手段と、文書に対して形態素解析を行い、文書中の品詞を特定するための形態素解析手段とを含む。文書チェック装置は、形態素解析の結果に基づき、文書中において、連続する名詞を連結することで、用語の候補である用語候補を特定する用語候補認識手段と、特定された用語候補間で、特定の文字列の使用の整合性をチェックする整合性チェック手段と、整合性のチェック結果を表示装置に表示させるための表示制御手段とをさらに備える。 According to one aspect of the present invention, there is provided a document checking device for checking the consistency of descriptions in a document to be analyzed. The document check device includes a storage unit that stores document data representing a document, and a specific character string is used to indicate that a term subsequent to the specific character string has already appeared in the document. Information acquisition means for acquiring information on a specific character string and part of speech specification means for specifying a part of speech included in the document are provided. The part-of-speech identifying means is a means for recognizing and identifying a specific character string based on the information acquired by the information acquiring means in the document, and performing morphological analysis on the document, and the part-of-speech in the document Morphological analysis means for specifying Based on the result of the morphological analysis, the document check device is connected between the term candidate recognition means for identifying the term candidates that are term candidates by connecting consecutive nouns in the document and the identified term candidates. There is further provided consistency checking means for checking the consistency of use of the character string and display control means for causing the display device to display the consistency check result.

好ましくは、情報取得手段は、さらに、用語の語尾に共通に用いられる複数の用語接尾語の情報を取得する。用語候補認識手段は、形態素解析の結果に基づき、文書中において、用語接尾語の前に連続する名詞を用語接尾語に連結することで、用語の候補である用語候補を特定する。 Preferably, the information acquisition means further acquires information on a plurality of term suffixes commonly used for the term ending. Based on the result of morphological analysis, the term candidate recognizing means identifies a term candidate that is a term candidate by linking a noun that precedes the term suffix in the document to the term suffix.

好ましくは、情報取得手段は、さらに、文書中で、用語候補認識手段での用語候補を特定する処理の時点で、特定されなかった用語候補を、ユーザの選択により用語候補として登録したユーザ辞書の情報を取得する。用語候補認識手段は、さらに、ユーザ辞書も参照して、用語候補を特定する。 Preferably, the information acquisition unit further includes a user dictionary in which a term candidate that has not been specified at the time of the process of specifying a term candidate in the term candidate recognition unit in the document is registered as a term candidate by user selection. Get information. The term candidate recognizing means further identifies the term candidate with reference to the user dictionary.

好ましくは、記憶手段は、特定の文字列の情報と、複数の用語接尾語の情報と、ユーザ辞書とを記憶する。情報取得手段は、記憶手段から、特定の文字列の情報と、複数の用語接尾語の情報と、ユーザ辞書の情報とを読み出す。 Preferably, the storage unit stores information on a specific character string, information on a plurality of term suffixes, and a user dictionary. The information acquisition unit reads information on a specific character string, information on a plurality of term suffixes, and information on a user dictionary from the storage unit.

好ましくは、文書は、内容分野ごとの複数のグループに分類されている。ユーザ辞書は、グループごとの部分辞書に分割され、ユーザは、文書の内容分野に対応する部分辞書に、特定されなかった用語候補を登録する。 Preferably, the documents are classified into a plurality of groups for each content field. The user dictionary is divided into partial dictionaries for each group, and the user registers unspecified term candidates in the partial dictionary corresponding to the content field of the document.

好ましくは、記憶手段は、さらに、用語に前置される特定の接頭語を予め記憶している。用語候補認識手段は、用語接尾語の前に連続する名詞を用語接尾語に連結した後、特定の接頭語が当該連結後の用語の前に連続する場合は、特定の接頭語をさらに連結することで、用語候補を特定する。 Preferably, the storage means further stores in advance a specific prefix to be prefixed to the term. The term candidate recognition unit further concatenates a specific prefix if a specific prefix continues before the term after the concatenation of a noun that precedes the term suffix to the term suffix. Thus, the term candidate is specified.

好ましくは、特定の文字列の情報と、複数の用語接尾語の情報と、ユーザ辞書とは、文書チェック装置の外部の外部記憶装置に記憶される。情報取得手段は、外部記憶装置から、特定の文字列の情報と、複数の用語接尾語の情報と、ユーザ辞書の情報とを通信により取得する。 Preferably, information on a specific character string, information on a plurality of term suffixes, and a user dictionary are stored in an external storage device outside the document check device. The information acquisition means acquires information on a specific character string, information on a plurality of term suffixes, and information on a user dictionary from an external storage device by communication.

好ましくは、文書データは、特定の文字列が、当該特定の文字列に後続する用語が、文書中で既出であることを示すために用いられる第１の文書を表現する第１の部分文書データと、第１の部分文書データで定義される内容を説明するための文書であって、用語に相当する説明用語が符号を付されて使用される第２の文書を表現する第２の部分文書データとを含む。形態素解析手段は、第１および第２の部分文書データに共通に、形態素解析を行う。文書チェック装置は、第２の部分文書データについて、説明用語と符号との整合性をチェックするための符号チェック手段をさらに備える。 Preferably, the document data includes first partial document data representing a first document used to indicate that a specific character string is a term subsequent to the specific character string. And a second partial document for explaining a content defined by the first partial document data and expressing a second document used by adding a description term corresponding to the term to a reference numeral Data. The morpheme analysis unit performs morpheme analysis in common with the first and second partial document data. The document check apparatus further includes a code check unit for checking the consistency between the explanation term and the code for the second partial document data.

この発明の別の局面に従うと、解析対象となる文書中の記載の整合性のチェックを、演算装置と文書を表す文書データを記憶する記憶装置とを備えるコンピュータに実行させるための文書チェックプログラムが提供される。文書チェックプログラムは、演算装置が、特定の文字列が、当該特定の文字列に後続する用語が、文書中で既出であることを示すために用いられる場合に、特定の文字列の情報を取得するステップと、文書中に含まれる品詞を特定するステップとを備える。品詞を特定するステップは、演算装置が、文書中において、取得された情報に基づいて、特定の文字列を認識して特定するステップと、演算装置が、文書に対して形態素解析を行い、文書中の品詞を特定するステップとを含む。文書チェックプログラムは、演算装置が、形態素解析の結果に基づき、文書中において、連続する名詞を連結することで、用語の候補である用語候補を特定するステップと、演算装置が、特定された用語候補間で、特定の文字列の使用の整合性をチェックするステップと、演算装置が、整合性のチェック結果を表示装置に表示させるステップとをさらに備える。 According to another aspect of the present invention, there is provided a document check program for causing a computer including an arithmetic device and a storage device that stores document data representing a document to check consistency of descriptions in a document to be analyzed. Provided. A document check program obtains information about a specific character string when the arithmetic unit is used to indicate that the specific character string is a term that follows the specific character string. And a step of specifying a part of speech included in the document. The step of specifying the part of speech includes a step in which the arithmetic device recognizes and specifies a specific character string based on the acquired information in the document, and the arithmetic device performs morphological analysis on the document, Identifying the part of speech within. The document check program includes a step in which a computing device connects consecutive nouns in a document based on the result of morphological analysis to identify term candidates that are term candidates, and the computing device uses the specified term The method further includes a step of checking the consistency of use of the specific character string between the candidates, and a step of causing the display device to display a result of the consistency check on the arithmetic device.

好ましくは、情報を取得するステップは、用語の語尾に共通に用いられる複数の用語接尾語の情報を取得するステップを含む。用語候補を特定するステップは、形態素解析の結果に基づき、文書中において、用語接尾語の前に連続する名詞を用語接尾語に連結することで、用語の候補である用語候補を特定するステップを含む。 Preferably, the step of acquiring information includes the step of acquiring information of a plurality of term suffixes commonly used for the ending of the term. The step of identifying term candidates includes the step of identifying term candidates that are term candidates by connecting consecutive nouns to the term suffix in the document based on the result of morphological analysis. Including.

好ましくは、取得するステップは、さらに、文書中で、用語候補を特定するステップでの用語候補を特定する処理の時点で、特定されなかった用語候補を、ユーザの選択により用語候補として登録したユーザ辞書の情報を取得するステップを含む。用語候補を特定するステップは、さらに、ユーザ辞書も参照して、用語候補を特定するステップを含む。 Preferably, the obtaining step further includes a user who has registered a term candidate that has not been identified as a term candidate by user selection at the time of the process of identifying the term candidate in the step of identifying the term candidate in the document. Including obtaining dictionary information. The step of specifying the term candidates further includes the step of specifying the term candidates with reference to the user dictionary.

好ましくは、記憶装置は、特定の文字列の情報と、複数の用語接尾語の情報と、ユーザ辞書とを記憶する。取得するステップは、記憶装置から、特定の文字列の情報と、複数の用語接尾語の情報と、ユーザ辞書の情報とを読み出すステップを含む。 Preferably, the storage device stores information on a specific character string, information on a plurality of term suffixes, and a user dictionary. The obtaining step includes a step of reading out information on a specific character string, information on a plurality of term suffixes, and information on a user dictionary from the storage device.

好ましくは、記憶装置は、さらに、用語に前置される特定の接頭語を予め記憶している。用語候補を特定するステップは、用語接尾語の前に連続する名詞を用語接尾語に連結した後、特定の接頭語が当該連結後の用語の前に連続する場合は、特定の接頭語をさらに連結することで、用語候補を特定するステップを含む。 Preferably, the storage device further stores in advance a specific prefix that is prefixed to the term. The step of identifying a term candidate includes adding a specific prefix if a specific prefix continues before the term after the concatenation of a noun that precedes the term suffix to the term suffix. It includes the step of identifying term candidates by concatenation.

好ましくは、特定の文字列の情報と、複数の用語接尾語の情報と、ユーザ辞書とは、文書チェックプログラムが実行されるコンピュータの外部の外部記憶装置に記憶される。取得するステップは、外部記憶装置から、特定の文字列の情報と、複数の用語接尾語の情報と、ユーザ辞書の情報とを通信により取得するステップを含む。 Preferably, information on a specific character string, information on a plurality of term suffixes, and a user dictionary are stored in an external storage device outside the computer on which the document check program is executed. The step of acquiring includes a step of acquiring information on a specific character string, information on a plurality of term suffixes, and information on a user dictionary from an external storage device by communication.

好ましくは、文書データは、特定の文字列が、当該特定の文字列に後続する用語が、文書中で既出であることを示すために用いられる第１の文書を表現する第１の部分文書データと、第１の部分文書データで定義される内容を説明するための文書であって、用語に相当する説明用語が符号を付されて使用される第２の文書を表現する第２の部分文書データとを含む。品詞を特定するステップは、第１および第２の部分文書データに共通に、形態素解析を行うステップを含む。文書チェック処理は、第２の部分文書データについて、説明用語と符号との整合性をチェックするステップをさらに備える。 Preferably, the document data includes first partial document data representing a first document used to indicate that a specific character string is a term subsequent to the specific character string. And a second partial document for explaining a content defined by the first partial document data and expressing a second document used by adding a description term corresponding to the term to a reference numeral Data. The step of specifying the part of speech includes the step of performing morphological analysis in common with the first and second partial document data. The document check process further includes a step of checking the consistency between the explanation term and the code for the second partial document data.

この発明の別の局面に従うと、解析対象となる文書中の記載の整合性のチェックを、演算装置と文書を表す文書データを記憶する記憶装置とを備えるコンピュータに実行させるための文書チェック方法が提供される。文書チェック方法は、演算装置が、特定の文字列が、当該特定の文字列に後続する用語が、文書中で既出であることを示すために用いられる場合に、特定の文字列の情報を取得するステップと、文書中に含まれる品詞を特定するステップとを備える。品詞を特定するステップは、演算装置が、文書中において、取得された情報に基づいて、特定の文字列を認識して特定するステップと、演算装置が、文書に対して形態素解析を行い、文書中の品詞を特定するステップとを含む。文書チェック方法は、演算装置が、形態素解析の結果に基づき、文書中において、連続する名詞を連結することで、用語の候補である用語候補を特定するステップと、演算装置が、特定された用語候補間で、特定の文字列の使用の整合性をチェックするステップと、演算装置が、整合性のチェック結果を表示装置に表示させるステップとをさらに備える。 According to another aspect of the present invention, there is provided a document check method for causing a computer including an arithmetic device and a storage device that stores document data representing a document to check consistency of descriptions in a document to be analyzed. Provided. The document check method is a method in which an arithmetic unit obtains information on a specific character string when a specific character string is used to indicate that a term following the specific character string has already appeared in the document. And a step of specifying a part of speech included in the document. The step of specifying the part of speech includes a step in which the arithmetic device recognizes and specifies a specific character string based on the acquired information in the document, and the arithmetic device performs morphological analysis on the document, Identifying the part of speech within. The document check method includes a step in which a computing device connects consecutive nouns in a document based on a result of morphological analysis to identify a term candidate that is a candidate for a term, and the computing device uses a specified term. The method further includes a step of checking the consistency of use of the specific character string between the candidates, and a step of causing the display device to display a result of the consistency check on the arithmetic device.

好ましくは、文書データは、特定の文字列が、当該特定の文字列に後続する用語が、文書中で既出であることを示すために用いられる第１の文書を表現する第１の部分文書データと、第１の部分文書データで定義される内容を説明するための文書であって、用語に相当する説明用語が符号を付されて使用される第２の文書を表現する第２の部分文書データとを含む。品詞を特定するステップは、第１および第２の部分文書データに共通に、形態素解析を行うステップを含む。文書チェック方法は、第２の部分文書データについて、説明用語と符号との整合性をチェックするステップをさらに備える。 Preferably, the document data includes first partial document data representing a first document used to indicate that a specific character string is a term subsequent to the specific character string. And a second partial document for explaining a content defined by the first partial document data and expressing a second document used by adding a description term corresponding to the term to a reference numeral Data. The step of specifying the part of speech includes the step of performing morphological analysis in common with the first and second partial document data. The document check method further includes a step of checking the consistency between the explanation term and the sign for the second partial document data.

解析対象となる文書中で、特定の文字列が、当該特定の文字列に後続する用語が、文書中で既出であることを示すために用いられる使用の態様が、適正であるかを容易にチェックすることが可能となる。 In a document to be parsed, it is easy to confirm whether a particular character string is appropriate for the mode of use used to indicate that the term following the particular character string has already appeared in the document. It becomes possible to check.

また、ユーザ辞書に逐一、用語を登録していなくても、用語の語尾に共通に用いられる複数の用語接尾語の情報を用いて用語の特定が行なわれるので、ユーザが用語を特定する処理を大幅に削減して、文書のチェックを行うことが可能となる。 Moreover, even if the term is not registered in the user dictionary one by one, the term is specified using information on a plurality of term suffixes commonly used at the end of the term. It is possible to check the document with a significant reduction.

また、「特定用語」、特定用語が既出であることを示す「特定の文字列」と、「説明用語」、説明用語に付される「符号」とが使用されるような文書において、それぞれの使用の整合性をチェックすることが可能となる。 In addition, in a document in which “specific term”, “specific character string” indicating that the specific term has already appeared, “descriptive term”, and “sign” attached to the explanatory term are used, It becomes possible to check the consistency of use.

あるいは、特定用語の候補を抽出する際に、「用語接尾語」を手がかりとすることは、ユーザの設定により選択することができる。 Alternatively, when a candidate for a specific term is extracted, using a “term suffix” as a clue can be selected by a user setting.

実施の形態１に係る文書チェック装置１００の構成をブロック図形式で表す図である。It is a figure showing the structure of the document check apparatus 100 which concerns on Embodiment 1 in a block diagram format. 実施の形態１に係る文書チェック装置１００の機能的構成をブロック図形式で示す図である。It is a figure which shows the functional structure of the document check apparatus 100 which concerns on Embodiment 1 in a block diagram format. 実施の形態１に係る文書チェック装置１００の動作を説明するためのフローチャートである。6 is a flowchart for explaining the operation of the document check apparatus 100 according to the first embodiment. 図３のステップＳ１０６とステップＳ１０８の処理を、より詳しく説明するためのフローチャートである。FIG. 4 is a flowchart for explaining in more detail the processing in step S106 and step S108 in FIG. 3. FIG. 特別な品詞の一覧を示す図である。It is a figure which shows the list of special parts of speech. 構成要素認識処理で割り付けられる記号を示す図である。It is a figure which shows the symbol allocated by the component recognition process. 図４の各処理ステップでの処理を説明するための第１の図である。It is a 1st figure for demonstrating the process in each process step of FIG. 図４の各処理ステップでの処理を説明するための第２の図である。It is a 2nd figure for demonstrating the process in each process step of FIG. ユーザ辞書である構成要素接尾語のうち、品詞Ｐ０に相当する用語を登録した辞書の例を示す図である。It is a figure which shows the example of the dictionary which registered the term equivalent to the part of speech P0 among the component element suffixes which are user dictionaries. 共通辞書である構成要素接尾語のうち、品詞Ｐ１に相当する用語を登録した辞書の例を示す図である。It is a figure which shows the example of the dictionary which registered the term corresponding to the part of speech P1 among the component element suffixes which are a common dictionary. 共通辞書のうち、構成要素接尾語の品詞Ｐ２および品詞Ｐ３に相当する用語を登録した辞書、構成要素接頭語の品詞Ｈ１，Ｈ２，Ｈ３に相当する用語を登録した辞書、接尾語ＴＴに相当する用語を登録した辞書の例を示す図である。Among the common dictionaries, a dictionary in which terms corresponding to the part of speech P2 and the part of speech P3 are registered, a dictionary in which terms corresponding to the component prefixes part of speech H1, H2, H3 are registered, and a suffix TT. It is a figure which shows the example of the dictionary which registered the term. 共通辞書のうち、接尾語ＦＴ、強制名詞ＦＮ、チェック文字列ＺＺ、非名詞ＸＮ、非接頭詞ＸＳに、それぞれ相当する用語を登録した辞書の例を示す図である。It is a figure which shows the example of the dictionary which each registered the term corresponding to suffix FT, compulsory noun FN, check character string ZZ, non-noun XN, and non-prefix XS among common dictionaries. 文書データ解析部１２０．２が行う補正処理および構成要素認識処理を概念的に説明する図である。It is a figure which illustrates notionally the correction process and component recognition process which a document data analysis part 120.2 performs. 図３のステップＳ１１０で表示される請求項の記載の例を示す図である。It is a figure which shows the example of description of the claim displayed by step S110 of FIG. 図３のステップＳ１１０で表示される画面全体の例を示す図である。It is a figure which shows the example of the whole screen displayed by step S110 of FIG. 実施の形態２の文書チェック装置の動作を説明するためのフローチャートであり、実施の形態１の図３と対比される図である。FIG. 10 is a flowchart for explaining the operation of the document check apparatus according to the second embodiment, and is a diagram for comparison with FIG. 3 according to the first embodiment. 図１６のステップ２０６の処理を説明するためのフローチャートである。It is a flowchart for demonstrating the process of step 206 of FIG. 図１６のステップＳ２０８とＳ２１０との流れを詳しく説明するためのフローチャートである。It is a flowchart for demonstrating in detail the flow of step S208 and S210 of FIG. 実施の形態１と実施の形態２の辞書の構成を対比して示す図である。It is a figure which shows the structure of the dictionary of Embodiment 1 and Embodiment 2 in contrast. チェック対象の文書中の文字列に割り付けられる「割付記号」を説明する図である。It is a figure explaining the "assignment symbol" assigned to the character string in the document of a check object. フロントエンド処理で割り付けられる割付記号を説明する図である。It is a figure explaining the allocation symbol allocated by front end processing. 形態素解析における割付記号を説明する図である。It is a figure explaining the allocation symbol in a morphological analysis. バックエンド処理１を説明するためのテーブルである。4 is a table for explaining backend processing 1; 補正処理１を説明するためのテーブルである。10 is a table for explaining correction processing 1; 補正処理２を説明するテーブルである。It is a table explaining the correction process 2. FIG. 「構成要素の認識処理」を説明するためのテーブルである。It is a table for demonstrating "the recognition process of a component." バックエンド処理２を説明するテーブルである。It is a table explaining the back end process 2. FIG. バックエンド処理３を説明するテーブルである。10 is a table for explaining backend processing 3; ステップ２１２において、データ表示解析部１２０．３が行う表示例を示す図である。In step 212, it is a figure which shows the example of a display which the data display analysis part 120.3 performs.

以下、図面を参照しつつ、本発明の実施の形態について説明する。以下の説明では、同一の部分には同一の符号を付してある。それらの名称および機能も同じである。したがってそれらについての詳細な説明は繰り返さない。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. In the following description, the same parts are denoted by the same reference numerals. Their names and functions are also the same. Therefore, detailed description thereof will not be repeated.

（概要）
実施の形態に係る文書チェック装置１００は、外部から与えられた文書データ中において、特定の文字列が、その文字列に後続する用語が、文書中で既出であることを示すために用いられる場合に、文書中から用語の候補となる語句を抽出し、抽出された候補間において、このような特定の文字列の使用が整合しているか否かをチェックする。 (Overview)
When document check apparatus 100 according to the embodiment is used in document data given from the outside, a specific character string is used to indicate that a term following the character string has already appeared in the document. In addition, a word / phrase which is a candidate for a term is extracted from the document, and it is checked whether or not the use of such a specific character string is consistent among the extracted candidates.

ここで、以下のような点に留意する必要がある。
１）文書中で、上記のような特定の文字列が前置される用語は、一般には、名詞、名詞句である。ただし、文書中で使用される全ての名詞、名詞句について、当該文書中に２回目以降に使用される場合に、必ず、特定の文字列が前置されるわけではない。たとえば、一般概念を表すために使用されている名詞、名詞句については、２回目以降であっても、特定の文字列が前置されない。言い換えれば、特定の文字列は、後続する用語が、特定の事物を指していることを前提として使用される。 Here, it is necessary to pay attention to the following points.
1) The term preceded by a specific character string in the document is generally a noun or a noun phrase. However, for all nouns and noun phrases used in a document, a specific character string is not necessarily prefixed when used in the document for the second time or later. For example, nouns and noun phrases used to represent general concepts are not prefixed with a specific character string even after the second time. In other words, a specific character string is used on the assumption that the following term refers to a specific thing.

２）一方で、特定の分野（以下、「文書分野」）の文書では、上述したような特定の文字列が前置される用語（以下、「特定用語」と呼ぶ）については、その語尾が一定の文字列（以後、「用語接尾語」と呼ぶ）の名詞となる確率が高い。たとえば、「期間」「文書」等である。 2) On the other hand, in a document in a specific field (hereinafter referred to as “document field”), the ending of a term (hereinafter referred to as “specific term”) preceded by a specific character string as described above. There is a high probability of becoming a noun of a certain character string (hereinafter referred to as “term suffix”). For example, “period”, “document”, and the like.

さらに、同じ文書分野でも、当該文書に記載される対象となる内容の分野（以下、「内容分野」）によって、構成要素を表す名詞、名詞句の使用される頻度にも相違がある。 Further, even in the same document field, there are differences in the frequency of use of nouns and noun phrases representing constituent elements depending on the field of contents to be described in the document (hereinafter, “content field”).

たとえば、文書分野として、特許請求の範囲を例にとると、その記載では、上記した特定の文字列としては、「前記」「当該」等が使用される。そして、特許請求の範囲の記載中において、上記のような特定用語は、特定の事物そのものか、あるいは、特定の事物を構成する要素を表していると考えられることから、これを特定用語のうちの特定の種類を表すものとして「構成要素」と呼ぶことにする。このとき、「前記」等が付される構成要素の語尾（以下、用語接尾語のうちの特定の種類を表すものとして「構成要素接尾語」と呼ぶ）としては、「手段」「装置」「素子」「信号」等々の特定の文字列となっている場合が、経験上は多い。このような用語接尾語（あるいは、構成要素接尾語）を専用辞書として事前に登録しておけば、文書中から、特定用語（あるいは、構成要素）の候補を抽出する際の手がかりとなる。内容分野に関わらず、使用される頻度が一定以上である用語接尾語についての専用辞書を「共通辞書」と呼ぶことにする。 For example, taking the scope of claims as an example of the document field, in the description, “above”, “related” or the like is used as the specific character string. In the description of the claims, the specific term as described above is considered to represent the specific thing itself or an element constituting the specific thing. It will be referred to as a “component” as representing a specific type. At this time, the suffix of the component to which “the above” or the like is attached (hereinafter referred to as “component suffix” indicating a specific type of term suffixes) includes “means”, “apparatus”, “ There are many cases in the case of specific character strings such as “element” and “signal”. If such a term suffix (or component suffix) is registered in advance as a dedicated dictionary, it becomes a clue when a candidate for a specific term (or component) is extracted from a document. Regardless of the content field, a dedicated dictionary for term suffixes that are used more than a certain frequency will be referred to as a “common dictionary”.

特許請求の範囲の記載では、構成要素として使用される名詞、名詞句は、特許請求の範囲の対象とする技術分野、あるいは、その特許出願を行なう出願人の業務範囲で相違する傾向がある。 In the description of the scope of claims, nouns and noun phrases used as constituent elements tend to be different depending on the technical field covered by the scope of claims or the business scope of the applicant applying for the patent.

３）上述のように、用語接尾語（より特定的には、構成要素接尾語）が、用語間で共通する場合は多いものの、必ずしも、全ての「特定の文字列が前置されるべき用語」について、このような共通な語尾が存在するとは限らない。さらには、特に、特許請求の範囲の記載などでは、技術の進歩に伴い、新しい用語が、随時、使用されるようになる傾向にあるため、事前に「構成要素接尾語」として登録しておける範囲内で、用語が使用されるとも限らない。そこで、共通辞書を用いて、特定用語（あるいは、構成要素）の候補を抽出した際に、抽出からもれた特定用語（あるいは、構成要素）の候補を、ユーザが、随時、登録できる辞書が存在することが望ましい。このような辞書を、「ユーザ辞書」と呼ぶ。 3) As mentioned above, the term suffix (more specifically, the component suffix) is often common among terms, but not all “specific terms should be preceded by a specific character string”. ”Does not always have such a common ending. Furthermore, especially in the description of claims, new terms tend to be used from time to time as technology advances, so they can be registered in advance as “component suffixes”. Terms are not necessarily used within the scope. Therefore, when a candidate for a specific term (or component) is extracted using a common dictionary, there is a dictionary in which a user can register a candidate for a specific term (or component) that is missing from the extraction at any time. It is desirable to exist. Such a dictionary is called a “user dictionary”.

なお、内容分野ごとに使用される用語の頻度が異なるので、文書チェック装置１００は、ユーザ辞書として、このような内容分野による文書のグループごとに対応した、複数の辞書を備えることも可能である。特に、特許請求の範囲の記載では、上述のように、構成要素として使用される名詞、名詞句は、出願人の業務範囲で相違する傾向があるので、文書チェック装置１００は、「ユーザ辞書」として、出願人ごとに異なる「顧客辞書」を備えることとしてもよい。もちろん、文書チェック装置１００は、出願人ごとではなく、技術分野ごとに「技術分野辞書」を備える構成とすることも可能である。 In addition, since the frequency of the term used for every content field differs, the document check apparatus 100 can also be provided with a some dictionary corresponding to every group of the document by such a content field as a user dictionary. . In particular, in the description of the scope of claims, as described above, nouns and noun phrases used as components tend to be different in the business scope of the applicant. Alternatively, a different “customer dictionary” may be provided for each applicant. Of course, the document checking apparatus 100 may be configured to include a “technical field dictionary” for each technical field, not for each applicant.

４）以上のようにして、用語の候補（あるいは、構成要素の候補）を抽出すると、各用語の候補について、ｉ）特定の文字列が前置されているものについては、同じ用語の候補が文書中に既出であるか、ｉｉ）特定の文字列が前置されていないものについては、その用語の候補が文書中に初出であるか、をそれぞれチェックすれば、特定の文字列の使用が整合しているかどうかを確認できることになる。 4) As described above, when candidate terms (or candidate constituent elements) are extracted, i) for each term candidate, i) for a term prefixed with a specific character string, Ii) For those that have not been prefixed with a specific character string, check whether the term candidate is first appearing in the document. It will be possible to check whether they are consistent.

ここで、特に、特許請求の範囲の記載では、既出あるいは初出であるか否かは、独立請求項であれば、その請求項の記載の範囲内だけで判断すればよい。ただし、従属請求項では、各請求項の中での記載にとどまらず、その請求項が従属する先の請求項の記載を順にたどって、最終的に従属の基礎となる独立請求項の記載までの範囲内で、既出あるいは初出であるかを判断する必要がある。 Here, in particular, in the description of the scope of claims, whether it is already appearing or first appearing should be determined only within the scope of the description of the claim if it is an independent claim. However, in the dependent claims, not only the description in each claim but also the description of the preceding claims on which the claims depend is followed in order until the description of the independent claims that will ultimately become the basis of the dependency. It is necessary to judge whether it has already appeared or first appearance within the range.

以上説明したような事情があるために、特許請求の範囲の文中から構成要素を認識するためには、文中の文字列の「単語範囲およびその品詞」に関する情報が必要である。そのためには、「形態素解析（Morphological Analysis）エンジン」のソフトウェアを用いることができる。 Due to the circumstances as described above, in order to recognize the constituent elements from the sentence of the claims, information on the “word range and its part of speech” of the character string in the sentence is necessary. For this purpose, software of “Morphological Analysis Engine” can be used.

ここで、「形態素解析」とは、コンピュータ等の計算機を用いた自然言語処理であって、対象言語の文法の情報（文法のルールの集まり）やコーパス辞書（品詞等の情報付きの単語リスト）を情報源として用い、自然言語で書かれた文を形態素（Morpheme,言語で意味を持つ最小単位）の列に分割し、それぞれの品詞を判別する処理のことをいう。 Here, “morphological analysis” is natural language processing using a computer such as a computer, and includes grammatical information (gathering of grammar rules) of a target language and a corpus dictionary (a word list with information such as parts of speech). Is a process of dividing a sentence written in a natural language into columns of morphemes (Morpheme, the smallest unit meaningful in the language) and discriminating each part of speech.

しかし、特許請求の範囲の文章において、形態素解析エンジンは、一般に、文章を形態素に分割し、各形態素の品詞を特定する処理を行うだけであるので、構成要素の特定のためには十分でない。そこで、構成要素の部分的な単語などを格納した上述のような専用辞書を用いる構成とすることが必要になる。 However, in the sentence of the claims, the morphological analysis engine generally only divides the sentence into morphemes and performs the process of specifying the part of speech of each morpheme, so that it is not sufficient for specifying the constituent elements. Therefore, it is necessary to use a dedicated dictionary as described above that stores partial words of the constituent elements.

日本語の「形態素解析エンジン（ＭＡＥ：Morphological Analysis Engine）」としては、すでに、フリーソフトウェアとして入手可能なものも存在し、たとえば、以下のようなものがある。 Japanese “Morphological Analysis Engine (MAE)” is already available as free software, for example, the following.

ｉ）KAKASI（かかし）"kanji kana simple inverter"，http://kakasi.namazu.org/
ｉｉ）MeCab（和布蕪（めかぶ）），http://mecab.sourceforge.net/
ｉｉｉ）ChaSen（茶筌），http://chasen-legacy.sourceforge.jp/
［実施の形態１］
（ハードウェア構成）
実施の形態１に係る文書チェック装置１００のハードウェア構成について、図１を参照して説明する。図１は、実施の形態１に係る文書チェック装置１００の構成をブロック図形式で表す図である。 i) KAKASI “kanji kana simple inverter”, http://kakasi.namazu.org/
ii) MeCab (Mekabu), http://mecab.sourceforge.net/
iii) ChaSen, http://chasen-legacy.sourceforge.jp/
[Embodiment 1]
(Hardware configuration)
A hardware configuration of the document checking apparatus 100 according to the first embodiment will be described with reference to FIG. FIG. 1 is a block diagram showing the configuration of the document check apparatus 100 according to the first embodiment.

以下では、文書チェックの一例として、文書チェック装置１００は、特許請求の範囲をチェック対象とするものとして、説明を行う。 Hereinafter, as an example of the document check, the document check apparatus 100 will be described assuming that the scope of the claims is a check target.

文書チェック装置１００は、コンピュータ本体１０２と、出力装置または表示装置としてのモニタ１０４と、入力装置としてのキーボード１１０および入力補助装置としてのマウス１１２とを備える。モニタ１０４、キーボード１１０、およびマウス１１２は、コンピュータ本体１０２とバス１０５を介して接続される。 The document checking apparatus 100 includes a computer main body 102, a monitor 104 as an output device or a display device, a keyboard 110 as an input device, and a mouse 112 as an input auxiliary device. The monitor 104, keyboard 110, and mouse 112 are connected to the computer main body 102 via the bus 105.

コンピュータ本体１０２は、外部記録媒体の読出装置であるフレキシブルディスク（ＦｌｅｘｉｂｌｅＤｉｓｃ、以下「ＦＤ」と呼ぶ）ドライブ１０６と、他の外部記録媒体の読出装置である光ディスクドライブ１０８と、演算処理装置であるＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）１２０と、記憶装置であるメモリ１２２と、大容量記憶装置である直接アクセスメモリ装置、たとえば、ハードディスク１２４と、通信装置としての通信インターフェイス１２８とを含む。これらの部品は、互いにバス１０５で接続されている。 The computer main body 102 is a flexible disk (Flexible Disc, hereinafter referred to as “FD”) drive 106 that is a reading device for an external recording medium, an optical disk drive 108 that is a reading device for another external recording medium, and an arithmetic processing device. A CPU (Central Processing Unit) 120, a memory 122 as a storage device, a direct access memory device as a mass storage device, for example, a hard disk 124, and a communication interface 128 as a communication device are included. These components are connected to each other by a bus 105.

ＦＤドライブ１０６は、ＦＤ１１６に情報を読み書きする。光ディスクドライブ１０８は、ＣＤ−ＲＯＭ（ＣｏｍｐａｃｔＤｉｓｃＲｅａｄ−ＯｎｌｙＭｅｍｏｒｙ）１１８等の光ディスク上の情報を読み込む。通信インターフェイス１２８は、外部とデータの授受を行なう。 The FD drive 106 reads and writes information from and to the FD 116. The optical disc drive 108 reads information on an optical disc such as a CD-ROM (Compact Disc Read-Only Memory) 118. The communication interface 128 exchanges data with the outside.

なお、ＣＤ−ＲＯＭ１１８は、コンピュータ本体に対してインストールされるプログラム等の情報を記録可能な媒体であれば、他の媒体、たとえば、ＤＶＤ−ＲＯＭ（ＤｉｇｉｔａｌＶｅｒｓａｔｉｌｅＤｉｓｃ）やメモリーカードなどでもよく、その場合は、コンピュータ本体１０２には、これらの媒体を読み取ることが可能なドライブ装置が設けられる。 The CD-ROM 118 may be another medium, such as a DVD-ROM (Digital Versatile Disc) or a memory card, as long as it can record information such as a program installed in the computer main body. In this case, the computer main body 102 is provided with a drive device that can read these media.

メモリ１２２は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）およびＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）を含む。 The memory 122 includes a ROM (Read Only Memory) and a RAM (Random Access Memory).

ハードディスク１２４は、表示制御プログラム１３１と、文書データ解析プログラム１３２と、解析対象の文書の内容分野のグループを表すグループ情報１３３と、共通辞書データ１３４と、顧客辞書データ１３５と、チェック対象となる文書データ１３６とを格納する。 The hard disk 124 includes a display control program 131, a document data analysis program 132, group information 133 representing a group of content fields of a document to be analyzed, common dictionary data 134, customer dictionary data 135, and a document to be checked. Data 136 is stored.

なお、グループ情報１３３は、特に限定されないが、例えば、ソフトウェアの起動時に、ユーザが入力する構成とすることができる。 The group information 133 is not particularly limited. For example, the group information 133 can be configured to be input by the user when starting the software.

また、共通辞書データ１３４と顧客辞書データ１３５とは、ネットワークで接続された他のコンピュータの記憶装置内に記憶されており、文書チェック装置１００は、通信インターフェイス１２８を介して、これらのデータにアクセスして読み出したり、あるいは、書込みを行なったりする構成とすることも可能である。 The common dictionary data 134 and the customer dictionary data 135 are stored in a storage device of another computer connected via a network, and the document check device 100 accesses these data via the communication interface 128. Thus, it is possible to adopt a configuration in which reading or writing is performed.

表示制御プログラム１３１は、文書チェック装置１００とユーザとの間のインターフェイスとなる画面の表示を制御する。したがって、ユーザからの入力を促す画面の表示をしたり、あるいは、チェック結果の表示を行う処理を制御する処理のためのものである。 The display control program 131 controls display of a screen serving as an interface between the document check apparatus 100 and the user. Therefore, this is for the process of controlling the process of displaying a screen for prompting input from the user or displaying the check result.

文書データ解析プログラム１３２は、後に説明するように、チェック対象となる文書データ１３６について、構成要素の候補となる名詞、名詞句を、共通辞書データ１３４および顧客辞書データ１３５に基づいて、抽出する。さらに、文書データ解析プログラム１３２は、このようにして抽出された構成要素の候補について、特定の文字列である「前記」「当該」などの使用の整合性をチェックする処理のためのものである。 As will be described later, the document data analysis program 132 extracts nouns and noun phrases as component candidates from the document data 136 to be checked based on the common dictionary data 134 and the customer dictionary data 135. Further, the document data analysis program 132 is for processing for checking the consistency of the use of a specific character string such as “above” or “related” for the component candidate extracted in this way. .

ここで、文書データ解析プログラム１３２が、構成要素の候補を抽出する処理においては、上述した形態素解析エンジンを使用する。したがって、ハードディスク１２４には、図示しないものの、形態素解析エンジンが形態素解析処理を行う際に必要となる「対象言語の文法の情報」や「コーパス辞書」も、格納されている。 Here, in the process in which the document data analysis program 132 extracts candidate components, the morphological analysis engine described above is used. Therefore, although not shown, the hard disk 124 also stores “grammar information of the target language” and “corpus dictionary” required when the morphological analysis engine performs the morphological analysis processing.

なお、表示制御プログラム１３１と文書データ解析プログラム１３２とは、ＦＤ１１６またはＣＤ−ＲＯＭ１１８等の記憶媒体に記録されることによって供給されてもよいし、他のコンピュータにより通信インターフェイス１２８を経由して供給されてもよい。 The display control program 131 and the document data analysis program 132 may be supplied by being recorded on a storage medium such as the FD 116 or the CD-ROM 118, or may be supplied via a communication interface 128 by another computer. May be.

共通辞書データ１３４は、上述したように、内容分野に関わらず、使用される頻度が一定以上である構成要素接尾語についての専用辞書である。図１に示した例では、内容分野は、出願人（顧客）の業務範囲により異なる場合を例示しているので、この場合は、顧客に関わらず、使用される頻度が一定以上である構成要素接尾語についての専用辞書である。 As described above, the common dictionary data 134 is a dedicated dictionary for component suffixes that are used at a certain frequency regardless of the content field. In the example shown in FIG. 1, the content field exemplifies a case where the content field varies depending on the business scope of the applicant (customer). In this case, the component that is used at a certain frequency or more regardless of the customer. This is a dedicated dictionary for suffixes.

一方、顧客辞書データ１３５は、共通辞書を用いて、構成要素の候補となる名詞、名詞句を抽出した際に、抽出からもれた構成要素の候補を、ユーザが、随時、登録できるユーザ辞書である。 On the other hand, the customer dictionary data 135 is a user dictionary that allows a user to register a candidate for a component missing from the extraction at any time when a noun or noun phrase that is a candidate for the component is extracted using a common dictionary. It is.

文書データ１３６は、チェック対象となる文書のデータである。特に限定されないが、たとえば、文書データ１３６は、チェック対象となる文書が、特定のワードプロセッサソフトウェアで作成されたファイルである場合、このファイルからテキストデータを抽出したデータとすることができる。表示制御プログラム１３１は、文書チェック装置１００での文書データの表示あたっては、当該テキストデータ中に含まれる情報から、所定のレイアウトで表示が行なわれるように制御する。このような構成とすれば、文書チェック装置１００は、ワードプロセッサソフトウェアで作成されたファイルを直接操作して、データの書込み等を行なわないことになるので、当該ファイルデータ中に、予期しない変更等が加えられ、いわゆる「文字化け」や「レイアウト情報の予期しない変更」等が生じるのを防止することができる。 Document data 136 is data of a document to be checked. Although not particularly limited, for example, when the document to be checked is a file created by specific word processor software, the document data 136 can be data obtained by extracting text data from this file. The display control program 131 controls the display of the document data on the document check apparatus 100 so that the display is performed in a predetermined layout from the information included in the text data. With such a configuration, the document checking apparatus 100 directly operates a file created by the word processor software and does not write data or the like, so that unexpected changes or the like are present in the file data. In addition, it is possible to prevent so-called “garbled characters” and “unexpected changes in layout information”.

演算処理装置として機能するＣＰＵ１２０は、メモリ１２２をワーキングメモリとして、上述した各プログラムに対応した処理を実行する。 The CPU 120 functioning as an arithmetic processing unit executes processing corresponding to each program described above using the memory 122 as a working memory.

表示制御プログラム１３１と文書データ解析プログラム１３２とは、上述の通り、ＣＰＵ１２０により実行されるソフトウェアである。一般的に、こうしたソフトウェアは、ＣＤ−ＲＯＭ１１８、ＦＤ１１６等の記憶媒体に格納されて流通し、光ディスクドライブ１０８またはＦＤドライブ１０６等により記憶媒体から読み取られてハードディスク１２４に一旦格納される。または、文書チェック装置１００がネットワークに接続されている場合には、ネットワーク上のサーバから一旦ハードディスク１２４にコピーされる。そうしてさらにハードディスク１２４からメモリ１２２中のＲＡＭに読み出されてＣＰＵ１２０により実行される。なお、ネットワーク接続されている場合には、ハードディスク１２４に格納することなくＲＡＭに直接ロードして実行するようにしてもよい。 The display control program 131 and the document data analysis program 132 are software executed by the CPU 120 as described above. Generally, such software is stored and distributed in a storage medium such as the CD-ROM 118 and the FD 116, read from the storage medium by the optical disk drive 108 or the FD drive 106, and temporarily stored in the hard disk 124. Alternatively, when the document check apparatus 100 is connected to a network, the document check apparatus 100 is temporarily copied to the hard disk 124 from a server on the network. Then, the data is further read from the hard disk 124 to the RAM in the memory 122 and executed by the CPU 120. In the case of network connection, the program may be directly loaded into the RAM and executed without being stored in the hard disk 124.

図１に示したコンピュータのハードウェア自体およびその動作原理は一般的なものである。したがって、本発明の機能を実現するに当り本質的な部分は、ＦＤ１１６、ＣＤ−ＲＯＭ１１８、ハードディスク１２４等の記憶媒体に記憶されたソフトウェアである。 The computer hardware itself shown in FIG. 1 and its operating principle are general. Therefore, an essential part for realizing the functions of the present invention is software stored in a storage medium such as the FD 116, the CD-ROM 118, and the hard disk 124.

（機能的構成）
図２は、本実施の形態１に係る文書チェック装置１００の機能的構成をブロック図形式で示す図である。 (Functional configuration)
FIG. 2 is a block diagram showing the functional configuration of the document check apparatus 100 according to the first embodiment.

図３は、本実施の形態１に係る文書チェック装置１００の動作を説明するためのフローチャートである。 FIG. 3 is a flowchart for explaining the operation of the document check apparatus 100 according to the first embodiment.

図２および図３を参照して、本実施の形態１に係る文書チェック装置１００の機能的構成および動作について説明する。 With reference to FIG. 2 and FIG. 3, the functional configuration and operation of the document check apparatus 100 according to the first embodiment will be described.

文書チェック装置１００は、記憶装置であるハードディスク１２４と、表示装置であるモニタ１０４と、入力装置であるキーボード１１０と、ＣＰＵ１２０により実行される処理の機能ブロックとして、ｉ）文書データ解析プログラム１３２により実行される、ｉ−１）ハードディスク１２４などの記憶装置から文書データを取得する文書取込部１２０．１およびｉ−２）文書データの解析・チェックを行う文書データ解析部１２０．２と、ｉｉ）表示制御プログラム１３１により実行される、文書データ解析部１２０．２のチェック結果や文書データ解析部１２０．２とユーザとのインターフェイス画面をモニタ１０４に表示させるデータ表示出力制御部１２０．３と、を備える。 The document check apparatus 100 includes a hard disk 124 as a storage device, a monitor 104 as a display device, a keyboard 110 as an input device, and functional blocks of processing executed by the CPU 120. I-1) a document fetch unit 120.1 that acquires document data from a storage device such as the hard disk 124, and i-2) a document data analysis unit 120.2 that analyzes and checks document data, and ii) A data display output control unit 120.3 that displays on the monitor 104 a check result of the document data analysis unit 120.2 and an interface screen between the document data analysis unit 120.2 and the user, which are executed by the display control program 131. Prepare.

図２では、ユーザ辞書である顧客辞書データ１３５が、文書データの内容分野に相当する顧客ごと（顧客の業務範囲ごと）の部分辞書１３５．１〜１３５．ｎに分割されていることを明示的に示している。 In FIG. 2, customer dictionary data 135, which is a user dictionary, includes partial dictionaries 135.1 to 135.135 for each customer (for each business scope of the customer) corresponding to the content field of the document data. It is explicitly shown that it is divided into n.

すなわち、専用辞書を２種類の辞書（顧客辞書、共通辞書）に分類する。たとえば、後に説明するＰ０辞書を顧客辞書に、Ｐ０以外の辞書を共通辞書に割り当てる。たとえば、複数人からなるユーザが、複数の小規模グループから成る１つの大規模グループに分割されており、小規模グループごとに顧客辞書を、大規模グループに共通辞書を配置する。 That is, the dedicated dictionary is classified into two types of dictionaries (customer dictionary and common dictionary). For example, a P0 dictionary described later is assigned to the customer dictionary, and a dictionary other than P0 is assigned to the common dictionary. For example, a plurality of users are divided into one large group composed of a plurality of small groups, and a customer dictionary is arranged for each small group and a common dictionary is arranged in the large group.

このような構成とすれば、小規模グループごとに必要な登録単語が異なる場合、辞書の単語登録数を最小限にすることができるため、単語検索も高速になる。さらに、小規模グループで誤った単語を登録してしまった場合でも、大規模グループ全体への影響は出ない、という効果もある。 With such a configuration, when the necessary registered words are different for each small group, the number of registered words in the dictionary can be minimized, so that the word search is also performed at high speed. Furthermore, even if an incorrect word is registered in a small group, there is an effect that the entire large group is not affected.

図３を参照して、文書チェック装置１００の動作が開始されると、まず、ユーザにより、チェック対象となる文書データの内容分野（顧客）を特定するためのグループ情報がキーボード１１０、マウス１１２などを使用して入力される（Ｓ１００）。これに応じて、文書データ解析部１２０．２は、使用する部分辞書１３５．ｉ（１≦ｉ≦ｎ）を選択する。 Referring to FIG. 3, when the operation of the document check apparatus 100 is started, first, group information for specifying a content field (customer) of document data to be checked is displayed by the user, such as a keyboard 110 and a mouse 112. (S100). In response to this, the document data analysis unit 120.2 uses the partial dictionary 135. i (1 ≦ i ≦ n) is selected.

続いて、文書取込部１２０．１は、文書データ１３６を取り込み、ワーキングメモリであるＲＡＭ上に展開する（Ｓ１０２）。 Subsequently, the document fetch unit 120.1 fetches the document data 136 and develops it on the RAM, which is a working memory (S102).

文書データ解析部１２０．２は、文書データ１３６の中から、チェック対象となる「特許請求の範囲」の記載を選択・特定し（Ｓ１０２）、続いて、構成要素の候補となる名詞、名詞句を抽出するために、共通辞書データ１３４および選択された部分辞書１３５．ｉと形態素解析エンジンとを用いて文書データの解析を行う（Ｓ１０６）。続いて、文書データ解析部１２０．２は、解析結果に基づいて、構成要素の候補を特定するとともに、構成要素の候補間において、特定の文字列である「前記」「当該」の使用が、特許請求の記載の中で、整合しているかをチェックする（Ｓ１０８）。 The document data analysis unit 120.2 selects / specifies the description of “claims” to be checked from the document data 136 (S102), and then nouns and noun phrases that are candidates for constituent elements To extract the common dictionary data 134 and the selected partial dictionary 135. Document data is analyzed using i and the morphological analysis engine (S106). Subsequently, the document data analysis unit 120.2 specifies the constituent element candidates based on the analysis result, and the use of the above-described “character”, which is a specific character string, between the constituent element candidates, In the description of the claim, it is checked whether or not they match (S108).

データ表示出力制御部１２０．３は、チェック結果に応じて、そのチェック結果を文書データの表示上で、たとえば、構成要素の候補について、異なった色のハイライト表示をするなどして、ｉ）「前記」「当該」が付されており、かつ、その使用が適正なもの（同じ「構成要素の候補」が特許請求の範囲の記載中に既出）、ｉｉ）「前記」「当該」が付されておらず、かつ、その使用が適正なもの（その「構成要素の候補」が特許請求の範囲の記載中で初出）、ｉｉｉ）「前記」「当該」が付されているが、その使用が不適切なもの、ｉｖ）「前記」「当該」が付されていないが、その使用が不適切なもの、を区別可能な表示態様で表示する。なお、ｉ）とｉｉ）については、ともに適正なものを表示するのであるから、共通の表示態様で表示してもよい。 In accordance with the check result, the data display output control unit 120.3 displays the check result on the display of the document data, for example, by highlighting different colors for the component candidates, i) “The above” “the relevant” is attached and the use is appropriate (the same “candidate of component” has already been mentioned in the claims), ii) “the above” “the relevant” is attached Is not used, and its use is appropriate (the “candidate of component” appears for the first time in the claims), iii) “the above” and “the relevant” are attached, Iv) “Appropriate use” is not displayed, but “appropriate use” is displayed in a distinguishable display mode. Since i) and ii) both display appropriate items, they may be displayed in a common display mode.

次に、ユーザからの入力により、再解析を行うのであれば（Ｓ１１２）、処理は、ステップＳ１０２に復帰し、再解析を行わないのであれば、処理は終了する。 Next, if re-analysis is performed based on an input from the user (S112), the process returns to step S102, and if re-analysis is not performed, the process ends.

ここで、ユーザが再解析を指示するのは、ステップＳ１１０での表示結果に基づいて、ユーザが文書データ１３６を修正した場合に、再解析をすることが必要となった場合である。このとき、上述のとおり、ユーザが修正するのは、特定のワードプロセッサソフトウェアで作成されたファイルであり、チェック対象の文書データ１３６としては、そのファイルからテキストファイルのみを取り込む、構成とすることができる。
［文書チェック処理の詳細］
以下では、図３により説明した文書チェック処理をより詳しく説明する。
（構成要素の存在の状態）
まず、構成要素が存在する状態の条件について説明する。
構成要素は、文中に以下の状態で存在することを前提とする。
１．構成要素自身
１−１名詞で構成する場合（例：半導体レーザ素子、制御装置、など）
１−２修飾語＋名詞で構成する場合（例：特定の半導体レーザ素子、所定の制御装置、など）
１−３専門用語や固有名詞で構成する場合（例：Ｐｎ接合など）
したがって、構成要素の抽出は、単に、形態素解析により、名詞を特定するだけでは不十分である。
２．構成要素の後方の文字列
２−１構成要素＋助詞（例：記憶装置に）
２−２構成要素＋助動詞（例：記憶装置であって）
２−３構成要素＋末尾語（例：記憶装置ごとに、設定温度以上に、水素電極間に、記憶装置（３）に）
２−４構成要素＋句読点（例：記憶装置、記憶装置。）
２−５構成要素＋上記以外（例：遊技状態終了後に、）
したがって、逆に、これらの「構成要素の後方の文字列」を文書データ中でのポインタとして用いれば、構成要素の終点の候補を見つけることに利用できる。
３．構成要素の前方の文字列
３−１名詞以外＋構成要素（例：・・を有する記憶装置に・・、）
３−２名詞＋構成要素（例：・・のうち記憶装置に・・・、）
３−３接頭詞＋構成要素（例：・・の各記憶装置に・・・、）
３−４前記＋構成要素（例：・・を前記記憶装置に・・、）
構成要素の始点については、単純に名詞で切れるとするわけにはいかない場合がある。
４．構成要素の部分文字列（特殊）
４−１名詞＋動詞＋特定の単語（例：選択するステップ）
４−２名詞＋動詞＋助詞＋特定の単語 (例：選択するためのステップ)
上記特定の単語とは、後述する専用辞書の辞書Ｐ２に登録する単語である。つまり、構成要素は、単純に、名詞の連結以外にも、動詞等と名詞との複合語となっている場合もある。 Here, the user instructs reanalysis when it is necessary to reanalyze when the user corrects the document data 136 based on the display result in step S110. At this time, as described above, the user corrects a file created by specific word processor software, and the document data 136 to be checked can be configured to take in only a text file from the file. .
[Details of document check processing]
Hereinafter, the document check process described with reference to FIG. 3 will be described in more detail.
(State of existence of component)
First, the condition of the state in which a component exists will be described.
It is assumed that the component exists in the following state in the sentence.
1. Components themselves 1-1 When configured with nouns (example: semiconductor laser element, control device, etc.)
1-2 When configured with a modifier + noun (eg, specific semiconductor laser element, predetermined control device, etc.)
1-3 Consists of technical terms and proper nouns (example: Pn junction)
Therefore, it is not sufficient to extract a constituent element simply by specifying a noun by morphological analysis.
2. Character string behind component 2-1 Component + particle (example: in storage device)
2-2 Constituent element + auxiliary verb (Example: storage device)
2-3 Constituent elements + last word (eg, for each storage device, above the set temperature, between hydrogen electrodes, to the storage device (3))
2-4 Component + Punctuation (eg, storage device, storage device)
2-5 Component + other than above (eg, after the game state ends)
Therefore, conversely, if these “character strings behind the constituent elements” are used as pointers in the document data, they can be used to find end point candidates of the constituent elements.
3. Character string in front of the component 3-1 Other than the noun + component (eg, in a storage device having a ...)
3-2 Noun + Constituent elements (Example: ... in the storage device ...)
3-3 Prefix + component (example: for each storage device ...)
3-4 The above-mentioned + component (eg, .. to the storage device)
The starting point of a component may not be simply broken by a noun.
4). Component substring (special)
4-1 Noun + verb + specific word (Example: selection step)
4-2 Noun + verb + particle + specific word (eg, step for selection)
The specific word is a word registered in a dictionary P2 of a dedicated dictionary described later. That is, the constituent element may be simply a compound word of a verb or the like and a noun other than the connection of nouns.

（文書チェック処理フロー）
図４は、図３のステップＳ１０６とステップＳ１０８の処理を、より詳しく説明するためのフローチャートである。 (Document check processing flow)
FIG. 4 is a flowchart for explaining in more detail the processing in steps S106 and S108 in FIG.

また、図７および図８は、図４の各処理ステップでの処理を説明するための図である。
まず、図４を参照して、ステップＳ１０４でデータ範囲の選択が終了すると、文書データ解析部１２０．２は、請求項ごとに分離して抽出し、請求項間の従属関係を抽出する（Ｓ１０６．１．１）。なお、このような請求項の分離は、もともとのテキストデータ中のタグを利用して行なうことができる。また、従属関係の抽出は、正規表現を用いたテンプレートとのマッチングにより、たとえば、「請求項○または△に記載の」等の表現を抽出することで、特定することができる。 7 and 8 are diagrams for explaining processing in each processing step of FIG.
First, referring to FIG. 4, when the selection of the data range is completed in step S104, the document data analysis unit 120.2 extracts each of the claims separately and extracts the dependency between the claims (S106). 1.1). Such claims can be separated by using tags in the original text data. In addition, the dependency relationship can be identified by extracting an expression such as “described in claim ○ or Δ” by matching with a template using a regular expression.

続いて、文書データ解析部１２０．２は、請求項が連番となっているかや、従属先の請求項と自身の発明の名称とが一致しているかをチェックする（Ｓ１０６．１．２）。特に限定されないが、たとえば、「発明の名称」は、ｉ）ユーザが特定してもよいし、あるいは、ｉｉ）当該請求項中で、もっとも後方（末尾側）で、構成要素として抽出されたものを用いることができる。このｉｉ）の場合は、発明の名称の整合性のチェックは、後に説明する「前記」等の整合性チェックの際に同時に行なわれることになる。 Subsequently, the document data analysis unit 120.2 checks whether the claims are serial numbers or whether the dependent claim and the name of its own invention match (S106.1.2). . Although not particularly limited, for example, “name of invention” may be i) specified by the user, or ii) extracted as a constituent element at the rearmost (end) side in the claim. Can be used. In the case of ii), the consistency check of the title of the invention is performed simultaneously with the consistency check such as “above” described later.

続いて、図４および図７を参照して、文書データ解析部１２０．２は、形態素解析を実行する。まず、文書データ解析部１２０．２は、形態素解析の前処理（「フロントエンド処理」と呼ぶ）として、専用辞書に登録された特定の単語については、形態素解析の前に、特別な品詞を強制的に割当てる処理を行う（Ｓ１０６．２．１）。 Subsequently, referring to FIG. 4 and FIG. 7, the document data analysis unit 120.2 performs morphological analysis. First, the document data analysis unit 120.2 forces a special part of speech before morphological analysis for a specific word registered in the dedicated dictionary as preprocessing for morphological analysis (referred to as “front-end processing”). Assigning process is performed (S106.2.1).

図５は、このような特別な品詞の一覧を示す図である。
まず、「フロントエンド処理」で、品詞の割り当てが行なわれるのは、以下のとおりである。 FIG. 5 is a diagram showing a list of such special parts of speech.
First, parts of speech are assigned in the “front end processing” as follows.

１）部分辞書１３５．ｉにユーザ登録されているもの（品詞Ｐ０）。構成要素と見なす文字列である。ただし、名詞が前置する場合のみ、それも含めて構成要素と見なすことになるので、これは、「構成要素接尾語」に分類する。ユーザ辞書である部分辞書１３５．ｉに登録されるのは、この品詞Ｐ０のみであり、他の品詞は、共通辞書データ１３４に登録されている。 1) Partial dictionary 135. The user is registered in i (part of speech P0). A character string that is considered a component. However, only when a noun is prefixed, it is considered as a constituent element including it, so it is classified as a “constituent suffix”. Partial dictionary 135 which is a user dictionary. Only this part of speech P0 is registered in i, and the other parts of speech are registered in the common dictionary data 134.

２）構成要素接頭語（品詞Ｈ１，Ｈ２，Ｈ３）
２−１）品詞Ｈ１：構成要素の先頭に来る単語。チェック文字列（「前記」「当該」などの特定の文字列）が後置されることはない。例えば、「特定の」などである。 2) Component prefix (parts of speech H1, H2, H3)
2-1) Part of speech H1: A word that comes at the top of a component. A check character string (a specific character string such as “above” or “related”) is not placed after the check character string. For example, “specific”.

２−２）品詞Ｈ２：構成要素の先頭に来る単語。チェック文字列が後置されることがある。たとえば、「複数の」、「所定の」などである。つまり、「複数の○○装置」が既出のときに、後に再び記載する際には、「前記複数の○○装置」となる場合も、「複数の前記○○装置」となる場合もある。 2-2) Part-of-speech H2: A word that comes at the beginning of a component. Check string may be appended. For example, “plurality”, “predetermined”, and the like. In other words, when the “plurality of XX devices” has already appeared, when they are described again later, they may be “the plurality of XX devices” or “the plurality of the XX devices”.

２−３）品詞Ｈ３：構成要素の先頭に来る単語。接続詞と組み合わせて複数存在する可能性がある。たとえば、「第」＋数字、「第」＋数字＋「の」などである。このときは、「第１および第２の○○」というような使用のされ方をする点で、品詞Ｈ１や品詞Ｈ２と異なる。 2-3) Part of speech H3: A word that comes to the top of the component. There may be multiple combinations with conjunctions. For example, “first” + number, “first” + number + “no”, and the like. At this time, it differs from the part of speech H1 or the part of speech H2 in that it is used as “first and second OO”.

３）接尾語（品詞ＴＴ）（これは、接尾語の辞書のうち、辞書ＦＴに記憶される）
形態素解析での品詞の分析いかんに関わらず、請求の範囲で使用される場合は、必ず、構成要素に後置されるものである。たとえば、「（」などである。 3) Suffix (part of speech TT) (this is stored in the dictionary FT in the suffix dictionary)
Regardless of the analysis of the part of speech in the morphological analysis, whenever it is used in the claims, it always follows the component. For example, “(”).

４）強制名詞（品詞Ｎ）（これは、辞書ＦＮ中に登録されている）
形態素解析での品詞の分析いかんに関わらず、必ず、名詞Ｎを割当てる。これは、請求項中の記載では、構成要素中に使用される場合があるからである。たとえば、「〜」「／」などである。 4) compulsory noun (part of speech N) (this is registered in the dictionary FN)
Regardless of the analysis of the part of speech in the morphological analysis, the noun N is always assigned. This is because the description in the claims may be used in a component. For example, “˜”, “/”, and the like.

５）チェック文字列（品詞ＺＺ）
構成要素に前置する単語であり、「特定の文字列」として、その使用の整合性がチェックされる単語である。該当単語は、「前記」、「該」、「当該」、「上記」、「各前記」、「前記各」などである。なお、「各前記」、「前記各」もチェック文字列としているのは、以下の理由による。すなわち、「各」は、形態素解析では、「接頭詞かつ名詞接続」と判断される場合がある。ところで、一般には、構成要素としての認識には、「接頭詞かつ名詞接続」の単語は、後続する名詞と一体として取り扱うのが妥当であるところ、「各」については、「各前記○○」、「前記各○○」（○○は名詞）という使用の仕方がされる場合がある。後者の場合は、この原則に従うと、「各○○」が構成要素とされてしまうため、むしろ、「各前記」、「前記各」をチェック文字列としてチェックするという例外扱いが妥当だからである。 5) Check character string (part of speech ZZ)
It is a word that precedes a component, and is a word that is checked for consistency in its use as a “specific character string”. The relevant words are “the above”, “the”, “the”, “the above”, “each”, “each”, and the like. The reason why “each” and “each” are used as check character strings is as follows. That is, “each” may be determined as “prefix and noun connection” in the morphological analysis. By the way, in general, for recognition as a constituent element, it is appropriate to treat the word “prefix and noun connection” as a unit with the following noun, but for “each”, “each said ○○” , “Each said XX” (XX is a noun) may be used. In the latter case, according to this principle, “each ○○” is considered as a component, so rather, the exception handling of checking “each” and “each” as a check character string is appropriate. .

再び、図４および図７にもどって、フロントエンド処理が終了すると、文書データ解析部１２０．２は、形態素解析処理を行い、フロントエンド処理で強制的に品詞を割当てた以外の単語について、名詞Ｎ，接頭詞かつ名詞接続Ｎ０、接続詞Ｏ，助詞Ｊ，句点Ｋ１，読点Ｋ２，動詞Ｖ，助動詞Ｇなどの品詞を割当てる。 Returning to FIG. 4 and FIG. 7 again, when the front end processing is completed, the document data analysis unit 120.2 performs morphological analysis processing, and nouns for words other than those for which part of speech is forcibly assigned by the front end processing. N, prefix and noun connection N0, conjunction O, particle J, phrase K1, reading K2, verb V, auxiliary verb G, etc. are assigned.

次に、文書データ解析部１２０．２は、形態素解析の結果を条件として使用して、形態素解析の後処理（「バックエンド処理」と呼ぶ）として、専用辞書に登録された特定の単語については、形態素解析の後に、特別な品詞を強制的に割当てる処理を行う（Ｓ１０６．２．３）。 Next, the document data analysis unit 120.2 uses the result of the morpheme analysis as a condition, and for a specific word registered in the dedicated dictionary as post-processing of the morpheme analysis (referred to as “back-end process”). After the morphological analysis, a process of forcibly assigning a special part of speech is performed (S106.2.3).

再び、図５を参照して、「バックエンド処理」で、品詞の割り当てが行なわれるのは、以下のとおりである。 Referring to FIG. 5 again, parts of speech are assigned in the “back end processing” as follows.

１）構成要素接尾語
その単語が、形態素解析により、名詞と判断された場合であって、かつ、[助詞 or 助動詞 or 句読点 or 接尾語 or 接続詞]が後置する場合に、共通辞書データ１３４中に構成要素接尾語の以下の品詞として登録があれば、その単語に以下の品詞Ｐ１〜Ｐ３のいずれかを割当てる。 1) Constituent Suffix When the word is determined to be a noun by morphological analysis, and [Participant or Auxiliary Verb or Punctuation or Suffix or Conjunction] is postfixed, the common dictionary data 134 If there is a registration as the following part of speech of the component suffix, any of the following parts of speech P1 to P3 is assigned to the word.

１−１）品詞Ｐ１：構成要素の末尾となる単語。動詞が前置されることはない。たとえば、「装置」「素子」などである。 1-1) Part of speech P1: A word that is the end of a component. The verb is not prefixed. For example, “device”, “element”, and the like.

１−２）品詞Ｐ２：構成要素の末尾となる単語。動詞の前置が許される。該当単語は、たとえば、「ステップ」、「工程」、「手段」である。 1-2) Part-of-speech P2: A word that is the end of a component. Verb prefix is allowed. The relevant word is, for example, “step”, “process”, “means”.

すなわち、「〜するステップ」「〜する手段」のような記載が品詞Ｐ２の単語には可能である。 That is, descriptions such as “step to do” and “means to do” are possible for the word of part of speech P2.

１−３）品詞Ｐ３：名詞が前置する場合のみ、構成要素として見なす文字列。該当単語は、たとえば、「条件」である。これは、一般名称として使用されることが想定される名詞であって、特定の事物を指す場合は、前置する名詞とで名詞句（複合名詞）が形成されるような名詞である。 1-3) Part of speech P3: A character string that is regarded as a component only when a noun is prefixed. The relevant word is, for example, “condition”. This is a noun that is assumed to be used as a general name, and when it refers to a specific thing, it is a noun that forms a noun phrase (compound noun) with a noun that precedes it.

２）接尾語（品詞ＴＴ）（これは、接尾語の辞書のうち、辞書ＴＴに記憶される）
形態素解析で名詞と判断された場合であって、構成要素に後置する単語である。たとえば、「毎」「以上」「以下」などである。 2) Suffix (part of speech TT) (this is stored in the dictionary TT in the suffix dictionary)
This is a word that is postfixed to a component when it is determined as a noun by morphological analysis. For example, “every”, “above” or “below”.

３）非名詞（品詞ＸＮ）
形態素解析で名詞と判断された場合であっても、その認識を解除して品詞ＸＮを割当てる。たとえば、「うち」「よう」などである。これらの単語は、たとえ名詞であっても、構成要素の中に含めるのが妥当ではないからである。「複数の○○のうち特定の○○」「〜するよう処理を切り換える」というような態様で使用されている場合である。 3) Non-noun (part of speech XN)
Even if it is determined as a noun by morphological analysis, the recognition is canceled and the part of speech XN is assigned. For example, “Uchi” “Yo”. This is because these words, even if they are nouns, are not appropriate to include in the component. This is a case where it is used in such a manner that “specific XX among a plurality of XX” and “switch processing to do”.

４）非接頭詞（ＸＳ）
形態素解析で、「接頭詞かつ名詞接続」と認識された場合であっても、その認識を解除して品詞ＸＳを割当てる。たとえば、「各」などである。これは、「複数の○○」が既出のときに、「各○○」と記載したときは、以後の記載では、本来は、「各前記○○」または「前記各○○」と記載されるのが適切である。しかし、「各」を含めて構成要素と判断してしまうと、「各○○」については、初出であるので、使用態様としては、適切と誤って判断されてしまうおそれがあるからである。 4) Non-prefix (XS)
Even if it is recognized as “prefix and noun connection” in the morphological analysis, the recognition is canceled and the part of speech XS is assigned. For example, “each”. This means that when “multiple XX” has already been mentioned and “each XX” is described, it is originally described as “each XX” or “each XX” in the following description. Is appropriate. However, if “including each” is determined as a constituent element, “each OO” is the first appearance, and therefore may be erroneously determined as appropriate as a usage mode.

図４と図８を参照して、バックエンド処理が終了すると、続いて、文書データ解析部１２０．２は、データ認識処理（Ｓ１０８）として、補正処理を行う（Ｓ１０８．１）。補正処理では、文書データ解析部１２０．２は、割当てた品詞を表す割付記号の並び方が、その他項目の条件を満たすときに、適切な記号を再度、割り付ける。つまり、本来、１つの構成要素として認識されるべき用語が、複数の単語に分離して認識されている場合に、これらの単語を連結して、連結された単語に対して、割付記号を割当てる。 4 and 8, when the back-end process is completed, the document data analysis unit 120.2 performs a correction process (S108.1) as a data recognition process (S108). In the correction process, the document data analysis unit 120.2 assigns an appropriate symbol again when the arrangement of the assigned symbols representing the assigned part of speech satisfies the conditions of other items. That is, when a term that should be recognized as one constituent element is recognized as being separated into a plurality of words, these words are concatenated and an assignment symbol is assigned to the concatenated words. .

たとえば、名詞Ｎとされる単語が連続している場合は、これらの単語を連結して、連結された用語に、名詞Ｎの記号を割り付ける。また、名詞Ｎとされる単語の前に、接頭詞かつ名詞接続Ｎ０と認識された単語が連続している場合は、これらの単語を連結して、連結された用語に、名詞Ｎの記号を割り付ける。 For example, if words designated as noun N are consecutive, these words are concatenated and the symbol of noun N is assigned to the concatenated terms. In addition, when a word recognized as a noun connection N0 is consecutive before a word that is designated as a noun N, these words are concatenated, and the symbol of the noun N is added to the concatenated terms. Assign.

構成要素接頭詞Ｈ１とＨ１とが連続している場合は、これらの単語を連結して、連結された単語に、品詞Ｈ１を割り付ける。 When the component prefixes H1 and H1 are continuous, these words are concatenated, and the part of speech H1 is assigned to the concatenated words.

あるいは、構成要素接尾語Ｐ２と認識されている単語の前に、動詞Ｖと認識された単語が連続している場合は、これらの単語を連結して、品詞Ｐ（構成要素接尾語Ｐ０，Ｐ１，Ｐ２，Ｐ３を総称して、品詞Ｐと呼ぶ）を割り付ける。 Alternatively, if a word recognized as a verb V is consecutive before a word recognized as a component suffix P2, these words are connected to form part of speech P (component suffix P0, P1). , P2 and P3 are collectively referred to as part of speech P).

また、文書データ解析部１２０．２は、構成要素接尾語Ｐの単語の前に、名詞Ｎと認識される単語が連続している場合には、これらの単語を結合して、結合された用語に、品詞Ｐを割り付ける。 In addition, when the word recognized as the noun N continues before the word of the component suffix P, the document data analysis unit 120.2 combines these words and combines them. Assign part of speech P to

このような単語の連結処理の態様を品詞の記号で表すと、以下の場合があることになる。
If such a mode of word connection processing is expressed by a part-of-speech symbol, the following cases may occur.

１）Ｎ＋Ｎ →Ｎ
２）Ｎ０＋Ｎ →Ｎ
３）Ｈ１＋Ｈ１ →Ｈ１
４）Ｈ３＋Ｈ１ →Ｈ１
５）Ｈ３＋Ｈ２ →Ｈ１
６）Ｖ＋Ｖ →Ｖ
７）Ｎ＋Ｖ →Ｖ
８）Ｖ＋Ｐ２ →Ｐ
９）Ｖ＋Ｎ＋Ｊ＋Ｐ２ →Ｐ
１０）Ｎ０＋Ｐ１ →Ｐ
１１）Ｐ０ →Ｐ
１２）Ｐ１ →Ｐ
１３）Ｐ２ →Ｐ
１４）Ｎ＋Ｐ３ →Ｐ
１５）Ｎ＋Ｐ →Ｐ
１６）Ｐ＋Ｐ →Ｐ
したがって、構成要素接尾語Ｐ０，Ｐ１，Ｐ２，Ｐ３の区別はなくなり、結局、１つの構成要素と認識されるべき用語については、補正処理により、単一の記号である品詞Ｐが割り付けられることになる。 1) N + N → N
2) N0 + N → N
3) H1 + H1 → H1
4) H3 + H1 → H1
5) H3 + H2 → H1
6) V + V → V
7) N + V → V
8) V + P2 → P
9) V + N + J + P2 → P
10) N0 + P1 → P
11) P0 → P
12) P1 → P
13) P2 → P
14) N + P3 → P
15) N + P → P
16) P + P → P
Accordingly, there is no distinction between the component suffixes P0, P1, P2, and P3. After all, a part of speech P that is a single symbol is assigned to a term that should be recognized as one component by correction processing. Become.

さらに、補正処理が終了すると、続いて、文書データ解析部１２０．２は、データ認識処理（Ｓ１０８）として、構成要素認識処理を行う（Ｓ１０８．２）。 Further, when the correction process is completed, the document data analysis unit 120.2 performs a component element recognition process (S108.2) as a data recognition process (S108).

構成要素認識処理では、図８に示す一覧において、割付記号の並び方が、その他項目の条件を満たすときに、構成要素として認識する。一覧の中の「その他」の項目の[○＋○＋（…）]部分を構成要素として認識し、チェック文字列の有無を表す記号を割付ける。 In the component recognition process, when the arrangement of the assigned symbols satisfies the conditions of other items in the list shown in FIG. 8, it is recognized as a component. Recognize the [○ + ○ + (...)] part of the “Other” item in the list as a component, and assign a symbol indicating the presence or absence of the check character string.

ここで、図６は、このようにして構成要素認識処理で割り付けられる記号を示す図である。 Here, FIG. 6 is a diagram showing symbols assigned in the component recognition process in this way.

記号ＺＣは、チェック文字列（前記、当該・・など、）が前置する構成要素を意味し、記号Ｃは、チェック文字列（前記、当該・・など、）が前置しない構成要素を意味する。 The symbol ZC means a component preceded by a check character string (said,..., Etc.), and the symbol C means a component not preceded by a check character string (said, .., etc.). To do.

たとえば、チェック文字列ＺＺに後続して、品詞Ｐの用語が存在するときは、当該用語（［Ｐ］の記号が割り付けられている単語または単語群）を構成要素として、記号ＺＣを割り付ける。一方、品詞Ｐの用語がチェック文字列と連続せずに存在するときは、当該用語を構成要素として、記号Ｃを割り付ける。 For example, when a term of the part of speech P exists after the check character string ZZ, the symbol ZC is assigned with the term (a word or word group to which the symbol [P] is assigned) as a constituent element. On the other hand, when the part of speech P term is not continuous with the check character string, the term C is assigned to the term as a constituent element.

たとえば、割付記号の並びが、ＺＺ＋[Ｈ２＋Ｈ３＋Ｐ]の時には、当該構成要素（[Ｈ２＋Ｈ３＋Ｐ]の記号が割付られている単語群からなる用語）には、記号ＺＣを割り付ける一方、[Ｈ２＋Ｈ３＋Ｐ]の時には、当該構成要素（[Ｈ２＋Ｈ３＋Ｐ]の記号が割付られている単語群からなる用語）には、記号Ｃを割り付ける。 For example, when the sequence of assigned symbols is ZZ + [H2 + H3 + P], the constituent element (a term consisting of a group of words to which the symbol [H2 + H3 + P] is assigned) is assigned the symbol ZC, while when it is [H2 + H3 + P], A symbol C is assigned to the component (a term consisting of a word group to which the symbol [H2 + H3 + P] is assigned).

このような構成要素の認識処理を場合分けして説明すると、以下のとおりである。
１）ＺＺ＋[Ｈ３＋Ｏ＋Ｈ３＋Ｐ] ：ＺＣ
２）ＺＺ＋Ｖ＋Ｇ＋[Ｈ３＋Ｏ＋Ｈ３＋Ｐ] ：ＺＣ
３）[Ｈ３＋Ｏ３＋Ｈ３＋Ｐ] ：Ｃ
４）ＺＺ＋[Ｈ２＋Ｈ３＋Ｐ] ：ＺＣ
５）ＺＺ＋Ｖ＋Ｇ＋[Ｈ２＋Ｈ３＋Ｐ] ：ＺＣ
６）[Ｈ２＋Ｈ３＋Ｐ] ：Ｃ
７）ＺＺ＋[Ｈ３＋Ｐ] ：ＺＣ
８）ＺＺ＋Ｖ＋Ｇ＋[Ｈ３＋Ｐ] ：ＺＣ
９）[Ｈ３＋Ｐ] ：Ｃ
１０）ＺＺ＋[Ｈ２＋Ｐ] ：ＺＣ
１１）ＺＺ＋Ｖ＋Ｇ＋[Ｈ２＋Ｐ] ：ＺＣ
１２）[Ｈ２＋Ｐ] ：Ｃ
１３）ＺＺ＋[Ｈ１＋Ｐ] ：ＺＣ
１４）ＺＺ＋Ｖ＋Ｇ＋[Ｈ１＋Ｐ] ：ＺＣ
１５）[Ｈ１＋Ｐ] ：Ｃ
１６）ＺＺ＋[Ｐ] ：ＺＣ
１７）ＺＺ＋Ｖ＋Ｇ＋[Ｐ] ：ＺＣ
１８）[Ｐ] ：Ｃ
なお、ここで、文書データ解析部１２０．２は、ＺＺ＋[Ｈ３＋Ｏ＋Ｈ３＋Ｐ]、すなわち、たとえば「前記第１および第２の○○」となっている場合には、「前記第１の○○」と「前記第２の○○」とが記載されているものとみなす処理を行う。また、文書データ解析部１２０．２は、[Ｈ３＋Ｏ＋Ｈ３＋Ｐ]、すなわち、たとえば「第１および第２の○○」となっている場合には、「第１の○○」と「第２の○○」とが記載されているものとみなす処理を行う。 Such component recognition processing will be described as follows.
1) ZZ + [H3 + O + H3 + P]: ZC
2) ZZ + V + G + [H3 + O + H3 + P]: ZC
3) [H3 + O3 + H3 + P]: C
4) ZZ + [H2 + H3 + P]: ZC
5) ZZ + V + G + [H2 + H3 + P]: ZC
6) [H2 + H3 + P]: C
7) ZZ + [H3 + P]: ZC
8) ZZ + V + G + [H3 + P]: ZC
9) [H3 + P]: C
10) ZZ + [H2 + P]: ZC
11) ZZ + V + G + [H2 + P]: ZC
12) [H2 + P]: C
13) ZZ + [H1 + P]: ZC
14) ZZ + V + G + [H1 + P]: ZC
15) [H1 + P]: C
16) ZZ + [P]: ZC
17) ZZ + V + G + [P]: ZC
18) [P]: C
Here, the document data analysis unit 120.2 determines that “the first and second OO” is “ZZ + [H3 + O + H3 + P]”, ie, “the first and second OO”. A process is performed assuming that “the second XX” is described. Further, the document data analysis unit 120.2 determines that [H3 + O + H3 + P], that is, “first and second XX”, for example, “first XX” and “second XX”. ”Is performed.

以上の処理により、各構成要素の候補として抽出された用語について、「前記」等のチェック文字列ＺＺが前置されているかが判別できたことになるので、文書データ解析部１２０．２は、続いて、このようなチェック文字列の使用が、構成要素の請求項中への出現が、初出または既出であるかと、整合しているかをチェックする（Ｓ１０８．３）。 With the above processing, it is possible to determine whether or not the check character string ZZ such as “above” is prefixed for the terms extracted as candidates for each component, the document data analysis unit 120.2 Subsequently, it is checked whether the use of such a check character string is consistent with the appearance of the component in the claim for the first time or already (S108.3).

この際には、各請求項について、その請求項が独立請求項であるのか、従属請求項であるのかに応じて、初出または既出を判断する範囲を設定する。 At this time, for each claim, a range for determining whether the claim is an independent claim or a dependent claim is set depending on whether the claim is an independent claim or a dependent claim.

たとえば、請求項３は、請求項２に従属し、請求項２は、請求項１に従属している場合は、請求項３に現れている構成要素については、請求項３だけでなく、請求項２および請求項１までを含めた範囲で、初出または既出を判断する。 For example, claim 3 is dependent on claim 2, and when claim 2 is dependent on claim 1, not only claim 3 but also claim of components appearing in claim 3 The first appearance or the previous appearance is determined within the range including Item 2 and Claim 1.

図９は、ユーザ辞書である構成要素接尾語のうち、品詞Ｐ０に相当する用語（単語に限らない）を登録した辞書の例を示す。 FIG. 9 shows an example of a dictionary in which terms (not limited to words) corresponding to the part of speech P0 among the component suffixes that are user dictionaries are registered.

共通辞書がシステム管理者により登録されるものであるのに対して、辞書Ｐ０（ユーザ辞書）には、ある顧客の業務範囲に応じて、特定の用語（単語または単語群）がユーザにより登録される。 While the common dictionary is registered by the system administrator, in the dictionary P0 (user dictionary), specific terms (words or word groups) are registered by the user according to the business scope of a certain customer. The

図１０は、共通辞書である構成要素接尾語のうち、品詞Ｐ１に相当する用語を登録した辞書の例を示す。 FIG. 10 shows an example of a dictionary in which terms corresponding to the part of speech P1 are registered among component suffixes that are common dictionaries.

この辞書Ｐ１では、顧客（または、顧客の業務範囲、または、技術分野）によらずに、構成要素の末尾に共通して使用される単語が登録されている。 In this dictionary P1, words that are commonly used at the end of the constituent elements are registered regardless of the customer (or the business scope of the customer or the technical field).

上述のとおり、共通辞書である辞書Ｐ１は、システム管理者により登録が行なわれる。なお、辞書Ｐ１などの共通辞書とユーザ辞書とがサーバなどの外部記憶装置に置かれ、複数の端末が共通辞書とユーザ辞書とを共有して利用する場合は、サーバ側で一括してシステム管理者が共通辞書中の単語の登録を行なうことができ、顧客辞書（ユーザ辞書）についてはクライエント端末側からユーザが単語の登録を行なうことができる。 As described above, the dictionary P1 that is a common dictionary is registered by the system administrator. When a common dictionary such as the dictionary P1 and a user dictionary are placed in an external storage device such as a server, and a plurality of terminals share and use the common dictionary and the user dictionary, system management is collectively performed on the server side. The user can register words in the common dictionary, and the user dictionary (user dictionary) can be registered by the user from the client terminal side.

図１１は、共通辞書のうち、構成要素接尾語の品詞Ｐ２および品詞Ｐ３に相当する用語を登録した辞書、構成要素接頭語の品詞Ｈ１，Ｈ２，Ｈ３に相当する用語を登録した辞書、接尾語ＴＴに相当する用語を登録した辞書の例を示す。 FIG. 11 shows a dictionary in which terms corresponding to the part of speech P2 and the part of speech P3 are registered among the common dictionaries, a dictionary in which terms corresponding to the parts of speech H1, H2, and H3 are registered, and suffixes. The example of the dictionary which registered the term corresponding to TT is shown.

図１１に示した例では、品詞Ｐ２には、「ステップ」「工程」「手段」が登録され、品詞Ｐ３には「条件」が登録されている。その他、品詞Ｈ１，Ｈ２，Ｈ３については、上述のとおりである。 In the example shown in FIG. 11, “step”, “process”, and “means” are registered in the part of speech P2, and “condition” is registered in the part of speech P3. Other parts of speech H1, H2, and H3 are as described above.

図１２は、共通辞書のうち、接尾語ＦＴ、強制名詞ＦＮ、チェック文字列ＺＺ、非名詞ＸＮ、非接頭詞ＸＳに、それぞれ相当する用語を登録した辞書の例を示す。 FIG. 12 shows an example of a dictionary in which terms corresponding to suffix FT, compulsory noun FN, check character string ZZ, non-noun XN, and non-prefix XS are registered in the common dictionary.

これらの単語の内容については、上述したので繰り返さない。
図１３は、文書データ解析部１２０．２が行う補正処理および構成要素認識処理を概念的に説明する図である。 The contents of these words have been described above and will not be repeated.
FIG. 13 is a diagram conceptually illustrating the correction process and the component recognition process performed by the document data analysis unit 120.2.

たとえば、「手段」は、品詞Ｐ２として登録されているので、「手段」の前に連続する「読取」「制御」との名詞が「手段」と連結されて構成要素「読取制御手段」と認識されており、チェック文字列が前置していないので、割付記号Ｃが割り付けられる。 For example, since “means” is registered as part of speech P2, the nouns “read” and “control” consecutive before “means” are connected to “means” and recognized as the component “read control means”. Since the check character string is not prefixed, the assignment symbol C is assigned.

これに対して、「読取バランサー」では、「バランサー」が品詞Ｐ１，Ｐ２，Ｐ３としては、登録されていないので、１回目の構成要素認識処理が終了した時点では、構成要素としては認識されないことになる。ただし、図３のステップＳ１１０において、ユーザが、たとえば、「バランサー」を部分辞書１３５．ｉに登録すると、「バランサー」およびこれの前に連続する名詞である「読取」とが結合されて、「読取バランサー」が一つの構成要素として認識される。そして、「読取バランサー」には、チェック文字列が前置しているので、割付記号ＺＣが割り付けられる。 On the other hand, in the “reading balancer”, “balancer” is not registered as part of speech P1, P2, P3, and therefore, it is not recognized as a component when the first component recognition process is completed. become. However, in step S110 of FIG. 3, the user selects, for example, “balancer” as a partial dictionary 135. When registered in i, the “balancer” and the “noun” that is a consecutive noun in front of this are combined, and the “reading balancer” is recognized as one component. Since “check balancer” is preceded by a check character string, an assignment symbol ZC is assigned.

図１４は、図３のステップＳ１１０で表示される請求項の記載の例を示す図である。たとえば、構成要素と認識されているものには、下線が引かれている。もちろん、構成要素と認識されているものには、たとえば、特定の色でハイライト表示がされていてもよい。さらに、「中間コード」との用語については、構成要素とは認識されていないので、ユーザが、たとえば、マウス１１２を操作して、「中間コード」との用語の範囲を選択すると、部分辞書１３５．ｉに登録される。 FIG. 14 is a diagram showing an example of the description of the claim displayed in step S110 of FIG. For example, what is recognized as a component is underlined. Of course, what is recognized as a component may be highlighted with a specific color, for example. Furthermore, since the term “intermediate code” is not recognized as a component, for example, when the user operates the mouse 112 and selects the range of the term “intermediate code”, the partial dictionary 135 . i is registered.

図１５は、図３のステップＳ１１０で表示される画面全体の例を示す図である。
左上には、検出された請求項の従属関係が、クレームツリーとして表示されている。また、右上には、構成要素の一覧が表示されている。 FIG. 15 is a diagram showing an example of the entire screen displayed in step S110 of FIG.
In the upper left, the dependency relationship of the detected claims is displayed as a claim tree. In the upper right, a list of components is displayed.

検出された「構成要素」については、クレームごとに下線またはハイライト表示がされる。 The detected “component” is underlined or highlighted for each claim.

同時に、クレームごとに検出された「構成要素名」がチェックボックス付きの一覧として表示される。従属クレームでは、ツリーの中で新たに検出されたものだけを一覧として表示する。チェックボックスは、「構成要素の候補」から不要なものを削除するためのインターフェイスである。 At the same time, “component name” detected for each claim is displayed as a list with check boxes. In the dependent claims, only newly detected items in the tree are displayed as a list. The check box is an interface for deleting unnecessary items from “component candidates”.

上述のように、チェック文字列の使用が適正か否か、また、「前記」等の記載が不要か、「前記」等の記載が欠落しているかに応じて、ハイライト表示の色を変えることで、ユーザにチェック文字列の使用の整合性の状態を知らせることができる。 As described above, the color of the highlight display is changed depending on whether or not the check character string is properly used, whether or not the description such as “said” is unnecessary, or whether or not the description such as “said” is missing. Thus, it is possible to inform the user of the consistency state of the use of the check character string.

図１４で説明したとおり、この表示の段階で、構成要素として認識されていない用語をユーザが確認すると、その用語の範囲をユーザが選択することで、部分辞書１３５．ｉにその用語が登録される。その段階で、ユーザが「解析」ボタンをクリックすると、再度、更新された部分辞書１３５．ｉに基づいて、構成要素の認識と、「前記」等の記載の整合性のチェックが行なわれる。 As described with reference to FIG. 14, when the user confirms a term that is not recognized as a component at this display stage, the user selects a range of the term, whereby the partial dictionary 135. The term is registered in i. At that stage, when the user clicks the “Analyze” button, the updated partial dictionary 135. Based on i, the component is recognized and the consistency is checked as described above.

以上のような構成により、解析対象となる文書中で、特定の文字列が、当該特定の文字列に後続する用語が、文書中で既出であることを示すために用いられる使用の態様が、適正であるかを容易にチェックすることが可能となる。 With the configuration as described above, in a document to be analyzed, a specific character string is used in order to indicate that a term following the specific character string has already appeared in the document. It is possible to easily check whether it is appropriate.

また、ユーザ辞書に、逐一、用語を登録していなくても、用語の語尾に共通に用いられる複数の用語接尾語の情報を用いて特定用語の特定が行なわれるので、ユーザが特定用語を特定する処理を大幅に削減して、文書のチェックを行うことが可能となる。 In addition, even if the term is not registered in the user dictionary one by one, the specific term is identified using information on multiple term suffixes commonly used at the end of the term, so the user identifies the specific term It is possible to check the document by greatly reducing the processing to be performed.

［実施の形態２］
実施の形態１の文書チェック装置１００は、外部から与えられた文書データ中において、特定の文字列が、その文字列に後続する用語が、文書中で既出であることを示すために用いられる場合に、文書中から用語の候補となる語句を抽出し、抽出された候補間において、このような特定の文字列の使用が整合しているか否かをチェックするものとして説明を行った。その場合、実施の形態１の文書チェック装置１００は、「特許請求の範囲」をチェック対象として文書チェックを行うものとして説明を行った。 [Embodiment 2]
When document check apparatus 100 according to Embodiment 1 is used to indicate that a specific character string in a document data given from the outside has already appeared in the document, the term following the character string In addition, the description has been made assuming that a word / phrase that is a candidate for a term is extracted from a document, and whether or not the use of such a specific character string is consistent among the extracted candidates. In this case, the document checking apparatus 100 according to the first embodiment has been described as performing a document check using “claims” as a check target.

このとき、上記用語（すなわち、「特定用語」）については、その語尾が一定の文字列（「用語接尾語」）となる確率が高いことを利用して、文章において、特定用語の候補を抽出する手がかりとして使用した。ここで、チェック対象の文章が、特許請求の範囲の場合は、「特定用語」とは、「構成要素」であり、「用語接尾語」とは、「構成要素接尾語」のことであった。 At this time, with regard to the above term (ie, “specific term”), a candidate for a specific term is extracted from the sentence by using the high probability that the end of the term will be a fixed character string (“term suffix”). Used as a clue. Here, when the sentence to be checked is a claim, the “specific term” is a “component” and the “term suffix” is a “component suffix” .

ただし、文書チェックとしては、以下のような場合も考えられる。
１）必ずしも、特定用語の候補を抽出する際に、「用語接尾語」を手がかりとすることなく、形態素解析エンジンにより特定された名詞が連続する場合は、これら名詞を連結することで特定用語の候補として、必要に応じて、「特定用語の候補」として不適切なものは、ユーザの個別の指示により除外する、という構成でも、同様の処理を実現できる。 However, the following cases can be considered as the document check.
1) When extracting specific term candidates, if nouns identified by the morphological analysis engine are consecutive without using "term suffix" as a clue, the specific terms can be linked by concatenating these nouns. The same processing can be realized even in a configuration in which candidates that are inappropriate as “specific term candidates” are excluded by a user's individual instruction as necessary.

特に、チェック対象文書が「明細書」である場合などは、「特許請求の範囲」とは異なり、「構成要素接尾語」のような文字列（たとえば、「手段」）が必ずしも使用されるとは限らず、この場合は、後述する「説明用語」の候補の抽出には、名詞の連結の方が適している場合が存在しうる。 In particular, when the document to be checked is “specification”, a character string (for example, “means”) such as “component suffix” is not necessarily used, unlike “claim”. In this case, there may be cases where noun linking is more suitable for extracting candidates for “explanatory terms” described later.

２）また、チェック対象となる文書によっては、当該文書の一部において、上述のように、特定の文字列が、その文字列に後続する用語が、文書中で既出であることを示すために用いられて、すなわち、用語の使用の厳格性を担保して、概念の定義を行い、当該文書の他の部分において、当該概念の具体的な説明を行うという場合もある。この場合は、概念の定義に使用された「特定用語」に対応して、より具体的な例を示す用語（以下、「説明用語」と呼ぶ）により、上記のような具体例の説明が行われる場合がある。このとき、上記のような具体的な説明において、図面を参照した説明が行われるときは、このような説明用語には、図面との対応を表すために、説明文中では直後に符号が付され、対応する図面中では、図面の対応部分に、その同じ符号が付されることで、具体的な説明が実行される、ということになる場合がある。 2) Also, depending on the document to be checked, in a part of the document, as described above, a specific character string may indicate that a term following the character string has already appeared in the document. In some cases, the concept is defined while ensuring the strictness of the use of the term, and the concept is specifically explained in other parts of the document. In this case, the specific example as described above is explained by a term indicating a more specific example (hereinafter referred to as “explanatory term”) corresponding to the “specific term” used in the definition of the concept. May be. At this time, in the specific description as described above, when explanation is made with reference to the drawings, such explanation terms are immediately followed by reference numerals in the explanation to indicate correspondence with the drawings. In the corresponding drawings, the same reference numerals are given to the corresponding portions of the drawings, and a specific description may be executed.

たとえば、「概念の定義」を行うのが、「特許請求の範囲」の記載である場合は、このような具体的な説明を行うのは、「明細書」（特に、たとえば、「発明を実施するための形態」の記載）ということになる。このとき、「説明用語」は、「特定用語」をより具体的に説明するものであるから、両者には、明示的に形式上の同一または類似の単語または単語の結合が使用されるか、あるいは、当該文章を読む読み手の常識では、そのような「説明用語」が、「特定用語」に対応することが自明であるような説明用語が使用されることになる。 For example, when “definition of concept” is described in “claims”, such specific explanation is given by “specification” (particularly, for example, “implementing the invention”). The description of "form for doing"). At this time, the “descriptive term” is a more specific explanation of the “specific term”, and therefore, either the same or similar word or combination of words is explicitly used for both, Alternatively, in the common sense of the reader who reads the sentence, an explanatory term that makes it obvious that such an “explanatory term” corresponds to a “specific term” is used.

このとき、「説明用語」には、上述のように、その直後に符号が付されているので、文章中で、説明用語とこれに対応する符号との対応関係が、文書中で、一貫して整合がとれているかも、チェックをする必要がある。 At this time, since “explanatory terms” are immediately followed by a reference numeral as described above, the correspondence between the explanatory terms and the corresponding reference numerals is consistent in the document. Therefore, it is necessary to check whether they are consistent.

そして、上述したような「形態素解析エンジン」による品詞の特定は、チェック対象となる文章において、このような「特定用語」が使用される部分と、「説明用語」が使用される部分とで、共通に実施することが可能となる。 And, as described above, the part-of-speech identification by the “morpheme analysis engine” includes a part where such “specific term” is used and a part where “explanatory term” is used in the text to be checked. It becomes possible to carry out in common.

なお、たとえば、特許出願書類では、「特許請求の範囲」と「明細書」とは、形式上、別文章であり、別のデータファイルとして作成される場合もある。しかし、本明細書においては、両者が、上述したような「概念の定義」と「当該概念の具体的な説明」といように密接な関係を有している場合は、両者を総合して、「チェック対象文書」と呼ぶことにする。 For example, in patent application documents, “claims” and “specifications” are different sentences in form and may be created as separate data files. However, in the present specification, when both have a close relationship such as “definition of concept” and “specific explanation of the concept” as described above, This will be called “check target document”.

以下では、上記のような前提の下に、実施の形態２の文書チェック装置、文書チェック方法、文書チェックプログラムについて説明を行う。ここで、実施の形態２の文書チェック装置のハードウェアの構成および機能の構成は、原則的には、図１および図２で説明した実施の形態１の文書チェック装置１００の構成と同様である。 Hereinafter, the document check apparatus, the document check method, and the document check program according to the second embodiment will be described based on the above premise. Here, the hardware configuration and the functional configuration of the document check apparatus according to the second embodiment are basically the same as those of the document check apparatus 100 according to the first embodiment described with reference to FIGS. 1 and 2. .

以下では、実施の形態２の文書チェック装置の動作について、説明する。
以下の実施の形態２の文書チェック装置の動作の説明においては、「特許請求の範囲」と「明細書」とを、チェック対象文書とする例について説明する。 Hereinafter, the operation of the document checking apparatus according to the second embodiment will be described.
In the following description of the operation of the document checking apparatus according to the second embodiment, an example in which “claims” and “specifications” are set as documents to be checked will be described.

ここで、以下の実施の形態２の説明では、「構成要素」との用語を、「特許請求の範囲」中の「特定用語」と「明細書」中の「説明用語」との双方を総称する用語として使用する。 Here, in the following description of the second embodiment, the term “component” is a generic term for both “specific term” in “claims” and “explanatory term” in “specification”. Use as a term.

図１６は、実施の形態２の文書チェック装置の動作を説明するためのフローチャートであり、実施の形態１の図３と対比される図である。 FIG. 16 is a flowchart for explaining the operation of the document check apparatus according to the second embodiment, and is a diagram contrasted with FIG. 3 according to the first embodiment.

図１６を参照して、文書チェック装置１００の動作が開始されると、まず、ユーザにより、チェック対象となる文書データの内容分野（顧客）を特定するためのグループ情報がキーボード１１０、マウス１１２などを使用して入力される（Ｓ２００）。これに応じて、文書データ解析部１２０．２は、使用する部分辞書１３５．ｉ（１≦ｉ≦ｎ）を選択する。 Referring to FIG. 16, when the operation of the document check apparatus 100 is started, first, the group information for specifying the content field (customer) of the document data to be checked by the user is the keyboard 110, the mouse 112, and the like. (S200). In response to this, the document data analysis unit 120.2 uses the partial dictionary 135. i (1 ≦ i ≦ n) is selected.

このとき、併せて、ユーザの入力により、「共通辞書」を使用するか、使用しないかを選択的に設定できるものとする。 At this time, it is also possible to selectively set whether or not to use the “common dictionary” by user input.

なお、ここでは、共通辞書と顧客辞書とは、「特許請求の範囲」と「明細書」とにそれぞれ対応して、その内容が格納されているものとする。ただし、たとえば、「特許請求の範囲」に対応する共通辞書データと顧客辞書データとは、ネットワークで接続された他のコンピュータの記憶装置内に記憶されており、文書チェック装置１００は、通信インターフェイス１２８を介して、これらのデータにアクセスして読み出したり、あるいは、書込みを行なったりする一方、「明細書」に対応する共通辞書データと顧客辞書データは、文書チェック装置１００中のハードディスク１２４内に記憶される構成とすることも可能である。これは、「特許請求の範囲」のような「概念の定義」を行う文書中では、比較的一般に使用されない用語が用いられる傾向があるために、複数のユーザでの使用の結果、たとえば、顧客辞書データに登録されるデータを複数ユーザ間で共用することで、他のユーザによる登録、すなわち、一種の学習結果を、他のユーザも使用することができるからである。一方で、「明細書」に関しては、一般に使用される用語が使用される傾向があるために、必ずしも、このような学習の結果を共用する必要性が、特許請求の範囲に比べると低いからである。 Here, it is assumed that the contents of the common dictionary and the customer dictionary are stored in correspondence with “claims” and “specifications”, respectively. However, for example, common dictionary data and customer dictionary data corresponding to “Claims” are stored in a storage device of another computer connected via a network, and the document check device 100 includes a communication interface 128. These data are accessed and read out via the URL, or written, while the common dictionary data and customer dictionary data corresponding to the “specification” are stored in the hard disk 124 in the document checking apparatus 100. It is also possible to adopt a configuration. This is due to the use of terms that are relatively uncommon in documents that make “concept definitions” such as “claims”, and as a result of use by multiple users, for example, This is because by sharing the data registered in the dictionary data among a plurality of users, other users can use the registration by other users, that is, a kind of learning result. On the other hand, because the term “specification” tends to use commonly used terms, it is not always necessary to share the results of such learning compared to the claims. is there.

続いて、文書取込部１２０．１は、文書データ１３６を取り込み、ワーキングメモリであるＲＡＭ上に展開する（Ｓ２０２）。 Subsequently, the document fetch unit 120.1 fetches the document data 136 and develops it on the RAM, which is a working memory (S202).

文書データ解析部１２０．２は、文書データ１３６の中から、チェック対象となる「特許請求の範囲」と「明細書」の記載を選択・特定し、構成要素の候補となる名詞、名詞句を抽出するために、必要に応じて共通辞書データ１３４と、選択された部分辞書１３５．ｉと、形態素解析エンジンとを用いて文書データの解析を行い、辞書の登録内容に従って文書データの文字列の一部に、品詞を特定する処理の前処理としての記号を割り付ける（Ｓ２０６）。続いて、文書データ解析部１２０．２は、形態素解析で特定された品詞に応じて、文書データ中で解析結果に基づいて、品詞を特定するための記号をそれぞれの文字列に割り当てる（Ｓ２０６）。 The document data analysis unit 120.2 selects and identifies the description of “claims” and “specifications” to be checked from the document data 136, and selects nouns and noun phrases that are candidate components. In order to extract the common dictionary data 134 and the selected partial dictionary 135. The document data is analyzed using i and the morphological analysis engine, and a symbol as a pre-process for specifying the part of speech is assigned to a part of the character string of the document data in accordance with the registered contents of the dictionary (S206). Subsequently, the document data analysis unit 120.2 assigns a symbol for specifying the part of speech to each character string based on the analysis result in the document data according to the part of speech specified by the morphological analysis (S206). .

続いて、文書データ解析部１２０．２は、以上の処理により記号が割り付けられている文字列に対して、名詞としての認識の対象から除外されるものとして辞書に登録されている文字列、構成要素に後置されるものとして辞書に登録されている文字列、構成要素の末尾にくる名詞（構成要素接尾語）として登録されている文字列などについて、割り付けら得ている符号のつけ直しを行う（Ｓ２０８）。その後、文書データ解析部１２０．２は、割り付けられた記号を所定のルールで記号を連結し統合して割り付け直す（Ｓ２０８）。さらに、文書データ解析部１２０．２は、構成要素について、既出であることを示す「特定の文字列」、すなわち、特許請求の範囲であれば、「前記」の有無に応じて、構成要素の整合性の認識のための記号（ＺＣ，Ｃ：説明は後述）を割り付ける。これにより、構成要素の候補が特定される。ただし、明細書において、「前記」が使用されないときは、このような「整合性の認識のための記号」については、明細書では整合性の判定では区別する必要がなく、単に「構成要素の候補」であることを示すのみである。 Subsequently, the document data analysis unit 120.2 has a character string registered in the dictionary and excluded from the recognition target as a noun for the character string to which the symbol is assigned by the above processing, For the character strings registered in the dictionary after the element, the character strings registered as nouns (component suffixes) at the end of the component, etc. Perform (S208). After that, the document data analysis unit 120.2 concatenates the assigned symbols according to a predetermined rule, and reallocates them by integrating them (S208). Further, the document data analysis unit 120.2 makes a “specific character string” indicating that the component has already been issued, that is, within the scope of the claims, according to the presence or absence of “the above”. Symbols for recognizing consistency (ZC, C: description will be described later) are assigned. Thereby, the candidate of a component is specified. However, when “the above” is not used in the specification, such a “symbol for recognizing consistency” does not need to be distinguished in the specification in the determination of consistency. It only indicates that it is a “candidate”.

続いて、文書データ解析部１２０．２は、構成要素の候補間において、特定の文字列である「前記」「当該」の使用が、特許請求の記載の中で、整合しているかをチェックする（Ｓ２１０）。また、文書データ解析部１２０．２は、明細書中で、構成要素と符号との対応付けが整合しているかをチェックする（Ｓ２１０）。 Subsequently, the document data analysis unit 120.2 checks whether the use of the specific character string “said” “related” is consistent among the candidate components in the claims. (S210). Further, the document data analysis unit 120.2 checks whether the correspondence between the component and the code is consistent in the specification (S210).

データ表示出力制御部１２０．３は、チェック結果に応じて、特許請求の範囲については、そのチェック結果を文書データの表示上で、たとえば、構成要素の候補について、異なった色のハイライト表示をするなどして、ｉ）「前記」「当該」が付されており、かつ、その使用が適正なもの（同じ「構成要素の候補」が特許請求の範囲の記載中に既出）、ｉｉ）「前記」「当該」が付されておらず、かつ、その使用が適正なもの（その「構成要素の候補」が特許請求の範囲の記載中で初出）、ｉｉｉ）「前記」「当該」が付されているが、その使用が不適切なもの、ｉｖ）「前記」「当該」が付されていないが、その使用が不適切なもの、を区別可能な表示態様で表示する（Ｓ２１２）。なお、ｉ）とｉｉ）については、ともに適正なものを表示するのであるから、共通の表示態様で表示してもよい。 In accordance with the check result, the data display output control unit 120.3 displays the check result on the display of the document data, for example, highlights of different colors for the candidate components in the claims. I) “the above” and “appropriate” are attached and the use thereof is appropriate (the same “candidate of component” has already been mentioned in the claims), ii) “ The above-mentioned "" concerned "is not attached, and its use is appropriate (the" component candidate "appears for the first time in the claims), iii)" the "" related "is attached Although not used, iv) “the above” and “not applicable” but not used are displayed in a distinguishable display mode (S212). Since i) and ii) both display appropriate items, they may be displayed in a common display mode.

また、データ表示出力制御部１２０．３は、チェック結果に応じて、明細書については、ｉ）構成要素と符号との関係が１対１の関係であるもの（使用は正しい可能性が高い）、ｉｉ）１つの構成要素について複数の符号が使用されているもの（誤使用の可能性有り）、ｉｉｉ）１つの符号について複数の構成要素が使用されているもの（明確に誤使用）を、それぞれ、色などの表示態様で区別して表示する。これにより、ユーザは、明細書において、明確に誤使用である構成要素と、誤使用の可能性のある構成要素とを認識することが可能となる。なお、構成要素と符号との関係が１対１の関係であるものが、「使用は正しい可能性が高い」とされるのは、以上のチェックでは、文書中の構成要素および符号の使用と図面中の符号の使用との整合性まではチェックがされていないからである。ただし、図面も電子データとなっている場合には、このような整合性もチェックしてもよい。たとえば、図Ｘの説明の部分において、使用されている構成要素の符号が、対応する図Ｘの図面データ中にも存在するかどうかをチェックする等すればよい。 Further, the data display output control unit 120.3 has a one-to-one relationship between the component and the sign for the specification according to the check result (the use is likely to be correct). Ii) One component using a plurality of symbols (possibly misused), iii) One component using a plurality of components (clearly misused), Each of them is displayed with distinction according to a display mode such as a color. Thereby, the user can recognize the component that is clearly misused and the component that may be misused in the specification. It should be noted that the relationship between the component and the code is a one-to-one relationship, but “use is likely to be correct” is considered to be the use of the component and the code in the document in the above check. This is because no check is made up to the consistency with the use of symbols in the drawing. However, if the drawing is also electronic data, such consistency may be checked. For example, in the description part of FIG. X, it may be checked whether the code of the component used is also present in the corresponding drawing data of FIG.

次に、ユーザからの入力により、再解析を行うのであれば（Ｓ２１４）、処理は、ステップＳ２０２に復帰し、再解析を行わないのであれば、処理は終了する。 Next, if reanalysis is performed based on an input from the user (S214), the process returns to step S202. If no reanalysis is performed, the process ends.

ここで、ユーザが再解析を指示するのは、ステップＳ２１２での表示結果に基づいて、ユーザが文書データ１３６を修正した場合に、再解析をすることが必要となった場合である。このとき、上述のとおり、ユーザが修正するのは、特定のワードプロセッサソフトウェアで作成されたファイルであり、チェック対象の文書データ１３６としては、そのファイルからテキストファイルのみを取り込む、構成とすることができる。
［文書チェック処理の詳細］
以下では、図１６により説明した文書チェック処理をより詳しく説明する。 Here, the user instructs reanalysis when it is necessary to reanalyze when the user corrects the document data 136 based on the display result in step S212. At this time, as described above, the user corrects a file created by specific word processor software, and the document data 136 to be checked can be configured to take in only a text file from the file. .
[Details of document check processing]
Hereinafter, the document check process described with reference to FIG. 16 will be described in more detail.

まず、図１９は、実施の形態１と実施の形態２の辞書の構成を対比して示す図である。
図１９に示すように、実施の形態２では、実施の形態１に対して、辞書名と辞書の内容を変更している。 First, FIG. 19 is a diagram showing the configuration of the dictionary according to the first embodiment and the second embodiment in comparison.
As shown in FIG. 19, in the second embodiment, the dictionary name and the contents of the dictionary are changed from the first embodiment.

ただし、ほとんどの辞書は、実施の形態１と共通である。また、全辞書のデフォルトの単語登録は、実施の形態１と実施の形態２で、同じとすることができる。 However, most dictionaries are common to the first embodiment. The default word registration of all dictionaries can be the same in the first embodiment and the second embodiment.

しかし、Ｈ３辞書は、実施の形態１と同様とできるが、正規表現を用いて、内部で生成することとし、辞書としては、設けない構成とすることもできる。また、ＦＴ、ＸＳ辞書も、実施の形態１と同様とすることができる。ただし、同じ処理結果になる様に形態素解析での記号の割り当ての方を変更することも可能である。さらに、Ｐ１辞書は、使用する／しないを選択できる構成としている。また、ＴＮＣの認識方法を変更し、辞書の種類を「顧客辞書」に変更している。 However, although the H3 dictionary can be the same as that of the first embodiment, it can be generated internally using a regular expression and can be configured not to be provided as a dictionary. The FT and XS dictionaries can be the same as those in the first embodiment. However, it is also possible to change the symbol assignment in the morphological analysis so that the same processing result is obtained. Furthermore, the P1 dictionary is configured to be able to select whether to use or not. Also, the TNC recognition method is changed, and the type of dictionary is changed to “customer dictionary”.

さらに、図２０は、チェック対象の文書中の文字列に割り付けられる「割付記号」を説明する図である。 Further, FIG. 20 is a diagram illustrating “assignment symbols” assigned to character strings in a document to be checked.

「割付記号」の意味は以下の通りである。
ＺＣ：前記が前置する構成要素
Ｃ：前記が前置しない構成要素
Ｐ：構成要素候補
Ｈ：構成要素の接頭文字列候補
ＸＸ：適切ではない構成要素
構成要素の認識処理について、以下の処理では、辞書または形態素解析の情報を用いて、対象文字列を分割し、分割した文字列に記号を割り付ける。以下では、その割り付ける記号を「割付記号」とし、割り付ける際の条件を「割付条件」として説明する。 The meaning of “assignment symbol” is as follows.
ZC: component preceded by the above C: component not preceded by the above P: component candidate H: component prefix string candidate XX: inappropriate component The component recognition process is as follows. The target character string is divided using the dictionary or morphological analysis information, and a symbol is assigned to the divided character string. In the following description, the assigned symbol is “assignment symbol”, and the assignment condition is “assignment condition”.

（文書チェック処理フロー）
図１７は、図１６のステップ２０６の処理を説明するためのフローチャートである。 (Document check processing flow)
FIG. 17 is a flowchart for explaining the processing of step 206 in FIG.

図１７を参照して、ステップＳ２０６．１．１において、取得した文書のすべてについて、処理が終了しているかが判断される。終了していれば、処理はステップＳ２１０へ移行する。終了していなければ、処理は、次のステップＳ２０６．１．２に移り、処理対処の文書の特定がなされる。たとえば、処理対象の文章データが、「特許請求の範囲のデータ」の次に「明細書のデータ」を含む構成になっていれば、まずは、「特許請求の範囲」が処理対象となる。 Referring to FIG. 17, in step S206.1.1, it is determined whether or not the processing has been completed for all of the acquired documents. If completed, the process proceeds to step S210. If not completed, the process proceeds to the next step S206.1.2, and a document to be processed is specified. For example, if the text data to be processed is configured to include “specification data” after “claimed data”, first, “claimed data” becomes the processing target.

続いて、処理対象が特許請求の範囲であれば（ステップＳ２０６．１．３）、文書データ解析部１２０．２は、請求項ごとに分離して抽出し、請求項間の従属関係を抽出する（Ｓ１０６．１．４）。なお、このような請求項の分離は、もともとのテキストデータ中のタグを利用して行なうことができる。また、従属関係の抽出は、正規表現を用いたテンプレートとのマッチングにより、たとえば、「請求項○または△に記載の」等の表現を抽出することで、特定することができる。 Subsequently, if the processing target is the scope of claims (step S206.1.3), the document data analysis unit 120.2 extracts the claims separately for each claim and extracts the dependency relationship between the claims. (S106.1.4). Such claims can be separated by using tags in the original text data. In addition, the dependency relationship can be identified by extracting an expression such as “described in claim ○ or Δ” by matching with a template using a regular expression.

続いて、文書データ解析部１２０．２は、請求項が連番となっているかや、従属先の請求項と自身の発明の名称とが一致しているかをチェックする（Ｓ１０６．１．５）。特に限定されないが、たとえば、「発明の名称」は、ｉ）ユーザが特定してもよいし、あるいは、ｉｉ）当該請求項中で、もっとも後方（末尾側）で、構成要素として抽出されたものを用いることができる。このｉｉ）の場合は、発明の名称の整合性のチェックは、後に説明する「前記」等の整合性チェックの際に同時に行なわれることになる。 Subsequently, the document data analysis unit 120.2 checks whether the claims are serial numbers or whether the dependent claim and the name of its own invention match (S106.1.5). . Although not particularly limited, for example, “name of invention” may be i) specified by the user, or ii) extracted as a constituent element at the rearmost (end) side in the claim. Can be used. In the case of ii), the consistency check of the title of the invention is performed simultaneously with the consistency check such as “above” described later.

続いて、文書データ解析部１２０．２は、フロントエンド処理を行う（Ｓ２０６．２．１）。 Subsequently, the document data analysis unit 120.2 performs front end processing (S206.2.1).

ここで、図２１は、フロントエンド処理で割り付けられる割付記号を説明する図である。 Here, FIG. 21 is a diagram for explaining the assignment symbols assigned in the front-end process.

図２１に示されるように、対象文字列に対して、図２１中に示す辞書を基に、図２１中のルールに従って記号を割り付ける。なお、割付処理は、テーブル中の順番にしたがって行われる。なお、テーブル中に順番が記載されている場合は、他の処理でも同様である。 As shown in FIG. 21, symbols are assigned to the target character string according to the rules in FIG. 21, based on the dictionary shown in FIG. The allocation process is performed according to the order in the table. When the order is described in the table, the same applies to other processes.

続いて、図１７に戻って、文書データ解析部１２０．２は、形態素解析を実行する（Ｓ２０６．２．２）。 Subsequently, returning to FIG. 17, the document data analysis unit 120.2 executes morpheme analysis (S206.2.2).

図２２は、このような形態素解析における割付記号を説明する図である。
図２２に示すように、上記フロントエンド処理で、記号が割り付けられていない文字列に対して、形態素解析処理をする。形態素解析処理では、入力文字列を分割した文字列とそれに対応する品詞を得る。図２２のテーブルに従って、その分割した文字列に対して、品詞を基に、記号を割り付ける（名詞：Ｎ，接続詞：Ｏ，動詞：Ｖ，助動詞：Ｇ，助詞：Ｊ）。 FIG. 22 is a diagram for explaining assignment symbols in such morphological analysis.
As shown in FIG. 22, morpheme analysis processing is performed on a character string to which no symbol is assigned in the front end processing. In the morphological analysis process, a character string obtained by dividing the input character string and a part of speech corresponding to the character string are obtained. According to the table of FIG. 22, symbols are assigned to the divided character strings based on the part of speech (noun: N, conjunction: O, verb: V, auxiliary verb: G, particle: J).

図１８は、図１６のステップＳ２０８とＳ２１０との流れを詳しく説明するためのフローチャートである。 FIG. 18 is a flowchart for explaining in detail the flow of steps S208 and S210 of FIG.

ステップＳ２０８において、文書データ解析部１２０．２は、まず、バックエンド処理１を行う（Ｓ２０８．１）。 In step S208, the document data analysis unit 120.2 first performs backend processing 1 (S208.1).

図２３は、このようなバックエンド処理１を説明するためのテーブルである。図２３に示すように、バックエンド処理１においては、それまでの処理で既に記号が割り付けられた文字列に対して、辞書を基に、図２３のテーブルに従って記号を割り付ける。 FIG. 23 is a table for explaining such back-end processing 1. As shown in FIG. 23, in the back-end process 1, symbols are assigned according to the table of FIG. 23 based on the dictionary with respect to the character strings to which symbols have already been assigned in the process so far.

図１８に戻って、次に、文書データ解析部１２０．２は、記号の置換と連結処理を行うために補正処理１を行う（Ｓ２０８．２）。 Returning to FIG. 18, next, the document data analysis unit 120.2 performs correction processing 1 to perform symbol replacement and concatenation processing (S208.2).

図２４は、このような補正処理１を説明するためのテーブルである。図２４に示すように、補正処理１においては、それまでの処理で既に記号が割り付けられた文字列に対して、辞書を基に、図２４のテーブルに従って記号を割り付ける。 FIG. 24 is a table for explaining such correction processing 1. As shown in FIG. 24, in the correction process 1, symbols are assigned according to the table of FIG. 24 based on the dictionary with respect to the character strings to which the symbols have already been assigned in the process so far.

たとえば、ＴＯＣ辞書に登録されている文字列には、割付記号Ｐが割り付けられ、ＴＯＮ辞書に登録されている文字列には、割付記号Ｎが割り付けられる。また、実施の形態１と同様にして、名詞同士の連結処理（Ｎ＋Ｎ→Ｎ）などが実施される。 For example, an assigned symbol P is assigned to a character string registered in the TOC dictionary, and an assigned symbol N is assigned to a character string registered in the TON dictionary. Further, in the same manner as in the first embodiment, a process of connecting nouns (N + N → N) is performed.

図１８に戻って、次に、文書データ解析部１２０．２は、Ｐ１辞書を使用しない設定がされている場合は（Ｓ２０８．３）、名詞または連結された名詞を構成要素候補として登録するための補正処理２を行う（Ｓ２０８．４）。 Returning to FIG. 18, next, the document data analysis unit 120.2 registers a noun or a connected noun as a constituent element candidate when the P1 dictionary is not used (S208.3). Correction processing 2 is performed (S208.4).

図２５は、このような補正処理２を説明するテーブルである。
図１８に戻って、次に、文書データ解析部１２０．２は、構成要素の認識処理を行う（Ｓ２０８．５）。 FIG. 25 is a table for explaining such correction processing 2.
Returning to FIG. 18, next, the document data analysis unit 120.2 performs a component recognition process (S208.5).

図２６は、このような「構成要素の認識処理」を説明するためのテーブルである。
図２６に示されるように、上記の処理で既に記号が割り付けられた文字列に対して、図２６のテーブルのルール内に従って記号を割り付ける。但し、大括弧で括られた部分に該当する文字列に記号を割り付ける。例えば、図２６のテーブルの順番１を例に挙げると、“Ｈ＋Ｐ”に該当する文字列には、その直前に文字列ＺＺが存在するので“ＺＣ”を割り付ける。順番３を例に挙げると、“Ｈ＋Ｐ”に該当する文字列には、その直前に文字列ＺＺが存在しないので“Ｃ”を割り付ける。 FIG. 26 is a table for explaining such a “component recognition process”.
As shown in FIG. 26, symbols are assigned according to the rules in the table of FIG. 26 to the character strings to which symbols have already been assigned in the above processing. However, a symbol is assigned to the character string corresponding to the part enclosed in square brackets. For example, taking the order 1 in the table of FIG. 26 as an example, the character string corresponding to “H + P” is assigned “ZC” because the character string ZZ exists immediately before that. Taking order 3 as an example, a character string corresponding to “H + P” is assigned “C” because there is no character string ZZ immediately before it.

図１８に戻って、次に、文書データ解析部１２０．２は、処理対象が「明細書」である場合は（Ｓ２０８．６）、バックエンド処理２を行う（Ｓ２０８．７）。 Returning to FIG. 18, next, when the processing target is “specification” (S208.6), the document data analysis unit 120.2 performs back-end processing 2 (S208.7).

図２７は、このようなバックエンド処理２を説明するテーブルである。「Ｃ」または「ＺＣ」の記号が割り付けられた文字列に、英数字の文字列、すなわち、符号が後置しない場合には、記号ＸＸが割り当てられる。 FIG. 27 is a table for explaining such back-end processing 2. If the character string to which the symbol “C” or “ZC” is assigned is an alphanumeric character string, that is, if the code is not followed, the symbol XX is assigned.

図１８に戻って、次に、文書データ解析部１２０．２は、ＴＮＣ辞書の登録内容にしたがって、バックエンド処理３を行う（Ｓ２０８．８）。 Returning to FIG. 18, next, the document data analysis unit 120.2 performs the back-end process 3 according to the registered contents of the TNC dictionary (S208.8).

図２８は、このようなバックエンド処理３を説明するテーブルである。「Ｃ」または「ＺＣ」の記号が割り付けられた文字列に対して、該当する単語がＴＮＣ辞書に登録されている場合には、記号ＸＸが割り当てられる。 FIG. 28 is a table for explaining such back-end processing 3. If the corresponding word is registered in the TNC dictionary for the character string assigned the symbol “C” or “ZC”, the symbol XX is assigned.

図１８に戻って、次に、文書データ解析部１２０．２は、取得文書すべてについて処理が終了している場合は（Ｓ２０６．１．１）、請求の範囲について「前記」等の整合性チェックを実施の形態１と同様に行う（Ｓ２１０．１）。 Returning to FIG. 18, the document data analyzing unit 120.2 then checks the consistency of “claimed” or the like for the claims when the processing has been completed for all the acquired documents (S206.1.1). Is performed in the same manner as in the first embodiment (S210.1).

次に、文書データ解析部１２０．２は、明細書について構成要素と参照符号の整合性チェックを行う（Ｓ２１０．２）。 Next, the document data analysis unit 120.2 checks the consistency between the component and the reference code for the specification (S210.2).

この場合は、明細書中の構成要素について、上述のとおり、以下の状態を判断する。
状態ｉ）構成要素と符号との関係が１対１の関係であるもの（使用は正しい可能性が高い）、
状態ｉｉ）１つの構成要素について複数の符号が使用されているもの（誤使用の可能性有り）、
状態ｉｉｉ）１つの符号について複数の構成要素が使用されているもの（明確に誤使用）。 In this case, the following states are determined for the components in the specification as described above.
State i) The relationship between the component and the sign is a one-to-one relationship (use is likely to be correct),
State ii) A case where a plurality of codes are used for one component (possibly misuse),
State iii) A case where a plurality of components are used for one code (clearly misused).

図２９は、ステップ２１２において、データ表示解析部１２０．３が行う表示例を示す図である。 FIG. 29 is a diagram illustrating a display example performed by the data display analysis unit 120.3 in step 212.

図２９においては、状態２に相当する構成要素は、状態ｉｉに相当する構成要素は下線が付され、状態ｉｉｉに相当する構成要素は枠で囲まれている。 In FIG. 29, the component corresponding to the state 2 is underlined in the component corresponding to the state ii, and the component corresponding to the state iii is surrounded by a frame.

もちろん、状態ｉ〜ｉｉｉに対して、異なる色でハイライト処理をして区別してもよい。 Of course, the states i to iii may be distinguished by performing highlight processing with different colors.

以上のような構成により、「特定用語」、特定用語が既出であることを示す「特定の文字列」と、「説明用語」、説明用語に付される「符号」とが使用されるような文書において、それぞれの使用の整合性をチェックすることが可能となる。 With the configuration as described above, “specific term”, “specific character string” indicating that the specific term has already appeared, “descriptive term”, and “sign” attached to the explanatory term are used. It is possible to check the consistency of each use in the document.

しかも、特定用語の候補を抽出する際に、「用語接尾語」を手がかりとすることは、ユーザの設定により選択することができる。 In addition, when a candidate for a specific term is extracted, the use of a “term suffix” as a clue can be selected by a user setting.

今回開示された実施の形態はすべての点で例示であって制限的なものではないと考えられるべきである。本発明の範囲は上記した説明ではなくて特許請求の範囲によって示され、特許請求の範囲と均等の意味および範囲内でのすべての変更が含まれることが意図される。 The embodiment disclosed this time should be considered as illustrative in all points and not restrictive. The scope of the present invention is defined by the terms of the claims, rather than the description above, and is intended to include any modifications within the scope and meaning equivalent to the terms of the claims.

１００文書チェック装置、１０２コンピュータ本体、１０４モニタ、１０５バス、１０６ＦＤドライブ、１０８光ディスクドライブ、１１０キーボード、１１２マウス、１２２メモリ、１２４ハードディスク、１２８通信インターフェイス、１３１表示制御プログラム、１３２文書データ解析プログラム、１３３グループ情報、１３４共通辞書データ、１３５顧客辞書データ、１３６文書データ。 100 Document Check Device, 102 Computer Main Body, 104 Monitor, 105 Bus, 106 FD Drive, 108 Optical Disk Drive, 110 Keyboard, 112 Mouse, 122 Memory, 124 Hard Disk, 128 Communication Interface, 131 Display Control Program, 132 Document Data Analysis Program, 133 Group information, 134 common dictionary data, 135 customer dictionary data, 136 document data.

Claims

A document check device for checking consistency of descriptions in a document to be analyzed,
Storage means for storing document data representing the document;
Information acquisition means for acquiring information on the specific character string when a specific character string is used to indicate that a term following the specific character string has already appeared in the document;
A part of speech specifying means for specifying a part of speech included in the document,
The part of speech specifying means is:
A specifying unit for recognizing and specifying the specific character string based on the information acquired by the information acquisition unit in the document;
Morphological analysis is performed on the document, and morphological analysis means for specifying a part of speech in the document includes:
Based on the result of the morphological analysis, term candidate recognition means for identifying a term candidate that is a candidate for the term by connecting consecutive nouns in the document;
Consistency checking means for checking consistency of use of the specific character string between the specified term candidates;
A document check device further comprising display control means for causing the display device to display the consistency check result.

The information acquisition means further acquires information of a plurality of term suffixes commonly used for the ending of the term,
The term candidate recognizing means, based on the result of the morphological analysis, connects a noun that precedes the term suffix in the document to the term suffix, thereby identifying a term candidate that is a candidate for the term. The document checking apparatus according to claim 1, wherein the document checking apparatus is specified.

The information acquisition unit further registers the term candidates that have not been specified at the time of the process of specifying the term candidates in the term candidate recognition unit in the document as the term candidates by user selection. Get user dictionary information,
The document check apparatus according to claim 2, wherein the term candidate recognition unit further identifies the term candidate with reference to the user dictionary.

The storage means stores information on the specific character string, information on the plurality of term suffixes, and the user dictionary,
The document check apparatus according to claim 3, wherein the information acquisition unit reads information on the specific character string, information on the plurality of term suffixes, and information on the user dictionary from the storage unit.

The documents are classified into a plurality of groups for each content field,
The said user dictionary is divided | segmented into the partial dictionary for every said group, The said user registers the said term candidate which was not specified to the said partial dictionary corresponding to the said content field of the said document. The document check device described.

The storage means further stores in advance a specific prefix prefixed to the term,
The term candidate recognizing means, after concatenating a noun consecutive before the term suffix to the term suffix, and when the specific prefix continues before the term after the concatenation, The document checking apparatus according to claim 4, wherein the term candidates are specified by further connecting words.

The specific character string information, the plurality of term suffix information, and the user dictionary are stored in an external storage device outside the document check device,
The document according to claim 3, wherein the information acquisition unit acquires, from the external storage device, information on the specific character string, information on the plurality of term suffixes, and information on the user dictionary by communication. Check device.

The document data is
First partial document data representing a first document used by the specific character string to indicate that a term following the specific character string has already occurred in the document;
A second partial document for explaining a content defined by the first partial document data, and expressing a second document used by adding a description term corresponding to the term to the reference Data and
The morpheme analysis means performs the morpheme analysis in common with the first and second partial document data,
The document checking apparatus according to claim 1, further comprising a code checking unit for checking the consistency between the explanation term and the code for the second partial document data.

A document check program for causing a computer including an arithmetic device and a storage device that stores document data representing the document to execute consistency check of a description in a document to be analyzed,
A step of obtaining information on the specific character string when the arithmetic device is used to indicate that the specific character string is a term subsequent to the specific character string; When,
Identifying a part of speech included in the document,
The step of specifying the part of speech includes
The arithmetic device recognizing and specifying the specific character string based on the acquired information in the document;
The arithmetic unit performs morphological analysis on the document and specifies parts of speech in the document;
The arithmetic device, based on the result of the morphological analysis, identifying consecutive term nouns in the document by connecting consecutive nouns; and
The computing device checking the consistency of use of the specific character string between the specified term candidates;
A document check program for causing a computer to execute a document check process, further comprising the step of causing the calculation device to display the consistency check result on a display device.

Obtaining the information includes obtaining information of a plurality of term suffixes commonly used for the ending of the term;
The step of specifying the term candidates is based on the result of the morphological analysis. In the document, a noun that precedes the term suffix is connected to the term suffix in the document, thereby being the term candidate. The document check program according to claim 9, comprising a step of specifying term candidates.

In the obtaining step, the term candidates that are not identified at the time of the process of identifying the term candidates in the step of identifying the term candidates in the document are further selected as the term candidates by user selection. Including the step of obtaining information of the registered user dictionary,
The document check program according to claim 10, wherein the step of specifying the term candidates further includes a step of specifying the term candidates with reference to the user dictionary.

The storage device stores information on the specific character string, information on the plurality of term suffixes, and the user dictionary,
The document check according to claim 11, wherein the obtaining step includes a step of reading out information on the specific character string, information on the plurality of term suffixes, and information on the user dictionary from the storage device. program.

The documents are classified into a plurality of groups for each content field,
The user dictionary is divided into partial dictionaries for each group, and the user registers the term candidates that are not specified in the partial dictionary corresponding to the content field of the document. The document check program described.

The storage device further stores in advance a specific prefix prefixed to the term;
The step of identifying the term candidate includes the step of identifying the noun when the specific prefix continues before the term after the concatenation of the noun that precedes the term suffix to the term suffix. The document check program according to claim 12, further comprising the step of specifying the term candidates by further concatenating the prefixes.

The specific character string information, the plurality of term suffix information, and the user dictionary are stored in an external storage device outside the computer on which the document check program is executed,
The obtaining step includes a step of obtaining from the external storage device information on the specific character string, information on the plurality of term suffixes, and information on the user dictionary by communication. The document check program described.

The document data is
First partial document data representing a first document used by the specific character string to indicate that a term following the specific character string has already occurred in the document;
A second partial document for explaining a content defined by the first partial document data, and expressing a second document used by adding a description term corresponding to the term to the reference Data and
The step of specifying the part of speech includes the step of performing the morphological analysis in common with the first and second partial document data;
The document check process includes:
11. The document check program according to claim 9, further comprising a step of checking consistency between the descriptive term and the code for the second partial document data.

A document check method for causing a computer including an arithmetic device and a storage device that stores document data representing the document to execute consistency check of a description in a document to be analyzed,
A step of obtaining information on the specific character string when the arithmetic device is used to indicate that the specific character string is a term subsequent to the specific character string; When,
Identifying a part of speech included in the document,
The step of specifying the part of speech includes
The arithmetic device recognizing and specifying the specific character string based on the acquired information in the document;
The arithmetic unit performs morphological analysis on the document and specifies parts of speech in the document;
The arithmetic device, based on the result of the morphological analysis, identifying consecutive term nouns in the document by connecting consecutive nouns; and
The computing device checking the consistency of use of the specific character string between the specified term candidates;
And a step of causing the arithmetic device to display the consistency check result on a display device.

Obtaining the information includes obtaining information of a plurality of term suffixes commonly used for the ending of the term;
The step of specifying the term candidates is based on the result of the morphological analysis. In the document, a noun that precedes the term suffix is connected to the term suffix in the document, thereby being the term candidate. The document checking method according to claim 17, further comprising a step of identifying term candidates.

In the obtaining step, the term candidates that are not identified at the time of the process of identifying the term candidates in the step of identifying the term candidates in the document are further selected as the term candidates by user selection. Including the step of obtaining information of the registered user dictionary,
The document checking method according to claim 17 or 18, wherein the step of specifying the term candidates further includes a step of specifying the term candidates with reference to the user dictionary.

The document data is
First partial document data representing a first document used by the specific character string to indicate that a term following the specific character string has already occurred in the document;
A second partial document for explaining a content defined by the first partial document data, and expressing a second document used by adding a description term corresponding to the term to the reference Data and
The step of specifying the part of speech includes the step of performing the morphological analysis in common with the first and second partial document data;
The document checking method according to claim 17 or 18, further comprising a step of checking the consistency between the explanation terminology and the code for the second partial document data.