JP2001092830A

JP2001092830A - Device and method for collating character string

Info

Publication number: JP2001092830A
Application number: JP26644899A
Authority: JP
Inventors: Tokuji Ota; 徳二太田
Original assignee: SAGAMA JOHO KAGAKU KENKYUSHO K; SAGAMA JOHO KAGAKU KENKYUSHO KK
Current assignee: SAGAMA JOHO KAGAKU KENKYUSHO K; SAGAMA JOHO KAGAKU KENKYUSHO KK
Priority date: 1999-09-21
Filing date: 1999-09-21
Publication date: 2001-04-06

Abstract

PROBLEM TO BE SOLVED: To collate a character string on the basis of an element larger than a character or term by realizing comparison on sentences composed of terms or documents composed of sentences. SOLUTION: Character string groups constituting a document are arranged in the form of a table and, after coincident or resemble character strings are adjoined to each other by sorting the character strings, detailed verification is performed. Then more detailed results are obtained by repeating a procedure for changing the dividing size or order of the character strings.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、文字コードによっ
て電子ファイル化された文書の複数の文字列を照合する
ための装置と方法に関する。[0001] 1. Field of the Invention [0002] The present invention relates to an apparatus and a method for collating a plurality of character strings of a document converted into an electronic file by a character code.

【０００２】[0002]

【従来の技術】文書あるいは文書の部分要素を照合する
際には、キーワードあるいはキーセンテンスに基づいた
検索条件を設定して自動的に検索した後に得られた文字
列を照合しており、キーの設定と検索に自然言語解析を
応用することなどが実施されている。2. Description of the Related Art When collating a document or a sub-element of a document, a character string obtained after setting a retrieval condition based on a keyword or a key sentence and performing an automatic retrieval is collated. The application of natural language analysis for setting and searching has been implemented.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、キーの
選択と検索条件の設定が適切に実施できない場合があ
り、照合対象が過多になって、照合が不可能となるとい
う問題がある。However, there is a case where the selection of the key and the setting of the search condition cannot be properly performed in some cases, and there is a problem that the number of objects to be collated becomes excessive and the collation becomes impossible.

【０００４】キーワードの選択および検索条件の設定の
問題を解決するために、検索を必要としない照合方法を
提供することが本発明の課題である。[0004] It is an object of the present invention to provide a collation method that does not require a search in order to solve the problems of selecting keywords and setting search conditions.

【０００５】[0005]

【課題を解決するための手段】上記課題を解決するため
に、請求項１の照合装置は、文字列および文字列群によ
り構成される単一の文書において、文書の構成要素であ
る文字列の相互の一致あるいは相違の関係を判別する際
に、入力文書格納装置から当該文書を入力し、文字列特
性表生成手段は当該構成要素から当該文字列を抽出し当
該文書内の当該構成要素の位置識別子とともに当該文書
名のような文書識別子および当該構成要素と当該文書と
の関係表現コードなどを含めることによって文字列の特
性を表現する入力文書文字列特性表を生成し、文字列ソ
ート手段は入力文書文字列特性表を文字列でソートして
出力文書文字列特性表を作成し、文書復元手段は出力文
書文字列特性表から当該文書の形式に一致関係の表現を
付加した文書を生成して出力文書格納装置へ出力するか
あるいは出力文書文字列特性表を出力文書格納装置へ出
力することを特徴とする。According to a first aspect of the present invention, there is provided a collating apparatus comprising: a single document including a character string and a group of character strings; When determining the mutual agreement or the difference, the document is input from the input document storage device, and the character string characteristic table generating means extracts the character string from the component and determines the position of the component in the document. By including a document identifier such as the document name together with the identifier and a relation expression code between the constituent element and the document, an input document character string characteristic table expressing the character string characteristics is generated, and the character string sorting means The document character string characteristic table is sorted by character strings to create an output document character string characteristic table, and the document restoring means generates a document from the output document character string characteristic table by adding a matching expression to the document format. And outputs whether or output document string property table output to the output document storage device and to the output document storage device.

【０００６】上記手段は、情報処理装置内に格納された
文書が文字列および文字列群の文字コードによって電子
化されているので、その文字列および文字列群の文字コ
ード群に対してソート方法を適用し、その結果で隣接し
た文字列および文字列群の一致関係を判別し、当該文字
列および文字列群に文書内の位置識別子とともに当該文
書名のような文書識別子および当該構成要素と当該文書
との関係表現コードなどの情報を随伴させた表形式にし
ておくことによって一般的な表のソートを実行した後
に、随伴する情報に基づいて当該文書の形式に一致関係
の表現を付加して出力するかあるいはソート結果そのも
のを出力することを特徴とする。According to the above means, since a document stored in the information processing apparatus is digitized by a character string and a character code of a character string group, a sorting method is performed on the character string and the character code group of the character string group. Is applied, and as a result, the matching relationship between adjacent character strings and character string groups is determined, and a document identifier such as the document name and the constituent element are added to the character string and the character string group together with the position identifier in the document. After performing a general table sort by making the table format accompanied by information such as the expression code related to the document, add a matching expression to the document format based on the accompanying information. It is characterized by outputting or outputting the sort result itself.

【０００７】請求項２の照合装置は、請求項１の文書復
元手段に一致関係表現形式指定規則に準拠した形式に変
換して出力する加工手段を追加し、照合結果の出力形式
を改良する方法を提供する。なお、本発明の規則の設定
ならびに設定した規則の使用に関しては、規則に準拠し
た処理を実施手段の中に組み込むことも規則処理の手段
を独立させて設定および格納および使用を汎用化するこ
とも可能で、普及している一般的な方法を利用すればよ
い。A collating apparatus according to a second aspect is a method for improving the output format of the collation result by adding processing means for converting the document into a format conforming to the rules for specifying a matching relationship expression and outputting the same to the document restoring means according to the first aspect. I will provide a. Regarding the setting of the rules of the present invention and the use of the set rules, it is possible to incorporate the processing conforming to the rules into the implementing means, and to make the setting, storing, and using the rule processing means independent. It is possible to use general methods that are possible and widespread.

【０００８】請求項３の照合装置は、一致関係のみなら
ず類似あるいは背反の関係をも判別するために、上記入
力文書文字列特性表から類似背反用語データベースに準
拠して入力文書文字列特性表を類似文字列特性表に変換
する類似背反関係設定手段を追加して、ソートした出力
文書文字列特性表を文書加工復元手段は類似背反用語デ
ータベースを参照し類似関係表現形式指定規則に準拠し
て出力文書を生成して出力することを特徴とする。According to a third aspect of the present invention, in order to determine not only a matching relationship but also a similar or conflicting relationship, the collating apparatus according to the input document character string characteristic table based on the similar conflicting term database is used. Is added to the similar character string characteristic table, and the document processing / reconstructing means refers to the similar contradiction term database and conforms to the similar relation expression format specification rules. An output document is generated and output.

【０００９】上記手段は、一致関係の照合方法に基づき
ながら、類似背反用語データベースに準拠して文字列を
基準の文字列へ置換した後に、原形の文字列などを随伴
情報として表のソート方法を適用して類似関係の照合方
法を提供する。The above means replaces a character string with a reference character string based on a similar contradiction term database based on a matching relation collation method, and then sorts the table using the original character string or the like as accompanying information. Apply to provide similarity matching methods.

【００１０】請求項４の照合装置は、複数の文書を照合
するために上記の手段を適用し、当該文書数の当該文書
を入力し当該文書数の入力文書文字列特性表を生成しさ
らに当該文書数の類似文字列特性表に変換した後に、当
該文書数の類似文字列特性表を単一の結合特性表に変え
る特性表結合手段を追加し、文字列ソート手段は単一の
結合特性表を文字列でソートして単一の出力文書文字列
特性表を作成し、文書加工復元手段は単一の出力文書文
字列特性表から単一ないし当該文書数の出力文書を生成
して出力文書格納装置へ出力することを特徴とする。According to a fourth aspect of the present invention, the above-described means is applied to collate a plurality of documents, the number of the documents is input, the input document character string characteristic table of the number of the documents is generated, and After converting to the similar character string characteristic table of the number of documents, a characteristic table combining means for converting the similar character string characteristic table of the number of documents into a single combined characteristic table is added. Are sorted by a character string to create a single output document character string characteristic table, and the document processing / restoring unit generates a single or the number of output documents from the single output document character string characteristic table to output the document. The data is output to a storage device.

【００１１】上記手段は、複数の文書を照合するため
に、複数の当該文書それぞれの識別子を随伴させた単一
の表を使用することによって、複数の文書の照合方法を
提供する。[0011] The means provides a method for collating a plurality of documents by using a single table associated with the identifiers of the plurality of documents to collate the plurality of documents.

【００１２】請求項５の照合装置は、上記の文字列特性
表生成手段に文書文字列化規則に準拠して文字列を変更
する手段を追加し、類似照合のために文字列を変換する
のみならず順序などの文字列の構成をも変更して改良す
る方法を提供する。すなわち、文字列を構成する用語に
基づいて配置順序を変更する文書文字列化規則が設定さ
れた場合には用語単位で文字列の順序を変更するので文
字列化の処理で位置を前へ変更された用語はソート結果
で優先されて隣接し、また、文字の順序を逆にするとい
う文書文字列化規則が設定された場合には文字列の文字
が逆順に変更されてソートが実行され終端から類似する
文字列が隣接する出力文書文字列特性表を出力する。According to a fifth aspect of the present invention, a means for changing a character string in conformity with a document characterizing rule is added to the character string characteristic table generating means, and only the character string is converted for similarity matching. In addition, there is provided a method for improving by changing the structure of a character string such as an order. In other words, if a document stringing rule that changes the arrangement order based on the terms that make up a character string is set, the order of the strings is changed in terms of terms, so the position is moved forward by the stringing process When the term of the document is prioritized and adjacent in the sort result, and if the document string conversion rule of reversing the character order is set, the characters of the character string are changed in reverse order and the sorting is executed and terminated. Output an output document character string characteristic table in which similar character strings are adjacent.

【００１３】請求項６の照合装置は、上記手段で照合し
た結果をさらに改良するために、隣接文字列相互関係規
則の抽出手段を追加して上記文字列ソート手段が作成し
た出力文書文字列特性表の隣接する文字列を対照させて
隣接文字列相互関係規則の抽出規則に準拠して隣接文字
列相互関係規則を抽出し、文字列化規則設定手段を追加
して隣接文字列相互関係規則が抽出された場合には文書
文字列化規則の設定を変更しながら照合手順を反復回数
の制限数内で継続させることを特徴とする。According to a sixth aspect of the present invention, in order to further improve the result of the collation by the above-mentioned means, an output document character string characteristic generated by the character string sorting means by adding an adjacent character string correlation rule extracting means. The adjacent character strings in the table are compared with each other to extract the adjacent character string mutual relation rules in accordance with the extraction rule of the adjacent character string mutual relation rules. When extracted, the collation procedure is continued within the limited number of repetitions while changing the setting of the document character string conversion rule.

【００１４】上記手段は、照合性能を改良するために、
ソートして得られた隣接文字列群を対照させて隣接文字
列相互関係規則の抽出規則に準拠して隣接した文字列間
の相互関係の規則を抽出し、その隣接文字列相互関係規
則を文書文字列化規則に追加設定して照合手順を反復さ
せる方法を提供する。In order to improve the matching performance, the above means
The adjacent character strings obtained by sorting are compared with each other to extract the rule of the mutual relationship between adjacent character strings according to the extraction rule of the adjacent character string correlation rule, and the adjacent character string correlation rule is written in a document. A method is provided in which the collation procedure is repeated by additionally setting a string conversion rule.

【００１５】[0015]

【発明の実施の形態】発明の実施の形態を、原理図を参
照して説明する。請求項１に対応する図１において、一
般的な表形式のソート方法を利用するために、文字列特
性表生成手段１２は、文字列および文字列群の文字コー
ドによって電子化されている単一の文書のファイルを格
納している入力文書格納装置１１から入力文書を読み込
んで、照合目的に合致した方法で文字列を抽出してその
文字列を入力文書文字列特性表１３の所定のソート基準
箇所に格納すると同時に、文字列特性表の形式に従って
当該文字列の文書内の位置の識別子や当該文書名のよう
な文書識別子や文書の構成要素としての当該文字列と文
書全体との関係表現コードなどの当該文字列の特性を表
現する情報を随伴させて格納して入力文書文字列特性表
を生成し、文字列ソート手段１４は、入力文書文字列特
性表を照合対象の文字列を基準にして表形式のソートを
実行して随伴する情報と基準の文字列との関係が保存さ
れた結果を得てその結果を出力文書文字列特性表１５と
して格納し、文書復元手段１６は、入力文書の形式で一
致関係を表現する要望の場合にはソート結果を入力文書
の形式に戻して一致関係の表現を付加した文書を生成し
て出力文書格納装置１７へ出力し、ソート結果の通りに
一致する文字列が隣接した表を要望する場合には出力文
書文字列特性表を出力文書格納装置へ出力する。Embodiments of the present invention will be described with reference to the principle diagrams. In FIG. 1 corresponding to claim 1, in order to use a general tabular sorting method, the character string characteristic table generating means 12 includes a character string and a single character code of a character string group. The input document is read from the input document storage device 11 which stores the file of the document, a character string is extracted by a method suitable for the collation purpose, and the character string is sorted by a predetermined sorting criterion in the input document character string characteristic table 13. At the same time as stored in the location, according to the format of the character string characteristics table, the identifier of the position of the character string in the document, a document identifier such as the document name, or a relation expression code between the character string as a component of the document and the entire document For example, the input document character string characteristic table is generated by storing information expressing the characteristics of the character string, such as the associated character string, and the character string sorting unit 14 compares the input document character string characteristic table with the character string to be compared. A result in which the relation between the accompanying information and the reference character string is stored by executing the sorting in the table format is obtained, and the result is stored as the output document character string characteristic table 15. In the case of a request for expressing a matching relationship in a format, the sort result is returned to the input document format, a document to which the expression of the matching relationship is added is generated, output to the output document storage device 17, and matches as the sorting result. When a table in which character strings are adjacent is desired, an output document character string characteristic table is output to the output document storage device.

【００１６】請求項２に対応する図２において、一致関
係の表現形式に要望がある場合にはその要望を一致関係
表現形式指定規則１８として表現し、ソートによって一
致する文字列が隣接していることに着目して文字列の一
致関係を判別して要望に応じてソート結果の出力文書文
字列特性表１５を変形して要望の一致関係の表現を付加
した文書を生成して出力文書格納装置１７へ出力する手
段を文書復元手段１６に追加して文書加工復元手段１９
に変えて出力形式を改良している。なお、一致関係の表
現形式の要望を一致関係表現形式指定規則として表現す
ることなくし文書加工復元手段の機能を追加することも
可能である。In FIG. 2 corresponding to claim 2, when there is a request for the expression form of the matching relation, the request is expressed as a matching relation expression form specification rule 18, and the character strings that match by sorting are adjacent. By paying attention to this, the matching of the character strings is determined, and the output document character string characteristic table 15 of the sorting result is modified according to the request to generate a document to which the expression of the matching of the request is added, and the output document storage device 17 is added to the document restoring means 16 and the document processing / restoring means 19 is added.
To improve the output format. It is also possible to add the function of the document processing / restoring unit without expressing the demand for the expression format of the matching relationship as the matching expression format designation rule.

【００１７】請求項３に対応する図３において、一般的
には文書の構成要素である文章や段落などの文字列が完
全に一致する場合は少なく類似あるいは背反の関係をも
判別することが必要となるので、図１のように入力文書
文字列特性表１３を文字列ソート手段１４が使用するの
ではなく、一致関係を照合する手段を利用して類似ある
いは背反の関係をも判別するために、類似背反関係設定
手段２１が類似背反用語データベース２２に準拠して入
力文書文字列特性表を類似文字列特性表２３に変換し
て、類似文字列特性表をソートして出力文書文字列特性
表１５を作成し、ソートして得られた出力文書文字列特
性表から出力文書を生成する図２の文書加工復元手段
が、類似背反用語データベースを参照し類似関係表現形
式指定規則２４に準拠して出力文書を生成して出力する
機能を追加した文書加工復元手段２５に変更されてい
る。なお、類似関係表現形式指定規則は一致関係表現形
式指定規則と背反関係表現形式指定規則をも包含させる
ことにより一致および類似および背反の照合が実現す
る。また、類似文字列特性表に類似背反関係設定手段が
変換する前の入力文書文字列を含めることにより、類似
背反関係を隠蔽して入力文書文字列のみを出力すること
も可能となる。In FIG. 3 corresponding to claim 3, in general, when character strings such as a sentence or a paragraph which are constituent elements of a document completely match, it is necessary to determine a similarity or a contradiction. Therefore, instead of using the input document character string characteristic table 13 by the character string sorting unit 14 as shown in FIG. The similar conflicting relation setting means 21 converts the input document character string characteristic table into the similar character string characteristic table 23 based on the similar conflicting term database 22, sorts the similar character string characteristic table, and sorts the output document character string characteristic table. The document processing / restoring unit shown in FIG. 2 which generates the output document 15 from the output document character string characteristic table obtained by sorting is referred to the similar contradiction term database and conforms to the similarity relation expression format designation rule 24. Has been changed to a document processing restoring means 25 for adding the generated and function of outputting the output document Te. It should be noted that the similarity relation expression form specification rule also includes the coincidence relation expression form specification rule and the contradiction relation expression form specification rule, thereby realizing matching, similarity, and contradiction. Further, by including the input document character string before conversion by the similar reciprocal relation setting unit in the similar character string characteristic table, it is possible to conceal the similar reciprocal relation and output only the input document character string.

【００１８】請求項４に対応する図４は、単一文書の照
合手段を利用して複数の文書を照合する手段の原理を示
している。文字列特性表生成手段３２は、文字列特性表
生成手段１２と異なり複数の文書を処理して当該文書数
の当該文書を格納する入力文書格納装置３１から当該文
書数の入力文書文字列特性表３３を生成し、類似背反関
係設定手段３４は、類似背反関係設定手段２１と異なり
当該文書数の入力文書文字列特性表を当該文書数の当該
文書に適用可能な類似背反用語データベース３５に準拠
して当該文書数の類似文字列特性表３６に変換して、特
性表結合手段３７は、当該文書数の類似文字列特性表を
単一の結合特性表３８に変え、文字列ソート手段３９は
単一となった結合特性表の全体を文字列ソート手段１４
と同様に文字列でソートして単一の出力文書文字列特性
表４０を作成し、文書加工復元手段４２は単一の出力文
書文字列特性表から当該文書数の当該文書に適用可能な
類似関係表現形式指定規則４１に準拠して単一ないし当
該文書数の出力文書を特性表に随伴している複数の当該
文書それぞれの識別子を使用して生成して当該文書数の
当該文書を格納できる出力文書格納装置４３へ出力す
る。FIG. 4 corresponding to claim 4 shows the principle of a means for comparing a plurality of documents using a single document matching means. The character string characteristic table generating means 32 is different from the character string characteristic table generating means 12 in that it processes a plurality of documents and stores the number of documents corresponding to the input document. 33, and the similarity reciprocal relation setting unit 34 differs from the similar rebellion relation setting unit 21 in that the input document character string characteristic table of the document number is based on the similarity reciprocal term database 35 applicable to the document number of the document. Then, the characteristic table combining means 37 converts the similar character string characteristic table of the document number into a single combined characteristic table 38, and the character string sorting means 39 converts the similar character string characteristic table of the document number into a single combined characteristic table 38. The entirety of the combined property table that has become one is sorted by the character string sorting means 14.
A single output document character string characteristic table 40 is created by sorting by a character string in the same manner as described above. In accordance with the relation expression format specification rule 41, a single or the number of output documents can be generated using the identifiers of a plurality of the documents associated with the property table and the number of the documents can be stored. Output to the output document storage device 43.

【００１９】請求項５に対応する図５においては、文字
列要素の順序などが相違する文字列をその順序の相違に
依存しない照合を実現するために、文字列特性表生成手
段５２は入力文書格納装置３１から入力した入力文書を
文書文字列化規則５１に準拠して文字列を変更して入力
文書文字列特性表３３を生成する手段を追加している。In FIG. 5 corresponding to claim 5, in order to realize a collation that does not depend on the difference in the order of the character string elements and the like, the character string characteristic table generating means 52 uses an input document. A means for changing the character string of the input document input from the storage device 31 in accordance with the document character string conversion rule 51 to generate the input document character string characteristic table 33 is added.

【００２０】請求項６に対応する図６においては、実行
済みの照合結果をさらに改良するために、文字列ソート
手段３９によって作成された出力文書文字列特性表４０
の隣接する文字列に対して、隣接文字列相互関係規則の
抽出手段５３は、文字列ソート手段が作成した出力文書
文字列特性表の隣接する文字列を対照させて隣接文字列
相互関係規則の抽出規則に準拠して隣接文字列相互関係
規則５４を抽出し、文字列化規則設定手段５５は隣接文
字列相互関係規則が抽出された場合には文書文字列化規
則５１の設定を変更しながら照合手順を反復回数の制限
数内で照合を継続させることによって、文字列特性表生
成手段５２が生成する入力文書文字列特性表３３を変更
し、次のソート結果の出力文書文字列特性表に反映させ
るという手順を反復する。In FIG. 6 corresponding to claim 6, an output document character string characteristic table 40 created by the character string sorting means 39 in order to further improve the executed collation result.
The adjacent character string correlation rule extraction unit 53 compares the adjacent character strings in the output document character string characteristic table created by the character string sorting unit with the adjacent character string correlation rule. The adjacent character string mutual relation rule 54 is extracted in accordance with the extraction rule, and the character string conversion rule setting means 55 changes the setting of the document character string conversion rule 51 when the adjacent character string mutual relation rule is extracted. The input document character string characteristic table 33 generated by the character string characteristic table generation means 52 is changed by continuing the collation within the limited number of repetition times of the collation procedure, and is changed to the output document character string characteristic table of the next sorting result. Repeat the procedure of reflecting.

【００２１】発明の文書処理の実施の形態を、本件の明
細書を実施例として図面を参照しながら説明する。図１
の文字列特性表生成手段１２は、入力文書が単一の場合
にその文書の形式に合わせかつ文書の構成要素である文
字列が照合目的に適うように抽出しなければならないの
で、入力文書格納装置１１から入力文書を読み込んで、
文書のファイル形式に従って文字列を生成する。An embodiment of the document processing according to the present invention will be described with reference to the drawings using the specification of the present invention as an example. FIG.
When the input document is single, the character string characteristic table generating means 12 must match the format of the document and extract a character string which is a component of the document so as to meet the purpose of collation. The input document is read from the device 11,
Generates a character string according to the file format of the document.

【００２２】例えば、本件の明細書の１２個の請求項の
相互関係を明確にするために照合して同一部分と相違部
分を判別するのであれば、各請求項が一つの段落で記載
されているから段落単位で照合したのでは一致すること
はあり得ないので、請求項１を図７のように読点で分割
し、請求項１を総数７個の文書構成要素に分割した文字
列を生成する。図８の入力文書文字列特性表１３の例で
は表の行数は分割総数になり、所定のソート基準箇所が
第２列になり、読点で分割された文字列に番号（番号の
付け方は任意であるからここでは１０１から始まる連番
としている）を付与して第１列に格納している数値形式
の文書内の位置の識別子としているが、より広い範囲の
照合では当該文字列に対して請求項番号より広い枠組み
の欄名である「特許請求の範囲」を随伴させることによ
り当該文字列が代表する文章要素と文書全体との関係を
表現することも有効である。一般の文書には章や節など
の目次の見出し構成があり、それがワープロの制御コー
ドになっているので、それらを使用すればよい。なお、
文書名は例えば本発明の名称である「文字列の照合装置
およびその方法」としてもよいが、入力文書格納装置内
のファイル名を使用すれば通常なら簡略となる。For example, if the same part is distinguished from the different part by collation to clarify the mutual relationship between the twelve claims in the specification, each claim is described in one paragraph. Since there is no possibility of matching if collated in paragraph units, claim 1 is divided by reading points as shown in FIG. 7, and a character string is generated by dividing claim 1 into a total of seven document components. I do. In the example of the input document character string characteristic table 13 in FIG. 8, the number of rows in the table is the total number of divisions, the predetermined sorting reference point is the second column, and the character strings divided by the reading point are numbers (the numbering method is arbitrary). Is assigned here as a serial number starting from 101), and the identifier of the position in the document in the numerical format stored in the first column is used. It is also effective to express the relationship between the sentence element represented by the character string and the entire document by adding “claims” which is a column name of a framework wider than the claim number. A general document has a table of contents such as a chapter or a section, which is used as a word processor control code, and can be used. In addition,
The document name may be, for example, the “character string collation apparatus and method”, which is the name of the present invention. However, if the file name in the input document storage apparatus is used, it is usually simplified.

【００２３】入力文書が本件の図１のような図面の場合
にも、その電子化ファイルの中から文字列を抽出して図
の符合を番号として図９のような特性表を生成すること
ができる。また、入力文書が表形式の場合には、その表
の文字列の部分を入力文書文字列特性表に複写すれば本
発明の手段を仕様でき、複数の基準を設定することも表
形式のソート手段で複数のキーを設定することにより可
能である。Even when the input document is a drawing as shown in FIG. 1 of the present case, it is possible to extract a character string from the digitized file and generate a characteristic table as shown in FIG. it can. If the input document is in the form of a table, the means of the present invention can be specified by copying the character string portion of the table into the input document character string characteristic table. This is possible by setting a plurality of keys by means.

【００２４】図１の文字列特性表生成手段１２の手順の
例のフローチャートは図１０となり、読点で分割すると
ういうような文書文字列群分割規則を設定（Ｓ１）する
ことにより、文書を文字列群に分割して文字列総数を決
定（Ｓ２）することができる。続いて、自然言語の構文
解析の規則とかあるいは仮名と漢字で区切って用語を抽
出するというような簡便な規則を文字列群から用語を抽
出するための文字列群用語抽出規則として設定（Ｓ３）
して類似検索などでも文字列特性表生成手段１２を使用
することが可能となるが、単純に文字列の一致関係を照
合するためであれば省略してよい。FIG. 10 is a flowchart showing an example of the procedure of the character string characteristic table generation means 12 shown in FIG. 1. A document character string group division rule for dividing a document at a reading point is set (S1), so that a document is converted to a character. It is possible to determine the total number of character strings by dividing the character string into column groups (S2). Subsequently, a simple rule such as a parsing rule of a natural language or a method of extracting terms separated by kana and kanji is set as a character string group term extraction rule for extracting terms from a character string group (S3).
Then, it is possible to use the character string characteristic table generating means 12 for similarity search or the like, but it may be omitted as long as the collation of character strings is simply collated.

【００２５】図１０のＳ５からは抽出結果から入力文書
文字列特性表１３を生成する手順で、文字列番号を初期
化（Ｓ４）して文字列化を開始（Ｓ５）し、分割された
文書の構成要素を文字列化（Ｓ６）し、文字列や文書識
別子や文字列位置識別子とともに必要があれば文字列化
した時の補足データを付加して特性表の文字列番号の行
に格納（Ｓ７）し、文字列番号が文字列総数に達したな
ら終了（Ｓ８）するが、達しない間は番号を更新（Ｓ
９）してＳ６とＳ７のステップを繰り返す。なお、本手
順を実行する際の環境が文書の分割とその文字列化を並
行して処理することに適している場合には、図１０のよ
うに文字列総数を設定してその個数分の処理を繰り返す
のではなく、文書の初頭から始めてＳ４からＳ９の手順
を繰り返しながら文書要素に分割して文書が終了した時
点で手順を終了すればよい。From S5 in FIG. 10, in the procedure for generating the input document character string characteristic table 13 from the extraction result, the character string number is initialized (S4), the character string is started (S5), and the divided document is obtained. Is converted into a character string (S6), and if necessary, supplementary data of the character string is added together with a character string, a document identifier, or a character string position identifier, and stored in the character string number row of the characteristic table ( S7), and if the character string number has reached the total number of character strings, the process is terminated (S8), but if not, the number is updated (S8).
9) Then repeat steps S6 and S7. If the environment at the time of executing this procedure is suitable for processing the division of the document and the conversion of the character string in parallel, the total number of character strings is set as shown in FIG. Instead of repeating the processing, the procedure may be started from the beginning of the document, divided into document elements while repeating the procedures from S4 to S9, and the procedure may be terminated when the document ends.

【００２６】図１において、上記のようにして文字列特
性表生成手段１２が入力文書文字列特性表１３を生成す
ると、文字列は文字コードによって数値的に電子化され
ているので数値データに対して利用するソート方法が適
用可能であり、しかも表を対象としたソート方法が普及
しているので、文字列ソート手段１４は入力文書文字列
特性表の全体を照合対象とし、照合する文字列を基準に
するために通常の表形式のソート方法のキーとして照合
する文字列を格納した表の列を使用する。図８や図９の
ように随伴する情報と基準の文字列とを同一の行に格納
するので、表形式のソートを実行して基準の文字列と随
伴情報の関係が保存された結果を得て、それを出力文書
文字列特性表１５とする。In FIG. 1, when the character string characteristic table generating means 12 generates the input document character string characteristic table 13 as described above, the character string is digitized numerically by the character code. In this case, the sorting method for the table is widespread. Therefore, the character string sorting unit 14 sets the entire input document character string characteristic table as a matching target, and determines a character string to be matched. Use a table column that contains a string to be matched as a key to a regular tabular sorting method. Since the accompanying information and the reference character string are stored in the same line as shown in FIGS. 8 and 9, the result of storing the relation between the reference character string and the accompanying information by performing a table sort is obtained. Then, it is set as the output document character string characteristic table 15.

【００２７】図１の文書復元手段１６の手順の例のフロ
ーチャートは図１１となり、請求項１の文章を図８のよ
うな特性表を生成したのと同様に請求項２の文章をも処
理して二つの請求項の特性表を一括して本件の手段を実
施した後に、例えば「元の文書の形式で一致関係を表現
する」という文字列群文書復元規則を設定（Ｓ１１）し
て、図１２のような出力文書を生成する。この場合に
は、例えば「完全一致のみを判別する」という隣接文字
列群照合規則が設定（Ｓ１２）されたことに相当し、以
下の手順に進む。FIG. 11 is a flowchart showing an example of the procedure of the document restoring means 16 shown in FIG. 1. The sentence of claim 1 is processed for the sentence of claim 2 in the same manner as the characteristic table shown in FIG. After the characteristic table of the two claims is collectively implemented and the means of the present invention is implemented, for example, a character string group document restoration rule of “express a matching relationship in the original document format” is set (S11). 12 is generated. In this case, for example, it corresponds to the setting of the adjacent character string group collation rule of “determine only a perfect match” (S12), and the procedure proceeds to the following procedure.

【００２８】図１の出力文書文字列特性表１５の行数は
入力文書文字列特性表１３の行数と同じであるから、入
力文書の形式で一致関係を表現するように要望されてい
る場合を例にしているので、図１０のＳ４、Ｓ５、Ｓ
８、Ｓ９と同様の図１１のＳ１３、Ｓ１４、Ｓ１７、Ｓ
１８の手順でソート結果の出力文書文字列特性表から出
力文書を生成する。文字列番号を初期化（Ｓ１３）して
文書復元を開始（Ｓ１４）し、Ｓ１５のステップで当該
文字列を隣接文字列群照合規則に準拠して隣接文字列群
と照合することにより請求項１の第３要素と請求項２の
第２要素となどが一致と判定されるので、Ｓ１６のステ
ップで照合結果を付加し文字列位置識別子および補足デ
ータに基づき文字列群文書復元規則に準拠して文字列を
文書に復元すると図１２の各行のようになり、文字列番
号が文字列総数に達したなら終了（Ｓ１７）するが、達
しない間は番号を更新（Ｓ１８）してＳ１５とＳ１６の
ステップを繰り返す。Since the number of lines in the output document character string characteristics table 15 of FIG. 1 is the same as the number of lines in the input document character string characteristics table 13, it is requested that the input document format be used to express a matching relationship. As an example, S4, S5, S5 in FIG.
8, S13, S14, S17, and S in FIG.
In step 18, an output document is generated from the output document character string characteristic table as the sorting result. 2. A character string number is initialized (S13), document restoration is started (S14), and the character string is collated with an adjacent character string group in step S15 according to an adjacent character string group collation rule. Is determined to be the same as the second element of claim 2 and so on, the collation result is added in step S16, and the character string group document is restored based on the character string position identifier and the supplementary data. When the character string is restored to a document, each line is as shown in FIG. 12. If the character string number has reached the total number of character strings, the processing ends (S17). If the character string number does not reach the number, the number is updated (S18) and the processing of S15 and S16 is repeated. Repeat steps.

【００２９】ソート結果の通りに一致する文字列が隣接
した表を要望する場合には、図１３のような出力文書文
字列特性表を出力文書格納装置１７へ出力する。When a table in which matching character strings are adjacent to each other is desired as shown in the sorting result, an output document character string characteristic table as shown in FIG.

【００３０】一致関係の表現形式に対して例えば「請求
項間で同一文字列は同じ行に相違文字列は異なる行に並
べる」という要望がある場合には、その要望を図２の一
致関係表現形式指定規則１８として使用し、図１４の相
関表形式の文書のような出力例を生成することができ
る。For example, when there is a request for the expression format of the matching relationship, "the same character string is placed on the same line and the different character string is placed on a different line between the claims", the request is expressed by the matching relationship expression shown in FIG. It can be used as the format specification rule 18 to generate an output example such as the document in the correlation table format of FIG.

【００３１】本明細書のすべての請求項に対して上記の
請求項１と請求項２との照合と同様の照合を行う場合に
は、一致する文字列が複数あるので、ソートによって一
致する文字列の行が複数行続いて隣接することになる。
この結果から一致する行数をカウントすることは容易で
あり、それを出力文書に表現することにより一致関係を
明確にできるので、一致個数の表現が要望された場合に
はその要望に応じて一致文字列の個数を例えば相関形式
の表の左端に付加するなどした文書を生成して出力する
ことにより出力文書を改良する。In the case where the same collation as that of the above claims 1 and 2 is performed for all the claims in this specification, there are a plurality of matching character strings. The rows of the column will be adjacent to each other in a plurality of rows.
From this result, it is easy to count the number of matching lines, and by expressing it in the output document, the matching relationship can be clarified. If the expression of the number of matching is requested, the matching is performed according to the request. The output document is improved by generating and outputting a document in which the number of character strings is added to, for example, the left end of the table in the correlation format.

【００３２】図３の類似背反の処理の説明のためには本
明細書は例にならないので、「お湯の温度を制御する装
置」と「温水の温度調節器」の類似関係を判別する場合
を例として考え、類似背反用語データベース２２の温水
関係の類似用語欄で、「お湯」と「温水」は類似であり
「温水」を基準にすると定義し、「制御」と「調節」は
類似であり「制御」を基準にすると定義し、「装置」と
「器」は類似であり「装置」を基準にすると定義し、例
えば「すべての仮名は類似関係の照合では無視する」こ
とにして、類似背反関係設定手段２１は「お湯の温度を
制御する装置」と「温水の温度調節器」に対して基準文
字列の「温水温度制御装置」を生成して、入力文書文字
列特性表１３の「お湯の温度を制御する装置」と「温水
の温度調節器」などの列を保存し、それぞれの行の別の
列に「温水温度制御装置」を格納して基準の列とし、補
足情報としてまた別の列に類似変換を行ったことを示す
コードを格納して類似文字列特性表２３を生成して、基
準の列をキーとして設定して表形式のソートを実行し、
「お湯の温度を制御する装置」と「温水の温度調節器」
の二つの行が隣接する結果を得る。Since the present specification is not an example for explaining the process of the similar contradiction in FIG. 3, it is assumed that the similarity between the "device for controlling the temperature of hot water" and the "temperature controller for hot water" is determined. Considered as an example, in the similar term column of hot water relation in the similar contradiction term database 22, “hot water” and “hot water” are defined as being similar and based on “hot water”, and “control” and “adjustment” are similar. Defined based on "control", defined as "device" and "vessel" are similar and defined based on "device", for example, "ignore all kana in similarity matching" The conflicting relation setting means 21 generates a "hot water temperature control device" of the reference character string for the "device for controlling the temperature of hot water" and the "temperature controller for hot water", A device that controls the temperature of hot water and a hot water temperature controller Is stored, and the "hot water temperature control device" is stored in another column of each row as a reference column, and a code indicating that similar conversion has been performed in another column as supplementary information is stored. A similar character string characteristic table 23 is generated, and a reference column is set as a key, and a table sort is performed.
"Device for controlling hot water temperature" and "Hot water temperature controller"
Get the result that the two rows of are adjacent.

【００３３】すなわち、図３の類似背反関係設定手段２
１の手順の例のフローチャートは図１５となり、上記の
「すべての仮名は類似関係の照合では無視する」とか例
えば「温水関係の類似用語欄を適用する」などの類似背
反適合規則を設定（Ｓ２１）し、新しく作成する類似文
字列特性表を初期化（Ｓ２２）して、入力文書文字列特
性表と類似文字列特性表の処理対象の文字列番号として
特性表の行番号を採用してそれを初期化（Ｓ２３）し、
入力文書文字列特性表から類似文字列特性表への文字列
ごとの変換処理を開始（Ｓ２４）する。この手順によっ
て入力文書文字列特性表の変換対象の文字列が選定され
るのでその文字列に対応した入力文書文字列特性表の行
を類似文字列特性表に転送（Ｓ２５）し、当該文字列を
類似背反用語データベースの構造と共通の用語抽出方法
によって用語に分解して、当該文字列の用語処理のため
に用語番号を付与し、その用語番号を初期化（Ｓ２６）
して文字列内の用語ごとの変換処理を開始（Ｓ２７）す
る。Ｓ２８のステップで当該用語は類似背反用語データ
ベースに含まれていることが判定された場合には、類似
背反適合規則に準拠して新しい文字列を生成して類似文
字列特性表の当該文字列の列とは別の列に追加（Ｓ２
９）格納し、当該文字列内の用語は終了（Ｓ３０）かど
うかを判別して、未終了であれば用語番号を更新（Ｓ３
１）してＳ２８のステップに戻る。Ｓ３０のステップで
当該文字列内の用語は終了したと判別されたならば、特
性表の文字列は終了（Ｓ３２）かどうかを判別して、未
終了であれば文字列番号を更新（Ｓ３３）してＳ２８の
ステップを戻る。That is, the reciprocal relation setting means 2 shown in FIG.
FIG. 15 is a flowchart of an example of the first procedure, and the similar conflicting matching rule such as “Ignore all pseudonyms in similarity matching” or, for example, “apply a similar term field related to hot water” is set (S21). Then, the newly created similar character string characteristic table is initialized (S22), and the line number of the characteristic table is adopted as the character string number to be processed in the input document character string characteristic table and the similar character string characteristic table. Is initialized (S23),
The conversion process for each character string from the input document character string characteristic table to the similar character string characteristic table is started (S24). According to this procedure, the character string to be converted in the input document character string characteristic table is selected, and the line of the input document character string characteristic table corresponding to the character string is transferred to the similar character string characteristic table (S25), and the character string is converted. Is decomposed into terms by the common term extraction database structure and the common term extraction method, a term number is assigned for term processing of the character string, and the term number is initialized (S26).
Then, the conversion process for each term in the character string is started (S27). If it is determined in step S28 that the term is included in the similar contradiction term database, a new character string is generated in accordance with the similar contradiction matching rule, and the new character string is stored in the similar character string property table. Add to a column different from the column (S2
9) Store and determine whether the term in the character string is finished (S30), and if not finished, update the term number (S3)
1) and return to step S28. If it is determined in step S30 that the term in the character string has been completed, it is determined whether the character string in the characteristic table is completed (S32). If not, the character string number is updated (S33). Then, the process returns to step S28.

【００３４】同様に、例えば「電源を投入」と「電源を
切断」の既述がされている文書において、例えば「装置
Ａの電源を投入する」の反対の既述の文書要素を照合す
るために、類似背反用語データベースの背反用語欄で
「投入」と「切断」は背反であり「投入」を基準にする
と定義しておいて、類似背反関係設定手段は例えば「装
置Ａの電源を切断する」という文字列を「装置Ａの電源
を投入する」に変換する。この処理において、入力文書
文字列特性表に当初から含まれている「装置Ａの電源を
投入する」と「装置Ａの電源を切断する」などの列を保
存し、それぞれの行の別の列に「装置Ａの電源を投入す
る」を格納して基準の列とし、補足情報として「装置Ａ
の電源を切断する」の行のまた別の列に背反変換を行っ
たことを示すコードを格納して、基準の列をキーとして
設定して表形式のソートを実行することにより、「装置
Ａの電源を投入する」と「装置Ａの電源を切断する」の
二つの行が隣接する結果を得る。Similarly, in a document in which, for example, "power on" and "power off" are described, for example, the document element opposite to "power on the device A" is collated. In the conflicting term column of the similar conflicting term database, "input" and "disconnection" are defined as conflicting and based on "input", and the similar conflicting relation setting means, for example, "turns off the power of the apparatus A." Is converted to "turn on the power of the device A". In this processing, columns such as “turn on the power of the device A” and “turn off the power of the device A” which are originally included in the input document character string characteristic table are stored, and another column of each row is stored. Is stored as a reference column, and “Apparatus A is turned on” as supplementary information.
By storing a code indicating that contraversion was performed in another column of the row of “Power off the device” and executing a table-type sort by setting the reference column as a key, The result is that the two rows of “turn on the power of the device A” and “turn off the power of the device A” are adjacent.

【００３５】このようにして生成された類似文字列特性
表を表形式でソートして出力文書文字列特性表１５を作
成し、ソートして得られた出力文書文字列特性表から類
似関係をも表現する出力文書を生成するために、図３の
文書加工復元手段２５は、例えば「類似背反関係で変換
された文字列をも入力文書の文字列とともに出力する」
かどうかなどの類似関係表現形式指定規則２４に準拠し
て出力文書を生成する。なお、類似背反用語データベー
スで類似用語や背反用語の関係定義とともに、類似関係
の優先順位のような補足情報を格納しておき、それを出
力文書に反映させることもできる。The similar character string characteristics table generated in this way is sorted in a table format to create an output document character string characteristics table 15, and the similarity relation is also determined from the output document character string characteristics table obtained by sorting. In order to generate an output document to be represented, the document processing / restoring unit 25 in FIG. 3 outputs, for example, “a character string converted in a similar reciprocal relation together with a character string of an input document”.
An output document is generated in accordance with the similarity expression format specification rule 24 such as whether or not. In addition, it is also possible to store, in the similarity contradiction term database, supplementary information such as the similarity relation and the contradiction term definition together with the similarity relation priority order, and reflect the supplementary information in the output document.

【００３６】上記の単一文書の照合手段を利用して、図
４の原理によって複数の文書を照合することができる。
文字列特性表生成手段３２の処理機能が文字列特性表生
成手段１２の処理機能と異なるのは、文字列特性表生成
手段１２が単一の入力文書を処理するのに対して文字列
特性表生成手段３２が複数の入力文書を処理することで
ある。単一の入力文書の中にも一致する文字列や類似す
る文字列が存在するから、複数の入力文書も構成要素の
多い単一の入力の処理と同様であるという原理である。
したがって、図３の入力文書文字列特性表１３と類似文
字列特性表２３が図４では入力文書の個数に増える差異
があるだけであるから、図４の文字列ソート手段３９の
ソート方法を図３の文字列ソート手段１４のソート方法
と同等にするために、図４の特性表結合手段３７が入力
文書数の類似文字列特性表を単一の結合特性表３８に変
える。ただし、入力文書数が多いために類似文字列特性
表ならびに結合特性表の処理を簡略にすることが必要に
なる場合には、入力文書文字列特性表３３を保存してお
いて文書加工復元手段４２で使用する。A plurality of documents can be collated according to the principle of FIG. 4 by using the above-mentioned single document collating means.
The processing function of the character string property table generating means 32 is different from the processing function of the character string property table generating means 12 in that the character string property table generating means 12 processes a single input document whereas the character string property table generating means 12 processes a single input document. The generation means 32 processes a plurality of input documents. Since a matching character string or a similar character string exists in a single input document, the principle is that a plurality of input documents are similar to the processing of a single input having many components.
Accordingly, the input document character string characteristic table 13 and the similar character string characteristic table 23 shown in FIG. 3 differ only in the number of input documents in FIG. 4, so the character string sorting means 39 in FIG. In order to make it the same as the sorting method of the third character string sorting unit 14, the characteristic table combining unit 37 of FIG. 4 changes the similar character string characteristic table of the number of input documents into a single combining characteristic table 38. However, when it is necessary to simplify the processing of the similar character string characteristic table and the combination characteristic table due to the large number of input documents, the input document character string characteristic table 33 is stored and the document processing / restoring means is stored. Used at 42.

【００３７】図４の特性表結合手段３７の手順の例のフ
ローチャートは図１６となり、新しく作成する結合特性
表を初期化（Ｓ４１）し、処理する入力文書の文書番号
を初期化（Ｓ４２）し、入力文書数の類似文字列特性表
を結合して単一の決同特性表を生成する処理を開始（Ｓ
４３）する。この手順によって結合対象の入力文書の類
似文字列特性表が選定されるので当該の類似文字列特性
表を読み込み（Ｓ４４）、当該類似文字列特性表と入力
文書の関係を示す位置識別子などを付加（Ｓ４５）し、
結合特性表に書き込み（Ｓ４６）、入力文書の総数の処
理は終了（Ｓ４７）したかどうかを判別して、未終了で
あれば文書番号を更新（Ｓ４８）してＳ４４のステップ
に戻る。Ｓ４７のステップですべての類似文字列特性の
処理は終了したと判別された時点で結合処理を終了す
る。FIG. 16 is a flowchart showing an example of the procedure of the property table combining means 37 shown in FIG. 4. The newly created combined property table is initialized (S41), and the document number of the input document to be processed is initialized (S42). Starts a process of generating a single decision characteristic table by combining similar character string characteristic tables of the number of input documents (S
43). Since the similar character string characteristic table of the input document to be combined is selected by this procedure, the similar character string characteristic table is read (S44), and a position identifier indicating the relationship between the similar character string characteristic table and the input document is added. (S45),
It is determined whether or not the processing of the total number of input documents has been completed (S47), and if not completed, the document number is updated (S48) and the process returns to the step S44. When it is determined in step S47 that the processing of all similar character string characteristics has been completed, the combining processing ends.

【００３８】図４の文字列ソート手段３９は単一となっ
た結合特性表の全体を図３の文字列ソート手段１４と同
様に基準の文字列でソートして単一の出力文書文字列特
性表４０を作成し、文書加工復元手段４２は、入力文書
数の入力文書に共通に要望される上記の「類似背反関係
で変換された文字列をも入力文書の文字列とともに出力
する」かどうかなどを類似関係表現形式指定規則４１と
し、それに準拠して特性表に随伴している複数の当該文
書それぞれの識別子を使用して単一の出力文書文字列特
性表から単一ないし入力文書数の出力文書を生成して出
力文書格納装置４３へ出力する。なお、入力文書が複数
であっても単一文書の図１４のような相関表形式の出力
は有効であるから、図４の文書加工復元手段４２の機能
が図３の文書加工復元手段２５と同様でもよい。The character string sorting means 39 shown in FIG. 4 sorts the entire combined property table by a reference character string in the same manner as the character string sorting means 14 shown in FIG. A table 40 is created, and the document processing / restoring unit 42 determines whether or not the above-mentioned “the character string converted in a similar conflict is also output together with the character string of the input document” which is commonly requested for the input documents of the number of input documents. Is used as the similarity relation expression format specification rule 41, and the identifier of each of a plurality of documents associated with the property table is used in accordance with the rule 41 to determine the number of single or input documents from a single output document character string property table. An output document is generated and output to the output document storage device 43. Note that even if there are a plurality of input documents, the output of a single document in the form of a correlation table as shown in FIG. 14 is effective, so that the function of the document processing / restoring means 42 of FIG. The same may be applied.

【００３９】種々の文書にはそれぞれに固有の微妙な特
徴があるから、同一あるいは類似の内容であるにもかか
わらず文章内の用語の順序が相違するというような場合
があり、本質的には差別する必要がないからそのような
多様な表現を照合では無視することが要望される状況が
起こる。この状況において基準とする文字列を生成する
際に、例えば「文字列の中の用語の順序を規定する」と
いうような文書文字列化規則５１を図５のように文字列
特性表生成手段５２に対して設定して、入力文書格納装
置３１から入力した入力文書を文書文字列化規則に準拠
して文字列を変更して入力文書文字列特性表３３を生成
することにより、無用に多様な表現になっていてもソー
トした結果を隣接させることができる。なお、「文字列
の中の用語の順序を規定する」という設定が困難な場合
には、用語も文字コードによってソートが可能であるか
ら、例えば「文字列を用語でソートして文字列内の用語
の順序をコード体系で標準化する」という規則に替える
ことにより、本件で利用するソート方法を単一の文字列
に適用するだけでこの目的は達成できる。また、文字列
の全ての文字の順序を逆にする文書文字列化規則を設定
することにより、終端部分が類似する文字列を抽出する
ことができる。Since various documents have their own subtle characteristics, there are cases where the order of terms in a sentence is different even though they have the same or similar contents. A situation arises in which it is desired to ignore such various expressions in matching because there is no need to discriminate. When a reference character string is generated in this situation, for example, a document character string conversion rule 51 such as “specify the order of terms in the character string” is set as shown in FIG. By changing the character string of the input document input from the input document storage device 31 in accordance with the document character string conversion rule and generating the input document character string characteristic table 33, Even if it is an expression, the sorted result can be adjacent. Note that if it is difficult to set “define the order of terms in a character string”, the terms can also be sorted by character code. This objective can be achieved simply by applying the sorting method used in the present case to a single character string by replacing the rule of "standardizing the order of terms with a coding system". Also, by setting a document character string conversion rule that reverses the order of all characters in a character string, a character string having a similar end portion can be extracted.

【００４０】上記のように文書要素の文字列を照合した
結果を得た後に、その結果を再判定して、文書の文書要
素への分割や文字列化の処理を変更することにより、照
合結果を照合目的にさらに適合させることが可能な場合
がある。請求項１と請求項２と請求項３の照合を例とす
れば、請求項１と請求項２の最後の文字列「ソートに基
づく単一文書の構成要素文字列の一致関係の照合装置」
と請求項３の最後の文字列「ソートに基づく単一文書の
構成要素文字列の類似関係の照合装置」とは、読点によ
る分割では相違すると判別されるが、文字列の前の部分
「ソートに基づく単一文書の構成要素文字列の」までが
一致するからソートされた結果で隣接するので、例えば
「隣接文字列において前から２０文字以上が一致するな
らばその文字列を新しい文書構成要素として抽出する」
ことにより、請求項１と請求項２と請求項３で「ソート
に基づく単一文書の構成要素文字列の」が一致すると判
定でき、請求項１と請求項２は「一致関係の照合装置」
が一致するが請求項３のみは「類似関係の照合装置」と
相違しているという判別結果を得ることができるので、
より照合結果が詳細になる。After obtaining the result of collating the document element character strings as described above, the result is re-determined, and the process of dividing the document into document elements and changing the character string is performed. May be further adapted for matching purposes. Taking the collation of claim 1, claim 2 and claim 3 as an example, the last character string of claim 1 and claim 2 "a collation device for matching the character strings of constituent elements of a single document based on sorting"
It is determined that the character string “the collation device for the similarity of the component character strings of the single document based on the sort” is different in the division based on the reading point, but the part before the character string “sort Since the components of a single document based on a match up to "", they are adjacent in the sorted result. For example, if the adjacent characters match at least 20 characters from the front, the character string is replaced with the new document component. Extract as
Accordingly, it can be determined that "the character string of the constituent element of the single document based on the sort" matches in claim 1, claim 2, and claim 3, and claim 1 and claim 2 are "matching apparatus for matching relation".
Can be obtained, but only claim 3 is different from the "similarity matching device".
The collation result becomes more detailed.

【００４１】このように照合結果を詳細にするために、
図６では文字列ソート手段３９が作成した出力文書文字
列特性表４０の隣接する文字列を対照させて隣接文字列
相互関係規則の抽出規則に準拠して隣接文字列相互関係
規則５４を抽出する隣接文字列相互関係規則の抽出手段
５３を装備し、上記の「隣接文字列において前から２０
文字以上が一致するならばその文字列を新しい文書構成
要素として抽出する」という隣接文字列相互関係規則の
抽出規則を設定して上記の「ソートに基づく単一文書の
構成要素文字列の」という一致文字列を抽出する。In order to make the collation result more detailed,
In FIG. 6, adjacent character strings in the output document character string characteristic table 40 created by the character string sorting unit 39 are compared with each other to extract the adjacent character string mutual relation rule 54 in accordance with the extraction rule of the adjacent character string mutual relation rule. Equipped with extraction means 53 for the adjacent character string correlation rule,
If more than characters match, extract the character string as a new document component "and set the extraction rule of the adjacent character string correlation rule to the above" Single component character string of single document based on sorting " Extract the matched string.

【００４２】このようにして抽出された文字列を使用し
て前の照合結果より詳細な照合を行うために、図４の出
力文書文字列特性表３２を図５の設定された文書文字列
化規則５１に準拠して文字列を変更する文字列特性表生
成手段５２に替えることにより、例えば「登録文字列が
あればそれを文書から分割して抽出する」という規則を
文書文字列化規則として設定し、上記の抽出された文字
列「ソートに基づく単一文書の構成要素文字列の」をそ
の登録文字列として設定することにより、次の照合手順
において、請求項１と請求項２の最後の文字列「ソート
に基づく単一文書の構成要素文字列の一致関係の照合装
置」は「ソートに基づく単一文書の構成要素文字列の」
と「一致関係の照合装置」に分割され、請求項３の最後
の文字列「ソートに基づく単一文書の構成要素文字列の
類似関係の照合装置」は「ソートに基づく単一文書の構
成要素文字列の」と「類似関係の照合装置」に分割され
て入力文書文字列特性表に格納されるので、ソートされ
た結果では請求項１と請求項２と請求項３に一致する文
字列「ソートに基づく単一文書の構成要素文字列の」が
あるという照合結果を出力することができる。In order to perform more detailed collation than the previous collation result using the character string extracted in this way, the output document character string characteristic table 32 of FIG. 4 is converted into the set document character string of FIG. By replacing the character string characteristic table generating means 52 that changes the character string in accordance with the rule 51, for example, the rule that “if there is a registered character string, extract it from the document and extract it” as the document character string conversion rule By setting and setting the extracted character string “of a component character string of a single document based on sorting” as its registered character string, the following collation procedure is performed in the next collation procedure. The character string "Single-based single-document component character string matching relationship matching device" is "sort-based single-document component string."
And a "matching device for a matching relationship", and the last character string "matching device for similarity of components of a single document based on sorting" in Claim 3 is "a component of a single document based on sorting." Since it is divided into "character string" and "similarity collation device" and stored in the input document character string characteristic table, the character string "" which matches claim 1, claim 2, and claim 3 in the sorted result It is possible to output a collation result indicating that there is a component character string of a single document based on sorting.

【００４３】この照合手順の反復の中で、文字列化規則
設定手段５５の文字列化の変更に際して、反復を継続す
るのは隣接文字列相互関係規則が新たに抽出されてかつ
反復回数の制限数内に限定するという制約条件を設定す
ることによって、意図しない過剰な繰り返しを回避す
る。In the repetition of the collation procedure, when changing the character string conversion by the character string conversion rule setting means 55, the repetition is continued because the adjacent character string correlation rule is newly extracted and the number of repetitions is limited. By setting a constraint of limiting to a number, unintended excessive repetition is avoided.

【００４４】また、請求項１と請求項２を照合する場合
に、請求項１の読点で分割した第二の文字列「文書の構
成要素である文字列の相互の一致あるいは相違の関係を
判別する際に」は、請求項２の読点で分割した最初の文
字列「単一の文書の構成要素である文字列の相互の一致
あるいは相違の関係を判別する際に」の「単一の」を除
いた部分と一致しているから、「分割して抽出された文
字列はすべて登録文字列として設定する」という規則を
採用して上記の「登録文字列があればそれを文書から分
割して抽出する」という規則を文書文字列化規則として
設定することにより、請求項１の第１文字列は「文字列
および文字列群により構成される単一の文書において」
と第２文字列は「文書の構成要素である文字列の相互の
一致あるいは相違の関係を判別する際に」となり、請求
項２の第１文字列は「単一の」と第２文字列は「文書の
構成要素である文字列の相互の一致あるいは相違の関係
を判別する際に」となるので、請求項１と請求項２の第
２文字列は一致すると判定し、出力文書の中の第１文字
列を比較して差異を認識することになる。In the case where claims 1 and 2 are collated, the second character string divided by the reading point of claim 1 is used to determine whether the character strings constituting the document are mutually identical or different. The "when" is the "single" of the first character string "at the time of judging the mutual agreement or difference between the character strings that are the components of a single document" Since it matches the part excluding, the rule that "all the character strings extracted and extracted are set as registered character strings" is adopted and the above "registered character string, if any, is divided from the document. The first character string of claim 1 is defined as "in a single document composed of a character string and a character string group" by setting the rule of "extract by extracting" as a document character string conversion rule.
And the second character string is "when judging the mutual agreement or difference between the character strings which are the components of the document", and the first character string in claim 2 is "single" and the second character string. Is "when judging the mutual coincidence or difference between character strings that are constituent elements of the document". Therefore, it is determined that the second character strings in claims 1 and 2 match, and Are compared to recognize the difference.

【００４５】詳細な照合を行うための図６の隣接文字列
相互関係規則の抽出手段５３の手順の例のフローチャー
トは図１７となる。この例は隣接文字列相互関係規則の
抽出規則が単一の場合に相当しているので、隣接文字列
相互関係規則の抽出規則をソート結果の出力文書文字列
特性表４０の処理に使用する設定（Ｓ５１）を行ない、
処理対象の文字列番号として特性表の行番号を採用して
それを初期化（Ｓ５２）し、生成する隣接文字列関係規
則番号を初期化（Ｓ５３）して、隣接文字列相互関係規
則の抽出を開始（Ｓ５４）して、文字列が終了（Ｓ５
５）かどうかを判定し、未終了であれば隣接文字列を相
互に比較して関係規則の抽出処理へ進む。FIG. 17 is a flowchart showing an example of the procedure of the adjacent character string correlation rule extracting means 53 of FIG. 6 for performing detailed collation. Since this example corresponds to the case where the extraction rule of the adjacent character string correlation rule is a single, the setting that the extraction rule of the adjacent character string correlation rule is used for the processing of the output document character string characteristic table 40 of the sorting result (S51), and
The line number of the characteristic table is adopted as the character string number to be processed and initialized (S52), the generated adjacent character string relation rule number is initialized (S53), and the adjacent character string mutual relation rule is extracted. Is started (S54), and the character string ends (S5).
5) It is determined whether or not, if not completed, adjacent character strings are compared with each other, and the process proceeds to a related rule extraction process.

【００４６】隣接する文字列を相互に比較するので、対
象となる文字列の行番号を基準として、その次の行番号
を隣接文字列番号に設定（Ｓ５６）し、当該文字列番号
の文字列と隣接文字列番号の文字列とは共通用語を持つ
か（Ｓ５７）を判別して、共通用語を持たなければＳ６
１のステップへ進み持つならば隣接文字列関係規則の抽
出規則が成立するか（Ｓ５８）を判別し、抽出規則が成
立しなければＳ６１のステップへ進み成立するならば隣
接文字列関係規則の抽出と格納の処理を実行する。Since the adjacent character strings are compared with each other, the next line number is set to the adjacent character string number based on the line number of the target character string (S56), and the character string of the character string number is set. And whether the character string of the adjacent character string number has a common term (S57).
If the process proceeds to step 1, it is determined whether or not the extraction rule of the adjacent character string relation rule is satisfied (S58). If the extraction rule is not satisfied, the process proceeds to step S61, and if the extraction rule is satisfied, extraction of the adjacent character string relation rule is performed. And execute the storing process.

【００４７】ステップＳ５９では隣接文字列関係規則番
号を更新して、ステップＳ６０では成立した隣接文字列
関係規則を隣接文字列関係規則番号に基づき格納すると
いう規則成立の処理をして、ステップＳ６１は不成立の
隣接文字列を新しい基準の文字列に設定してステップＳ
５６へ戻る。In step S59, the adjacent character string relation rule number is updated. In step S60, the established adjacent character string relation rule is stored based on the adjacent character string relation rule number. The adjacent character string that is not satisfied is set as a new reference character string, and step S
Return to 56.

【００４８】図１７は隣接文字列相互関係規則の抽出規
則が単一であり共通用語を持つ隣接文字列が２個以下の
場合のフローチャートの例であるが、抽出規則が複数の
場合には抽出規則に番号を付与し、相互関係のある隣接
文字列が３個以上の場合には隣接文字列に第２番目が初
期値になるように番号を付与して、それらの番号をステ
ップＳ５７の後で初期化し、ステップＳ５８の後で更新
し、すべての抽出規則とすべての隣接文字列に対して成
立するかどうかを判別して、いずれも成立しない場合に
ステップＳ６１へ進むのが通常である。しかしながら、
ステップＳ５７の隣接文字列の相互間で共通用語を持つ
かどうかの判別と抽出規則は密接な関係を持つので、番
号の初期化および更新および終了の判定とステップＳ５
７およびＳ５７の処理の前後関係とは、抽出規則の内容
に依存して多様である。FIG. 17 is an example of a flowchart in the case where the extraction rule of the adjacent character string correlation rule is a single and the number of adjacent character strings having a common term is two or less. Numbers are assigned to rules, and when there are three or more adjacent character strings having mutual relation, numbers are assigned to the adjacent character strings so that the second becomes the initial value, and those numbers are added after step S57. , And is updated after step S58, it is determined whether or not the condition is satisfied for all extraction rules and all adjacent character strings. If none of the rules is satisfied, the process proceeds to step S61. However,
Since the determination of whether or not the adjacent character strings have a common term and the extraction rule in step S57 are closely related, the determination of the initialization, update, and end of the number and the determination in step S5 are performed.
7 and S57 are various depending on the contents of the extraction rule.

【００４９】図１８は図６の文字列化規則設定手段５５
の手順の例のフローチャートで、照合反復回数は過多
（Ｓ７１）かどうかを判別して過多であれば反復を終了
し、過多でなければ隣接文字列関係規則番号は更新され
ている（Ｓ７２）かどうかを判別して更新されてなけれ
ば抽出されたものがないのであるから反復を終了し、更
新されていれば隣接文字列関係規則を文書文字列化規則
に追加設定（Ｓ７３）して、図６の手順を文字列特性表
生成手段５２から繰り返す。図１８のフローによると、
隣接文字列関係規則番号は更新されている場合にも反復
回数が過多になると反復を打ち切ることになるという問
題が残るが、反復回数の制限数の設定を配慮することで
も対策になるし、隣接文字列関係規則番号は更新されて
いるにもかかわらず打ち切る場合には未処理の隣接文字
列関係規則を出力するなどの対策を取ればよい。FIG. 18 shows the character string conversion rule setting means 55 of FIG.
In the flowchart of the example of the procedure, it is determined whether or not the number of collation repetitions is excessive (S71). If it is excessive, the repetition is terminated. If not, is the adjacent character string relation rule number updated (S72)? If it has not been updated, the repetition is terminated because there is no extracted character, and if it has been updated, the adjacent character string relation rule is additionally set to the document character string conversion rule (S73). Step 6 is repeated from the character string characteristic table generation means 52. According to the flow of FIG.
Even if the adjacent character string relation rule number is updated, the problem remains that if the number of repetitions is excessive, the repetition will be aborted. If the character string relation rule number is to be discontinued despite being updated, a countermeasure such as outputting an unprocessed adjacent character string relation rule may be taken.

【００５０】上記の例では、「隣接文字列において前か
ら２０文字以上が一致するならばその文字列を新しい文
書構成要素として抽出する」という規則を隣接文字列相
互関係規則の抽出規則として設定して、上記の「登録文
字列があればそれを文書から分割して抽出する」という
規則を文書文字列化規則として設定した文字列特性表生
成手段５２で抽出した文字列を登録文字列として使用し
て、入力文書文字列特性表の生成を前の処理より詳細に
する手段を実現した。この例を拡張した形態として、複
数の登録文字列を格納表の形態で格納して多数の登録文
字列を文書文字列化規則で使用する手段が実現できる。
さらに、文字を逆順にする文書文字列化規則を設定した
照合を実行して一致する文字列を抽出して登録文字列に
設定する形態にして入力文書と同順と逆順で照合を繰り
返し、一定文字数以上の一致文字列を抽出することも実
現する。また、照合結果で一致した文字列のすべてを文
書文字列化規則の登録文字列に採用する形態にして文字
列の構成要素として特性表の文字列化を実施して図６の
反復手順を繰り返すことにより、一定の文字数以上の複
数箇所の同一あるいは類似の文字列を抽出する手段が実
現する。In the above example, the rule "extract the character string as a new document component if the preceding character matches at least 20 characters in the adjacent character string" is set as the extraction rule of the adjacent character string correlation rule. Then, the character string extracted by the character string characteristic table generating means 52, which has set the above-described rule of “if there is a registered character string, and divide it from the document and extract it” as a document characterizing rule, is used as the registered character string. Thus, a means for making the generation of the input document character string characteristic table more detailed than the previous processing is realized. As an extended form of this example, means for storing a plurality of registered character strings in the form of a storage table and using a large number of registered character strings in a document character string conversion rule can be realized.
In addition, collation is performed by setting a document string conversion rule that reverses the characters, and a matching character string is extracted and set as a registered character string. It is also possible to extract a matching character string that is longer than the number of characters. In addition, all the character strings that match in the collation result are adopted as the registered character strings of the document character string conversion rules, and the characteristic table is converted into a character string as a component of the character string, and the iterative procedure of FIG. 6 is repeated. As a result, a means for extracting the same or similar character string at a plurality of positions with a certain number of characters or more is realized.

【００５１】以上では文書構成要素として用語より大き
い文字列を要素とした場合について本発明に基づいた文
字列の照合手段を説明したが、分割要素を小さくして用
語を要素としても本発明は使用できる。その場合の例と
して請求項１の最後の上記の文字列「ソートに基づく単
一文書の構成要素文字列の一致関係の照合装置」と請求
項３の最後の上記の文字列「ソートに基づく単一文書の
構成要素文字列の類似関係の照合装置」を照合して図１
４の相関表形式の文書のような出力例を要望するなら図
１９になり、用語は短いので横に並べることも可能であ
るから図２０のようにも出力できる。In the above description, the character string collating means based on the present invention has been described for a case where a character string larger than a term is used as a document constituent element. However, the present invention can be used even if a divided element is reduced and a term is used as an element. it can. As an example of such a case, the above-mentioned character string “a collation device for matching the character strings of the constituent elements of a single document based on sorting” in claim 1 and the above-mentioned character string “single based on sorting” in claim 3 FIG. 1 shows a collation device for a similarity relation between component character strings of one document.
If an output example such as a document in the form of a correlation table shown in FIG. 4 is desired, FIG. 19 is used. Since the terms are short and can be arranged side by side, the output can be performed as shown in FIG.

【００５２】[0052]

【発明の効果】本発明は、以上説明したように文字列そ
のものの照合ができるので、以下に記載されるような効
果を奏する。According to the present invention, since the character string itself can be collated as described above, the following effects can be obtained.

【００５３】単一文書の中でも、記述の進行とともに内
容が微妙に変化する場合は多いので、記述者本人であっ
ても内容を確認する必要性が生まれるし、他人の読者に
あっては内容の変化の解釈に入る前に記述されている文
章そのものや使用されている用語そのものの同一部分と
相違部分を明確にすることが必要でその後に内容の差異
を検討し理解することに移行する。しかしながら、長い
文書に出てくる大量の用語あるいは文章を詳細に照合す
るための作業負担は大きく長い時間を必要とし、また人
手で行うと種々の状況に影響されて過失をも伴うので、
本発明の手段によって、自動的に照合されて一致と不一
致さらには類似と非類似が切り分けられた出力を得るこ
とにより、大きな作業負担を軽減し長い時間を短縮し人
手作業に伴う過失を解消することができる。In a single document, the content often changes slightly with the progress of the description, so that it is necessary for the writer himself to check the content, and for the reader of another person, the content is not changed. Before entering the interpretation of change, it is necessary to clarify the same and different parts of the written text itself and the terms used, and then move on to examining and understanding the differences in content. However, the burden of collating a large number of terms or sentences in a long document in detail requires a large amount of time, and if done manually, it is affected by various situations and involves negligence.
By means of the present invention, by automatically collating and obtaining an output in which a match and a disagreement and a similarity and a dissimilarity are separated, a large work load is reduced, a long time is reduced, and errors caused by manual work are eliminated. be able to.

【００５４】文書を照合するためには照合対象部分を抽
出する必要があり、抽出するためにキーワードを設定す
るとか文章を構文解析して特徴を抽出する従来技術があ
るが、本発明によれば入力文書を分割するだけで文字列
特性表を生成して照合できるので、そのような作業が不
要となる。In order to collate a document, it is necessary to extract a collation target portion. There is a conventional technique of extracting a feature by setting a keyword or extracting a sentence by parsing a sentence. Since the character string characteristic table can be generated and collated simply by dividing the input document, such a work becomes unnecessary.

【００５５】キーワードやキーワードの特性値を設定し
て文書を検索する従来の手段に相当して、文書要素や文
章に対応する文字列を設定して照合し、利用しやすい形
式の出力文書を生成して、設定した文書要素や文章と一
致あるいは類似の文書要素や文書を抽出することができ
るので、文書要素や文章をキーとした検索手段としても
利用できる。This is equivalent to the conventional means for searching a document by setting a keyword or a characteristic value of a keyword. A character string corresponding to a document element or a sentence is set and collated to generate an output document in a format that is easy to use. Then, a document element or document that matches or is similar to the set document element or sentence can be extracted, so that it can be used as a search unit using the document element or sentence as a key.

【００５６】目的を設定して照合した結果が大量な場合
には結果を整理する作業負担が大きくなるが、文字列特
性表の形式に従って文字列の文書内の位置の識別子や文
書名のような文書識別子や文書の構成要素としての文字
列と文書全体との関係などの文字列の特性を表現する情
報を随伴させて格納しており、それらを反映させて出力
文書を生成するので、整理作業の負担を軽減できて時間
も短縮でき、その作業で生じる過失を回避できる。ま
た、出力文書の表現形式に要望がある場合にはその要望
を一致関係表現形式指定規則で既述して文書加工復元手
段の出力文書の生成に反映することができるので、馴染
みの好い形式で結果を出力できる。When the result of collation after setting the purpose is large, the work load for organizing the result becomes large. However, according to the format of the character string characteristic table, the identifier of the position of the character string in the document or the document name such as the document name is used. It stores information that expresses the characteristics of character strings, such as the relationship between a document identifier, a character string as a component of the document, and the entire document, and generates an output document by reflecting the information. Can be reduced and the time can be shortened, and negligence caused by the work can be avoided. Also, if there is a request for the expression format of the output document, the request can be reflected in the generation of the output document by the document processing / restoring means by describing the request in the matching relation expression format specification rule. Can output the result.

【００５７】類似関係の照合においては、汎用的な辞書
で類似と定義されている場合にも差別して用語を使用し
ている特定領域の文書とか、また逆に文書利用者の領域
では文書作成者の領域で差別している用語を類似と判定
して情報収集を目指す文書もあるから、照合対象の入力
文書に対して適用可能な類似背反用語データベースに準
拠して類似背反関係設定手段が類似文字列特性表２３に
変換するので、領域ごとの特殊事情を活かして類似関係
を照合することができる。In the collation of similarity relations, when a general dictionary is defined as similar, a document in a specific area in which terms are used in a discriminative manner, or conversely, a document is created in a domain of a document user. Some documents aim to collect information by judging terms discriminating in the domain of the user as similar, and the similarity conflict setting means is similar based on the similarity conflict database applicable to the input document to be collated. Since the character string is converted into the character string characteristic table 23, the similarity can be collated by utilizing the special circumstances of each area.

【００５８】意味の反対の用語を格納した類似背反用語
データベースを使用することにより、肯定的文章を設定
してその反対の否定的文章を抽出することができる。By using the similar contradictory term database storing terms having the opposite meaning, it is possible to set a positive sentence and extract a negative sentence opposite thereto.

【００５９】複数の入力文書を照合する場合には、特性
表結合手段が単一の結合特性表に変えて単一となった結
合特性表の全体をソートするので、文書の個数によって
照合処理が制限される問題を解消できる。In the case of collating a plurality of input documents, the property table combining means sorts the entire combined property table into a single combined property table instead of a single combined property table. Eliminate limited issues.

【００６０】文書文字列化規則に準拠して文字列を変更
して入力文書文字列特性表を生成することができるの
で、文字列要素の順序などが相違しているにもかかわら
ず類似の意味を持つ文字列を順序の相違に依存しないで
照合することができる。また、文字列の全ての文字の順
序を逆にすることにより、終端部分が類似する文字列を
抽出するなどの、日本語特有の文章の終りを重視した照
合を行うことができる。Since the input document character string characteristic table can be generated by changing the character strings in accordance with the document character string formation rules, the character strings have similar meanings even though the order of the character string elements is different. Can be matched without depending on the difference in order. Also, by reversing the order of all the characters in the character string, it is possible to perform a collation that emphasizes the end of a sentence unique to Japanese, such as extracting a character string having a similar end portion.

【００６１】実行済みの照合結果の長い文字列の中に多
くの共通部分があるにもかかわらず一部の相違部分のた
めに相違と判定される場合に、隣接文字列相互関係規則
を抽出して文書文字列化規則の設定を変更しながら照合
手順を反復するので、共通部分を抽出することができ
る。If there is a lot of common part in the long character string of the executed collation result and it is determined that the difference is due to a part of the difference, the adjacent character string correlation rule is extracted. Therefore, the collating procedure is repeated while changing the setting of the document characterizing rule, so that a common part can be extracted.

【００６２】隣接文字列相互関係規則を抽出して照合を
反復する際の回数が制限数内に制約されるので、不要に
反復を継続することを回避できる。Since the number of times of extracting the adjacent character string correlation rule and repeating the collation is limited to the limited number, unnecessary repetition can be avoided.

【００６３】目次の付いた入力文書を照合する際には、
文書内の位置の識別子あるいは文章要素と文書全体との
関係の表現の識別子としてワープロ文書ファイルの見出
しの制御コードなどを使うことができるので、入力文書
の構成を活かした形式の出力文書を生成することができ
る。When collating an input document with a table of contents,
Since the control code of the heading of the word processing document file can be used as the identifier of the position in the document or the identifier of the expression of the relationship between the sentence element and the entire document, an output document in a format utilizing the structure of the input document is generated. be able to.

【００６４】図面を主体とした文書ファイルに対して
も、ファイルの中から文字列を抽出して特性表を生成す
ることができるので、図面と図面を文字列あるいは文字
列群で照合するとか図面形式の文書と文章形式の文書と
の照合ができ、文書ファイルの照合対象を広げることが
できる。For a document file mainly composed of a drawing, a character string can be extracted from the file and a characteristic table can be generated. A document in a text format and a document in a text format can be collated, and the collation target of a document file can be expanded.

【００６５】文字列変換での特殊性とかソート結果の一
致する文字列個数などを、随伴情報として特性表に付加
することができるので、出力文書での照合結果の表現を
明確にすることができる。Since the specialty in the character string conversion and the number of character strings that match the sort result can be added to the characteristic table as accompanying information, the expression of the collation result in the output document can be clarified. .

【００６６】文書を分割して文書構成要素を抽出する処
理は任意の文字列を単位として実行できるので、用語を
単位とした詳細な照合ができ、また、照合対象を文書単
位から文章単位に変えて、文章間の構成や用語の差異を
比較することができる。The process of dividing a document and extracting document components can be executed in units of arbitrary character strings, so that detailed matching can be performed in terms of terms, and the matching target can be changed from a document unit to a sentence unit. Thus, differences in composition and terminology between sentences can be compared.

[Brief description of the drawings]

【図１】単一文書の一致関係の照合装置の原理図であ
る。FIG. 1 is a diagram illustrating the principle of a matching apparatus for matching a single document;

【図２】一致関係表現形式指定規則による文書復元手段
の原理図である。FIG. 2 is a principle diagram of a document restoring unit based on a matching expression format designation rule.

【図３】単一文書の類似関係の照合装置の原理図であ
る。FIG. 3 is a diagram illustrating the principle of a similar document collation apparatus for a single document.

【図４】複数文書の類似関係の照合装置の原理図であ
る。FIG. 4 is a principle diagram of a collation device for a similarity relationship between a plurality of documents.

【図５】文書文字列化規則による文字列特性表生成手段
の原理図である。FIG. 5 is a principle diagram of a character string characteristic table generating means based on a document character string conversion rule.

【図６】複数文書の類似関係の照合を反復して詳細化す
る装置の原理図である。FIG. 6 is a principle diagram of an apparatus for repetitively refining collation of similarity between a plurality of documents.

【図７】請求項１を処理する場合に読点によって分割し
た文字列の例を示す図である。FIG. 7 is a diagram showing an example of a character string divided by a reading point when claim 1 is processed.

【図８】請求項１を処理する場合の入力文書文字列特性
表の例を示す図である。FIG. 8 is a diagram showing an example of an input document character string characteristic table when processing claim 1;

【図９】図１を処理する場合の入力文書文字列特性表の
例を示す図である。FIG. 9 is a diagram showing an example of an input document character string characteristic table when processing FIG. 1;

【図１０】文字列特性表生成手段（１２）の処理手順の
例を示すフローチャートである。FIG. 10 is a flowchart illustrating an example of a processing procedure of a character string characteristic table generation unit (12).

【図１１】文書復元手段（１６）の処理手順の例を示す
フローチャートである。FIG. 11 is a flowchart illustrating an example of a processing procedure of a document restoration unit (16).

【図１２】請求項１と請求項２を処理する場合の照合結
果を入力文書の形式で出力した例を示す図である。FIG. 12 is a diagram showing an example in which a collation result in processing claims 1 and 2 is output in the form of an input document.

【図１３】請求項１と請求項２を処理する場合の照合結
果をソート結果の通りに出力した例を示す図である。FIG. 13 is a diagram showing an example in which a collation result in processing claims 1 and 2 is output as a sort result.

【図１４】請求項１と請求項２を処理する場合の照合結
果を相関表の形式で出力した例を示す図である。FIG. 14 is a diagram showing an example in which the result of collation in processing claims 1 and 2 is output in the form of a correlation table.

【図１５】類似背反関係設定手段（２１）の処理手順の
例を示すフローチャートである。FIG. 15 is a flowchart illustrating an example of a processing procedure of a similar conflicting relationship setting unit (21).

【図１６】特性表結合手段（３７）の処理手順の例を示
すフローチャートである。FIG. 16 is a flowchart illustrating an example of a processing procedure of a characteristic table combining unit (37).

【図１７】隣接文字列相互関係規則の抽出手段（５３）
の処理手順の例を示すフローチャートである。FIG. 17 is a means for extracting an adjacent character string correlation rule (53).
6 is a flowchart showing an example of the processing procedure of FIG.

【図１８】文字列化規則設定手段（５５）の処理手順の
例を示すフローチャートである。FIG. 18 is a flowchart illustrating an example of a processing procedure of a character string conversion rule setting unit (55).

【図１９】用語を構成要素とした照合結果を縦配置の相
関表の形式で出力した例を示す図である。FIG. 19 is a diagram showing an example in which a result of collation using terms as constituent elements is output in the form of a vertically arranged correlation table.

【図２０】用語を構成要素とした照合結果を横配置の文
章の形式で出力した例を示す図である。FIG. 20 is a diagram illustrating an example in which a collation result having terms as constituent elements is output in the form of a horizontally arranged sentence.

[Explanation of symbols]

１１、３１入力文書格納装置１２、３２、５２文字列特性表生成手段１３、３３入力文書文字列特性表１４、３９文字列ソート手段１５、４０出力文書文字列特性表１６文書復元手段１７、４３出力文書格納装置１８一致関係表現形式指定規則１９、２５、４２文書加工復元手段２１、３４類似背反関係設定手段２２、３５類似背反用語データベース２３、３６類似文字列特性表２４、４１類似関係表現形式指定規則３７特性表結合手段３８結合特性表５１文書文字列化規則５３隣接文字列相互関係規則の抽出手段５４隣接文字列相互関係規則５５文字列化規則設定手段Ｓ１〜Ｓ７３処理手順を示すフロー内のステップＹ条件が成立した場合の分岐（Ｙｅｓ）Ｎ条件が成立しなかった場合の分岐（Ｎｏ） 11, 31 Input document storage device 12, 32, 52 Character string characteristic table generating means 13, 33 Input document character string characteristic table 14, 39 Character string sorting means 15, 40 Output document character string characteristic table 16 Document restoring means 17, 43 Output document storage device 18 Matching relationship expression format designating rules 19, 25, 42 Document processing / restoring means 21, 34 Similar conflicting relationship setting means 22, 35 Similar conflicting term database 23, 36 Similar character string characteristic table 24, 41 Similarity relationship expression format Specification rule 37 Characteristic table combining means 38 Binding characteristic table 51 Document character string conversion rule 53 Adjacent character string mutual relation rule extracting means 54 Adjacent character string mutual relation rule 55 Character string conversion rule setting means S1 to S73 In the flow showing the processing procedure Step Y: Branch if the condition is satisfied (Yes) N Branch if the condition is not satisfied (No)

Claims

[Claims]

In a single document composed of a character string and a character string group, when judging a mutual match or a difference between character strings which are constituent elements of the document, the input document storage device reads the document from the input document storage device. And the character string property table generating means extracts the character string from the component, and a document identifier such as the document name together with the position identifier of the component in the document and the relationship between the component and the document. Generates an input document character string characteristic table that expresses the character string characteristics by including the expression code, etc., and the character string sorting means sorts the input document character string characteristic table by character string and creates an output document character string characteristic table Then, the document restoring means generates a document in which the expression of the matching relation is added to the format of the document from the output document character string characteristic table and outputs the document to the output document storage device, or outputs the output document character string characteristic table. A collation device for matching the character strings of constituent elements of a single document based on sorting, which is output to an output document storage device

2. A method according to claim 1, wherein the character string is a constituent element of a single document, and the character string is generated from the input document storage device. Express the character string characteristics by extracting the character string from the element and including a document identifier such as the document name and a relation expression code between the component and the document along with the position identifier of the component in the document The input document character string characteristic table is generated, the character string sorting means sorts the input document character string characteristic table by character strings to create an output document character string characteristic table, and the document processing / restoring means generates the output document character string characteristic table. A collating unit for matching the character strings of the constituent elements of a single document based on sorting, wherein the collating unit generates an output document in accordance with a rule for specifying a matching relationship expression format and outputs the generated document to an output document storage device

3. When determining the mutual agreement or difference between character strings that are constituent elements of a single document, the document is input from an input document storage device, and a character string characteristic table generating unit outputs Express the character string characteristics by extracting the character string from the element and including a document identifier such as the document name and a relation expression code between the component and the document along with the position identifier of the component in the document A similar conflicting relationship setting means converts the input document character string property table into a similar character string property table based on a similar conflicting term database, and the character string sorting means generates a similar character string property table. The output document character string characteristic table is created by sorting the table by character strings, and the document processing / reconstructing means uses the output document character string characteristic table to convert a document in a format easily associated with the document into a similar contradictory term database. Collating similarity of component character strings of a single document based on sorting, wherein an output document is generated in accordance with the rules for specifying a similarity relation expression format with reference to a source document and output to an output document storage device apparatus

4. In order to match a plurality of documents, when judging a mutual match or a difference between character strings as constituent elements, the same number of documents are input from the input document storage device and the characters are input. The column characteristic table generation means extracts the character string from the component, and includes a document identifier such as the document name, a relation expression code between the component and the document, together with a position identifier of the component in the document. The input document character string characteristic table of the number of documents expressing the characteristics of the character string is thereby generated, and the similar contradiction relation setting means converts the input document character string characteristic table of the relevant document number to the document in accordance with the similar contradiction term database. Number of similar character string characteristics tables, the characteristic table combining means converts the number of similar character string characteristics tables into a single combining characteristic table, and the character string sorting means converts the single combining characteristic table into a character string. sort To create a single output document character string characteristic table, and the document processing / restoring means uses the single output document character string characteristic table to convert the single Generating a number of output documents, referring to a similar contradiction term database and an input document character string characteristic table, performing a generation process in conformity with a similarity relation expression format designation rule, and outputting the generated output documents to an output document storage device. Device for collating similarity between component character strings of multiple documents based on sorting

5. When collating a plurality of documents, when judging a mutual agreement or a difference between character strings as constituent elements, the number of the documents is inputted from the input document storage device and the characters are inputted. The column characteristic table generating means extracts the character string from the component, changes the character string in accordance with the document characterizing rule, and a document identifier such as the document name together with the position identifier of the component in the document. An input document character string characteristic table of the number of the documents expressing the character string characteristics by including a relation expression code between the component and the document is generated, and the similar conflict setting means conforms to the similar conflict database. The input document character string characteristic table of the number of documents is converted into a similar character string characteristic table of the number of documents, and the characteristic table combining unit changes the similar character string characteristic table of the number of documents into a single combined characteristic table, and Row saw The output means sorts the single combined property table by character string to create a single output document character string property table, and the document processing / reconstruction means outputs the same number of documents from the single output document character string property table. When generating a single or the number of output documents in a format that can be easily correlated with each other, refer to the similar contradiction term database and the input document character string property table, and perform the generation process based on the similarity expression format specification rules Collating device for similarity relation among component character strings of a plurality of documents based on sorting, wherein the collation is output to an output document storage device

6. In order to collate a plurality of documents, when judging a mutual match or difference between character strings as constituent elements, the same number of documents are input from the input document storage device and the characters are input. The column characteristic table generating means extracts the character string from the component, changes the character string in accordance with the document characterizing rule, and a document identifier such as the document name together with the position identifier of the component in the document. An input document character string characteristic table of the number of the documents expressing the character string characteristics by including a relation expression code between the component and the document is generated, and the similar conflict setting means conforms to the similar conflict database. The input document character string characteristic table of the number of documents is converted into a similar character string characteristic table of the number of documents, and the characteristic table combining unit changes the similar character string characteristic table of the number of documents into a single combined characteristic table, and Row saw The sorting means sorts the single combined property table by the character string to create a single output document character string property table, and the extracting means of the adjacent character string correlation rule uses adjacent character strings in the output document character string property table. The character string is compared and the adjacent character string correlation rule is extracted in accordance with the extraction rule of the adjacent character string correlation rule. When generating a single or the number of output documents in a format that can be easily correlated with each other, refer to the similar contradiction term database and the input document character string property table, and perform the generation process based on the similarity expression format specification rules Then, when the adjacent character string correlation rule is extracted, the character string conversion rule setting means changes the setting of the document character string rule and repeats the collation procedure by a limited number of times. To continue within Wherein, matching device similar relationship component string plurality of documents based on the sorting

7. In a single document file in which a character string and a character string group are digitized by a character code, a character string is used when judging a mutual match or a difference between character strings which are constituent elements of the document. And sort by the original form of the character string group, and determine the matching relationship between adjacent character strings and character string groups based on the result. The document identifier such as the document name is added to the character string and the character string group together with the position identifier in the document. And output the discrimination result by adding the expression of the matching relationship to the format of the document by attaching the relation expression code between the component and the document, or output the characteristic table itself of the sorting result Characteristic Matching Method for Component Strings of Single Document

8. A method for matching a matching character string of a component of a single document, wherein the result of the matching method for matching is converted to a format conforming to rules for specifying a matching relationship expression format and output.

9. According to the matching method for matching, the character string is replaced with a reference character string in accordance with a similar contradiction term database, and then sorted and output in accordance with a similarity relation expression format designation rule. Matching Method for Similarity Relationships

10. A method for collating a plurality of documents based on the collation method for similarity of a single document, wherein all character strings of the plurality of documents are sorted and output documents are output. Collation method

11. A collation method, comprising: changing the arrangement order of character strings based on the collation method according to a document character string conversion rule;

12. An adjacent character string interrelation rule is extracted in accordance with an adjacent character string interrelation rule extraction rule by comparing adjacent character string groups obtained by sorting based on the collation method. A collation method characterized by setting a string correlation rule to a document string conversion rule and repeating a collation procedure.