JP6664201B2

JP6664201B2 - Abutting processing device, abutting processing method and abutting processing program

Info

Publication number: JP6664201B2
Application number: JP2015230918A
Authority: JP
Inventors: 貫一郎望月
Original assignee: Pasco Corp
Current assignee: Pasco Corp
Priority date: 2015-11-26
Filing date: 2015-11-26
Publication date: 2020-03-13
Anticipated expiration: 2035-11-26
Also published as: JP2017097719A

Description

本発明は、複数のレコードがそれぞれ格納された複数のデータベースの突合処理を行う突合処理装置及び突合処理方法並びに突合処理プログラムに関する。 The present invention relates to a matching processing device, a matching processing method, and a matching processing program for performing a matching process on a plurality of databases each storing a plurality of records.

従来、複数のデータベース（例えば、データベースＡ及びデータベースＢ）のそれぞれに含まれている複数のレコードを突合した場合、突合状態を区分するコードを各レコードへ付与することによって突合結果を得ている。なお、本明細書におけるレコードとは、データベース内に格納されているデータのかたまりであって、例えば住所情報などのような固有の情報を示すデータのかたまりを指している。また、本明細書における突合とは、複数のデータベースのそれぞれのレコードに含まれているデータの一部又は全部が一致するか否かを調べるために、データベース同士（あるいは、データベースの各レコード同士）を照合することである。 Conventionally, when a plurality of records included in each of a plurality of databases (for example, a database A and a database B) are matched, a matching result is obtained by assigning a code for classifying the matching state to each record. It should be noted that a record in the present specification is a group of data stored in a database, and indicates a group of data indicating unique information such as, for example, address information. In addition, the term “matching” as used in the present specification means that databases are compared with each other (or between records of a database) in order to check whether a part or all of data included in each record of a plurality of databases match. Is to match.

従来の突合処理では、例えば、データベースＡ内のレコードとデータベースＢ内のレコードが一致した場合や何らかの関連付けが得られた場合には、そのレコードに対して一致したレコードであることを示すコード『１』を付与し、レコードが不一致となった場合には不一致のレコードであることを示すコード『−１』を付与するなど、単純なコードによってレコードの一致／不一致の識別を行っている。 In the conventional matching process, for example, when a record in the database A matches a record in the database B or when some association is obtained, a code “1” indicating that the record matches the record is used. , And a code "-1" indicating that the record is a non-coincidence is added when the records do not coincide with each other.

また、従来技術として、下記の特許文献１〜３に開示されている技術が存在する。特許文献１には、文字列階層データ（住所名など）や符号文字列データ（郵便番号など）が記憶された複数のデータベースを合成して、ツリー構造を持つ合成データベースを作成する技術が開示されている。特許文献１の開示技術によれば、文字列階層データは住所名などの階層構造を持つデータであり、文字列階層データを階層単位（ノード単位）で比較することで一致／不一致を判定する。 In addition, as conventional techniques, there are techniques disclosed in Patent Documents 1 to 3 below. Patent Literature 1 discloses a technique in which a plurality of databases storing character string hierarchical data (such as an address name) and code character string data (such as a postal code) are combined to create a combined database having a tree structure. ing. According to the technology disclosed in Patent Literature 1, character string hierarchical data is data having a hierarchical structure such as an address name, and a match / mismatch is determined by comparing the character string hierarchical data in a layer unit (node unit).

また、特許文献２には、住所コードデータベース（数字の配列で表される住所情報を含む）と、住所データベース（地名情報を含む）とをマージ（統合）して、マージデータベースを作成する技術が開示されている。特許文献２の開示技術によれば、データベースのマージを行う際、住所（町村名の部分）を右端から１文字ずつ削除して検索し、ヒットした箇所の文字列までコード化されていると判断して記憶する。この際、コード化されている箇所を識別するために、例えば記号『／』を挿入する。 Further, Patent Literature 2 discloses a technique of merging (integrating) an address code database (including address information represented by an array of numbers) with an address database (including place name information) to create a merge database. It has been disclosed. According to the technology disclosed in Patent Literature 2, when merging the databases, the address (the name of the town and village) is deleted one character at a time from the right end and searched, and it is determined that the character string of the hit location is coded. And memorize. At this time, for example, a symbol “/” is inserted to identify a coded portion.

また、特許文献３には、ユーザにより入力された検索住所文字列に基づいて、住所データベースの中から該当する住所データを検索する技術が開示されている。特許文献３の開示技術によれば、ユーザの入力した検索住所文字列に該当する住所が存在しない場合は、１つ以上の住所がヒットするまで、検索住所文字列を後方から所定量（例えば、住所表記単位）ずつ削除して検索を繰り返す。 Further, Patent Literature 3 discloses a technique for searching for address data in an address database based on a search address character string input by a user. According to the technology disclosed in Patent Literature 3, if there is no address corresponding to the search address character string input by the user, the search address character string is rearranged by a predetermined amount (for example, until one or more addresses are hit). Address) and repeat the search.

特開２０１０−１３４８２８号公報（要約書、段落［０１１６］、図１、図３０、図３１、図３４）JP 2010-134828 A (abstract, paragraph [0116], FIG. 1, FIG. 30, FIG. 31, FIG. 34) 特開２００３−１６７９１２号公報（要約書、段落［００２８］、図６）JP-A-2003-167912 (abstract, paragraph [0028], FIG. 6) 特開２００３−１８６８８０号公報（要約書）JP 2003-186880 A (abstract)

従来技術では、各レコードにおける一致／不一致の状況の識別（例えば、上述したコード『１』、『−１』の付与）や、不一致となった各レコードについて修正すべき箇所の特定及び修正（例えば、上述した特許文献１、２の開示技術）を試みている。しかしながら、各レコードについて修正すべき箇所が特定されたとしても、適切な修正値を見つけ出すことは容易ではない。また、不一致となったレコードが特定されたとしても、不一致となった要因がどこにあり、データベース内のレコード全体において同様の理由で不一致となったレコードがどのくらい生じているのかなどを把握できなければ、不一致となったレコードを効率良くかつ適切に修正することは困難である。 In the related art, identification of a matching / mismatching state in each record (for example, the above-mentioned code “1” or “−1” is added), and identification and correction of a portion to be corrected for each mismatched record (for example, (Disclosed technologies of Patent Documents 1 and 2 described above). However, even if a part to be corrected is specified for each record, it is not easy to find an appropriate correction value. Also, even if a mismatched record is identified, it is necessary to know where the cause of the mismatch and how many mismatched records occur in the entire record in the database for the same reason. It is difficult to efficiently and appropriately correct the mismatched records.

また、特許文献１〜３の開示技術では、主に、住所情報などの文字列情報を前方から検索する前方検索手法、又は、後方から検索する後方検索手法が用いられている。しかしながら、このような前方検索手法又は後方検索手法は、例えば比較すべきデータベースの各レコード間に複数の差異（複数の不一致となる箇所）が存在する場合、これらの差異が生じている複数の箇所を効率的かつ正確に検出できない可能性がある。 Further, in the disclosed techniques of Patent Literatures 1 to 3, a forward search method for searching for character string information such as address information from the front or a backward search method for searching from behind is mainly used. However, such a forward search method or a backward search method uses, for example, when a plurality of differences (a plurality of unmatched portions) exist between records of a database to be compared, a plurality of locations where these differences occur. May not be detected efficiently and accurately.

上記の問題点を考慮して、本発明は、複数のデータベースのそれぞれに含まれている複数のレコードを突合し、相互のレコードにおいて一致／不一致が生じている箇所を含む突合結果を出力することで、データベース同士の各レコードの一致／不一致の状況を明確に示すことができる突合処理装置及び突合処理方法並びに突合処理プログラムを提供することを目的とする。 In view of the above problems, the present invention matches a plurality of records included in each of a plurality of databases, and outputs a matching result including a portion where a match / mismatch occurs in each record. It is an object of the present invention to provide a matching processing device, a matching processing method, and a matching processing program that can clearly indicate the state of match / mismatch between records in databases.

上記の目的を達成するため、本発明の突合処理装置は、文字列情報を含む複数のレコードがそれぞれ格納された複数のデータベースの突合処理を行う突合処理装置であって、
前記文字列情報について所定の規則に従って複数の分節を設定する分節設定部と、
前記複数のデータベースのうちの第１データベース内のレコードに含まれる文字列情報と、前記複数のデータベースのうちの第２データベース内のレコードに含まれる文字列情報とを比較して、相互に対応する分節内の文字列情報の一致又は不一致を検出するレコード比較部と、
比較したすべての分節において共通の文字列情報を含むレコードが前記第１及び第２データベースの両方に存在する場合には、前記比較したすべての分節において共通の文字列情報を含むことを示す情報を、前記比較したすべての分節において共通の文字列情報を含む前記第１及び第２データベース内のレコードのそれぞれに関連付ける突合結果生成部とを、
有する。 In order to achieve the above object, a matching processing device of the present invention is a matching processing device that performs a matching process on a plurality of databases each storing a plurality of records including character string information,
A segment setting unit that sets a plurality of segments according to a predetermined rule for the character string information;
The character string information included in the record in the first database of the plurality of databases is compared with the character string information included in the record in the second database of the plurality of databases. A record comparing unit that detects a match or mismatch of character string information in a segment;
When a record including the common character string information in all of the compared segments is present in both the first and second databases, information indicating that the common character string information is included in all of the compared segments is stored. A matching result generation unit for associating with each of the records in the first and second databases including character string information common to all of the compared segments;
Have.

また、上記の目的を達成するため、本発明の突合処理方法は、文字列情報を含む複数のレコードがそれぞれ格納された複数のデータベースの突合処理を行う突合処理装置であって、
前記文字列情報について所定の規則に従って複数の分節を設定する分節設定ステップと、
前記複数のデータベースのうちの第１データベース内のレコードに含まれる文字列情報と、前記複数のデータベースのうちの第２データベース内のレコードに含まれる文字列情報とを比較して、相互に対応する分節内の文字列情報の一致又は不一致を検出するレコード比較ステップと、
比較したすべての分節において共通の文字列情報を含むレコードが前記第１及び第２データベースの両方に存在する場合には、前記比較したすべての分節において共通の文字列情報を含むことを示す情報を、前記比較したすべての分節において共通の文字列情報を含む前記第１及び第２データベース内のレコードのそれぞれに関連付ける突合結果生成ステップとを、
有する。 In order to achieve the above object, a matching processing method of the present invention is a matching processing device that performs a matching process on a plurality of databases each storing a plurality of records including character string information,
A segment setting step of setting a plurality of segments according to a predetermined rule for the character string information;
The character string information included in the record in the first database of the plurality of databases is compared with the character string information included in the record in the second database of the plurality of databases. A record comparison step of detecting a match or mismatch of character string information in a segment;
When a record including the common character string information in all of the compared segments is present in both the first and second databases, information indicating that the common character string information is included in all of the compared segments is stored. Generating a matching result associated with each of the records in the first and second databases including character string information common to all of the compared segments.
Have.

また、上記の目的を達成するため、本発明の突合処理プログラムは、文字列情報を含む複数のレコードがそれぞれ格納された複数のデータベースの突合処理を行う突合処理方法をコンピュータにより実行させるための突合処理プログラムであって、
前記文字列情報について所定の規則に従って複数の分節を設定する分節設定ステップと、
前記複数のデータベースのうちの第１データベース内のレコードに含まれる文字列情報と、前記複数のデータベースのうちの第２データベース内のレコードに含まれる文字列情報とを比較して、相互に対応する分節内の文字列情報の一致又は不一致を検出するレコード比較ステップと、
比較したすべての分節において共通の文字列情報を含むレコードが前記第１及び第２データベースの両方に存在する場合には、前記比較したすべての分節において共通の文字列情報を含むことを示す情報を、前記比較したすべての分節において共通の文字列情報を含む前記第１及び第２データベース内のレコードのそれぞれに関連付ける突合結果生成ステップとを、
有する突合処理方法をコンピュータにより実行させるための突合処理プログラムである。 Further, in order to achieve the above object, a match processing program according to the present invention includes a match processing method for causing a computer to execute a match processing method for performing a match processing of a plurality of databases each storing a plurality of records including character string information. A processing program,
A segment setting step of setting a plurality of segments according to a predetermined rule for the character string information;
The character string information included in the record in the first database of the plurality of databases is compared with the character string information included in the record in the second database of the plurality of databases. A record comparison step of detecting a match or mismatch of character string information in a segment;
When a record including the common character string information in all of the compared segments is present in both the first and second databases, information indicating that the common character string information is included in all of the compared segments is stored. Generating a matching result associated with each of the records in the first and second databases including character string information common to all of the compared segments.
This is a program for causing a computer to execute the abutting method.

本発明は、上記の構成又は処理を有しており、複数のデータベースのそれぞれに含まれている複数のレコードを突合し、相互のレコードにおいて一致／不一致が生じている箇所を含む突合結果を出力することで、データベース同士の各レコードの一致／不一致の状況を明確に示すことができるという効果を有する。また、本発明は、データベース同士の各レコードの一致／不一致の状況を定量的に示すことで、この一致／不一致の状況の内訳を確認できるようにするという効果を有する。さらに、本発明は、不一致となったレコードについて、データベース間の不整合を解決するための作業を迅速に進めることができるようにするという効果を有する。 The present invention has the above-described configuration or processing, matches a plurality of records included in each of a plurality of databases, and outputs a matching result including a portion where a match / mismatch occurs in a mutual record. This has the effect that the status of the match / mismatch between the records in the databases can be clearly shown. Further, the present invention has an effect that it is possible to confirm the details of the status of the match / mismatch by quantitatively indicating the status of the match / mismatch of each record between the databases. Further, the present invention has an effect that the work for resolving the inconsistency between databases can be promptly performed for the inconsistent records.

また、本発明は、特に、住所情報を含む台帳データベースと、地図などを表す画像情報に関連付けられて同じく住所情報を含む地図データベースとの突合処理に適用された場合において著しい効果を有する。住所情報を含む複数のデータベースにおいては、特に、国土における地番などに不整合が生じているという課題があり、こうした不整合の解決作業を的確かつ迅速に行うことが、行政や土地開発上の課題となっていた。本発明は、土地台帳と住所の突合検討において、本発明の実施の形態において説明するような一致／不一致の状況を明確に示すコード体系を用いて、適切なキー項目の値を迅速に見つけ出すことを可能とし、不突合となった要因を知ることができるという効果を有する。 Further, the present invention has a remarkable effect particularly when applied to a reconciliation process between a ledger database including address information and a map database including address information associated with image information representing a map or the like. In the case of multiple databases containing address information, in particular, there is the problem of inconsistencies in the lot numbers, etc. in the national land. Had become. The present invention is to quickly find an appropriate key item value by using a code system that clearly indicates a match / mismatch situation as described in the embodiment of the present invention in the examination of a land register and an address. And it is possible to know the cause of the mismatch.

本発明の実施の形態に共通する突合処理装置の構成の一例を示すブロック図である。It is a block diagram showing an example of composition of a matching processing device common to an embodiment of the invention. 本発明の実施の形態に共通する突合処理の概要を示すフローチャートである。It is a flowchart which shows the outline | summary of the matching process common to embodiment of this invention. 本発明の実施の形態に共通する突合処理における事前設定処理の一例を示すフローチャートである。It is a flowchart which shows an example of the pre-setting process in the matching process common to the embodiment of the present invention. 本発明の実施の形態に共通する突合処理における突合コード設定処理の一例を示すフローチャートである。It is a flowchart which shows an example of the joining code setting process in the joining process common to the embodiment of the present invention. 本発明の実施の形態における突合処理を説明するために用いられるデータベースＡ及びＢの一例を示す図である。It is a figure showing an example of databases A and B used for explaining matching processing in an embodiment of the invention. 本発明の実施の形態における突合処理の処理過程におけるデータベースＡ及びＢの第１の状態の一例を示す図である。It is a figure showing an example of the 1st state of databases A and B in the processing process of the matching processing in an embodiment of the invention. 本発明の実施の形態において、文字列情報に対して３つの分節が設定された場合のフィルタの一例、及び、フィルタに対応して設定されている突合コードの一例を模式的に示す図である。FIG. 5 is a diagram schematically illustrating an example of a filter when three segments are set for character string information, and an example of a matching code set corresponding to the filter in the embodiment of the present invention. . 本発明の実施の形態において、文字列情報に対して４つの分節が設定された場合のフィルタの一例、及び、フィルタに対応して設定されている突合コードの一例を模式的に示す図である。FIG. 5 is a diagram schematically illustrating an example of a filter when four segments are set for character string information, and an example of a matching code set corresponding to the filter in the embodiment of the present invention. . 本発明の実施の形態において、文字列情報に対して５つの分節が設定された場合のフィルタの一例、及び、フィルタに対応して設定されている突合コードの一例を模式的に示す図である。FIG. 7 is a diagram schematically illustrating an example of a filter when five segments are set for character string information, and an example of a matching code set corresponding to the filter in the embodiment of the present invention. . 本発明の実施の形態における突合処理の処理過程におけるデータベースＡ及びＢの第２の状態の一例を示す図である。It is a figure showing an example of the 2nd state of databases A and B in the processing process of matching processing in an embodiment of the invention. 本発明の実施の形態における突合処理の処理過程におけるデータベースＡ及びＢの第３の状態の一例を示す図である。It is a figure showing an example of the 3rd state of databases A and B in the processing process of the matching processing in an embodiment of the invention. 本発明の実施の形態における突合処理において得られる最終的な突合結果を含むデータベースＡ及びＢの状態の一例を示す図である。FIG. 7 is a diagram illustrating an example of a state of databases A and B including a final matching result obtained in a matching process according to the embodiment of the present invention. 図８に示す状態において、突合コードが、レコード数を示す情報を有するようさらに拡張された場合の一例を示す図である。FIG. 9 is a diagram illustrating an example of a case where the match code is further extended to have information indicating the number of records in the state illustrated in FIG. 8. 本発明の実施の形態において、最終的な突合結果から特定の突合コード『１１＿』が設定されたレコードのみを抽出した状態の一例を示す図である。FIG. 13 is a diagram illustrating an example of a state where only records in which a specific match code “11_” is set are extracted from the final match result in the embodiment of the present invention. 本発明の実施の形態に係る実施例において、最終的な突合結果から特定の突合コード『１１＿１１』が設定されたレコードのみを抽出した状態の一例を示す図である。FIG. 13 is a diagram showing an example of a state where only records in which a specific match code “11_11” is set are extracted from the final match result in the example according to the embodiment of the present invention. 本発明の実施の形態に係る実施例において、最終的な突合結果から特定の突合コード『１１＿１＿』が設定されたレコードのみを抽出した状態の一例を示す図である。FIG. 13 is a diagram showing an example of a state where only records in which a specific match code “11_1_” is set are extracted from the final match result in the example according to the embodiment of the present invention. 本発明の実施の形態に係る実施例において、最終的な突合結果から特定の突合コード『１１１＿１』が設定されたレコードのみを抽出した状態の一例を示す図である。FIG. 13 is a diagram illustrating an example of a state where only records in which a specific match code “111_1” is set are extracted from the final match result in the example according to the embodiment of the present invention. 図１１のテーブルの４番目のレコードに係る住所情報を、地図データベースに基づく地図上に重ね合わせて表示画面上に表示した状態の一例を示す図である。FIG. 12 is a diagram illustrating an example of a state in which address information related to a fourth record in the table of FIG. 11 is displayed on a display screen by superimposing the address information on a map based on a map database. 本発明の実施の形態に係る実施例において、最終的な突合結果から特定の突合コード『１１１１１：１１』が設定されたレコードのみを抽出した状態であり、さらに「面積」フィールドが表示された状態の一例を示す図である。In the example according to the embodiment of the present invention, a state in which only a record in which a specific match code “11111: 11” is set is extracted from a final match result, and a state in which an “area” field is displayed It is a figure showing an example of. 本発明に係る突合処理によって２つのデータベース（台帳データベース及び地図データベース）で１対１に対応していると判断されたレコードについて、当該対応していると判断された各レコードの共通の項目（「面積」フィールド）の値を用いて作成された散布図の一例を示す図である。Regarding the records determined to correspond one-to-one in the two databases (ledger database and map database) by the reconciliation process according to the present invention, a common item (““ FIG. 21 is a diagram illustrating an example of a scatter diagram created using values of an “area” field).

以下、図面を参照しながら、本発明の実施の形態について説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

図１は、本発明の実施の形態に共通する突合処理装置の構成の一例を示すブロック図である。図１に図示されている突合処理装置１０は、データベース読み取り部１１、突合処理部１２を有している。また、突合処理装置１０には、複数のデータベースが記憶された記憶媒体２０、操作入力部３０、表示部４０が接続されている。 FIG. 1 is a block diagram showing an example of a configuration of a matching processing apparatus common to the embodiments of the present invention. The collision processing apparatus 10 illustrated in FIG. 1 includes a database reading unit 11 and a collision processing unit 12. Further, a storage medium 20 storing a plurality of databases, an operation input unit 30, and a display unit 40 are connected to the reconciliation processing apparatus 10.

データベース読み取り部１１は、記憶媒体２０に記憶されている複数のデータベース（図１のデータベースＡ及びＢ）を読み取る機能を有している。突合処理装置１０によって読み取られたデータベースは、突合処理部１２が処理を行うために、例えば突合処理装置１０内の主記憶装置（メインメモリ、図１には不図示）などに一時的に記憶される。 The database reading unit 11 has a function of reading a plurality of databases (databases A and B in FIG. 1) stored in the storage medium 20. The database read by the abutting processing device 10 is temporarily stored in, for example, a main storage device (main memory, not shown in FIG. 1) in the abutting processing device 10 so that the abutting processing unit 12 performs processing. You.

突合処理部１２は、データベース読み取り部１１によって読み取られた複数のデータベース（図１のデータベースＡ及びＢ）内の各レコードの突合処理を行う機能を有しており、レコード比較部１３、突合結果生成部１４、突合結果出力部１５を備えている。 The matching processing unit 12 has a function of performing a matching process on each record in a plurality of databases (databases A and B in FIG. 1) read by the database reading unit 11. And a matching result output unit 15.

レコード比較部１３は、複数のデータベース（図１のデータベースＡ及びＢ）のそれぞれの各レコードに含まれる文字列情報を相互に比較して、文字列情報の一致又は不一致を検出する突合処理を行う機能を有している。また、レコード比較部１３は分節設定部１６を有している。分節設定部１６は、所定の分節構造に係る規則に従って各レコードに含まれる文字列情報を分節化し、文字列情報に対して複数の分節を設定する機能を有している。なお、文字列情報の分節化とは、文字列情報を区分けして複数の部分（各部分は少なくとも１文字以上を含む）に分けることであり、本明細書では、区分けされた部分を分節と呼ぶ。文字列情報の分節化によって、文字列情報に対して、例えば後述する図４に示されているように複数の分節が設定される。 The record comparing unit 13 performs a matching process of comparing character string information included in each record of each of a plurality of databases (databases A and B in FIG. 1) with each other and detecting a match or mismatch of the character string information. Has a function. The record comparing unit 13 has a segment setting unit 16. The segment setting unit 16 has a function of segmenting the character string information included in each record according to a rule regarding a predetermined segment structure, and setting a plurality of segments for the character string information. The segmentation of the character string information is to divide the character string information into a plurality of parts (each part includes at least one character). In this specification, the divided parts are referred to as segments. Call. By segmenting the character string information, a plurality of segments are set for the character string information, for example, as shown in FIG. 4 described later.

突合結果生成部１４は、レコード比較部１３による比較結果に基づいて、複数のデータベース（図１のデータベースＡ及びＢ）のそれぞれの各レコードに関する突合結果を生成する機能を有している。突合結果生成部１４は、複数のデータベース（図１のデータベースＡ及びＢ）のそれぞれの各レコードの突合結果を、例えば後述する突合コード（突合結果を統合したコード情報）として生成することが可能である。突合結果出力部１５は、突合結果生成部１４によって生成された突合結果を出力して、データベース内のデータとして記録させたり表示部４０に表示させたりする機能を有している。 The matching result generating unit 14 has a function of generating a matching result for each record of a plurality of databases (databases A and B in FIG. 1) based on the comparison result by the record comparing unit 13. The matching result generation unit 14 can generate a matching result of each record of a plurality of databases (databases A and B in FIG. 1) as, for example, a matching code (code information integrating the matching results) described later. is there. The comparison result output unit 15 has a function of outputting the comparison result generated by the comparison result generation unit 14, recording the result as data in a database, and displaying the data on the display unit 40.

また、記憶媒体２０は、データベースを記憶する機能を有しており、例えば、ハードディスク、光記録媒体、磁気記録媒体、不揮発性メモリなどにより実現可能である。なお、記憶媒体２０は、ネットワークなどを介して突合処理装置１０がアクセス可能な場所に配置されていてもよい。なお、図１には複数のデータベースとして２つのデータベース（データベースＡ及びＢ）が同一の記憶媒体２０に記憶されている状態が図示されているが、複数のデータベースはそれぞれ異なる記憶媒体２０に記憶されていてもよい。以下では、２つのデータベースにおける突合処理が行われる場合について主に説明するが、３つ以上のデータベースにおける突合処理が行われてもよい。 The storage medium 20 has a function of storing a database, and can be realized by, for example, a hard disk, an optical recording medium, a magnetic recording medium, a nonvolatile memory, or the like. Note that the storage medium 20 may be arranged at a location accessible by the matching apparatus 10 via a network or the like. Although FIG. 1 shows a state in which two databases (databases A and B) are stored in the same storage medium 20 as a plurality of databases, the plurality of databases are stored in different storage media 20 respectively. May be. Hereinafter, a description will be given mainly of a case where the matching process is performed in two databases. However, the matching process may be performed in three or more databases.

記憶媒体２０に記憶されているデータベースはデータの集合体であり、文字列情報を含む複数のデータを有している。データベースに含まれているデータは、例えば、所定の格納形式（例えば、表形式）に基づいて、レコード単位でデータベースに格納されている。なお、各レコードに含まれる文字列情報は任意の種類の情報であり、例えば市町村、大字、小字、地番、枝番などの階層構造を有する住所情報などが含まれ得る。また、複数のデータベースに含まれるレコード数は同一であってもよく、あるいは異なっていてもよい。 The database stored in the storage medium 20 is an aggregate of data, and has a plurality of data including character string information. The data included in the database is stored in the database in record units based on, for example, a predetermined storage format (for example, a table format). The character string information included in each record is any type of information, and may include, for example, address information having a hierarchical structure such as municipalities, large letters, small letters, lot numbers, and branch numbers. Further, the number of records included in a plurality of databases may be the same or different.

また、操作入力部３０は、突合処理装置１０におけるユーザ操作を可能とする機能を有しており、例えばマウスやキーボードなどの入力インタフェースにより実現可能である。操作入力部３０を用いることで、ユーザは、突合処理装置１０に読み取らせるデータベースの指定、突合処理における設定、突合処理の開始指示、表示部４０における情報の表示指示などを始めとした様々な設定及び指示を突合処理装置１０に入力することが可能である。 The operation input unit 30 has a function of enabling a user operation in the matching processing apparatus 10, and can be realized by an input interface such as a mouse or a keyboard. By using the operation input unit 30, the user can specify a database to be read by the amalgamation processing device 10, set in the amalgamation process, instruct a start of the amalgamation process, display an information on the display unit 40, and various other settings. And instructions can be input to the matching processing apparatus 10.

また、表示部４０は、複数のデータベースのそれぞれに格納されているデータや突合処理部１２によって生成された突合結果などを、ユーザが視認可能な情報として提供する機能を有しており、例えばモニタ又はディスプレイなどによって実現可能である。また、操作入力部３０及び表示部４０は、操作入力機能と表示機能が一体化されたタッチパネルなどによって実現されてもよい。 In addition, the display unit 40 has a function of providing data stored in each of the plurality of databases and a match result generated by the match processing unit 12 as information that can be visually recognized by a user. Alternatively, it can be realized by a display or the like. Further, the operation input unit 30 and the display unit 40 may be realized by a touch panel or the like in which the operation input function and the display function are integrated.

なお、図１では、突合処理装置１０の各機能がブロックによって模式的に図示されているが、これらのブロックで表されている各機能は、例えば、ハードウェア又はＣＰＵ（Central Processing Unit：中央処理装置）がソフトウェア（プログラム）を実行することによって実現可能である。突合処理装置１０は、例えば汎用ＰＣ（Personal Computer：パーソナルコンピュータ）によって実現可能であり、突合処理装置１０における各処理は、本発明の実施の形態における各処理（例えば、後述の図２Ａ〜２Ｃに示すフローチャートに含まれる各処理）の実行命令が記述されたプログラムをＣＰＵが実行することで実現可能である。 In FIG. 1, each function of the matching processing apparatus 10 is schematically illustrated by blocks, but each function represented by these blocks is, for example, hardware or a CPU (Central Processing Unit). (Device) executes software (program). The matching processing apparatus 10 can be realized by, for example, a general-purpose PC (Personal Computer), and each processing in the matching processing apparatus 10 is performed by each processing in the embodiment of the present invention (for example, see FIGS. 2A to 2C described later). The processing can be realized by the CPU executing a program in which an execution instruction of each process included in the illustrated flowchart is described.

また、図１には不図示であるが、汎用ＰＣで実現される突合処理装置１０は、データやプログラムなどを一時的に記憶する主記憶装置（メインメモリ）を備えており、例えば、データベース読み込み部１１によって読み込まれたデータベース内のデータ、処理中に変更又は生成されたデータ、ＣＰＵで実行すべきプログラムなどを主記憶装置に一時的に記憶しながら、本発明の実施の形態における処理を行うことが可能である。 Although not shown in FIG. 1, the matching processing device 10 implemented by a general-purpose PC includes a main storage device (main memory) for temporarily storing data, programs, and the like. The processing in the embodiment of the present invention is performed while temporarily storing data in the database read by the unit 11, data changed or generated during processing, programs to be executed by the CPU, and the like in the main storage device. It is possible.

次に、図２Ａ〜２Ｃを参照しながら、本発明の実施の形態における突合処理について説明する。図２Ａは本発明の実施の形態に共通する突合処理の概要を示すフローチャート、図２Ｂは本発明の実施の形態に共通する突合処理における事前設定処理（図２Ａに示すステップＳ１００の事前設定処理）の一例を示すフローチャート、図２Ｃは本発明の実施の形態に共通する突合処理における突合コード設定処理（図２Ａに示すステップＳ２００の突合コード設定処理）の一例を示すフローチャートである。なお、図２Ａ〜２Ｃに示すフローチャートの各処理は、図１に図示されている突合処理装置１０により実行される。以下では、本発明に対する理解が容易となるよう、突合処理を行う対象とする２つのデータベースＡ及びＢの各レコードに、図３に示す文字列情報が含まれている場合を一例に挙げながら説明する Next, the matching process in the embodiment of the present invention will be described with reference to FIGS. 2A to 2C. FIG. 2A is a flowchart showing an outline of the abutting process common to the embodiment of the present invention, and FIG. 2B is a presetting process in the abutting process common to the embodiment of the present invention (the advance setting process of step S100 shown in FIG. 2A). FIG. 2C is a flowchart showing an example of a match code setting process (match code setting process of step S200 shown in FIG. 2A) in the match process common to the embodiment of the present invention. Each process of the flowcharts shown in FIGS. 2A to 2C is executed by the matching processing device 10 shown in FIG. In the following, in order to facilitate understanding of the present invention, a description will be given by taking as an example a case where each record of two databases A and B to be subjected to the matching process includes the character string information shown in FIG. Do

図２Ａに示すように、本発明の実施の形態における突合処理は、事前設定処理（ステップＳ１００）と、突合コード設定処理（ステップＳ２００）とに大別される。ステップＳ１００の事前設定処理では、後続の突合コード設定処理において使用されるパラメータなどの設定が行われる。また、ステップＳ２００の突合コード設定処理では、実際に突合処理の対象となる複数のデータベース（例えば、データベースＡ及びＢ）内のレコードを比較し、各レコードに対して突合結果を示す情報（例えば、突合コード）を付与する処理が行われる。 As shown in FIG. 2A, the matching process in the embodiment of the present invention is roughly classified into a pre-setting process (step S100) and a matching code setting process (step S200). In the pre-setting processing of step S100, setting of parameters used in the subsequent matching code setting processing is performed. In the match code setting process in step S200, records in a plurality of databases (for example, databases A and B) that are actually subjected to the match process are compared, and information indicating a match result (for example, A process of giving a matching code) is performed.

事前設定処理では、図２Ｂに示すように、突合処理装置１０は、記憶媒体２０に記憶されている２つのデータベースＡ及びＢを読み出し、これらのデータベースＡ及びＢに含まれている各レコードの突合コード欄を初期化するとともに（例えば、初期値として『０』を設定）、各レコードに対して共通の分節構造を設定し、さらに、その分節構造に応じたフィルタを設定する（ステップＳ１０１）。なお、設定された分節構造及びフィルタに関する情報は、後続の処理で用いるために突合処理装置１０によって保持される。 In the pre-setting process, as shown in FIG. 2B, the matching processing device 10 reads out the two databases A and B stored in the storage medium 20 and matches each record included in these databases A and B. A code column is initialized (for example, “0” is set as an initial value), a common segment structure is set for each record, and a filter according to the segment structure is set (step S101). The set segment structure and filter information are held by the matching processing apparatus 10 for use in subsequent processing.

ステップＳ１０１で設定される分節構造は、各レコードに含まれる文字列情報を所定の文字数ごとに分節化した構造であり、文字列情報に対して分節構造を設定することで、文字列情報が複数の分節によって構成されているとみなすことができるようになる。例えば、図４に示すように、データベースＡ及びＢの各レコード（１０桁の文字列情報を含む）に対して、上位４桁（左から１桁目〜４桁目）の文字列を含む分節Ｓ１、中位３桁（左から５桁目〜７桁目）の文字列を含む分節Ｓ２、下位３桁（左から８桁目〜１０桁目）の文字列を含む分節Ｓ３の３つの分節を設定することが可能である。 The segment structure set in step S101 is a structure in which character string information included in each record is segmented for each predetermined number of characters. Can be regarded as being composed of segments. For example, as shown in FIG. 4, for each record (including 10-digit character string information) of databases A and B, a segment including a character string of the upper 4 digits (the first to fourth digits from the left). S1, a segment S2 including a character string of middle three digits (fifth to seventh digits from the left) and a segment S3 including a character string of lower three digits (eighth to tenth digits from the left) Can be set.

なお、ステップＳ１０１では、データベースＡ及びＢの各レコードに含まれる文字列情報の長さ（文字数）や、後続の突合コード設定処理において比較する文字列情報の位置などに応じて、任意の分節構造を設定することが可能である。例えば、図４の例では、文字列情報に対して３つの分節を設定するとともに、各分節の長さを左から順に４桁、３桁、３桁となるよう設定しているが、分節の個数及び長さは任意に設定可能である。また、図４の例では、文字列情報全体を包含するように（すなわち、各分節の長さの総和が文字列情報全体の長さと等しくなるように）分節を設定しているが、文字列情報の一部（端部又は途中を除いた一部）のみに対して分節を設定してもよい。また、ステップＳ１０１において設定される分節構造として、文字列情報に対してあらかじめ設定されている分節構造を利用してもよい。例えば、文字列情報として住所情報が含まれている場合には、市町村、大字、小字、地番、枝番などの階層を各分節として設定することが可能である。 In step S101, an arbitrary segment structure may be set according to the length (number of characters) of the character string information included in each record of the databases A and B and the position of the character string information to be compared in the subsequent matching code setting processing. Can be set. For example, in the example of FIG. 4, three segments are set for the character string information, and the length of each segment is set to four digits, three digits, and three digits in order from the left. The number and length can be set arbitrarily. In the example of FIG. 4, the segments are set so as to cover the entire character string information (that is, the sum of the lengths of the respective segments is equal to the entire length of the character string information). A segment may be set only for a part of the information (a part excluding the end or the middle). Further, as the segment structure set in step S101, a segment structure preset for character string information may be used. For example, when address information is included as character string information, it is possible to set hierarchies such as municipalities, large letters, small letters, lot numbers, and branch numbers as segments.

また、ステップＳ１０１において設定されるフィルタは、設定された分節構造に含まれる各分節のうち、後続の突合コード設定処理においてどの分節の組み合わせを比較対象とするかを設定するためのものである。例えば、図４の例のように文字列情報に対して３つの分節が設定された場合、その分節構造に対応するフィルタとして、３つの分節の組み合わせを比較対象とするよう設定することが可能である。 The filter set in step S101 is used to set which segment combination of the segments included in the set segment structure is to be compared in the subsequent matching code setting process. For example, when three segments are set for the character string information as in the example of FIG. 4, it is possible to set a combination of the three segments as a comparison target as a filter corresponding to the segment structure. is there.

図５Ａは、図４の例のように文字列情報に対して３つの分節が設定された場合のフィルタの一例、及び、フィルタに対応して設定されている突合コードの一例を模式的に示す図である。図５Ａの縦方向の配列（行）は各フィルタの識別情報（ＬＶＬ＝１〜７）を表し、図５Ａの横方向の配列（列）は各フィルタが用いられた場合に比較対象とする分節（Ｓ１〜Ｓ３）を表している。 FIG. 5A schematically shows an example of a filter when three segments are set for character string information as in the example of FIG. 4, and an example of a match code set corresponding to the filter. FIG. The vertical array (row) in FIG. 5A represents identification information (LVL = 1 to 7) of each filter, and the horizontal array (column) in FIG. 5A is a segment to be compared when each filter is used. (S1 to S3).

図５Ａに模式的に示されているフィルタは、後続の突合コード設定処理においてレコードの検索が行われる際（後述する図２ＣのステップＳ２０７の処理）に用いられる。各フィルタ内の『比較』と記載されている分節は、データベースＡ及びＢの各レコードに関して共通の文字列情報が存在するかどうかの判断が行われることを表す。一方、ハッチングが施された分節は、共通の文字列情報に関する判断が行われないことを表す。具体的には、例えばＬＶＬ＝１のフィルタは、すべての分節Ｓ１〜Ｓ３において『比較』と記載されている。したがって、ＬＶＬ＝１のフィルタを用いた場合には、すべての分節Ｓ１〜Ｓ３（この例では１０桁の文字列情報全体）において共通の文字列情報が存在するかどうかの判断が行われる。また、例えばＬＶＬ＝２のフィルタは、分節Ｓ３にはハッチングが施されており、分節Ｓ１及びＳ２において『比較』と記載されている。したがって、ＬＶＬ＝２のフィルタを用いた場合には、分節Ｓ１及びＳ２の両方において共通の文字列情報が存在するかどうかの判断が行われる一方、分節Ｓ３については判断が行われない。 The filter schematically illustrated in FIG. 5A is used when a record search is performed in the subsequent matching code setting process (the process of step S207 in FIG. 2C described later). The segment described as “comparison” in each filter indicates that it is determined whether or not common character string information exists for each record of the databases A and B. On the other hand, hatched segments indicate that no determination is made regarding common character string information. Specifically, for example, a filter with LVL = 1 is described as “comparison” in all the segments S1 to S3. Therefore, when the filter of LVL = 1 is used, it is determined whether or not there is common character string information in all the segments S1 to S3 (in this example, the entire character string information of 10 digits). Also, for example, in the filter of LVL = 2, the segment S3 is hatched, and the segment S1 and S2 are described as “comparison”. Therefore, when the filter of LVL = 2 is used, it is determined whether or not there is common character string information in both of the segments S1 and S2, while no determination is made for the segment S3.

また、図５Ａには、各フィルタを用いて共通の文字列情報を検出した場合に、レコードに設定される突合コードが示されている。この突合コードは分節の個数に等しい桁数を有しており、突合コードの各桁と各分節の位置とが対応している。例えば、図５Ａに示す例では、文字列情報は３つの分節に区切られており、各分節に対応して突合コードも３桁に設定されている。 FIG. 5A shows a match code set in a record when common character string information is detected using each filter. This matching code has the same number of digits as the number of segments, and each digit of the matching code corresponds to the position of each segment. For example, in the example shown in FIG. 5A, the character string information is divided into three segments, and the matching code is set to three digits corresponding to each segment.

突合コードは、各フィルタに固有のコードである。突合コードの各桁は分節Ｓ１〜Ｓ３のそれぞれに対応しており、共通の文字列情報が存在するかどうかの判断が行われる分節（図５Ａの『比較』と記載されている分節）に対応する桁には、値『１』が設定されている。また、共通の文字列情報が存在するかどうかの判断が行われない分節（図５Ａのハッチングが施された分節）に対応する桁には、『＿』（アンダーバー）が設定されている。 The matching code is a code unique to each filter. Each digit of the matching code corresponds to each of the segments S1 to S3, and corresponds to the segment for determining whether or not common character string information exists (the segment described as “Compare” in FIG. 5A). The value “1” is set in the digit to be set. In addition, “_” (under bar) is set in a digit corresponding to a segment (segment indicated by hatching in FIG. 5A) for which it is not determined whether there is common character string information.

また、突合コードは、対応するフィルタを用いて共通の文字列情報が存在するかどうかの判断が行われた結果、フィルタに設定されているすべての分節で共通の文字列情報を持つレコードがデータベースＡ及びＢの両方に存在している場合、これらのレコードに対して設定される。例えば突合コード『１１１』が設定されたレコードは、分節Ｓ１〜Ｓ３のすべてで共通の文字列情報を持つレコードがデータベースＡ及びＢの両方に存在していると判断されたものであることを表している。また、例えば突合コード『＿１１』が設定されたレコードは、分節Ｓ２及びＳ３で共通の文字列情報を持つレコードがデータベースＡ及びＢの両方に存在していると判断されたものであることを表している。したがって、あるレコードに設定されている突合コードを参照すれば、そのレコードが、データベースＡ及びＢの両方に共通のレコードであって、当該共通のレコードがどの分節で共通の文字列情報を持っているかを把握することが可能となる。 Also, as for the match code, as a result of determining whether common character string information exists using the corresponding filter, records having common character string information in all segments set in the filter are stored in the database. If it exists in both A and B, it is set for these records. For example, a record in which the match code "111" is set indicates that a record having common character string information in all of the segments S1 to S3 is determined to exist in both the databases A and B. ing. Further, for example, a record in which the match code “_11” is set indicates that a record having common character string information is determined to exist in both the databases A and B in the segments S2 and S3. ing. Therefore, by referring to the matching code set in a certain record, the record is a record common to both the databases A and B, and the common record has common character string information in which segment. Can be grasped.

また、図５Ｂ及び図５Ｃは、文字列情報に対して４つの分節及び５つの分節がそれぞれ設定された場合のフィルタの一例、及び、一致した場合に設定される突合コードの一例を模式的に示す図である。フィルタの個数（すなわち、ＬＶＬの数に相当）は、分節構造の分節の個数をＮとした場合、最大２^Ｎ−１個（Ｎ＝３の場合は最大７個、Ｎ＝４の場合は最大１５個、Ｎ＝５の場合は最大３１個）となる。ただし、必ずしもすべてのフィルタを用いる必要はない。また、後述する突合コード設定処理では、最初にＬＶＬ＝１のフィルタを使用し、続いてＬＶＬの値をインクリメントしながら、ＬＶＬ＝２のフィルタ、ＬＶＬ＝３のフィルタ、・・・を順次使用する。どのような分節の組み合わせを有するフィルタをどのＬＶＬに設定するかは任意に変更可能であるが、比較対象とする分節の個数がより多いフィルタ（すなわち、図５Ａ〜図５Ｃにおいて『比較』と記載された箇所がより多いフィルタ）を用いた突合コード設定処理が優先的に実行されるようにすることが望ましい。 FIGS. 5B and 5C schematically show an example of a filter when four segments and five segments are respectively set for character string information, and an example of a match code set when they match. FIG. The number of filters (i.e., corresponding to the number of LVLs) is 2 ^N -1 maximum when N is the number of segments in the segment structure (up to 7 when N = 3 and up to 7 when N = 4). 15 and a maximum of 31 when N = 5). However, not all filters need to be used. In the matching code setting process described later, the filter of LVL = 1 is used first, and then the filter of LVL = 2, the filter of LVL = 3,... Are sequentially used while incrementing the value of LVL. . Although it is possible to arbitrarily change which LVL is set to a filter having what kind of segment combination, a filter having a larger number of segments to be compared (ie, described as “comparison” in FIGS. 5A to 5C) It is desirable that the matching code setting process using a filter having more locations is performed preferentially.

なお、上述した図５Ａ〜図５Ｃは、本発明に係るフィルタのセットの概念を説明するために模式的に示されたものであり、当業者であれば、このようなフィルタのセットを実体的なデータとして準備する必要はなく、文字列情報に含まれる適切な分節を比較対象としたり、比較対象から除外したりするなどの処理を適宜実行することによって実現可能であることは明らかである。 5A to 5C described above are schematically shown to explain the concept of a filter set according to the present invention, and those skilled in the art will recognize such a filter set as a substantial one. It is not necessary to prepare as appropriate data, and it can be realized by appropriately executing a process such as making an appropriate segment included in the character string information a comparison target or excluding it from the comparison target.

さらに事前準備処理において、突合処理装置１０は、後続の突合コード設定処理において最初に用いるフィルタ（初期フィルタ）を変数ＬＶＬとして設定する（ステップＳ１０３）。ステップＳ１０３では任意のフィルタの設定が可能であるが、例えば、図５Ａのフィルタのセットにおいて比較対象とする分節の個数が最も多いＬＶＬ＝１のフィルタを初期フィルタとして設定することが望ましい。また、初期フィルタの設定と共に、突合コード設定処理において用いるフィルタや、最後の処理に用いる最終フィルタなどをユーザが設定できるようにしてもよい。なお、ユーザが明示的に最終フィルタを設定せず、自動的に、すべてのフィルタを用いた処理（すなわち、すべての分節の組み合わせを比較する処理）が行われるように設定されてもよい。 Further, in the preliminary preparation processing, the matching processing device 10 sets a filter (initial filter) used first in the subsequent matching code setting processing as a variable LVL (step S103). In step S103, an arbitrary filter can be set. For example, it is desirable to set, as the initial filter, a filter of LVL = 1, which has the largest number of segments to be compared in the filter set of FIG. 5A. In addition to the setting of the initial filter, the user may be able to set a filter used in the matching code setting process, a final filter used in the last process, and the like. Note that the processing may be set so that the process using all filters (ie, the process of comparing all combinations of segments) is automatically performed without the user explicitly setting the final filter.

上述の事前設定処理においてパラメータなどの設定が完了した後、突合コード設定処理では、図２Ｃに示すように、突合処理装置１０は、データベースＡ及びＢのそれぞれにおいて、現在設定されているフィルタ（ここでは初期フィルタ）によって規定されている分節内の各レコードの文字列情報を参照して、同一の文字列情報を含むレコード数を算出し、そのレコード数の最大値を抽出する（ステップＳ２０１）。例えば、初期フィルタとして図５ＡのＬＶＬ＝１のフィルタ（分節Ｓ１〜Ｓ３を含むフィルタ）が設定されている場合、図４に示すデータベースＡの例では、分節Ｓ１〜Ｓ３において同一の文字列情報を含むレコードはレコードＮｏ．５、６のレコード（同一のレコード数＝２）であり、すなわち、レコード数の最大値は２となる。また同様に、図４に示すデータベースＢの例では、分節Ｓ１〜Ｓ３において同一の文字列情報を含むレコードはレコードＮｏ．４〜６のレコード（同一のレコード数＝３）であり、レコード数の最大値は３となる。データベースＡのレコード数の最大値は値ｖＡＭＡＸに設定され、データベースＢのレコード数の最大値は値ｖＢＭＡＸに設定される。これらの値ｖＡＭＡＸ及びｖＢＭＡＸは、後述するループ処理のループ回数として利用される。なお、後述するループ処理においては、そのループ回数をそれぞれのデータベースＡ及びＢのレコードの総数とした場合であっても適切に処理が行われるが、事前に、レコード数の最大値である値ｖＡＭＡＸ及びｖＢＭＡＸをループ回数として設定することで、ループ処理に要する処理時間を低減させることが可能となる。 After the setting of the parameters and the like is completed in the above-described pre-setting processing, in the matching code setting processing, as shown in FIG. Then, referring to the character string information of each record in the segment specified by the initial filter, the number of records including the same character string information is calculated, and the maximum value of the number of records is extracted (step S201). For example, when the filter of LVL = 1 in FIG. 5A (the filter including the segments S1 to S3) is set as the initial filter, in the example of the database A shown in FIG. 4, the same character string information is used in the segments S1 to S3. The record including the record No. 5 and 6 (the same number of records = 2), that is, the maximum number of records is 2. Similarly, in the example of the database B shown in FIG. 4, records including the same character string information in the segments S1 to S3 are record Nos. There are 4 to 6 records (the same number of records = 3), and the maximum number of records is 3. The maximum value of the number of records in the database A is set to the value vAMAX, and the maximum value of the number of records in the database B is set to the value vBMAX. These values vAMAX and vBMAX are used as the number of loops of a loop process described later. In the loop processing to be described later, the processing is appropriately performed even when the number of loops is the total number of records in the respective databases A and B. However, the value vAMAX which is the maximum value of the number of records is previously determined. By setting vBMAX and vBMAX as the number of loops, the processing time required for loop processing can be reduced.

以上のように、初期フィルタ及び最終フィルタ、データベースＡ及びＢのそれぞれのレコード数の最大値（ループ回数の値ｖＡＭＡＸ及びｖＢＭＡＸ）の設定が完了すると、突合処理装置１０は、データベースＡ及びＢの各レコードの突合処理を行うことが可能となる。突合処理装置１０は、例えば、まずデータベースＡ内の同一レコード数（変数Ａ１）を１に設定するとともに（ステップＳ２０３）、データベースＢ内の同一レコード数（変数Ｂ１）を１に設定して（ステップＳ２０５）、これらの条件に合ったレコードをデータベースＡ及びＢのそれぞれから抽出し、共通するレコード（すなわち、同一の文字列情報によって構成されているレコード）を検索する（ステップＳ２０７）。なお、ステップＳ２０７において、共通の文字列情報を持つ共通のレコードが検出されなかった場合には、突合コードを設定すべきレコードは存在しないと判断し（ステップＳ２０９）、ステップＳ２１１及びＳ２１３の処理は実行されない。 As described above, when the setting of the maximum value (the number of loop times vAMAX and vBMAX) of the number of records of the initial filter and the final filter and the databases A and B is completed, the matching processing apparatus 10 It is possible to perform a record matching process. For example, the matching processing apparatus 10 first sets the number of identical records (variable A1) in the database A to 1 (step S203), and sets the number of identical records (variable B1) in the database B to 1 (step S203). S205), records matching these conditions are extracted from each of the databases A and B, and a common record (that is, a record constituted by the same character string information) is searched (step S207). If a common record having common character string information is not detected in step S207, it is determined that there is no record for which a match code should be set (step S209), and the processes in steps S211 and S213 are performed. Not executed.

図４に示すデータベースＡ及びＢの例を参照すると、データベースＡ内に含まれている同一レコード数が１のレコードはレコードＮｏ．１〜４、７、８のレコードであり、データベースＢ内に含まれている同一レコード数が１のレコードはレコードＮｏ．１〜３、７、８のレコードである。これらのレコードを比較することで、突合処理装置１０は、データベースＡ内のレコードＮｏ．２のレコードとデータベースＢ内のレコードＮｏ．２のレコードとが、分節Ｓ１〜Ｓ３において共通の文字列情報『１２３５７９２３８４』を持つ共通のレコードであることを検出する。 Referring to the examples of the databases A and B shown in FIG. Records having the same number of records of 1 in the database B are records No. 1 to 4, 7, and 8. These are records 1 to 3, 7, and 8. By comparing these records, the reconciliation processing apparatus 10 records the record No. in the database A. 2 and the record No. in the database B. The second record is detected as a common record having the common character string information “1235792384” in the segments S1 to S3.

そして、突合処理装置１０は、データベースＡ及びＢで共通するこれらのレコードに対し、現在設定されているフィルタ（ここでは初期フィルタ）に対応する突合コードを設定する（ステップＳ２１１）。突合コードは、例えば図５Ａに示す突合コードのように、比較対象とした分節の位置（すなわち、共通の文字列情報が検出された分節の位置）を示す突合コードを用いることが望ましいが、他の様々な方法によって表現することが可能である。このようにして得られた突合コードは、対応するレコードの突合コード欄に書き込まれ（ステップＳ２１３）、その結果、例えば図５Ａに示す突合コードを用いた場合、データベースＡ内のレコードＮｏ．２のレコードとデータベースＢ内のレコードＮｏ．２のレコードに対しては、突合コード『１１１』が設定される。 Then, the matching processing apparatus 10 sets a matching code corresponding to the currently set filter (here, the initial filter) for these records common to the databases A and B (step S211). For example, as shown in FIG. 5A, it is desirable to use a matching code indicating the position of a segment to be compared (that is, the position of a segment where common character string information is detected), as shown in FIG. 5A. Can be expressed by various methods. The matching code thus obtained is written in the matching code column of the corresponding record (step S213). As a result, when the matching code shown in FIG. 2 and the record No. in the database B. A match code “111” is set for the record of No. 2.

突合処理装置１０は、変数Ａ１＝１、変数Ｂ１＝１の検索を終了すると、続いて変数Ｂ１が値ｖＢＭＡＸに等しいか否かを判断し（ステップＳ２１５）、変数Ｂ１が値ｖＢＭＡＸに達していない場合には、変数Ｂ１を１だけ増加（インクリメント）して（ステップＳ２１７）、ステップＳ２０７以降の処理を再び行う。また、ステップＳ２１５で変数Ｂ１が値ｖＢＭＡＸに達したと判断した場合には、変数Ａ１が値ｖＡＭＡＸに等しいか否かを判断し（ステップＳ２１９）、変数Ａ１が値ｖＡＭＡＸに達していない場合には、変数Ａ１をインクリメントして（ステップＳ２２１）、ステップＳ２０５以降の処理を再び行う。このように、ステップＳ２０３〜Ｓ２２１の処理により、設定されているフィルタ（ここでは、初期フィルタとして設定されているＬＶＬ＝１のフィルタ）に対して、１からｖＡＭＡＸまでの範囲にある変数Ａ１と、１からｖＢＭＡＸまでの範囲に存在する変数Ｂ１との各組み合わせにおいて、データベースＡとデータベースＢとの間に共通するレコードが存在するか否かが検索され、共通するレコードに対しては、そのレコードの突合コード欄に突合コードが設定される。 After terminating the search for the variable A1 = 1 and the variable B1 = 1, the matching processing apparatus 10 subsequently determines whether the variable B1 is equal to the value vBMAX (step S215), and the variable B1 has not reached the value vBMAX. In this case, the variable B1 is increased (incremented) by 1 (step S217), and the processing after step S207 is performed again. If it is determined in step S215 that the variable B1 has reached the value vBMAX, it is determined whether the variable A1 is equal to the value vAMAX (step S219). If the variable A1 has not reached the value vAMAX, , The variable A1 is incremented (step S221), and the processing after step S205 is performed again. As described above, by the processing of steps S203 to S221, the variable A1 in the range from 1 to vAMAX is set for the set filter (here, the filter of LVL = 1 set as the initial filter), In each combination with the variable B1 existing in the range from 1 to vBMAX, it is searched whether or not a common record exists between the database A and the database B. A match code is set in the match code field.

この一連の処理によって、図４のデータベースＡ及びＢの例では、データベースＡ内のレコードＮｏ．２のレコードとデータベースＢ内のレコードＮｏ．２のレコードにおける突合コードの設定に加え、データベースＡ内のレコードＮｏ．５、６のレコードとデータベースＢ内のレコードＮｏ．４〜６のレコードに対して突合コード『１１１』が設定される（変数Ａ１＝２、変数Ｂ１＝３の条件で得られる）。この結果、例えばデータベースＡ及びＢは、図６に示す状態となる。 By this series of processing, in the example of the databases A and B in FIG. 2 and the record No. in the database B. In addition to the setting of the match code in the record of the record No. 2, the record No. 5 and 6 and the record No. in the database B. A match code "111" is set for records 4 to 6 (obtained under the conditions of variable A1 = 2 and variable B1 = 3). As a result, for example, the databases A and B are in the state shown in FIG.

変数Ａ１がｖＡＭＡＸに達し変数Ｂ１がｖＢＭＡＸに達して、すべての同一レコード数についての処理が終了すると、突合処理装置１０は、フィルタの識別情報（変数ＬＶＬ）が最終フィルタに等しいか否かを判断し（ステップＳ２２３）、変数ＬＶＬが最終フィルタに達していない場合には、フィルタを変更（例えば、変数ＬＶＬをインクリメント）して（ステップＳ２２５）、ステップＳ２０１以降の処理を再び行う。例えば、ＬＶＬ＝１をインクリメントしてＬＶＬ＝２とした場合、図５Ａに示すフィルタのセットの例では、突合コード『１１＿』に対応するＬＶＬ＝２のフィルタを用いて、ステップＳ２０１以降の処理が再び行われることになる。 When the variable A1 has reached vAMAX and the variable B1 has reached vBMAX, and the processing for all the same number of records has been completed, the matching processing apparatus 10 determines whether or not the filter identification information (variable LVL) is equal to the final filter. If the variable LVL has not reached the final filter (step S223), the filter is changed (for example, the variable LVL is incremented) (step S225), and the processing from step S201 is performed again. For example, when LVL = 1 is incremented to LVL = 2, in the example of the filter set shown in FIG. 5A, the processing of step S201 and subsequent steps is performed using the filter of LVL = 2 corresponding to the match code “11_”. Will be done again.

変数ＬＶＬをインクリメントしてレベル２（変数ＬＶＬ＝２）とした場合に実行されるステップＳ２０１以降の処理、さらに変数ＬＶＬを上げた状態で実行されるステップＳ２０１以降の処理では、既に突合コード欄に突合コードが設定されているレコードについては、ステップＳ２０７における検索処理において検索対象から除外する。すなわち、処理の過程でレコードに突合コードがいったん設定された場合、そのレコードの突合コードは変更されないよう制御される。 In the processing after step S201 executed when the variable LVL is incremented to level 2 (variable LVL = 2), and in the processing after step S201 executed with the variable LVL raised, the matching code column The record in which the matching code is set is excluded from the search target in the search processing in step S207. That is, once a match code is set for a record in the course of processing, control is performed so that the match code of that record is not changed.

このようにフィルタを変更しながら突合コードの設定を行うことで、ＬＶＬ＝１のフィルタを用いて得られた突合コードが設定された状態（図６のデータベースＡ及びＢに示す状態）から、さらにＬＶＬ＝２のフィルタを用いた際に、データベースＡ内のレコードＮｏ．７のレコードとデータベースＢ内のレコードＮｏ．７、８のレコードに対して突合コード『１１＿』が設定され（変数ＬＶＬ＝２、変数Ａ１＝１、変数Ｂ１＝２の条件で得られる）、ＬＶＬ＝３のフィルタを用いた際に、データベースＡ内のレコードＮｏ．４のレコードとデータベースＢ内のレコードＮｏ．３のレコードに対して突合コード『１＿１』が設定され（変数ＬＶＬ＝３、変数Ａ１＝１、変数Ｂ１＝１の条件で得られる）、ＬＶＬ＝５のフィルタを用いた際に、データベースＡ内のレコードＮｏ．１のレコードとデータベースＢ内のレコードＮｏ．１のレコードに対して突合コード『１＿＿』が設定され（変数ＬＶＬ＝５、変数Ａ１＝１、変数Ｂ１＝１の条件で得られる）、ＬＶＬ＝６のフィルタを用いた際に、データベースＡ内のレコードＮｏ．３のレコードとデータベースＢ内のレコードＮｏ．９のレコードに対して突合コード『＿１＿』が設定される（変数ＬＶＬ＝６、変数Ａ１＝１、変数Ｂ１＝１の条件で得られる）。この結果、例えばデータベースＡ及びＢは、図７に示す状態となる。 By performing the setting of the match code while changing the filter in this way, the state in which the match code obtained using the filter of LVL = 1 is set (the state shown in databases A and B in FIG. 6) is further increased. When a filter of LVL = 2 is used, record No. 7 and the record No. in the database B. A match code "11_" is set for records 7 and 8 (obtained under the condition of variable LVL = 2, variable A1 = 1, variable B1 = 2), and when a filter of LVL = 3 is used, A record No. in A 4 and the record No. in the database B. A match code "1_1" is set for record 3 (obtained under the condition of variable LVL = 3, variable A1 = 1, variable B1 = 1). When a filter of LVL = 5 is used, Record No. 1 and the record No. in the database B. A match code "1__" is set for one record (obtained under the condition of variable LVL = 5, variable A1 = 1, variable B1 = 1). When a filter of LVL = 6 is used, Record No. 3 and the record No. in the database B. A match code “_1_” is set for the record of No. 9 (obtained under the conditions of variable LVL = 6, variable A1 = 1, and variable B1 = 1). As a result, for example, the databases A and B are in the state shown in FIG.

そして、初期フィルタ（例えば図５ＡのＬＶＬ＝１のフィルタ）から最終フィルタ（例えば図５Ａのレベル７のフィルタ）までの突合処理が終了すると、突合処理装置１０は、突合コード欄が初期値『０』のまま残っているレコードについて、どのレベルにおいても一致する文字列情報が検索されなかったことを示す突合コード（完全不一致コードと呼ぶ）を設定する（ステップＳ２２７）。完全不一致コードは、例えば『９９９』などのようにすることが可能であるが、その他の任意の文字列又は記号としてもよく、あるいは初期値『０』をそのまま残しておいてもよい（この場合、完全不一致コードは『０』）。この結果、例えばデータベースＡ及びＢは図８に示す状態となり、これによって突合処理は完了する。 When the matching process from the initial filter (for example, the filter of LVL = 1 in FIG. 5A) to the final filter (for example, the filter of level 7 in FIG. 5A) is completed, the matching processing device 10 sets the matching code column to the initial value “0”. Is set (step S227). A match code (referred to as a completely unmatched code) indicating that no matching character string information has been found at any level is set for the record remaining as is. The complete mismatch code can be, for example, “999”, but may be any other character string or symbol, or the initial value “0” may be left as it is (in this case, , The complete mismatch code is “0”). As a result, for example, the databases A and B are in the state shown in FIG. 8, and the matching process is completed.

なお、データベースＡ及びＢの各レコードへの突合コードは、最初に読み出された元のデータベースＡ及びＢ内に設定されてもよく、あるいは、元のデータベースＡ及びＢを複製した新たなデータベース内に設定されてもよい。さらに、元のデータベースＡ及びＢを複製して新たなデータベースを作成する場合、これらのデータベースＡ及びＢの内容を統合した１つの統合データベースを作成し、その中の各レコードに突合コードが設定されてもよい。 Note that the matching code for each record of the databases A and B may be set in the original databases A and B read first, or in a new database obtained by duplicating the original databases A and B. May be set. Further, when a new database is created by duplicating the original databases A and B, one integrated database is created by integrating the contents of these databases A and B, and a match code is set for each record in the database. You may.

図８に示すようにデータベースＡ及びＢの各レコードに設定された突合コードは、データベースＡ及びＢ内のそれぞれに含まれる文字列情報に関して、どの分節において共通する文字列情報が存在しているのか（突合コードの値『１』の位置）を表しており、突合コードから一致／不一致の状況を容易に読み取ることができるようになっている。 As shown in FIG. 8, the matching code set in each record of the databases A and B is based on the character string information included in each of the databases A and B, in which segment the common character string information exists. (The position of the value “1” of the match code), and it is possible to easily read the match / mismatch status from the match code.

また、さらに突合コードを拡張して、データベースＡ内に共通の文字列情報を有するレコード数が何個存在しており、データベースＢ内に共通の文字列情報を有するレコード数が何個存在しているのかを把握できるようにしてもよい。例えば、上述の例において、データベースＡ内の２個のレコードＮｏ．５、６と、データベースＢ内の３個のレコードＮｏ．４〜６は、突合コード『１１１』で表されているように分節Ｓ１〜Ｓ３において共通の文字列情報を有している。この突合コード『１１１』に対して、データベースＡ及びＢ内のそれぞれにおける個数が分かるように付加し（例えば、データベースＡのレコード数、データベース内のレコード数の順に並べる）、突合コードを『１１１：２３』などのように表してもよい。また、このように拡張された突合コードによって、ユーザは、データベースＡ及びＢ内の共通する文字列情報を持つレコード数を容易に把握できるようになる。 Further, by expanding the match code, the number of records having the common character string information exists in the database A, and the number of records having the common character string information exists in the database B. It may be possible to grasp whether or not there is. For example, in the above example, two record Nos. 5, 6 and three record Nos. Nos. 4 to 6 have common character string information in the segments S1 to S3 as represented by the matching code “111”. To this match code “111”, a number is added so that the number in each of the databases A and B is known (for example, the number of records in the database A and the number of records in the database are arranged in this order). 23 ". Further, the user can easily grasp the number of records having the common character string information in the databases A and B by the expanded matching code.

また、データベース内に共通の文字列情報を有する複数のレコード（２個以上のレコード）が存在している場合、上述のようにレコード数を示すのではなく、単に複数存在していることを示すだけでもよい。例えば、データベース内に共通の文字列情報を有するレコード数が１個のみ存在しているレコードについては値『１』で表し、レコード数が２個以上存在しているレコードについては、たとえ３個以上のレコードが存在する場合であっても値『２』で表すようにしてもよい。これによって、２つのデータベースに存在するレコード数の関係が１対１の場合には『１１』、１対多の場合には『１２』又は『２１』、多対多の場合には『２２』などのように表すことが可能となる。２つのデータベースに存在するレコード数の関係が１対１、１対多、多対多のいずれであるかを把握できるようにするだけでも十分に有用である。 When a plurality of records (two or more records) having common character string information exist in the database, the number of records does not indicate the number of records as described above, but indicates that a plurality of records exist. Or just For example, a record in which only one record having common character string information exists in the database is represented by a value “1”, and a record in which two or more records exist has three or more records. May be represented by the value “2” even when the record of “1” exists. Thereby, "11" when the relationship between the numbers of records existing in the two databases is one-to-one, "12" or "21" when the relationship is one-to-many, and "22" when the relationship is many-to-many. It can be expressed as follows. It is sufficiently useful only to be able to grasp whether the relationship between the numbers of records existing in the two databases is one-to-one, one-to-many, or many-to-many.

例えば、上述の例における突合コード『１１１：２３』は、『１１１：２２』（末尾の『２２』は、データベースＡ内に２個以上のレコード、データベースＢ内に２個以上のレコードが存在していることを示す）と表される。上記のように、データベース内に共通の文字列情報を有するレコード数が２個以上存在しているレコードについて、値『２』で表す突合コードを設定した一例を図９に示す。なお、図９では、桁数を合わせるために完全不一致コードは『９９９：９９』と設定されている。 For example, in the above example, the match code “111: 23” is “111: 22” (the last “22” indicates that two or more records exist in the database A and two or more records exist in the database B). Is shown). FIG. 9 shows an example in which a match code represented by a value “2” is set for a record in which two or more records having common character string information exist in the database as described above. In FIG. 9, the complete mismatch code is set to "999: 99" to match the number of digits.

また、突合処理装置１０は、各レコードに突合データが設定されたデータベースＡ及びＢの中から、特定の突合コードが設定されているレコードのみを抽出して、そのレコードのみを表示画面上に表示したり、そのレコードのみを含むデータベースを新たに作成及び保存したりしてもよい。例えば図９に示すように単数又は複数のレコード数の識別情報が付加された突合コードが設定されたデータベースＡ及びＢの例において、突合処理装置１０がユーザ入力や他の装置又はプログラムから特定の突合コード『１１＿』の出力指示を受けた場合には、図１０に示すように、データベースＡ内のレコードＮｏ．７のレコードとデータベースＢ内のレコードＮｏ．７、８のレコードとを抽出し、それぞれのレコードが存在するデータベースの識別子を付与した状態で表示してもよい。異なるデータベースＡ及びＢにおいて同一の突合コードが設定されたレコードは相互に関連している可能性が高く、このように１つの表示画面にまとめて表示することで、ユーザがレコードの一致／不一致の状況を容易に把握できるようになり、不一致となったレコードの修正を行う際の作業効率を向上させることが可能となる。 Further, the matching processing apparatus 10 extracts only records in which a specific matching code is set from databases A and B in which matching data is set in each record, and displays only those records on a display screen. Or a new database containing only that record may be created and saved. For example, as shown in FIG. 9, in the example of the databases A and B in which the matching code to which the identification information of the single or plural records is added is set, the matching processing device 10 is configured to input a specific code from a user input or another device or program. When the output instruction of the match code “11_” is received, as shown in FIG. 7 and the record No. in the database B. The records 7 and 8 may be extracted and displayed with an identifier of the database in which each record exists. Records having the same matching code set in different databases A and B are likely to be related to each other, and by collectively displaying them on one display screen in this way, the user can determine whether the records match or mismatch. The situation can be easily grasped, and the work efficiency at the time of correcting a mismatched record can be improved.

なお、図２Ａ〜２Ｃに示すフローチャートを拡張することで、３つ以上のデータベースにおける突合処理を行う場合にも対応可能であることは明らかである。例えばデータベースＡ及びＢに加えて、さらにデータベースＣを含む３つのデータベースにおける突合処理を行う場合には、ステップＳ２０１においてデータベースＣのレコードの最大値ｖＣＭＡＸを抽出し、データベースＣ内のレコード数の変数Ｃ１を１からｖＣＭＡＸまでインクリメントしながら、ステップＳ２０７においてデータベースＡ、Ｂ及びＣで共通するレコードを検索できるようなループ処理を実行することによって、各データベース内のレコード同士の組み合わせを漏れなく比較できるようにすることが望ましい。なお、３つ以上のデータベースにおける突合処理を行った場合も上記の突合コードをそのまま使用することができる。また、レコード数を示す情報を突合コードに付加する場合には、データベース数に応じて桁数を増やせばよく、例えば、突合コード『１１１：１２１』（末尾の３桁は３つのデータベースのそれぞれに存在するレコード数に対応）などのようにすればよい。 It should be noted that the flowcharts shown in FIGS. 2A to 2C can be extended to cope with the case where the matching process is performed in three or more databases. For example, when performing the matching process on three databases including the databases C in addition to the databases A and B, the maximum value vCMAX of the records of the database C is extracted in step S201, and the variable C1 of the number of records in the database C is extracted. Is incremented from 1 to vCMAX, in step S207, a loop process for searching for records common to databases A, B, and C is executed, so that combinations of records in each database can be completely compared. It is desirable to do. It should be noted that even when the matching process is performed in three or more databases, the above-described matching code can be used as it is. When information indicating the number of records is added to the matching code, the number of digits may be increased according to the number of databases. For example, the matching code “111: 121” (the last three digits are assigned to each of the three databases) (Corresponding to the number of existing records).

＜実施例＞
本発明に係る突合処理装置１０は、２つのデータベース間の差異を発見することができる。こうした差異は、例えば、本来は同一の情報を持つべき２つのデータベースの一方に含まれているタイプミスやＯＣＲ（Optical Character Recognition：光学文字認識）による読み取りのミスや、データベース管理者やデータ更新時期の違いによって生じたものである。 <Example>
The matching processing apparatus 10 according to the present invention can discover the difference between the two databases. Such differences include, for example, typographical errors or reading errors due to OCR (Optical Character Recognition) included in one of the two databases that should originally have the same information, database managers and data update timings. This is caused by the difference.

また、本発明に係る突合処理装置１０が処理対象とするデータベースの各レコードに含まれている文字列情報は、例えば、数字のみによって構成されていてもよく、あるいは、漢字、仮名、アルファベット、記号などによって構成されていてもよい。さらに、本発明では、文字列情報に対して分節構造を設定することを前提としているが、この分節構造の設定においては、住所情報などのように元から階層的に設定されている階層構造を利用してもよく、あるいは、突合処理装置１０のユーザが独自の方法で文字列情報を分節化してもよい。 Further, the character string information included in each record of the database to be processed by the matching processing apparatus 10 according to the present invention may be composed of, for example, only numbers, or may include kanji, kana, alphabets, symbols, and the like. Or the like. Further, in the present invention, it is assumed that a segment structure is set for character string information. In setting the segment structure, a hierarchical structure originally set hierarchically, such as address information, is used. Alternatively, the user of the matching processing apparatus 10 may segment the character string information by a unique method.

また、本発明に係る突合処理装置１０が処理対象とするデータベースの種類は特に限定されるものではない。例えば、住所情報を含むデータベース、ネットワークアドレス情報やドメイン情報を含むデータベース、マイナンバーなどの個人情報を含むデータベース、顧客管理情報を含むデータベース、財務会計情報を含むデータベース、商品の在庫情報を含むデータベースなどを始めとした様々な種類のデータベースを処理対象とすることが可能である。 Further, the type of database to be processed by the matching processing apparatus 10 according to the present invention is not particularly limited. For example, a database containing address information, a database containing network address information and domain information, a database containing personal information such as my number, a database containing customer management information, a database containing financial accounting information, a database containing product inventory information, etc. And various types of databases can be processed.

以下では、住所情報を含む台帳データベースと、地図などを表す画像情報に関連付けられて同じく住所情報を含む地図データベースを用意し、本発明の実施の形態における突合処理装置１０によって実際に突合処理を行った実施例について説明する。 In the following, a ledger database including address information and a map database including address information in association with image information representing a map or the like are prepared, and the matching process is actually performed by the matching processing device 10 according to the embodiment of the present invention. Examples will be described.

この実施例において用意された台帳データベース及び地図データベースには、同一地域の住所情報が同一の格納形式で格納されている。ただし、それぞれのデータベースの管理者及び更新時期などが異なっており、相互のデータベースにおける各レコードは完全には一致していない。また、それぞれのデータベースにおいてレコードの追加や削除が行われた結果、レコード数が異なっていることも考えられる。 In the ledger database and map database prepared in this embodiment, address information of the same area is stored in the same storage format. However, the managers and update times of the respective databases are different, and the records in the respective databases do not completely match. Further, as a result of addition or deletion of records in each database, the number of records may be different.

各レコードには、階層構造を有する住所情報が文字列情報として格納されている。なお、文字列情報は２３桁の数字の配列であり、最初の３桁は市町村、次の８桁は大字、次の４桁は小字、次の５桁は地番、最後の３桁は枝番にそれぞれ対応している。例えば、文字列情報『３２２０００００００４００５８０００８４００５』は、最初の『３２２』が市町村、次の『０００００００４』が大字、次の『００５８』が小字、次の『０００８４』が地番、最後の『００５』が枝番を表している。本実施例における突合処理では、この住所情報が持つ階層構造に基づいて、市町村、大字、小字、地番、枝番の５つの分節を設定し、図５Ｃに模式的に示されているフィルタのセットを用いている。 In each record, address information having a hierarchical structure is stored as character string information. The character string information is an array of 23-digit numbers. The first three digits are municipalities, the next eight digits are large, the next four digits are small, the next five digits are lot numbers, and the last three digits are branch numbers. Respectively. For example, in the character string information “32200000004005800084005”, the first “322” is a municipal, the next “00000004” is a large letter, the next “0058” is a small letter, the next “00084” is a lot number, and the last “005” is a branch number. Is represented. In the reconciliation process in the present embodiment, based on the hierarchical structure of the address information, five segments of municipalities, large and small characters, lot numbers, and branch numbers are set, and a set of filters schematically shown in FIG. 5C is set. Is used.

本発明の実施の形態における突合処理装置１０により、上記の条件下で実際に突合処理を行った結果を図１１〜図１３に示す。図１１〜図１３には、ユーザによって指定された突合コードが設定されているレコードのみを台帳データベース及び地図データベースのそれぞれから抽出し、表形式で表示画面上に表示した状態が示されている。これらの表示状態は、上述した図１０の表示状態に相当するものである。 FIGS. 11 to 13 show the results of actually performing the abutting process under the above conditions by the abutting device 10 according to the embodiment of the present invention. FIGS. 11 to 13 show a state in which only records in which a match code specified by the user is set are extracted from each of the ledger database and the map database, and displayed on the display screen in a table format. These display states correspond to the display states of FIG. 10 described above.

図１１〜図１３のフィールド『種別』の値『１』は台帳データベース内のレコードであることを表し、値『２』は地図データベース内のレコードであることを表している。また、図１１〜図１３のフィールド『市町村』、『大字』、『小字』、『地番』、『枝番』は各レコードの文字列情報に設定された各分節（住所情報の各階層）を含み、フィールド『ＣＯＤＥ』は突合コードを含んでいる。なお、文字列情報である住所情報は５つの分節が設定されており、突合コードも各分節に対応した５桁の数字（さらに、台帳データベースのレコード数と地図データベースのレコード数も付加されている）を含んでいる。 The value “1” of the field “type” in FIGS. 11 to 13 represents a record in the ledger database, and the value “2” represents a record in the map database. The fields "city", "large", "small", "lot number", and "branch number" in FIGS. 11 to 13 indicate each segment (each layer of address information) set in the character string information of each record. The field "CODE" contains the matching code. The address information, which is character string information, has five segments set therein, and the match code is also a five-digit number corresponding to each segment (the number of records in the ledger database and the number of records in the map database are also added). ).

図１１には、突合コード『１１＿１１』が設定されているレコードを抽出して表示画面上に表示した状態が図示されている。例えば、図１１のテーブルの１番目及び２番目のレコードは、それぞれ台帳データベース及び地図データベース内のレコードであり、これらのレコードは対応関係を有している。図１１のテーブルの１番目及び２番目のレコードは、突合コード『１１＿１１』が示すとおり、「市町村」、「大字」、「地番」、「枝番」に共通した文字列情報が含まれている一方、「小字」は異なっている。また、突合コードの末尾に付加された『１１』が示すとおり、図１１のテーブルの１番目及び２番目のレコードは、台帳データベース及び地図データベースにおいて１対１に対応している。 FIG. 11 illustrates a state in which records in which the match code “11 — 11” is set are extracted and displayed on the display screen. For example, the first and second records in the table of FIG. 11 are records in a ledger database and a map database, respectively, and these records have a correspondence. The first and second records in the table of FIG. 11 include character string information common to “municipalities”, “larger characters”, “lot numbers”, and “branch numbers” as indicated by the matching code “11_11”. On the other hand, small letters are different. Further, as indicated by “11” added to the end of the matching code, the first and second records of the table in FIG. 11 correspond one-to-one in the ledger database and the map database.

また、図１２には、突合コード『１１＿１＿』が設定されているレコードを抽出して表示画面上に表示した状態が図示されている。例えば、図１２のテーブルの１番目及び２番目のレコードは、それぞれ地図データベース及び台帳データベース内のレコードであり、これらのレコードは対応関係を有している。図１２のテーブルの１番目及び２番目のレコードは、突合コード『１１＿１＿』が示すとおり、「市町村」、「大字」、「地番」に共通した文字列情報が含まれている一方、「小字」、「枝番」は異なっている。また、突合コードの末尾に付加された『１１』が示すとおり、図１１のテーブルの１番目及び２番目のレコードは、台帳データベース及び地図データベースにおいて１対１に対応している。 FIG. 12 illustrates a state where a record in which the match code “11_1_” is set is extracted and displayed on the display screen. For example, the first and second records in the table in FIG. 12 are records in the map database and the ledger database, respectively, and these records have a correspondence. The first and second records in the table of FIG. 12 include character string information common to “municipalities”, “large characters”, and “lot numbers” as indicated by the matching code “11_1_”, while “small characters” , “Branch numbers” are different. Further, as indicated by “11” added to the end of the matching code, the first and second records of the table in FIG. 11 correspond one-to-one in the ledger database and the map database.

また、図１３には、突合コード『１１１＿１』が設定されているレコードを抽出して表示画面上に表示した状態が図示されている。例えば、図１３のテーブルの１番目及び３番目のレコードは、それぞれ地図データベース及び台帳データベース内のレコードであり、これらのレコードは対応関係を有している。図１３のテーブルの１番目及び３番目のレコードは、突合コード『１１１＿１』が示すとおり、「市町村」、「大字」、「小字」、「枝番」に共通した文字列情報が含まれている一方、「地番」は異なっている。また、突合コードの末尾に付加された『１１』が示すとおり、図１３のテーブルの１番目及び３番目のレコードは、台帳データベース及び地図データベースにおいて１対１に対応している。 FIG. 13 illustrates a state where a record in which the match code “111_1” is set is extracted and displayed on the display screen. For example, the first and third records in the table in FIG. 13 are records in the map database and the ledger database, respectively, and these records have a correspondence. The first and third records in the table of FIG. 13 include character string information common to “municipalities”, “large characters”, “small characters”, and “branch numbers” as indicated by the matching code “111_1”. On the other hand, the lot number is different. Further, as indicated by “11” added to the end of the matching code, the first and third records in the table in FIG. 13 correspond one-to-one in the ledger database and the map database.

なお、地図データベース内の住所情報が地図を表示するための画像情報と関連付けられていることを利用して、地図データベース内に格納されている情報に基づいて作成及び表示された地図を参照することで、本発明に係る突合結果の確認及び修正を行ってもよい。 In addition, by using the fact that the address information in the map database is associated with the image information for displaying the map, referring to the map created and displayed based on the information stored in the map database. Then, the confirmation and correction of the abutting result according to the present invention may be performed.

例えば、図１１のテーブルの３番目及び４番目のレコードは、それぞれ地図データベース及び台帳データベース内のレコードであり、これらのレコードは対応関係を有している。具体的には、図１１のテーブルの３番目及び４番目のレコードは突合コード『１１＿１１：１１』が示すとおり、１対１に対応するレコードであり、「市町村」、「大字」、「地番」、「枝番」に共通した文字列情報が含まれている一方、「小字」のみ異なっている。より詳細には、図１１のテーブルの３番目のレコード（地図データベース内のレコード）は「小字」の値が『０１７０』であるのに対し、４番目のレコード（台帳データベース内のレコード）は「小字」の値が『０１７２』である点でのみ異なっている。上記の突合結果から、図１１のテーブルの３番目及び４番目のレコードのどちらか一方において、「小字」の値に誤りがあることが推測できるが、『０１７０』及び『０１７２』のどちらの値が誤っているかを特定することは容易ではない。 For example, the third and fourth records in the table of FIG. 11 are records in the map database and the ledger database, respectively, and these records have a correspondence. Specifically, the third and fourth records in the table of FIG. 11 are one-to-one records as indicated by the matching code “11_11: 11”, and include “municipalities”, “larger characters”, and “lot numbers”. , “Branch number” includes common character string information, but only “small letters” are different. More specifically, the third record (record in the map database) of the table in FIG. 11 has a value of “0170” in “lower case”, while the fourth record (record in the ledger database) is “ The only difference is that the value of "subscript" is "0172". From the above matching result, it can be inferred that there is an error in the value of “lower case” in one of the third and fourth records of the table in FIG. 11, but which of “0170” and “0172” It's not easy to identify if is wrong.

このような場合、地図データベース内の情報を用いて、突合結果の確認対象となる位置を含む地図を表示部４０の表示画面に表示することで、この地図を参照したユーザは、台帳データベース又は地図データベースのどちらのレコードの値が誤っているかを容易に推測することが可能となる。 In such a case, by using the information in the map database to display a map including the position to be checked for the matching result on the display screen of the display unit 40, the user who has referred to this map can use the ledger database or the map. It is possible to easily guess which record value of the database is wrong.

図１４には、地図データベース内に格納されている情報に基づいて、図１１のテーブルの３番目のレコード（地図データベース内のレコード）に係る住所周辺の地図を表示画面上に表示した状態が図示されている。地図データベース内の各レコード（住所情報を含む）は、地図上の特定の住所を含む区画を表しており、図１４では、各区画を示す画像と共に、対応する住所（文字列）が表示されている。例えば、図１１のテーブルの３番目のレコードは、図１４の地図内の区画Ａに対応しており、当該３番目のレコードに含まれる住所を表す文字列『０１７２０００３２−０００』がこの区画Ａ上に重畳表示されている。 FIG. 14 illustrates a state where a map around the address related to the third record (record in the map database) of the table in FIG. 11 is displayed on the display screen based on the information stored in the map database. Have been. Each record (including address information) in the map database represents a section including a specific address on the map, and in FIG. 14, a corresponding address (character string) is displayed together with an image showing each section. I have. For example, the third record in the table of FIG. 11 corresponds to the section A in the map of FIG. 14, and the character string “017200032-000” representing the address included in the third record is assigned to this section A. It is superimposed on the display.

また、図１４の地図では、「小字」の値が『０１７０』の区画と『０１７２』の区画の双方が表示されており、『０１７０』の区画が視覚的に強調されるよう着色表示されている。なお、図１４の例では、『０１７０』の区画にハッチングが施されているが、このハッチングが特定色の着色を表している。なお、『０１７２』の区画についても同様に、着色表示が行われてもよい（ただし、『０１７０』の区画とは異なる色であることが望ましい）。 In the map of FIG. 14, both the section of “0170” and the section of “0172” with the value of “small font” are displayed, and the section of “0170” is colored and displayed so as to be visually emphasized. I have. In the example of FIG. 14, the section “0170” is hatched, but the hatching indicates coloring of a specific color. Similarly, the section “0172” may be colored and displayed (however, it is desirable that the section “0172” has a different color from the section “0172”).

この地図を参照したユーザは、その周辺の区画との着色の違いから、「小字」が『０１７２』の領域内に、「小字」が『０１７０』に設定された区画Ａが孤立しており、飛び地の状態になっていることを視覚的に把握することができる。その結果、ユーザは、区画Ａに設定されている「小字」の値『０１７０』（地図データベース内のレコードの値）は誤りであり、突合処理で得られた１対１に対応する台帳データベース内のレコードに含まれる「小字」の値『０１７２』が正しい値である可能性が高いと推測でき、図１１のテーブルの３番目のレコードの「小字」の値を『０１７０』から『０１７２』に変更すべきであると判断することができる。 The user who refers to this map finds that the section A whose “small font” is set to “0170” is isolated within the area of “small font” “0172” due to the difference in coloring from the surrounding sections. It is possible to visually recognize that the state of the enclave has been reached. As a result, the user finds that the value “0170” (the value of the record in the map database) of the “small font” set in the section A is incorrect, and the value in the ledger database corresponding to the one-to-one obtained by the matching process. It can be inferred that the value “0172” of “subscript” included in the record of “No.” is likely to be a correct value, and the value of “subscript” of the third record in the table of FIG. 11 is changed from “0170” to “0172”. It can be determined that it should be changed.

なお、ここでは、突合処理の結果を参照して地図データベースと台帳データベースとの間で不整合が生じているレコードを得た後に、地図データベース内の情報に基づいて表示される地図を用いて当該不整合の確認を行う態様を示したが、異なる態様として、地図データベース内の情報に基づいて表示される地図を参照して視覚的に違和感を覚える区画（例えば、飛び地など）を特定した後に、突合処理の結果を参照して、当該区画に対応する台帳データベースのレコードの確認を行うようにしてもよい。また、地図データベース内の情報に基づいて表示される地図上に、突合処理によって得られた情報（例えば、突合コードや、１対１に対応する台帳データベースのレコードに含まれる文字列情報など）を重ね合わせて表示してもよい。 Here, after obtaining a record in which there is an inconsistency between the map database and the ledger database by referring to the result of the matching process, the map is displayed using the map displayed based on the information in the map database. Although the mode in which the inconsistency is confirmed has been described, as a different mode, after visually identifying a section (for example, an enclave, etc.) that visually feels a sense of incongruity with reference to a map displayed based on information in the map database, The record of the ledger database corresponding to the section may be checked with reference to the result of the matching process. Also, on a map displayed based on the information in the map database, information obtained by the matching process (for example, a matching code or character string information included in a record of the ledger database corresponding to one-to-one) is displayed. You may superimpose and display.

また、図１１〜図１３に示すテーブルには、突合処理によって得られた突合結果を表示する際に、ユーザによるレコードの比較を容易とする特徴が含まれている。以下、突合処理を行った結果を表示するテーブルにおいて、ユーザの視認性及び利便性を向上させるための特徴について説明する。 In addition, the tables shown in FIGS. 11 to 13 include a feature that facilitates comparison of records by a user when displaying a result of the matching obtained by the matching process. Hereinafter, features for improving the visibility and convenience of the user in the table displaying the result of performing the matching process will be described.

図１１〜図１３のテーブルは、例えば『市町村』、『大字』、『小字』フィールドなどを基準としたソート表示を行うという特徴を有している。この特徴により、共通の文字列情報を持つレコードが隣接した行に表示され、共通の文字列情報を持つレコード同士を比較する際におけるユーザの視認性を向上させることが可能となる。なお、異なるデータベース内のレコードが隣接した行に表示されるため、テーブル全体では、台帳データベース内のレコード（『種別』＝『１』）と、地図データベース内のレコード（『種別』＝『２』）が入れ子になって表示されるという特徴がある。 The tables in FIGS. 11 to 13 are characterized in that, for example, the sort display is performed based on the “municipalities”, “large characters”, “small characters” fields and the like. With this feature, records having common character string information are displayed on adjacent lines, and it is possible to improve the visibility of the user when comparing records having common character string information. Since records in different databases are displayed in adjacent rows, in the entire table, records in the ledger database (“type” = “1”) and records in the map database (“type” = “2”) ) Are displayed in a nested manner.

また、図１１〜図１３のテーブルは、比較した２つのデータベース（台帳データベース及び地図データベース）のそれぞれのレコード内の文字列情報を異なるフィールドに表示するという特徴を有する。具体的には、各レコードの地番及び枝番が、そのレコードが台帳データベース内のレコードである場合には、台帳データベースの『地番』及び『枝番』フィールドに表示され、地図データベース内のレコードである場合には、地図データベースの『地番』及び『枝番』フィールドに表示されるようにする。このように、台帳データベースのレコードに含まれる文字列情報を表示するフィールドと、地図データベースのレコードに含まれる文字列情報を表示するフィールドとをそれぞれ分けることによって、ユーザは、各レコードが台帳データベース及び地図データベースのどちらに含まれているものであるかを即座に判断できるようになる。 The tables in FIGS. 11 to 13 are characterized in that the character string information in each record of the two compared databases (ledger database and map database) is displayed in different fields. More specifically, if the record is a record in the ledger database, the lot number and branch number of each record are displayed in the “building number” and “branch number” fields of the ledger database. In some cases, the information is displayed in the "lot number" and "branch number" fields of the map database. As described above, by separating the field for displaying the character string information included in the record of the ledger database from the field for displaying the character string information included in the record of the map database, the user can determine whether each record has the ledger database and the It will be possible to immediately determine which one of the map databases is included.

また、本発明に係る突合処理の結果として、データベースＡとデータベースＢとにおいて完全に一致し、かつ、１対１に対応しているレコード（すなわち、上述の例において、突合コード『１１１：１１』や『１１１１１：１１』が設定されるレコード）が得られた場合であっても、その突合結果が必ずしも正しいと言えないこともある。そこで、このように１対１に対応していると判断されたレコードに関して、さらに、この突合結果が正しいかどうかを定量的に検証できるようにしてもよい。 In addition, as a result of the matching process according to the present invention, records that completely match in database A and database B and have a one-to-one correspondence (that is, in the above-described example, matching code “111: 11”). Or a record in which “11111: 11” is set), the matching result may not always be correct. Therefore, it may be possible to quantitatively verify whether or not the matching result is correct for the records determined to have a one-to-one correspondence.

１対１に対応していると判断されたレコードに関する突合結果を確認する場合、例えば、比較した２つのデータベース（台帳データベース及び地図データベース）のそれぞれのレコード内に含まれている共通の項目を利用することが可能である。以下、図１５及び図１６を参照しながら説明する。 When confirming the result of matching between records determined to correspond one-to-one, for example, a common item included in each record of two compared databases (ledger database and map database) is used. It is possible to Hereinafter, description will be made with reference to FIGS. 15 and 16.

図１５には、突合コード『１１１１１：１１』が設定されている６個のレコード（（以下、上から順に第１〜第６レコードと呼ぶ））を抽出して表示画面上に表示した状態が図示されている。ただし、図１５のテーブルには、各レコードに含まれている『面積』フィールドが設定されている。この『面積』フィールドは、台帳データベース及び地図データベース内のそれぞれのレコードに設定されており、『面積』フィールドには、各レコードに格納された住所情報に関連する土地の面積（地図面積）が格納されている。台帳データベース及び地図データベースにおいて１対１に対応しているレコードであるならば、共通の項目である『面積』フィールドの数値（土地面積）も等しいはずである。この関係を利用して、台帳データベース及び地図データベースにおいて１対１に対応していると判断されたレコードについて、さらに『面積』フィールド内の数値を比較し、両者が一致している場合には１対１の突合結果は正しいと判断することによって、突合結果の検証を行うことが可能である。 FIG. 15 shows a state where six records (hereinafter, referred to as first to sixth records in order from the top) in which the matching code “11111: 11” is set and displayed on the display screen. Is shown. However, the “area” field included in each record is set in the table of FIG. The “area” field is set for each record in the ledger database and the map database, and the “area” field stores the area of the land (map area) related to the address information stored in each record. Have been. If the records correspond to one-to-one in the ledger database and the map database, the numerical value (land area) of the “area” field, which is a common item, should be the same. Utilizing this relationship, the records in the ledger database and the map database determined to have a one-to-one correspondence are further compared with each other in the "area" field. By judging that the one-to-one matching result is correct, it is possible to verify the matching result.

なお、共通の項目に含まれる情報が等しいか否かを判断する際、例えば数値の差が所定の範囲（例えば、数パーセント）内に収まる場合には、共通の項目に含まれる情報は実質的に等しい（すなわち、１対１の突合結果は正しい）と判断してもよい。この所定の範囲は、ユーザが自由に設定することが可能である。 When judging whether or not the information included in the common item is equal, for example, if the difference between the numerical values falls within a predetermined range (for example, several percent), the information included in the common item is substantially (That is, the one-to-one matching result is correct). This predetermined range can be freely set by the user.

図１５に図示されている第１〜第６レコードのうち、第１及び第２レコードは『市町村』、『大字』、『小字』、『地番』、『枝番』のすべてにおいて共通の文字列情報『２０６０００２１００００００３５６００１』を含み、第３及び第４レコードは『市町村』、『大字』、『小字』、『地番』、『枝番』のすべてにおいて共通の文字列情報『２０６０００２１００００００３５６００２』を含み、第５及び第６レコードは『市町村』、『大字』、『小字』、『地番』、『枝番』のすべてにおいて共通の文字列情報『２０６０００２１００００００３５６００３』を含む。また、第１、第３及び第６レコードは台帳データベース内のレコードであり、第２、第４及び第５レコードは地図データベース内のレコードである。また、図１５のテーブルでは、『市町村』、『大字』、『小字』、『地番』、『枝番』のすべてにおいて共通の文字列情報を持つレコードが隣接した行に表示されている。 Among the first to sixth records shown in FIG. 15, the first and second records are character strings common to all of “municipalities”, “large characters”, “small characters”, “local numbers”, and “branch numbers”. The third and fourth records include character string information "20600021000000356002" common to all of "municipalities", "large characters", "small characters", "chiban", and "branch numbers". The sixth record includes character string information "20600021000000356003" common to all of "municipalities", "large characters", "small characters", "local numbers", and "branches". The first, third, and sixth records are records in the ledger database, and the second, fourth, and fifth records are records in the map database. In the table of FIG. 15, records having common character string information in all of “municipalities”, “large letters”, “small letters”, “local numbers”, and “branch numbers” are displayed in adjacent rows.

上述のように、図１５のテーブルには、各レコードの『面積』フィールドが表示されている。例えば、第１及び第２レコードは突合処理において１対１に対応していると判断されたレコードであるが、第１レコードの『面積』フィールド及び第２レコードの『面積』フィールドの数値は共に『９４６』で等しいことから、第１及び第２レコードが１対１に対応しているという突合結果は正しいと判断することができる。また、第３レコードの『面積』フィールドの数値は『１４７０』、第４レコードの「面積」フィールドの数値は『１４５２』であり、両方の数値はほぼ等しいことから、第３及び第４レコードが１対１に対応しているという突合結果は正しいと判断することができる。また、第５レコードの『面積』フィールドの数値は『１６０３』、第６レコードの『面積』フィールドの数値は『８１０』であり、両方の数値は大きく異なることから、第５及び第６レコードが１対１に対応しているという突合結果は誤りであると判断することができる。 As described above, the "area" field of each record is displayed in the table of FIG. For example, the first and second records are records determined to have a one-to-one correspondence in the matching process, but the values of the “area” field of the first record and the “area” field of the second record are both Since “946” is equal, it can be determined that the matching result that the first and second records correspond one-to-one is correct. The numerical value of the “area” field of the third record is “1470”, and the numerical value of the “area” field of the fourth record is “1452”. Since both numerical values are almost equal, the third and fourth records are It can be determined that the matching result of the one-to-one correspondence is correct. The numerical value of the “area” field of the fifth record is “1603”, and the numerical value of the “area” field of the sixth record is “810”. Since both numerical values are greatly different, the fifth and sixth records are different. It can be determined that the matching result of the one-to-one correspondence is an error.

なお、上記では、例えば第３レコードの『面積』フィールドの数値『１４７０』と、第４レコードの「面積」フィールドの数値『１４５２』とがほぼ等しいと判断しているが、この判断は、ユーザが自由に設定可能な判断基準に依存する。例えば、両方の数値の差（ここでは『１８』）がどちらか一方の数値又は両方の数値の平均値の５パーセント以内であれば、両方の数値は実質的に等しいと判断する場合には、第３及び第４レコードが１対１に対応しているという突合結果は正しいと判断される。一方、例えば、両方の数値の差（ここでは『１８』）がどちらか一方の数値又は両方の数値の平均値の１パーセント以内であれば、両方の数値は実質的に等しいと判断する場合には、第３及び第４レコードが１対１に対応しているという突合結果は誤りであると判断される。 In the above description, for example, it is determined that the numerical value “1470” in the “area” field of the third record is substantially equal to the numerical value “1452” in the “area” field of the fourth record. Depends on freely settable criteria. For example, if the difference between the two numbers (here, “18”) is within 5% of one of the numbers or the average of both numbers, then if it is determined that both numbers are substantially equal, It is determined that the matching result that the third and fourth records have a one-to-one correspondence is correct. On the other hand, for example, if the difference between the two numerical values (here, “18”) is within 1% of the average value of either numerical value or both numerical values, it is determined that both numerical values are substantially equal. Is determined that the matching result that the third and fourth records correspond one-to-one is incorrect.

なお、１対１に対応していると判断されたレコードに対して上記の検証を行い、その検証結果を各レコードに設定してもよい。例えば、１対１に対応しているという突合結果が誤りであると判断されたレコードにフラグを設定したり、テーブルで表示する際にレコードの色を変えて強調表示したりすることが可能である。 Note that the above-described verification may be performed on records determined to correspond one-to-one, and the verification result may be set in each record. For example, it is possible to set a flag for a record for which the matching result that corresponds to one-to-one is determined to be incorrect or to change the color of the record when displaying it in a table and highlight it. is there.

また、図１６に示すように、『面積』フィールドの数値をプロットした散布図を作成し、突合結果が正しいかどうかをユーザが視覚的に判別できるようにしてもよい。図１６には、１対１に対応していると判断された２つのレコードの『面積』フィールドの数値をＸ座標（横軸）及びＹ座標（縦軸）とした点が多数プロットされている。１対１に対応していると判断された２つのレコードの『面積』フィールドの数値がほぼ等しい場合には、プロットされた点は、Ｙ＝Ｘの直線上又はこの直線に近い位置に配置される。したがって、Ｙ＝Ｘの直線から大きく離れた位置にプロットされた点は、誤った突合結果を表しているとみなすことができる。 Further, as shown in FIG. 16, a scatter diagram in which numerical values in the “area” field are plotted may be created so that the user can visually determine whether or not the matching result is correct. FIG. 16 plots a number of points where the numerical values of the “area” fields of the two records determined to have a one-to-one correspondence are set as the X coordinate (horizontal axis) and the Y coordinate (vertical axis). . If the values of the "area" fields of the two records determined to correspond one-to-one are approximately equal, the plotted point is located on or near a straight line Y = X. You. Therefore, a point plotted far away from the Y = X straight line can be considered to represent an erroneous match result.

なお、図１６に示すような散布図において、例えば、ユーザが設定した判断基準を示す直線を表示してもよく、当該判断基準を超えて突合結果が誤りであると判断される点の色を変えて強調表示してもよい。 In the scatter diagram shown in FIG. 16, for example, a straight line indicating the criterion set by the user may be displayed, and the color of a point exceeding the criterion and judged that the abutment result is incorrect is displayed. It may be changed and highlighted.

なお、上述の例では、台帳データベース内のレコードと地図データベース内のレコードに共通の項目として『面積』フィールドを利用する場合について説明したが、比較する複数のデータベース内のレコードに設定されている共通の項目であれば、その他の項目を用いてもよい。台帳データベース及び地図データベースの例では、土地の面積のほかに、例えば、その土地で栽培されている作物、その土地の用途を表す地目、その土地の周囲長、などを始めとして、様々な項目を利用することが可能である。また、ここでは台帳データベース及び地図データベースの例を挙げて説明しているが、他の任意の種類のデータデータベースにおいても同様に、比較するデータベース内のレコードに設定されている共通の項目を適宜選択することが可能である。 In the above example, the case where the “area” field is used as an item common to the record in the ledger database and the record in the map database has been described, but the common field set in the records in a plurality of databases to be compared is described. As long as the item is, other items may be used. In the example of the ledger database and the map database, in addition to the area of the land, various items such as, for example, crops cultivated on the land, land marks indicating the use of the land, perimeter of the land, and the like are included. It is possible to use. In addition, here, the description is made with reference to the example of the ledger database and the map database. However, similarly, in other arbitrary types of data databases, common items set in the records in the databases to be compared are appropriately selected. It is possible to

本発明は、複数のレコードがそれぞれ格納された複数のデータベースの突合処理を行う技術に適用可能である。また、本発明は、任意の種類のデータが格納されたデータベースを処理対象とすることが可能であり、データベース管理技術全般に適用可能である。 INDUSTRIAL APPLICABILITY The present invention is applicable to a technique for performing a reconciliation process on a plurality of databases each storing a plurality of records. Further, the present invention can process a database in which arbitrary types of data are stored, and is applicable to all database management technologies.

１０突合処理装置
１１データベース読み取り部
１２突合処理部
１３レコード比較部
１４突合結果生成部
１５突合結果出力部
１６分節設定部
２０記憶媒体
３０操作入力部
４０表示部 REFERENCE SIGNS LIST 10 Join processing device 11 Database reading unit 12 Join processing unit 13 Record comparison unit 14 Join result generation unit 15 Join result output unit 16 Segment setting unit 20 Storage medium 30 Operation input unit 40 Display unit

Claims

A reconciliation processing device that performs reconciliation processing on a plurality of databases each storing a plurality of records including character string information,
A segment setting unit that sets a plurality of segments according to a predetermined rule for the character string information;
The character string information included in the record in the first database of the plurality of databases is compared with the character string information included in the record in the second database of the plurality of databases. A record comparing unit that detects a match or mismatch of character string information in a segment;
When a record including the common character string information in all of the compared segments is present in both the first and second databases, information indicating that the common character string information is included in all of the compared segments is stored. A matching result generation unit for associating with each of the records in the first and second databases including character string information common to all of the compared segments;
Butt processing device to have.

The matching processing apparatus according to claim 1, wherein the record comparison unit is configured to select one or a plurality of segments to be compared for matching or mismatching of the character string information from the plurality of segments. .

The record comparing unit first selects a predetermined number of the plurality of segments, detects the match or mismatch of the character string information in the predetermined number of the segments, and then matches or mismatches the string information. 3. The matching processing apparatus according to claim 2, wherein the matching processing apparatus is configured to detect a match or mismatch of the character string information while gradually decreasing the number of segments to be compared from the predetermined number.

The record comparison unit is configured to exclude, from the comparison targets, records in the first and second databases in which a match of the character string information in the corresponding segments has already been detected. The butting apparatus according to claim 2.

2. The matching result generation unit is configured to generate code information indicating which segment of the plurality of segments corresponds to all the compared segments including the common character string information. 5. The abutment processing apparatus according to any one of items 1 to 4.

The code information includes information based on the number of records in the first database including common character string information in all of the compared segments, and the second database including common character string information in all of the compared segments. The matching processing apparatus according to claim 5, further comprising: information based on the number of records in the data.

Whether the information based on the number of records in the first and second databases is the number of records in the first and second databases themselves, or whether the number of records in the first and second databases is singular or plural The matching processing apparatus according to any one of claims 1 to 6, wherein the information is information indicating the following.

8. The method according to claim 1, wherein the character string information is address information, and the segment setting unit is configured to set the plurality of segments based on a hierarchical structure defined in advance in the address information. 9. A collision processing apparatus according to one of the above.

A reconciliation processing method for reconciling a plurality of databases each storing a plurality of records including character string information,
A segment setting step of setting a plurality of segments according to a predetermined rule for the character string information;
The character string information included in the record in the first database of the plurality of databases is compared with the character string information included in the record in the second database of the plurality of databases. A record comparison step of detecting a match or mismatch of character string information in a segment;
When a record including the common character string information in all of the compared segments is present in both the first and second databases, information indicating that the common character string information is included in all of the compared segments is stored. Generating a matching result associated with each of the records in the first and second databases including character string information common to all of the compared segments.
Butting method.

A match processing program for causing a computer to execute a match processing method of performing a match processing of a plurality of databases each storing a plurality of records including character string information,
A segment setting step of setting a plurality of segments according to a predetermined rule for the character string information;
The character string information included in the record in the first database of the plurality of databases is compared with the character string information included in the record in the second database of the plurality of databases. A record comparison step of detecting a match or mismatch of character string information in a segment;
When a record including the common character string information in all of the compared segments is present in both the first and second databases, information indicating that the common character string information is included in all of the compared segments is stored. Generating a matching result associated with each of the records in the first and second databases including character string information common to all of the compared segments.
A program for causing a computer to execute the matching processing method.