JP5836423B2

JP5836423B2 - Data matching confirmation method and system

Info

Publication number: JP5836423B2
Application number: JP2014083184A
Authority: JP
Inventors: 飯塚　真玄; 真玄飯塚; 拓橋本; 伸介岩元
Original assignee: 株式会社Ｔｋｃ
Priority date: 2014-04-14
Filing date: 2014-04-14
Publication date: 2015-12-24
Anticipated expiration: 2034-04-14
Also published as: JP2015203986A

Description

本発明は、一般に、データ一致確認方法に関し、さらに詳しくは、ソース・データベースに格納されているデータと、複製データベースに格納されているデータとが一致しているか否かを、高速かつ低コストで確認するための方法およびシステムに関する。 The present invention generally relates to a data matching confirmation method, and more particularly, whether or not data stored in a source database matches data stored in a duplicate database at high speed and low cost. It relates to a method and a system for checking.

データのバックアップのために、ソース・データベースと複製データベースとを備えてなるデータベース・システムが広く適用されている。 For data backup, database systems comprising a source database and a duplicate database are widely applied.

この種のデータベース・システムでは、ソース・データベースに格納されているデータが、定期的に、複製データベースにコピーされている。 In this type of database system, data stored in the source database is periodically copied to the duplicate database.

最近では、例えば非特許文献１乃至３に示されるバルク・コピーのような、高速コピーのためのユーティリティも用意されており、これらによって、データのコピーは、より短時間で、かつ高い信頼性で実行されるようになった。 Recently, utilities for high-speed copying such as bulk copying shown in Non-Patent Documents 1 to 3, for example, have been prepared, which enables data copying in a shorter time and with higher reliability. It came to be executed.

Microsoft Developer Network ホームページ、http://msdn.microsoft.com/ja-jp/library/7ek5da1a(v=vs.110).aspxMicrosoft Developer Network home page, http://msdn.microsoft.com/en-us/library/7ek5da1a(v=vs.110).aspx Oracle Data Provider for .NET開発者ガイドホームページ、http://docs.oracle.com/cd/E16635_01/win.111/e06104/OracleBulkCopyClass.htmOracle Data Provider for .NET Developer's Guide home page, http://docs.oracle.com/cd/E16635_01/win.111/e06104/OracleBulkCopyClass.htm IBM DB2 Database for Linux, UNIX, and Windows インフォメーション・センターホームページ、http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.swg.im.dbclient.adonet.ref.doc%2Fdoc%2FDB2BulkCopyClass.htmlIBM DB2 Database for Linux, UNIX, and Windows Information Center home page, http://pic.dhe.ibm.com/infocenter/db2luw/v9r7/index.jsp?topic=%2Fcom.ibm.swg.im.dbclient. adonet.ref.doc% 2Fdoc% 2FDB2BulkCopyClass.html

このように、ソース・データベースから複製データベースへのデータのコピーがなされると、コピー終了後において、ソース・データベースに格納されているデータと、複製データベースにコピーされたデータとが一致していることを確認する必要がある。 In this way, when data is copied from the source database to the duplicate database, the data stored in the source database matches the data copied to the duplicate database after the copy is completed. It is necessary to confirm.

一般に、この種の確認には、多大な時間を要する。当然ながら、データベースに格納されているデータの量が多くなればなるほど、それに要する時間も長くなる。したがって、このような場合には、データベースの本来の運用に支障をきたす恐れがある。 In general, this type of confirmation takes a great deal of time. Of course, the larger the amount of data stored in the database, the longer it takes. Therefore, in such a case, there is a risk of hindering the original operation of the database.

しかしながら、最近におけるコピー技術に進歩により、コピーの信頼性は向上しており、実際には、コピーは正しく行われ、コピー後において、ソース・データベースに格納されているデータと、複製データベースに格納されているデータとが一致している可能性は極めて高い。 However, due to recent advances in copy technology, the reliability of copying has improved, and in practice, copying is performed correctly, and after copying, the data stored in the source database and the copy database are stored. It is very likely that the data is consistent.

以上を考慮すると、コピー後におけるデータ一致確認は、長時間をかけて詳細に行うよりも、簡易的でも良いから短時間に確認できるようにすることが望ましい。そして、とりあえず、データの不一致がないことを確認できれば、コピーが正しく実行されたものと判断することにより、データベースの本来の使用に、迅速に戻ることができる。 Considering the above, it is desirable that the data coincidence confirmation after copying is simpler than performing it in detail over a long time, so that it can be confirmed in a short time. For the time being, if it can be confirmed that there is no data mismatch, it can be quickly returned to the original use of the database by determining that the copy has been executed correctly.

一方、データに不一致があると判定された場合には、周知のツールを用いて、どのデータに不一致があるのかを、多少時間がかかったとしても、詳細に発見すればよい。 On the other hand, if it is determined that there is a mismatch in the data, it is only necessary to find out in detail which data has a mismatch using a known tool even if it takes some time.

本発明はこのような事情に鑑みてなされたものであり、ソース・データベースに格納されたデータ・レコードを複製データベースにコピーした場合、ソース・データベースに格納されているデータと、複製データベースにコピーされたデータとが一致しているか否かを、簡易的かつ短時間に判定することが可能な、データ一致確認方法およびシステムを提供することを目的とする。 The present invention has been made in view of such circumstances. When a data record stored in a source database is copied to a duplicate database, the data stored in the source database is copied to the duplicate database. An object of the present invention is to provide a data matching confirmation method and system capable of simply and quickly determining whether or not the data matches.

上記の目的を達成するために、本発明では、以下のような手段を講じる。 In order to achieve the above object, the present invention takes the following measures.

すなわち、請求項１の発明は、それぞれデータを有する複数のデータ項目からなるデータ・レコードを多数格納するソース・データベースに格納されているデータと、ソース・データベースに格納されているデータの複製を格納するための複製データベースに格納されているデータとが一致している否かを確認するためのシステムである。 That is, the invention of claim 1 stores data stored in a source database that stores a large number of data records each consisting of a plurality of data items each having data, and a copy of the data stored in the source database. This is a system for confirming whether or not the data stored in the replication database for matching is the same.

このシステムは、データ取得手段と、メモリと、ハッシュ計算手段と、比較手段とを備える。 This system includes a data acquisition unit, a memory, a hash calculation unit, and a comparison unit.

データ取得手段は、ソース・データベースに格納されているデータ・レコードのうちの任意の１件のデータ・レコードを、ソース・データベースから取得して前記メモリに格納する第１の処理を実行する。 The data acquisition means executes a first process of acquiring any one of the data records stored in the source database from the source database and storing it in the memory.

ハッシュ計算手段は、メモリに格納されたデータ・レコードについて、このデータ・レコードの各データ項目に対応する各データに対して、予め定められたアルゴリズムを適用することによって得られるハッシュ値を求め、求められた各ハッシュ値を、それぞれ対応するデータ項目に関連付けて前記メモリに記憶させ、しかる後に、メモリの使用量を低減するために、このデータ・レコードをメモリから削除する第２の処理を実行する。 The hash calculation means obtains a hash value obtained by applying a predetermined algorithm to each data corresponding to each data item of the data record stored in the memory, and obtains the hash value. Each hash value obtained is stored in the memory in association with the corresponding data item, and thereafter, a second process of deleting the data record from the memory is executed in order to reduce the memory usage. .

データ取得手段は、ソース・データベースから、未取得の任意の１件のデータ・レコードを取得してメモリに格納する第３の処理を実行する。 The data acquisition means executes a third process of acquiring any one unacquired data record from the source database and storing it in the memory.

ハッシュ計算手段は、メモリに格納されたデータ・レコードについて、このデータ・レコードの各データ項目に対応する各データに対して、アルゴリズムを適用することによって得られる各ハッシュ値を求め、求められた各ハッシュ値を、それぞれ対応するデータ項目に関連付けて記憶されているそれぞれのハッシュ値に加えることによって、記憶されている各ハッシュ値を更新するように、１つのデータ・レコードずつ加算方式でハッシュ値を更新してメモリに記憶させ、しかる後に、メモリの使用量を低減するために、このデータ・レコードをメモリから削除する第４の処理を実行する。 The hash calculation means obtains each hash value obtained by applying an algorithm to each data corresponding to each data item of the data record stored in the memory, and determines each obtained By adding the hash value to each stored hash value in association with the corresponding data item, the hash value is added by one data record so as to update each stored hash value. The data is updated and stored in the memory, and then a fourth process for deleting the data record from the memory is executed in order to reduce the memory usage .

データ取得手段による第３の処理と、ハッシュ計算手段による第４の処理とを、ソース・データベースに格納されている最後のデータ・レコードまで実行することによって、ハッシュ計算手段が、ソース・データベースにおける前記各データ項目毎に、ハッシュ値の総和を取得する第５の処理を実行する。 By executing the third process by the data acquisition unit and the fourth process by the hash calculation unit up to the last data record stored in the source database, the hash calculation unit performs the above process in the source database. For each data item, a fifth process for acquiring the sum of hash values is executed.

データ取得手段とハッシュ計算手段とが、第１乃至第５の処理を、ソース・データベースに代えて、複製データベースに対して実行することによって、複製データベースにおける各データ項目毎に、ハッシュ値の総和を取得する第６の処理を実行する。 The data acquisition means and the hash calculation means execute the first to fifth processes on the duplicate database instead of the source database, thereby calculating the sum of hash values for each data item in the duplicate database. The sixth process to be acquired is executed.

比較手段は、第５の処理で取得された各ハッシュ値の総和と、第６の処理で取得された各ハッシュ値の総和とを、同じデータ項目毎に比較し、ハッシュ値の総和が一致しないデータ項目がある場合には、ソース・データベースに格納されているデータと、複製データベースに格納されているデータとが一致していないと判定し、ハッシュ値の総和が一致しないデータ項目がない場合には、ソース・データベースに格納されているデータと、複製データベースに格納されているデータとが一致していると判定する第７の処理を実行することによって、データ取得手段による、ソース・データベースからのデータ・レコードの取得も、複製データベースからのデータ・レコードの取得も、それぞれ任意の順序で実施しながら、ソース・データベースに格納されているデータ・レコードと、複製データベースに格納されているデータ・レコードとの一致性の確認を可能にしている。 The comparison means compares the sum of the hash values acquired in the fifth process and the sum of the hash values acquired in the sixth process for each data item, and the sum of the hash values does not match. If there is a data item, it is determined that the data stored in the source database and the data stored in the duplicate database do not match, and there is no data item whose sum of hash values does not match Performs a seventh process for determining that the data stored in the source database and the data stored in the duplicate database match , thereby causing the data acquisition means to Both the acquisition of data records and the acquisition of data records from the duplicate database are performed in any order, while the source database is Thereby enabling the data record stored, the confirmation of coincidence of the data records stored in the copy database.

また、請求項２の発明は、請求項１のシステムに適用された方法である。 The invention of claim 2 is a method applied to the system of claim 1.

本発明に係るデータ一致化のための方法およびシステムによれば、ソース・データベースに格納されたデータ・レコードを複製データベースにコピーした場合、ソース・データベースに格納されているデータと、複製データベースにコピーされたデータとが一致しているか否かを、簡易的かつ短時間に判定することが可能となる。 According to the method and system for data matching according to the present invention, when a data record stored in a source database is copied to a duplicate database, the data stored in the source database is copied to the duplicate database. It is possible to easily and quickly determine whether or not the data matches.

本発明の実施形態に係るデータ一致確認方法が適用されたシステムの構成例を示す概念図である。It is a conceptual diagram which shows the structural example of the system by which the data matching confirmation method which concerns on embodiment of this invention was applied. ソース・データベースに格納されているデータのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the data stored in the source database. 本発明の実施形態に係るデータ一致確認方法が適用されたシステムの動作例を示すフローチャートである。It is a flowchart which shows the operation example of the system to which the data matching confirmation method which concerns on embodiment of this invention was applied. 図２に示すデータに対応するハッシュ値の例を示す図である。It is a figure which shows the example of the hash value corresponding to the data shown in FIG. 本発明の別の実施形態に係るデータ一致確認方法が適用されたシステムの構成例を示す概念図である。It is a conceptual diagram which shows the structural example of the system by which the data matching confirmation method which concerns on another embodiment of this invention was applied.

以下に、本発明を実施するための最良の形態について図面を参照しながら説明する。 The best mode for carrying out the present invention will be described below with reference to the drawings.

図１は、本発明の実施形態に係るデータ一致確認方法が適用されたシステムの構成例を示す概念図である。 FIG. 1 is a conceptual diagram showing a configuration example of a system to which a data matching confirmation method according to an embodiment of the present invention is applied.

このシステム１０は、ソース・データベース１２と複製データベース１４との間でのデータの一致を確認するためのシステムであって、ソース・データベース１２と複製データベース１４の間に設けられ、データ取得部１６と、メモリ１８と、ハッシュ計算部２０と、比較部２２とを備えている。 This system 10 is a system for confirming the coincidence of data between the source database 12 and the duplicate database 14, and is provided between the source database 12 and the duplicate database 14. , A memory 18, a hash calculation unit 20, and a comparison unit 22.

なお、ソース・データベース１２とシステム１０との間、および／または、複製データベース１４とシステム１０との間は、例えばインターネットのような通信ネットワーク（図示せず）によって接続されていても良い。この場合、通信ネットワークは、イーサネット（登録商標）等のＬＡＮ、あるいは公衆回線や専用回線を介して複数のＬＡＮが接続されるＷＡＮ等からなる。ＬＡＮの場合には、必要に応じてルータを介した多数のサブネットから構成される。また、ＷＡＮの場合には、公衆回線に接続するためのファイアウォール等を適宜備えているが、ここではその図示及び詳細説明を省略する。 Note that the source database 12 and the system 10 and / or the replication database 14 and the system 10 may be connected by a communication network (not shown) such as the Internet. In this case, the communication network includes a LAN such as Ethernet (registered trademark) or a WAN to which a plurality of LANs are connected via a public line or a dedicated line. In the case of a LAN, it is composed of a number of subnets via routers as necessary. In the case of a WAN, a firewall or the like for connecting to a public line is provided as appropriate, but illustration and detailed description thereof are omitted here.

図２は、ソース・データベース１２に格納されているデータのデータ構造２４の一例を示す図である。すなわち、このデータ構造２４は、データ項目と、データ・レコードとの２次元で形成される。 FIG. 2 is a diagram illustrating an example of the data structure 24 of data stored in the source database 12. In other words, the data structure 24 is formed in two dimensions of a data item and a data record.

図２では、データ項目として、（１）個人番号、（２）氏名、（３）住所が例示されている。もちろん、データ項目の数は、３つに限定されるものではなく、それより多い場合も少ない場合もありうる。また、データ項目の内容も、これらに限定されるものではない。また、データ・レコードとは、このような各データ項目に対応するデータからなる１セットに対して与えられる番号に相当する。 In FIG. 2, (1) personal number, (2) name, and (3) address are illustrated as data items. Of course, the number of data items is not limited to three, and there may be more or less data items. Further, the contents of the data items are not limited to these. A data record corresponds to a number given to a set of data corresponding to each data item.

したがって、ソース・データベース１２は、おのおのデータ（例えば、図２に示すデータ・レコード１の場合、“０００１０”、「日本太郎」、「東京都港区虎ノ門１−２−３」）を有する複数のデータ項目（例えば、図２に示す例の場合、（１）個人番号、（２）氏名、（３）住所）からなるデータ・レコードを多数格納している。 Accordingly, the source database 12 has a plurality of pieces of data (for example, “00010”, “Taro Nihon”, “1-2-3 Toranomon Minato-ku, Tokyo” in the case of the data record 1 shown in FIG. 2). A large number of data records including data items (for example, (1) personal number, (2) name, and (3) address) in the example shown in FIG. 2 are stored.

また、複製データベース１４は、ソース・データベース１２に格納されているデータの複製を格納するためのデータベースであり、データは、ソース・データベース１２と同様に、図２に示すようなデータ構造２４で格納される。 The duplicate database 14 is a database for storing a duplicate of data stored in the source database 12, and the data is stored in a data structure 24 as shown in FIG. Is done.

このようなソース・データベース１２と複製データベース１４とにおいて、同じデータが格納されているか否かを判定するために、システム１０が実行する処理を、図３のフローチャートとともに以下の通り説明する。 The processing executed by the system 10 in order to determine whether or not the same data is stored in the source database 12 and the duplicate database 14 will be described below with reference to the flowchart of FIG.

まず、データ取得部１６が、ソース・データベース１２に格納されている１件のデータ・レコード（例えば、データ・レコード１）のデータを、ソース・データベース１２から取得してメモリ１８に格納する（ステップＳ１）。なお、データ取得部１６がソース・データベース１２から取得するデータ・レコードの順番は、任意であってよく、必ずしもデータ・レコード１を最初に取得する必要はない。 First, the data acquisition unit 16 acquires data of one data record (for example, data record 1) stored in the source database 12 from the source database 12 and stores it in the memory 18 (step S1). S1). Note that the order of the data records acquired by the data acquisition unit 16 from the source database 12 may be arbitrary, and it is not always necessary to acquire the data record 1 first.

ステップＳ１に続いて、ハッシュ計算部２０が、ステップＳ１においてメモリ１８に格納されたデータ・レコード（例えば、データ・レコード１）の各データ項目（例えば、（１）個人番号、（２）氏名、（３）住所）に対応する各データ（例えば、“０００１０”、「日本太郎」、「東京都港区虎ノ門１−２−３」）に対して、予め定められたアルゴリズムを適用することによって、ハッシュ値をそれぞれ求める（ステップＳ２ａ）。このアルゴリズムは、ある数値（例えば、“００１０”）や、ある文字（例えば、「日」）に対して、予め定められた数値を、ハッシュ値として出力するようにしている。 Subsequent to step S1, the hash calculation unit 20 performs each data item (for example, (1) personal number, (2) name, data record) of the data record (for example, data record 1) stored in the memory 18 in step S1. (3) By applying a predetermined algorithm to each data corresponding to (address) (for example, “00010”, “Taro Nihon”, “1-2-3 Toranomon, Minato-ku, Tokyo”), Each hash value is obtained (step S2a). This algorithm outputs a predetermined numerical value as a hash value for a certain numerical value (for example, “0010”) or a certain character (for example, “day”).

このようなアルゴリズムを用いることによって、ハッシュ計算部２０は、データ・レコード（例えば、データ・レコード１）のデータ項目（１）個人番号のデータ（“０００１０”）、データ項目（２）氏名のデータ（「日本太郎」）、データ項目（３）住所のデータ（「東京都港区虎ノ門１−２−３」）のそれぞれについてハッシュ値を求める。 By using such an algorithm, the hash calculation unit 20 uses the data item (1) personal number data (“00010”) and the data item (2) name data of the data record (for example, data record 1). (“Nippon Taro”), data item (3) Hash value is obtained for each address data (“Toranomon 1-2-3 Toranomon, Minato-ku, Tokyo”).

図４は、データ・レコード１のデータ項目（１）、（２）、（３）について求められたハッシュ値がそれぞれ“２３”，“１２１”，“３２３”であることを示している。データ・レコード１のデータ項目（２）のデータである「日本太郎」に対応するハッシュ値として、例として“１２１”が得られたことを示している。また、データ・レコード１のデータ項目（３）のデータである「東京都港区虎ノ門１−２−３」に対応するハッシュ値として、例として“３２３”が得られたことを示している。 FIG. 4 shows that the hash values obtained for the data items (1), (2), and (3) of the data record 1 are “23”, “121”, and “323”, respectively. As an example, “121” is obtained as a hash value corresponding to “Nihon Taro”, which is the data item (2) of the data record 1. In addition, “323” is obtained as an example of the hash value corresponding to “Toranomon 1-2-3 Toranomon, Minato-ku, Tokyo” which is the data item (3) of the data record 1.

ハッシュ計算部２０はさらに、このようにしてデータ・レコード（例えば、データ・レコード１）のデータ項目（１）、（２）、（３）のおのおのについて求めた各ハッシュ値を、それぞれ対応するデータ項目（１）、（２）、（３）に関連付けて、メモリ１８に記憶させる。そして、メモリ１８への記憶が完了すると、ハッシュ値の計算に供されたこのデータ・レコード（例えば、データ・レコード１）をメモリ１８から削除する（ステップ２ｂ）。 The hash calculation unit 20 further uses the hash values obtained for the data items (1), (2), and (3) of the data record (for example, the data record 1) in this way as the corresponding data. The information is stored in the memory 18 in association with the items (1), (2), and (3). When the storage in the memory 18 is completed, the data record (for example, data record 1) used for the calculation of the hash value is deleted from the memory 18 (step 2b).

データ取得部１６は、ソース・データベース１２から、次の１件のデータ・レコード（例えば、データ・レコード２）のデータを取得してメモリ１８に格納する（ステップＳ３）。なお、前述したように、データ取得部１６がソース・データベース１２から取得するデータ・レコードの順番は、任意であってよく、例えば、必ずしもデータ・レコード１の次にデータ・レコード２を取得する必要はない。 The data acquisition unit 16 acquires the data of the next one data record (for example, data record 2) from the source database 12 and stores it in the memory 18 (step S3). As described above, the order of the data records acquired by the data acquisition unit 16 from the source database 12 may be arbitrary. For example, it is necessary to acquire the data record 2 next to the data record 1. There is no.

ハッシュ計算部２０は次に、メモリ１８に格納されたこのデータ・レコード（例えば、データ・レコード２）の各データ項目（例えば、（１）個人番号、（２）氏名、（３）住所）に対応する各データ（例えば、“００２００”、「宇都宮花子」、「栃木県宇都宮市山田町１−２−３」）に対しても、前のデータ・レコード（例えば、データ・レコード１）に適用されたものと同じアルゴリズムを適用することによって、それぞれハッシュ値を求める（ステップ４ａ）。図４は、データ・レコード２の各データ項目（１）、（２）、（３）に対応するデータについて求められたハッシュ値が、それぞれ“３１”，“２２５”，“１９８”であることを示している。 The hash calculation unit 20 then stores each data item (for example, (1) personal number, (2) name, (3) address) of this data record (for example, data record 2) stored in the memory 18. Applicable to the previous data record (for example, data record 1) for each corresponding data (for example, “00200”, “Hanako Utsunomiya”, “1-2-3 Yamadacho, Utsunomiya City, Tochigi Prefecture)” Each hash value is obtained by applying the same algorithm as described above (step 4a). FIG. 4 shows that the hash values obtained for data corresponding to the data items (1), (2), and (3) of the data record 2 are “31”, “225”, and “198”, respectively. Is shown.

ハッシュ計算部２０はさらに、データ・レコード２の各データ項目（１）、（２）、（３）について求められた各ハッシュ値（“３１”，“２２５”，“１９８”）を、それぞれ対応するデータ項目（１）、（２）、（３）に関連付けてメモリ１８に既に記憶されているハッシュ値（例えば、データ・レコード１のデータ項目（１）、（２）、（３）のハッシュ値）にそれぞれ加えることによって、メモリ１８に記憶されている各ハッシュ値を更新する。例えば、メモリ１８に、データ・レコード１のデータ項目（１）のハッシュ値として“２３”、データ項目（２）のハッシュ値として“１２１”、データ項目（３）のハッシュ値として“３２３”が記憶されており、今回、データ・レコード２のデータ項目（１）ハッシュ値として“３１”、データ項目（２）のハッシュ値として“２２５”、データ項目（３）のハッシュ値として“１９８”がそれぞれ計算されたのであれば、データ項目（１）のハッシュ値を、２３＋３１＝５４に更新し、データ項目（２）のハッシュ値を、１２１＋２２５＝３４６に更新し、データ項目（３）のハッシュ値を、３２３＋１９８＝５２１に更新して、メモリ１８に記憶させる。このようにしてハッシュ値が更新され、記憶されると、ハッシュ計算部２０は、ハッシュ計算が実施されたデータ・レコード（例えば、データ・レコード２）を、メモリ１８から削除する（ステップＳ４ｂ）。 The hash calculation unit 20 further corresponds to each hash value (“31”, “225”, “198”) obtained for each data item (1), (2), (3) of the data record 2 respectively. Hash values already stored in the memory 18 in association with the data items (1), (2), (3) to be performed (for example, the hash of the data items (1), (2), (3) of the data record 1) Each hash value stored in the memory 18 is updated by adding to each value. For example, “23” is stored as the hash value of the data item (1) of the data record 1, “121” is stored as the hash value of the data item (2), and “323” is stored as the hash value of the data item (3). This time, “31” is stored as the hash value of the data item (1), “225” is stored as the hash value of the data item (2), and “198” is stored as the hash value of the data item (3). If calculated, the hash value of the data item (1) is updated to 23 + 31 = 54, the hash value of the data item (2) is updated to 121 + 225 = 346, and the hash value of the data item (3) is updated. Is updated to 323 + 198 = 521 and stored in the memory 18. When the hash value is updated and stored in this manner, the hash calculation unit 20 deletes the data record (for example, data record 2) on which the hash calculation has been performed from the memory 18 (step S4b).

データ取得部１６によるステップＳ３の処理と、ハッシュ計算部２０によるステップＳ４ａおよびステップＳ４ｂの処理とを、ソース・データベース１２に格納されている最後のデータ・レコードまで実行することによって、ハッシュ計算部２０は、ソース・データベース１２における各データ項目毎に、ハッシュ値の総和を取得する（ステップＳ５）。図４は、データ項目（１）のハッシュ値の総和が“５８４”であり、データ項目（２）のハッシュ値の総和が“２３２４”であり、データ項目（３）のハッシュ値の総和が“４３２”であることを示している。 By executing the processing of step S3 by the data acquisition unit 16 and the processing of steps S4a and S4b by the hash calculation unit 20 up to the last data record stored in the source database 12, the hash calculation unit 20 Obtains the sum of hash values for each data item in the source database 12 (step S5). In FIG. 4, the sum of hash values of the data item (1) is “584”, the sum of hash values of the data item (2) is “2324”, and the sum of hash values of the data item (3) is “ 432 ″.

このようなステップＳ１乃至ステップＳ５からなる処理を、複製データベース１４に対しても同様に実行することによって、複製データベース１４に格納されているすべてのデータに対しても同様に、各データ項目についてハッシュ値の総和を取得する。本実施形態に係るシステム１０は、このようにして、図３のフローチャートに示すような処理を、ソース・データベース１２および複製データベース１４に対してそれぞれ実行することによって、ソース・データベース１２と複製データベース１４とのそれぞれについて、データ項目毎のハッシュ値の総和を取得する。なお、このような処理を、ソース・データベース１２に対して先に行っても、複製データベース１４に対して先に行っても、何れでも良い。また、図５に示すように、データ取得部１６と、メモリ１８と、ハッシュ計算部２０とを２系統設けることによって、このような処理を、両データベースに対して並行して行うようにしても良い。 By performing the process including steps S1 to S5 on the duplicate database 14 in the same manner, all data stored in the duplicate database 14 is similarly hashed for each data item. Get the sum of values. In this way, the system 10 according to the present embodiment performs the processing as shown in the flowchart of FIG. 3 on the source database 12 and the duplicate database 14 respectively, so that the source database 12 and the duplicate database 14 are processed. For each of and, the total of hash values for each data item is acquired. Such processing may be performed first for the source database 12 or first for the duplicate database 14. Further, as shown in FIG. 5, by providing two systems of the data acquisition unit 16, the memory 18, and the hash calculation unit 20, such a process may be performed on both databases in parallel. good.

このようにして、ソース・データベース１２と複製データベース１４とのそれぞれについて、各データ項目のハッシュ値の総和が取得されると、比較部２２が、ソース・データベース１２の各データ項目について取得された各ハッシュ値の総和と、複製データベース１４の各データ項目について取得された各ハッシュ値の総和とを、対応するデータ項目毎に比較する。すなわち、比較部２２は、データ項目（１）、データ項目（２）、およびデータ項目（３）毎にそれぞれ、ソース・データベース１２に格納されたデータに関して取得されたハッシュ値の総和と、複製データベース１４に格納されたデータに関して取得されたハッシュ値の総和とを比較する。そして、データ項目（１）、（２）、（３）のうちの何れかに、ハッシュ値の総和が一致しないデータ項目がある場合には、ソース・データベース１２に格納されているデータと、複製データベース１４に格納されているデータとの中に、一致していないデータがあると判定する。一方、ハッシュ値の総和が一致しないデータ項目がない場合、つまり、データ項目（１）、（２）、（３）のいずれについても、ソース・データベース１２に格納されたデータに関して取得されたハッシュ値の総和と、複製データベース１４に格納されたデータに関して取得されたハッシュ値の総和とが一致している場合、ソース・データベース１２に格納されているデータと、複製データベース１４に格納されているデータとがすべて一致していると判定する。これによって、ソース・データベース１２に格納されたデータが、複製データベース１４に正しくコピーされていることを確認することができる。 In this way, when the sum of hash values of each data item is obtained for each of the source database 12 and the duplicate database 14, the comparison unit 22 obtains each data item obtained for each data item of the source database 12. The sum of hash values and the sum of hash values acquired for each data item in the replication database 14 are compared for each corresponding data item. That is, the comparison unit 22 calculates the sum of hash values acquired for the data stored in the source database 12 for each data item (1), data item (2), and data item (3), and the duplicate database. 14 is compared with the sum of the hash values acquired for the data stored in 14. If any of the data items (1), (2), and (3) has a data item whose sum of hash values does not match, the data stored in the source database 12 and a copy It is determined that there is data that does not match the data stored in the database 14. On the other hand, when there is no data item in which the sum of hash values does not match, that is, for any of the data items (1), (2), and (3), the hash value acquired for the data stored in the source database 12 And the sum of the hash values acquired for the data stored in the replication database 14 match, the data stored in the source database 12 and the data stored in the replication database 14 Is determined to match. Thereby, it can be confirmed that the data stored in the source database 12 is correctly copied to the duplicate database 14.

上述したように、本実施形態に係るデータ一致確認方法が適用されたシステム１０においては、ソース・データベース１２に格納されているデータと、複製データベース１４に格納されているデータとが一致しているか否かを、すべてのデータを突き合わせて確認する訳ではないので、短時間に判定することが可能となる。 As described above, in the system 10 to which the data matching check method according to the present embodiment is applied, does the data stored in the source database 12 match the data stored in the duplicate database 14? It is possible to determine in a short time because not all data are checked against each other.

また、このシステム１０では、１件のデータ・レコードに関し、データ項目毎に、データに対応するハッシュ値が計算される。そして、ハッシュ値の計算が終わると、このデータ・レコードのデータはメモリ１８から削除され、メモリ１８は、データ項目毎のハッシュ値のみを保持する。 Further, in this system 10, a hash value corresponding to data is calculated for each data item with respect to one data record. When the calculation of the hash value is completed, the data of this data record is deleted from the memory 18, and the memory 18 holds only the hash value for each data item.

そして、次のデータ・レコードについても、各データ項目毎にデータに対応するハッシュ値が計算され、ハッシュ値の計算が終わると、このデータ・レコードのデータがメモリ１８から削除される。そして、計算されたハッシュ値が、すでにメモリ１８に格納されているハッシュ値に、データ項目毎に加算され、更新され、更新されたハッシュ値のみがメモリ１８に保持される。このような処理が、最後のデータ・レコードまで繰り返される。 Also for the next data record, a hash value corresponding to the data is calculated for each data item, and when the calculation of the hash value is completed, the data of this data record is deleted from the memory 18. Then, the calculated hash value is added to the hash value already stored in the memory 18 for each data item and updated, and only the updated hash value is held in the memory 18. Such a process is repeated until the last data record.

このように、本実施形態に係るデータ一致確認方法が適用されたシステム１０では、ハッシュ値の更新に供されたデータ・レコードは、メモリ１８から削除され、メモリ１８は、更新されたハッシュ値のみをデータ項目毎に保持する。よって、例えば、図２のように、３つのデータ項目からなるデータ・レコードがデータベースに格納されている場合、メモリ１８は、３つのハッシュ値しか保持しない。したがって、本システム１０では、メモリ１８の容量は低くてよく、データ・レコードの数がどんなに増えた場合であっても、データ項目数が変わらないのであれば、容量の増設は不要であるので、余分なコストの発生をもたらすことはない。 As described above, in the system 10 to which the data matching confirmation method according to the present embodiment is applied, the data record used for the update of the hash value is deleted from the memory 18, and the memory 18 includes only the updated hash value. Is stored for each data item. Thus, for example, as shown in FIG. 2, when a data record including three data items is stored in the database, the memory 18 holds only three hash values. Therefore, in the present system 10, the capacity of the memory 18 may be low, and no matter how much the number of data records increases, if the number of data items does not change, it is not necessary to increase the capacity. There is no extra cost.

また、ソース・データベース１２に格納されているすべてのデータと、複製データベース１４に格納されているすべてのデータとを対象に一致が確認されるので、ソース・データベース１２からデータ取得部１６に取得されるデータ・レコードの順序も、複製データベース１４からデータ取得部１６に取得されるデータ・レコードの順序も、任意の順序で良く、また、両データベース間で順序が異なっていても、一致確認結果に影響を及ぼすことはない。 In addition, since all the data stored in the source database 12 and all the data stored in the duplicate database 14 are confirmed, they are acquired from the source database 12 by the data acquisition unit 16. The order of the data records to be acquired and the order of the data records acquired from the duplicate database 14 to the data acquisition unit 16 may be arbitrary, and even if the order is different between the two databases, There is no effect.

以上、本発明を実施するための最良の形態について、添付図面を参照しながら説明したが、本発明はかかる構成に限定されない。特許請求の範囲の発明された技術的思想の範疇において、当業者であれば、各種の変更例及び修正例に想到し得るものであり、それら変更例及び修正例についても本発明の技術的範囲に属するものと了解される。 The best mode for carrying out the present invention has been described above with reference to the accompanying drawings, but the present invention is not limited to such a configuration. Within the scope of the invented technical idea of the scope of claims, a person skilled in the art can conceive of various changes and modifications. The technical scope of the present invention is also applicable to these changes and modifications. It is understood that it belongs to.

本発明に係るデータ一致確認方法およびシステムは、ソース・データベースと複製データベースとを備えてなる、例えば、自治体の住民データ管理のためのデータベース・システムや、金融機関、証券会社、保険会社等のデータベース・システムにも好適に適用される。 The data matching confirmation method and system according to the present invention comprises a source database and a duplicate database, for example, a database system for resident data management of local governments, databases of financial institutions, securities companies, insurance companies, etc. -It is applied suitably also to a system.

１０システム
１２ソース・データベース
１４複製データベース
１６データ取得部
１８メモリ
２０ハッシュ計算部
２２比較部
２４データ構造 DESCRIPTION OF SYMBOLS 10 System 12 Source database 14 Duplicated database 16 Data acquisition part 18 Memory 20 Hash calculation part 22 Comparison part 24 Data structure

Claims

Data stored in a source database that stores a large number of data records each consisting of a plurality of data items each having data, and stored in a replica database for storing a copy of the data stored in the source database A system for confirming whether or not the data matches
A data acquisition means, a memory, a hash calculation means, and a comparison means;
The data acquisition means executes a first process of acquiring any one of the data records stored in the source database from the source database and storing it in the memory And
The hash calculation means obtains a hash value obtained by applying a predetermined algorithm to each data corresponding to each data item of the data record stored in the memory. Each of the obtained hash values is stored in the memory in association with a corresponding data item, and then the data record is deleted from the memory in order to reduce the memory usage . Execute the process of
The data acquisition means executes a third process of acquiring any one unacquired data record from the source database and storing it in the memory,
The hash calculation means obtains each hash value obtained by applying the algorithm to each data corresponding to each data item of the data record for the data record stored in the memory, and Add one data record to update each stored hash value by adding each determined hash value to each stored hash value associated with the corresponding data item A hash value is updated in a manner and stored in the memory, and then, in order to reduce the memory usage, a fourth process of deleting the data record from the memory is performed,
By executing the third process by the data acquisition unit and the fourth process by the hash calculation unit up to the last data record stored in the source database, the hash calculation unit For each data item in the source database, execute a fifth process for obtaining the sum of hash values,
The data acquisition means and the hash calculation means execute the first to fifth processes on the duplicate database instead of the source database, thereby for each data item in the duplicate database. , Execute a sixth process for obtaining the sum of hash values,
The comparison means compares the sum of the hash values acquired in the fifth process with the sum of the hash values acquired in the sixth process for each same data item, and calculates the sum of the hash values. If there is a data item that does not match, it is determined that the data stored in the source database and the data stored in the duplicate database do not match, and the sum of hash values does not match If there is no item, execute a seventh process for determining that the data stored in the source database matches the data stored in the duplicate database ;
Data stored in the source database while the data acquisition unit performs acquisition of data records from the source database and acquisition of data records from the duplicate database in any order. A system capable of confirming the consistency between a record and a data record stored in the duplicate database .

Data stored in a source database that stores a large number of data records each consisting of a plurality of data items each having data, and stored in a replica database for storing a copy of the data stored in the source database A method for confirming whether or not the data is consistent,
A first process of acquiring any one of the data records stored in the source database from the source database and storing it in a memory;
For each data record stored in the memory, a hash value obtained by applying a predetermined algorithm to each data corresponding to each data item of the data record is obtained. Storing a hash value in association with each corresponding data item in the memory, and then deleting the data record from the memory to reduce the memory usage ;
A third process of acquiring any one unrecorded data record from the source database and storing it in the memory;
For each data record stored in the memory, a hash value obtained by applying the algorithm to each data corresponding to each data item of the data record is obtained, and each obtained hash value is obtained. The hash value is updated by an addition method for each data record so as to update each stored hash value by adding to each stored hash value in association with the corresponding data item. A fourth process for deleting the data record from the memory in order to reduce the amount of the memory used .
By executing the third process and the fourth process up to the last data record stored in the source database, a sum of hash values is obtained for each data item in the source database. A fifth process of acquiring
A sixth process for obtaining a sum of hash values for each data item in the duplicate database by executing the first to fifth processes on the duplicate database instead of the source database. When,
A data item in which the sum of hash values acquired in the fifth process and the sum of hash values acquired in the sixth process are compared for each same data item, and the sum of hash values does not match. If there is no data item in which the data stored in the source database and the data stored in the duplicate database do not match and the sum of hash values does not match includes a data stored in the source database, seventh processing <br/> viewed free determines that the data stored in the replicated database is matched,
The data record stored in the source database and the replica are obtained while performing acquisition of the data record from the source database and acquisition of the data record from the duplicate database in an arbitrary order. A method that allows you to check the consistency of data records stored in a database .