JP3797143B2

JP3797143B2 - Bulk loading system, bulk loading method, and bulk loading program

Info

Publication number: JP3797143B2
Application number: JP2001182399A
Authority: JP
Inventors: 義孝安村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2001-06-15
Filing date: 2001-06-15
Publication date: 2006-07-12
Anticipated expiration: 2021-06-15
Also published as: JP2002373094A

Description

【０００１】
【発明の属する技術分野】
本発明は、バルクロードシステム，バルクロード方法及びバルクロードプログラムに関し、特に、ディレクトリサービスにおいて入力ファイルに記述されたエントリの識別情報や属性情報を、ディレクトリサーバに一括ロードすることを可能とするバルクロードシステム，バルクロード方法及びバルクロードプログラムに関する。
【０００２】
【従来の技術】
従来、個人情報管理，ネットワーク管理，ファイル管理などのディレクトリサービスでは、人やコンピュータなどの資源情報をエントリで表現し、各エントリは、その名称などの付随する複数の情報を属性情報（アトリビュート）として保持している。
また、各エントリは、一般的に、それが属する組織や地域などにより階層構造に分類され、ディレクトリサーバを利用して一元管理される。
【０００３】
このようなディレクトリサーバを用いた技術の一例として、たとえば、特開２０００−２４２５３８号公報において、エントリの先祖関係を利用することにより、検索速度を向上させるとともに、記憶容量を削減できるディレクトリ検索システムの技術が提案されている。
このディレクトリ検索システムに使用されるディレクトリサーバは、エントリと属性のデータを記憶装置に登録する際、ディレクトリ検索の高速化のために利用するインデックス（先祖関係の表）を生成し、このインデックスをも一緒に登録する。
【０００４】
つまり、通常、ディレクトリサーバは、エントリデータ，属性データ及びインデックスデータを、ディレクトリデータとして管理しており、様々なインデックス付与方法が開発されており、このインデックスデータを利用して、ディレクトリの検索を高速で行う構成としてある。
【０００５】
また、上記ディレクトリサービスの関連する技術は、大規模なディレクトに対して、より高速かつ効率良く対応できるように、検索などの処理の高速化や、メモリ効率の向上などを目的として、様々な技術が開示されている。
【０００６】
ところで、一般的なディレクトリデータの生成方法としては、ユーティリティを利用して一つのエントリごとにディレクトリデータを生成する方法と、全エントリを任意のフォーマットに従ったファイルに記述しておいて、それを読み込んで一括してディレクトリデータをロードするツールを利用する方法とがあり、後者の方が、一括してディレクトリデータをロードできることから、大規模なディレクトリデータに対して、迅速かつ効率良く対応できる。
【０００７】
たとえば、インターネット上の標準ディレクトリサービスであるＬＤＡＰ（ＬｉｇｈｔｗｅｉｇｈｔＤｉｒｅｃｔｏｒｙＡｃｃｅｓｓＰｒｏｔｏｃｏｌ）には、ＬＤＩＦ（ＬＤＡＰＤａｔａＩｎｔｅｒｃｈａｎｇｅＦｏｒｍａｔ）と呼ばれるデータ交換のためのファイル形式がＲＦＣ２８４９で規定されており、ＬＤＡＰに準拠しているディレクトリサーバは、上記ＬＤＩＦのファイルを一括して読み込んだり書き込んだりする機能を備えているものが多い。
【０００８】
また、ＬＤＩＦのファイルを読み込んで一括してディレクトリデータをロードするには、通常、二つの方法が採用されており、一つは、ＬＤＡＰのプロトコルに従ってデータをロードする方法と、もう一つは、バックエンドにデータベースを利用している場合に、そのデータベースのアプリケーションとしてデータをロードする方法である。
ただし、どちらの方法も一つのエントリごとに、入力ファイルに記述されたデータをディレクトリサーバにロードして、ディレクトリデータを生成していた。
【０００９】
（第一従来例）
前者の方法として、たとえば、特開平１１−３４５２３４号公報において、共通フォーマットに従ったファイルから、一括してデータを読み込んで任意の処理を行うデータ処理装置の技術が提案されている。
このデータ処理装置は、ＣＳＶ（ＣｏｍｍａＳｅｐａｒａｔｅｄＶａｌｕｅ）と呼ばれる形式のファイルを利用して、候補となる社員を予め登録しておき、これら社員のデータを一括して読み込むことができる。
【００１０】
（第二従来例）
また、同様に前者の方法として、たとえば、特開平１０−３２６２８５号公報において、同じくＣＳＶファイルを用いてドキュメントを一括登録する、ドキュメント管理システムの技術が提案されている。
このドキュメント管理システムは、全ドキュメントの属性やファイル名などを格納しているデータベースシステムと、データベースシステムからのデータ読み出しによりドキュメント管理に必要なデータをＣＳＶ形式で作成する管理データファイルと、作成した管理データファイルのデータ読み込みと編集で作成されるＣＳＶ形式の一括登録ファイルと、一括登録ファイルを読み込んでドキュメントの登録を行う一括登録インタフェースと、一括登録インタフェースが実際に登録を行うオブジェクト指向データベースシステムと、登録処理でエラーが発生して登録できなかった場合に出力するエラーログファイルとで構成してある。
【００１１】
上記構成を有するドキュメント管理システムの一括登録処理は、次のように動作する。
まず、一括登録ファイルの１行目を読み込んで解析し、登録項目とフィールド順番の対応付けを行う。
続いて、一括登録ファイルの２行目以降を１行ずつ読み込み、前処理で得られた対応付けを基に登録用のデータ配列を生成する。
そして、エラーが無い場合、生成した登録用のデータ配列でデータベースに登録を行う。
また、エラーが有るときは、エラー内容とエラーレコードをログファイルにＣＳＶ形式で追加する。
【００１２】
（第三従来例）
一方、後者の方法として、データベースに格納するデータをファイルから一括してロードする方法としてバルクロード方式がある。
バルクロード方式をオブジェクト指向データベースに適用した一例として、１９９４年、プロシーディングズ・オブ・ブイ・エル・ディー・ビー・コンファレンス、１２０〜１３１頁（ＰｒｏｃｅｅｄｉｎｇｓｏｆＶＬＤＢＣｏｎｆｅｒｅｎｃｅ，１９９４，Ｐａｇｅｓ１２０−１３１）に掲載されたＢｕｌｋＬｏａｄｉｎｇｉｎｔｏａｎＯＯＤＢ：ＡＰｅｒｆｏｒｍａｎｃｅＳｔｕｄｙと題するＪａｎｅｔＬ．Ｗｉｅｎｅｒらによる論文がある。
【００１３】
また、この論文に記載されたバルクロード方式をさらに拡張した一例として、１９９５年、プロシーディングズ・オブ・ブイ・エル・ディー・ビー・コンファレンス、３０〜４１頁（ＰｒｏｃｅｅｄｉｎｇｓｏｆＶＬＤＢＣｏｎｆｅｒｅｎｃｅ，１９９５，Ｐａｇｅｓ３０−４１）に掲載されたＯＯＤＢＢｕｌｋＬｏａｄｉｎｇＲｅｖｉｓｉｔｅｄ：ＴｈｅＰａｒｔｉｔｉｏｎｅｄ−ＬｉｓｔＡｐｐｒｏａｃｈと題するＪａｎｅｔＬ．Ｗｉｅｎｅｒらによる論文がある。
【００１４】
これらの論文に記載されたバルクロード方式は、基本的に、次のような方式である。
新たに読み込まれるデータ（オブジェクト）は、あらかじめ、テキストファイルとして用意され、各オブジェクトには、一意な番号が割り当てられており、既に読み込まれたオブジェクトが記述されたテキストファイルは、全オブジェクトについて、ＩＤ（識別符号）が付与してあり、このＩＤを用いたＩＤマップで管理される。
【００１５】
つまり、既に読み込まれた各オブジェクトは、ＩＤマップで管理され、また、新たに読み込まれるオブジェクト（参照するオブジェクト）は、ＩＤマップおよび仮ＩＤを付与したＴｏＤｏリスト（逆参照があるときは、ＩｎｖＴｏＤｏリスト）で管理される。
【００１６】
また、新たに読み込まれるオブジェクトは、ＩＤマップとＴｏＤｏリスト（あるいは、ＩｎｖＴｏｄｏリスト）を利用して、更新しなければならない情報を含んだＵｐｄａｔｅリストが生成される。
そして、再び、新たに読み込まれるオブジェクトのテキストファイルを読み込んで、Ｕｐｄａｔｅリストを参照しながら、新たに読み込まれるオブジェクトを含むオブジェクトが生成される。
【００１７】
【発明が解決しようとする課題】
ところが、上記従来技術は、ディレクトリサービスで提供しているユーティリティを利用する場合（前者の方法）はもとより、入力ファイルを用いてディレクトリデータを一括ロードする場合（後者の方法）であっても、一つのエントリごとにデータをロードしなければならず、処理速度を向上させることができないといった問題があった。
【００１９】
また、データをロードした後のディレクトリサービスの検索性能がよくなるように、ディレクトリデータが効率よく配置されないといった問題があった。
【００２０】
本発明は、上記の問題を解決すべくなされたものであり、ディレクトリサービスにおけるディレクトリデータを、高速に、かつ、ディレクトリサービスの検索性能を向上させるように、クラスタリングして一括ロードできるバルクロードシステム，バルクロード方法及びバルクロードプログラムの提供を目的としている。
【００２１】
【課題を解決するための手段】
この目的を達成するため、本発明の請求項１記載のバルクロードシステムは、データ処理装置が、バルクロード制御手段と、当該バルクロード制御手段により制御される、入力ファイル解析手段，ハッシュテーブル管理手段，参照リスト管理手段，及び，データページ生成手段とを備えたバルクロードシステムであって、前記入力ファイル解析手段が、入力ファイルを解析して、一つのエントリごとにデータを読み込む第一回目の読み込みを行い、前記ハッシュテーブル管理手段が、前記エントリの識別情報をハッシュテーブルで管理し、前記参照リスト管理手段が、前記入力ファイル解析手段が読み込んだ前記エントリに対する実際の識別符号と異なる、前記入力ファイルのエントリに対する仮の識別符号を有する、仮の識別符号マップ及び／又は仮参照リストにもとづいて、参照リストを生成し、前記エントリの参照関係を前記参照リストで管理し、前記バルクロード制御手段が、前記参照リストにもとづいて、更新リストを生成し、前記入力ファイル解析手段が、前記入力ファイルのエントリのデータに対し、第二回目の読み込みを行い、前記データページ生成手段が、前記更新リストにもとづいて、第二回目の読み込みで取得した前記入力ファイルのエントリのデータを、データベースの格納形式で生成する構成としてある。
【００２２】
このようにすると、参照リスト管理手段が、エントリの参照関係を参照リストで管理し、すなわち、エントリの参照関係を予め取得するので、入力ファイルに含まれるエントリのデータを効率良くかつ高速でロードできる。また、エントリの参照関係を、一つのエントリを読み込むごとに、仮の識別符号マップ及び／又は仮参照リストに記憶させることができる。さらに、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを一括ロードできる。
【００２３】
また、請求項２記載の発明は、上記請求項１記載のバルクロードシステムにおいて、前記参照リストが、前記入力ファイル解析手段が読み込んだ前記エントリに対する実際の識別符号と、上位概念を表す前記エントリに対する親の識別符号と、下位概念を表す前記エントリに対する子の識別符号とを有する構成としてある。
【００２４】
このようにすると、エントリの参照関係を、エントリのデータにもとづいた親子関係で表すことができ、参照リストを容易に利用することができる。
【００２５】
また、請求項３記載の発明は、上記請求項２記載のバルクロードシステムにおいて、前記参照リストが、同位レベルの前記エントリに対して、前記エントリどうしの兄弟関係を示す、次の識別符号を有する構成としてある。
【００２６】
このようにすると、エントリの参照関係を、エントリのデータにもとづいた兄弟関係で表すことができ、参照リストを容易に利用することができる。
【００３１】
また、請求項４記載の発明は、上記請求項１〜３のいずれか一項に記載のバルクロードシステムにおいて、前記入力ファイル解析手段が、前記入力ファイルのすべてのエントリのデータに対し、第一回目及び第二回目の読み込みを行う構成としてある。
【００３２】
このようにすると、入力ファイルが複数のエントリからなる場合であっても、入力ファイルのすべてのエントリのデータを一括ロードできる。
【００３３】
また、本発明の請求項５記載のバルクロードシステムは、データ処理装置が、バルクロード制御手段と、当該バルクロード制御手段により制御される、入力ファイル解析手段，ハッシュテーブル管理手段，エントリソート実行手段，及び，データページ生成手段を備えたバルクロードシステムであって、前記入力ファイル解析手段が、入力ファイルを解析して、一つのエントリごとにデータを読み込み、前記ハッシュテーブル管理手段が、前記エントリの識別情報をハッシュテーブルで管理し、前記エントリソート実行手段が、前記エントリをエントリリストで管理し、前記データページ生成手段が、前記エントリのデータをデータベースの格納形式で生成する構成としてある。
【００３４】
このようにすると、エントリソート実行手段が、エントリをエントリリストで管理し、すなわち、エントリのデータ内容を予め取得するので、入力ファイルに含まれるエントリのデータを効率良くかつ高速でロードできる。
【００３５】
また、請求項６記載の発明は、上記請求項５記載のバルクロードシステムにおいて、前記入力ファイル解析手段が、前記入力ファイルのエントリのデータに対し、第一回目の読み込みを行い、前記エントリソート実行手段が、前記エントリリストにもとづいて、ソート処理を行い、前記入力ファイル解析手段が、前記入力ファイルのエントリのデータに対し、第二回目の読み込みを行い、前記ハッシュテーブル管理手段が、前記入力ファイル解析手段が第二回目に読み込んだ前記エントリに、実際の識別符号を割り当てし、この実際の識別符号で、前記エントリの識別情報をハッシュテーブルに登録し、さらに、上位概念を表す前記エントリに対して、親の識別符号を取得し、前記データページ生成手段が、前記ソート処理にもとづいて、第二回目の読み込みで取得した前記入力ファイルのエントリのデータを、データベースの格納形式で生成する構成としてある。
【００３６】
このようにすると、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを、検索性能が向上するようにクラスタリングして一括ロードできる。
【００３７】
また、請求項７記載の発明は、上記請求項６記載のバルクロードシステムにおいて、前記入力ファイル解析手段が、前記入力ファイルのすべてのエントリのデータに対し、第一回目及び第二回目の読み込みを行う構成としてある。
【００３８】
このようにすると、入力ファイルが複数のエントリからなる場合であっても、入力ファイルのすべてのエントリのデータを一括ロードできる。
【００３９】
また、本発明の請求項８記載のバルクロード方法は、データ処理装置が、バルクロード制御手段と、当該バルクロード制御手段により制御される、入力ファイル解析手段，ハッシュテーブル管理手段，参照リスト管理手段，及び，データページ生成手段を備えたバルクロード方法であって、前記入力ファイル解析手段が、入力ファイルを解析して、一つのエントリごとにデータを読み込む第一回目の読み込みを行い、前記ハッシュテーブル管理手段が、前記エントリの識別情報をハッシュテーブルで管理し、前記参照リスト管理手段が、前記入力ファイル解析手段が読み込んだ前記エントリに対する実際の識別符号と異なる、前記入力ファイルのエントリに対する仮の識別符号を有する、仮の識別符号マップ及び／又は仮参照リストにもとづいて、参照リストを生成し、前記エントリの参照関係を前記参照リストで管理し、前記バルクロード制御手段が、前記参照リストにもとづいて、更新リストを生成し、前記入力ファイル解析手段が、前記入力ファイルのエントリのデータに対し、第二回目の読み込みを行い、前記データページ生成手段が、前記更新リストにもとづいて、第二回目の読み込みで取得した前記入力ファイルのエントリのデータを、データベースの格納形式で生成する方法としてある。
【００４０】
このように、本発明は、バルクロード方法の発明としても有効であり、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを一括ロードでき、エントリのデータを効率良くかつ高速でロードできる。
【００４１】
また、本発明の請求項９記載のバルクロード方法は、データ処理装置が、バルクロード制御手段と、当該バルクロード制御手段により制御される、入力ファイル解析手段，ハッシュテーブル管理手段，エントリソート実行手段，及び，データページ生成手段を備えたバルクロード方法であって、前記入力ファイル解析手段が、入力ファイルを解析して、一つのエントリごとにデータを読み込み、前記ハッシュテーブル管理手段が、前記エントリの識別情報をハッシュテーブルで管理し、前記エントリソート実行手段が、前記エントリをエントリリストで管理し、さらに、前記エントリを前記エントリの識別情報によりソートし、前記データページ生成手段が、ソートした前記エントリの順番で、入力ファイルから再び読み込まれたエントリのデータを、データベースの格納形式で生成する方法としてある。
【００４２】
このように、本発明は、バルクロード方法の発明としても有効であり、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを、検索性能が向上するようにクラスタリングして一括ロードできる。
【００４３】
また、本発明の請求項１０記載のバルクロードプログラムは、コンピュータに、入力ファイルを解析して、一つのエントリごとにデータを読み込む第一回目の読み込みを行わせる処理、前記エントリの識別情報をハッシュテーブルで管理する処理、前記入力ファイル解析手段が読み込んだ前記エントリに対する実際の識別符号と異なる、前記入力ファイルのエントリに対する仮の識別符号を有する、仮の識別符号マップ及び／又は仮参照リストにもとづいて、参照リストを生成し、前記エントリの参照関係を前記参照リストで管理する処理、前記参照リストにもとづいて、更新リストを生成する処理、前記入力ファイルのエントリのデータに対し、第二回目の読み込みを行わせる処理、前記更新リストにもとづいて、第二回目の読み込みで取得した前記入力ファイルのエントリのデータを、データベースの格納形式で生成する処理、を実行させる構成としてある。
【００４４】
このように、本発明は、バルクロードプログラムの発明としても有効であり、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを一括ロードでき、エントリのデータを効率良くかつ高速でロードできる。
【００４５】
また、本発明の請求項１１記載のバルクロードプログラムは、コンピュータに、入力ファイルを解析して、一つのエントリごとにデータを読み込む処理、前記エントリの識別情報をハッシュテーブルで管理する処理、前記エントリをエントリリストで管理し、さらに、前記エントリを前記エントリの識別情報によりソートする処理、ソートした前記エントリの順番で、入力ファイルから再び読み込まれたエントリのデータを、データベースの格納形式で生成する処理、を実行させる構成としてある。
【００４６】
このように、本発明は、バルクロードプログラムの発明としても有効であり、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを、検索性能が向上するようにクラスタリングして一括ロードできる。
【００４７】
また、本発明の請求項１２記載のバルクロードシステムは、キーボード等の入力装置，データ処理装置，情報を記憶する記憶装置，情報を格納して管理するデータベース管理装置，記憶媒体，及び，ディスプレイ装置や印刷装置等の出力装置を備えたバルクロードシステムであって、上記請求項１０又は請求項１１に記載のバルクロードプログラムを搭載した構成としてある。
【００４８】
このように、バルクロードプログラムを搭載したバルクロードシステムとすることによって、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを一括ロードでき、又は、複数のエントリのデータを、検索性能が向上するようにクラスタリングして一括ロードできる。
【００４９】
【発明の実施の形態】
以下、本発明の実施形態について、図面を参照して説明する。
まず、本発明のバルクロードシステムの第一実施形態について、図面を参照して説明する。
【００５０】
「バルクロードシステムの第一実施形態」
図１は、本発明に係るバルクロードシステムの第一実施形態の基本構成を説明するための概略ブロック図を示している。
同図において、バルクロードシステムは、キーボード等の入力装置１，プログラム制御により動作するデータ処理装置２，情報を記憶する記憶装置３，情報を格納して管理するデータベース管理装置４，及び，ディスプレイ装置や印刷装置等の出力装置５とで構成してある。
【００５１】
記憶装置３は、入力ファイル記憶部３１と、ハッシュテーブル記憶部３２と、仮ＩＤマップ記憶部３３と、参照リスト記憶部３４とを備えた構成としてある。
【００５２】
上記入力ファイル記憶部３１は、入力となるＬＤＩＦファイルなどのファイルデータを記憶する。
また、ハッシュテーブル記憶部３２は、データ格納技法の一つであるハッシュ法で用いられるテーブルであり、文字列や数字などをキーワードとして、任意の情報を格納したハッシュテーブルを記憶する。
【００５３】
また、仮の識別符号マップ（適宜、仮ＩＤマップ２３０と略称する。図６参照。）を記憶する、仮ＩＤ記憶部３３は、エントリに付与された、仮の識別符号（適宜、仮ＩＤと略称する。）と実際の識別符号（適宜、実ＩＤと略称する。）の関係を記憶する。
なお、仮の識別符号は、入力ファイルのエントリに対する識別符号であり、かつ、実際の識別符号と番地の衝突などが発生しないように設定される、実際の識別符号と異なる識別符号をいい、たとえば、図６に示すように、実ＩＤが５，６のとき、仮ＩＤとして、−２，−３が設定される。
また、仮の識別符号マップとは、仮の識別符号と該仮の識別符号に対応する実際の識別符号とからなるマップをいう。
このように、実ＩＤの他に、仮ＩＤを用いることにより、ハッシュテーブルにおいて、番地の衝突などが発生しないように、ハッシュ法を適切に利用することができる。
【００５４】
さらにまた、参照リスト記憶部３４は、参照リスト２２０（図６参照）、すなわち、エントリの親子関係を、各エントリに付与した実ＩＤや仮ＩＤを利用して表したリストを記憶する。
【００５５】
データベース管理装置４は、データベースに格納される情報をページ形式で保持するデータページ記憶部４１を備えている。
このように、情報をページ形式で保持することにより、情報を出力する際に、あらためて、情報を出力形式にあわせて、変換する必要がないので、結果的に、処理速度を速めることができる。
【００５６】
データ処理装置２は、バルクロード制御手段２１と、バルクロード制御手段２１により制御される、入力ファイル解析手段２２，ハッシュテーブル管理手段２３，参照リスト管理手段２４，及び，データページ生成手段２５とを備えた構成としてある。
【００５７】
バルクロード制御手段２１は、入力装置１から、利用者からのＬＤＩＦファイルを利用したエントリロードの要求を受けると、その要求にしたがって入力ファイル解析手段２２と、ハッシュテーブル管理手段２３と、参照リスト管理手段２４と、データページ生成手段２５とを制御して、バルクロード処理を行う。
また、バルクロード制御手段２１は、バルクロード処理が終了すると、終了した旨を出力装置４に表示する。
【００５８】
ここで、入力ファイル解析手段２２は、バルクロード制御手段２１からのエントリロードの指示により、入力ファイル記憶部３１からＬＤＩＦファイルを読み込んで解析し、ＬＤＩＦファイルに含まれるエントリの情報を取得する。
【００５９】
ハッシュテーブル管理手段２３は、同様に、バルクロード制御手段２１からの指示により、ハッシュテーブル記憶部３２を利用して必要なハッシュテーブルを作成し、キーワードの値からハッシュ値を算出して、上記ＬＤＩＦファイルに含まれるエントリの識別情報を管理する。
【００６０】
また、参照リスト管理手段２４は、仮ＩＤマップ記憶部３３および参照リスト記憶部３４を利用して、エントリの親子関係を表す参照リスト２２０を作成し、バルクロード制御手段２１からの指示により、実ＩＤまたは仮ＩＤの付与された各エントリを管理する。
【００６１】
ここで、参照リスト２２０が、入力ファイル解析手段２２が読み込んだエントリに対する実ＩＤと、上位概念を表すエントリに対する親の識別符号（適宜、親ＩＤと略称する。）と、下位概念を表すエントリに対する子の識別符号（適宜、子ＩＤと略称する。）とを有する構成とするとよく、このようにすると、エントリの参照関係を、エントリのデータにもとづいた親子関係（上位概念および下位概念の組み合わせ）で表すことができ、参照リストを容易に利用することができる。
【００６２】
また、参照リスト２２０が、同位レベルのエントリに対して、エントリどうしの兄弟関係を示す、次の識別符号（適宜、次ＩＤと略称する。）を有する構成とするとよく、このようにすると、エントリの参照関係を、エントリのデータにもとづいた兄弟関係で表すことができ、参照リストを容易に利用することができる。
【００６３】
また、参照リスト２２０が、入力ファイルのエントリに対する仮ＩＤを有する、仮ＩＤマップ２３０及び／又は仮参照リスト２１０にもとづいて生成される構成とするとよく、このようにすると、エントリの参照関係を、一つのエントリを読み込むごとに、仮ＩＤマップ２３０及び／又は仮参照リスト２１０に記憶させることができ、システムの動作を単純化することができる。
なお、仮参照リスト２１０は、図６に示すように、仮ＩＤ、親ＩＤ、子ＩＤ及び次ＩＤからなるリストであり、親子関係などを仮参照リストから参照リストに登録し直すための、参照リスト２２０を捕捉する（あるいは、参照リストを構成する一部ともいえる）リストである。
【００６４】
また、データページ生成手段２５は、バルクロード制御手段２１からの指示により、参照リスト記憶部３４を利用して、各エントリのデータをデータベースのページ形式に構成し、データページ記憶部４１に格納する。
【００６５】
次に、上記構成のバルクロードシステムの動作について、図面を参照して説明する。
図２は、第一実施形態にかかるバルクロードシステムの動作を説明するための、概略フローチャート図を示している。
【００６６】
図２において、先ず、バルクロード制御手段２１は、入力装置１から入力されたＬＤＩＦファイルのファイル名を入力し、入力ファイル解析手段２２により、入力ファイル記憶部３１から該当するファイルを読み込んで、一つのエントリ（適宜、１エントリと略称する。）のデータを取得する（ステップＡ１）。
つまり、入力ファイル解析手段２２が、入力ファイルから、第一回目のエントリのデータを読み込む。
【００６７】
続いて、ハッシュテーブル管理手段２３が、ハッシュテーブル記憶部３２に、上記１エントリの識別情報と実ＩＤからなる、ハッシュテーブルを登録し、かつ、参照リスト管理手段２４が、仮ＩＤマップ記憶部３３および参照リスト記憶部３４に、上記１エントリの参照関係（親子関係および兄弟関係）を表す参照リストを登録する（ステップＡ２）。
【００６８】
そして、本実施形態におけるバルクロードシステムは、この１エントリが、入力ファイルの最終エントリかどうかを調べて（ステップＡ３）、もしそうでなければステップＡ１に戻り、次のエントリに対して、同様の処理を実施する。
【００６９】
また、ステップＡ３で、最終エントリまで処理が終了したら、参照リスト管理手段２４は、仮ＩＤマップ記憶部３３と参照リスト記憶部３４を利用して、更新リスト（図７参照）を生成する（ステップＡ４）。
更新リストは、実ＩＤ、親ＩＤ、子ＩＤ及び次ＩＤからなるリストであり、この更新リストにもとづいて、データページ生成手段２５が、第二回目の読み込みで取得した入力ファイルのエントリのデータを、データベースの格納形式で生成する。
【００７０】
続いて、入力ファイル解析手段２２は、入力ファイル記憶部３１から該当する入力ファイルを再び読み込んで、１エントリのデータを取得する（ステップＡ５）。つまり、入力ファイル解析手段２２が、入力ファイルから、第二回目のエントリのデータを読み込む。
【００７１】
次に、データページ生成手段２５は、取得した１エントリからエントリデータを生成し（ステップＡ６）、参照リスト記憶部３４の更新リストを参照してエントリデータに親エントリ（親の関係となるエントリ）を登録し（ステップＡ７）、また、子エントリ（子の関係となるエントリ）の集合を生成してエントリデータ（エントリデータの集合体）に登録する（ステップＡ８）。
つまり、データページ生成手段２５が、更新リストにもとづいて、第二回目の読み込みで取得した入力ファイルのエントリのデータを、データベースの格納形式で生成する。
【００７２】
このエントリデータは、ページ形式に構成してあり、ページの空きスペースがなくなればデータページ記憶部４１に格納する、すなわち、本実施形態におけるバルクロードシステムによれば、データ処理装置２によってまとめられた、エントリデータの集合体を一括でロードすることができる。
【００７３】
次に、このエントリが入力ファイルの最終エントリかどうかを調べて（ステップＡ９）、もしそうでなければステップＡ５に戻り、次のエントリに対して、同様の処理を実施する。
そして、ステップＡ９で、最終エントリまで処理が終了したら、ページ形式に構成したエントリデータを、データページ記憶部４１に格納し、処理終了を通知するために出力装置５に表示する。
【００７４】
このように、参照リスト管理手段２４が、エントリの参照関係を参照リスト２２０で管理し、すなわち、エントリの参照関係を予め取得するので、入力ファイルに含まれるエントリのデータを、効率良くかつ高速でロードできる。
また、更新リストを利用することにより、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを一括ロードできる。
【００７５】
次に、ハッシュテーブルおよび参照リストの登録の動作を、図面を参照して説明する。
図３は、第一実施形態にかかるバルクロードシステムにおける、ハッシュテーブルおよび参照リストの登録の動作を説明するための、概略フローチャート図を示している。
図３において、エントリに実ＩＤを割り当て（ステップＢ１）、自エントリ（当該実ＩＤを割り当てたエントリ）がハッシュテーブルに登録済かどうかを調べて（ステップＢ２）、もし登録済でなければ、続いて、親エントリ（自エントリの上位概念のエントリ）がハッシュテーブルに登録済かどうかを調べる（ステップＢ６）。
【００７６】
ここで、もし、登録済でなければ親エントリに、仮ＩＤを割り当て（ステップＢ１０）、親子関係を仮参照リストに登録し（ステップＢ１１）、ハッシュテーブルに識別情報と仮ＩＤを登録する（ステップＢ１２）。
そして、再び、ステップＢ６にもどる。
【００７７】
次に、ステップＢ６において、親エントリが仮ＩＤで登録されたので、ハッシュテーブルから親エントリのＩＤを取得し（ステップＢ７）、参照リストに親子関係を登録し（ステップＢ８）、ハッシュテーブルに識別情報と実ＩＤを登録する（ステップＢ９）。
【００７８】
なお、ステップＢ６で、親エントリが登録済であれば、ステップＢ１０〜Ｂ１２の処理が省かれる。
また、ステップＢ２で、自エントリが登録済であれば、仮ＩＤと実ＩＤを仮ＩＤマップに登録し（ステップＢ３）、親子関係を仮参照リストから参照リストに登録し直し（ステップＢ４）、ハッシュテーブルに識別情報と実ＩＤを再登録する（ステップＢ５）。
【００７９】
このように、バルクロード制御手段２は、ハッシュテーブル管理手段２３や参照リスト管理手段２４が、実ＩＤの他に、仮ＩＤを用い、さらに、仮ＩＤマップ，仮参照リスト，参照リスト，及び，更新リストを用いることにより、ハッシュテーブルにおいて、番地の衝突などが発生しないように、ハッシュ法を適切に利用することができる。
【００８０】
上述したように、バルクロードシステムの第一実施形態によれば、入力ファイルのエントリの読み込みを、図２に示すように、ステップＡ１とステップＡ５において２回行い、かつ、ステップＡ１の読み込みにおいて、図３に示すように、エントリの識別情報とエントリ間の参照関係を取得しておくことで、ステップＡ６のエントリデータを生成するときに、他のエントリデータを参照する必要がなくなる。
このため、一つずつエントリのデータをロードしなくて済み、すなわち、複数のエントリのデータを一括してロードすることが可能なバルクロードシステムを実現することができる。
【００８１】
また、入力ファイル解析手段２２が、入力ファイルのすべてのエントリのデータに対し、第一回目及び第二回目の読み込みを行う構成としてもよく、このようにすると、入力ファイルが複数のエントリからなる場合であっても、入力ファイルのすべてのエントリのデータを一括ロードできる。
【００８２】
＜実施例＞
次に、具体的な実施例を用いて第一実施形態の動作を説明する。
図４は、ＬＤＩＦファイルの一例を示す、概略図を示している。
同図において、「ｃ＝ＪＰ」の識別子を持つエントリがディレクトリのルート（最上位）であり、本実施例は、このエントリの配下に、下位のエントリを追加した実施例である。
【００８３】
最初の行の「ｖｅｒｓｉｏｎ：１」というのは、ＬＤＩＦの規格の版を表している。
次の行から空行で区切られた範囲を、「１エントリのデータ」と定義しており、このＬＤＩＦファイルには、「１エントリのデータ」が１０個含まれている（３０１〜３１０）。
また、各「１エントリのデータ」は、「ｄｎ：」で始まる行が識別情報を示し、その他の行が属性値（属性データ）を示している。
【００８４】
このＬＤＩＦファイルによりディレクトリをロードすると図５のようなディレクトリ階層が構築される。
ここで、エントリ１００は、ディレクトリ階層のルートとなり、予めディレクトリサーバに登録してある。これに対し、エントリ１０１〜１１０は、新規にロードされるエントリである。
【００８５】
各エントリ１０１〜１１０に記述してある名前は、相対識別名（ＲＤＮ：ＲｅｌａｔｉｖｅＤｉｓｔｉｎｇｕｉｓｈｅｄＮａｍｅ）であり、ルートまで遡る全てのエントリの相対識別名をカンマ（，）でつなげれば、そのエントリの識別名となる。
【００８６】
上記ＬＤＩＦファイルを読み込んで、エントリ間の親子関係を取得すると、たとえば、図６のような仮参照リスト２１０と参照リスト２２０、仮ＩＤマップ２３０が得られる。
【００８７】
図６において、仮ＩＤマップ２３０は、ルートとなるエントリ１００の実ＩＤを１とし、エントリに実ＩＤを付与するときは、１から順に増やしていき、また、エントリに仮ＩＤを付与するときは、−１から順に減らしていった。
なお、本実施例では、仮ＩＤとして、−１，−２，−３を使用し、それぞれ実ＩＤの１（図４の３０１），５（図４の３０５），６（図４の３０６）に対応している。
【００８８】
仮参照リスト２１０は、仮ＩＤ，親ＩＤ，子ＩＤ及び次ＩＤからなり、負の整数である仮ＩＤを除いて、正の整数であるＩＤ（１〜１０）は、上記「１エントリのデータ」（図４の３０１〜３１０）に対応している。
【００８９】
また、仮参照リスト２１０と参照リスト２２０は、実ＩＤに対して、子ＩＤと次ＩＤを各一つ有する構成としてあり、このようにすると、子ＩＤが二以上ある場合であっても、次ＩＤで兄弟のエントリをつなげることができるので、子ＩＤのエントリから次ＩＤを順々にたどれば、全ての子エントリを取得することができる。
【００９０】
仮ＩＤマップ２３０は、図６に示すような形式で、仮ＩＤマップ記憶部３３に格納され、また、仮参照リスト２１０と参照リスト２２０は、同様に図６に示すような形式で、参照リスト記憶部３４に格納される。
【００９１】
参照リスト管理手段２４は、仮ＩＤマップ記憶部３３と参照リスト記憶部３４に格納されているこれらの情報を利用して、上記ステップＡ４において、図７に示す更新リストを生成する。
この更新リストは、図６の参照リスト２２０をベースにして、仮ＩＤマップ２３０を用いて、仮ＩＤを実ＩＤに変更するだけで生成することができる。
【００９２】
次に、更新リストの読み方について説明する。
実ＩＤ１（図４の「１エントリのデータ」３０１）は、親ＩＤがなく（ルートのエントリ）、子ＩＤが５（図４の「１エントリのデータ」３０５）である。
また、子ＩＤにおいて５と表示された実ＩＤ５（図４の「１エントリのデータ」３０５）は、親ＩＤが１（図４の「１エントリのデータ」３０１）で、子ＩＤが２（図４の「１エントリのデータ」３０２）で、次ＩＤが６（図４の「１エントリのデータ」３０６）である。
【００９３】
また、次ＩＤにおいて６と表示された実ＩＤ６は、親ＩＤが１で、子ＩＤが４（図４の「１エントリのデータ」３０４）で、次ＩＤが７（図４の「１エントリのデータ」３０７）である。
また、次ＩＤにおいて７と表示された実ＩＤ７は、親ＩＤが１で、子ＩＤが９（図４の「１エントリのデータ」３０９）で、次ＩＤがない。
【００９４】
また、子ＩＤにおいて２と表示された実ＩＤ２は、親ＩＤが５で、子ＩＤがなく、次ＩＤが３（図４の「１エントリのデータ」３０３）である。
また、次ＩＤにおいて３と表示された実ＩＤ３は、親ＩＤが５で、子ＩＤおよび次ＩＤがない。
【００９５】
なお、更新リストは、他の実ＩＤ４，８，９，１０についても、同様の読み方で親子関係を表しており、任意の実ＩＤに対する親ＩＤ、子ＩＤ、次ＩＤを容易にかつ簡便に表すことができる。
【００９６】
また、データページ生成手段２５は、参照リスト記憶部３４内に生成された更新リストを参照し、ステップＡ７で親エントリの登録を行い、ステップＡ８で子エントリ集合を生成して、エントリデータに登録する。
【００９７】
このように、本発明のバルクロードシステムによれば、更新リストを参照して、子エントリの集合を生成して、エントリデータに登録するので、入力ファイルに含まれる複数のエントリを一括でロードすることができ、読み込み処理の高速化を図ることができる。
【００９８】
「バルクロードシステムの第二実施形態」
次に、本発明のバルクロードシステムの第二実施形態について図面を参照して詳細に説明する。
図８は、本発明に係るバルクロードシステムの第二実施形態の基本構成を説明するための概略ブロック図を示している。
同図において、本発明の第二実施形態のバルクロードシステムは、データ処理装置６が、バルクロード制御手段２１と、入力ファイル解析手段２２と、ハッシュテーブル管理手段２３と、エントリソート実行手段２６と、データページ生成手段２５とを備え、記憶装置７が、入力ファイル記憶部３１と、ハッシュテーブル記憶部３２と、エントリソート記憶部３５とを備えた構成としてある。
【００９９】
エントリソート記憶部３５は、図１０に示すように、各エントリの識別名とＬＤＩＦファイル内の行数をエントリリストとして記憶する。
また、エントリソート実行手段２６は、バルクロード制御手段２１からの指示により、各エントリの識別名と行数をエントリソート記憶部３５のエントリリストに登録し、それらのエントリを、識別名をキーとしてソートする。
その他の構成は、第一実施形態におけるバルクロードシステムと同様としてある。
【０１００】
次に、上記構成のバルクロードシステムの動作について、図面を参照して説明する。
図９は、第二実施形態にかかるバルクロードシステムの動作を説明するための、概略フローチャート図を示している。
また、図１０は、ＬＤＩＦファイルを読み込んで構築した、エントリリストを説明するための表を示している。
【０１０１】
図９において、まず、バルクロード制御手段２１が、入力装置１から与えられたＬＤＩＦファイルのファイル名を入力し、入力ファイル解析手段２２により、入力ファイル記憶部３１から該当するファイルを読み込んで、１エントリのデータを取得（ステップＡ１）する。
続いて、バルクロード制御手段２１が、エントリソート実行手段２６により、エントリソート記憶部３５にエントリの識別名とファイル上の行数を登録する（ステップＣ１）。
【０１０２】
次に、このエントリが入力ファイルの最終エントリかどうかを調べて（ステップＡ３）、最終エントリでないときは、ステップＡ１に戻る。
そして、ステップＡ３で、最終エントリまで処理が終了したら、エントリソート実行手段２６により、エントリソート記憶部３５に登録されたエントリを識別名でソートする（ステップＣ２）。
【０１０３】
エントリソート実行手段２６は、エントリソート記憶部３５内のソートされたエントリリストから１エントリを取得し（ステップＣ３）、そのエントリに実ＩＤを割り当て（ステップＢ１）、続いて、ハッシュテーブル管理手段２３は、ハッシュテーブル記憶部３２のハッシュテーブルに識別子と実ＩＤを登録し（ステップＢ９）、ハッシュテーブルから親エントリのＩＤを取得する（ステップＢ７）。
【０１０４】
そして、データページ生成手段２５は、エントリデータを生成し（ステップＡ６）、生成したエントリデータに親エントリを登録し（ステップＡ７）、親のエントリデータに子エントリとして登録する（ステップＣ４）。
【０１０５】
次に、このエントリがエントリリストの最終エントリかどうかを調べて（ステップＡ９）、最終エントリでなければステップＣ３に戻る。
ステップＡ９で、最終エントリまで処理が終了したら、処理終了を通知するために、出力装置５に表示する。
【０１０６】
このように、本実施形態におけるバルクロードシステムは、エントリソート実行手段が、エントリをエントリリストで管理し、すなわち、入力ファイルに記述された全エントリを識別名でソートし、ディレクトリ階層上で近傍にあるエントリをまとめて登録することができるので、登録済のエントリを再び読み込むといった無駄な処理を行う可能性を低減することができる。
また、エントリのデータ内容を予め取得するので、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを、検索性能が向上するようにクラスタリングして一括ロードできる。
【０１０７】
＜実施例＞
次に、具体的な実施例を用いて第二実施形態の動作を説明する。
第一実施形態の実施例と同様に、図４に示すＬＤＩＦファイルを入力ファイルとし、図５のようなディレクトリ階層が構築されるとする。
【０１０８】
このＬＤＩＦファイルを読み込んで、各エントリについて識別名とＬＤＩＦファイルでの行数をエントリリストに登録し、識別名でソートした結果のエントリリストは図１０のようになる。
【０１０９】
エントリソート実行手段２６は、ステップＣ１でエントリソート記憶部３５内にあるエントリリストにエントリを登録しておき、ステップＣ２で登録したエントリを識別名でソートする。
このソートを行うにあたり、エントリソート実行手段２６は、ディレクトリ階層を考慮して、エントリデータのクラスタリングを行うことができるので、ディレクトリサービスの検索性能を向上させることができる。
【０１１０】
また、ステップＣ３ではエントリリストから一つのエントリを取り出し、そのエントリの行数を利用して入力ファイル解析手段２２により入力ファイル記憶部３１から該当するエントリの情報を読み込むことができる。
【０１１１】
このようにすると、本実施例におけるバルクロードシステムは、入力ファイルに記述された全エントリを識別名でソートし、ディレクトリ階層上で近傍にあるエントリをまとめて登録することで、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを、検索性能が向上するようにクラスタリングして一括ロードできる。
【０１１２】
「バルクロード方法の第一実施形態」
また、本発明は、バルクロード方法としても有効であり、上述した図２，３に示す概略フローチャート図を用いて、第一実施形態にかかるバルクロード方法を説明する。
本実施形態におけるバルクロード方法は、データ処理装置が、バルクロード制御手段と、バルクロード制御手段により制御される、入力ファイル解析手段，ハッシュテーブル管理手段，参照リスト管理手段，及び，データページ生成手段を備えたバルクロード方法であって、入力ファイル解析手段が、入力ファイルを解析して、一つのエントリごとにデータを読み込む第一回目の読み込みを行い、すなわち、１エントリのデータを取得する（ステップＡ１）。
【０１１３】
次に、ハッシュテーブル管理手段が、エントリの識別情報をハッシュテーブルで管理し、参照リスト管理手段が、図３に示すように、入力ファイル解析手段が読み込んだエントリに対する実際の識別符号と異なる、入力ファイルのエントリに対する仮の識別符号を有する、仮の識別符号マップ及び／又は仮参照リストにもとづいて、参照リストを生成し、エントリの参照関係を参照リストで管理する。つまり、ハッシュテーブルを登録し、かつ、参照リストを登録する（ステップＡ２）。
【０１１４】
次に、バルクロード制御手段が、参照リストにもとづいて、更新リストを生成し（ステップＡ４）、続いて、入力ファイル解析手段が、前記入力ファイルのエントリのデータに対し、第二回目の読み込みを行い（ステップＡ５）、さらに、データページ生成手段が、参照リストから生成された更新リストにもとづいて、入力ファイルから再び読み込まれたエントリのデータを、データベースの格納形式で生成する（ステップＡ６，Ａ７及びＡ８）方法としてある。
つまり、データページ生成手段が、更新リストにもとづいて、第二回目の読み込みで取得した入力ファイルのエントリのデータを、データベースの格納形式で生成する。
【０１１５】
このように、本発明は、バルクロード方法の発明としても有効であり、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを一括ロードでき、エントリのデータを効率良くかつ高速でロードできる。
【０１１６】
「バルクロード方法の第二実施形態」
また、本発明は、バルクロード方法としても有効であり、上述した図９に示す概略フローチャート図を用いて、第二実施形態にかかるバルクロード方法を説明する。
本実施形態におけるバルクロード方法は、データ処理装置が、バルクロード制御手段と、バルクロード制御手段により制御される、入力ファイル解析手段，ハッシュテーブル管理手段，エントリソート実行手段，及び，データページ生成手段を備えたバルクロード方法であって、入力ファイル解析手段が、入力ファイルを解析して、一つのエントリごとにデータを読み込み、すなわち、１エントリのデータを取得する（ステップＡ１）。
【０１１７】
次に、ハッシュテーブル管理手段が、エントリの識別情報をハッシュテーブルで管理し、エントリソート実行手段が、エントリをエントリリストで管理する。つまり、各エントリの識別情報を、エントリリストに登録する（ステップＣ１）。
そして、各エントリを、エントリの識別情報によりソート処理する（ステップＣ２）。
【０１１８】
次に、データページ生成手段が、ソートしたエントリの順番で、入力ファイルから再び読み込まれたエントリのデータを、データベースの格納形式で生成する（ステップＣ３，Ａ６，Ａ７及びＣ４）方法としてある。
つまり、データページ生成手段が、エントリリストにもとづいて、第二回目の読み込みで取得した入力ファイルのエントリのデータを、データベースの格納形式で生成する。
【０１１９】
このように、本発明は、バルクロード方法の発明としても有効であり、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを、検索性能が向上するようにクラスタリングして一括ロードできる。
【０１２０】
「バルクロードプログラムの第一実施形態」
また、本発明は、バルクロードプログラムとしても有効であり、上述した図２に示す概略フローチャート図を用いて、第一実施形態にかかるバルクロードプログラムを説明する。
本実施形態におけるバルクロードプログラムは、コンピュータに、先ず、入力ファイルを解析して、一つのエントリごとにデータを読み込む第一回目の読み込みを行わせる処理（ステップＡ１）を実行させる。
【０１２１】
次に、バルクロードプログラムは、コンピュータに、エントリの識別情報をハッシュテーブルで管理する処理、及び、エントリの参照関係を参照リストで管理する処理、すなわち、入力ファイル解析手段が読み込んだエントリに対する実際の識別符号と異なる、入力ファイルのエントリに対する仮の識別符号を有する、仮の識別符号マップ及び／又は仮参照リストにもとづいて、参照リストを生成し、エントリの参照関係を参照リストで管理する処理（ステップＡ２）を実行させる。
続いて、参照リストにもとづいて、更新リストを生成する処理（ステップＡ４）を実行させ、さらに、入力ファイルのエントリのデータに対し、第二回目の読み込みを行わせる処理（ステップＡ５）を実行させる。
【０１２２】
続いて、バルクロードプログラムは、コンピュータに、参照リストから生成された更新リストにもとづいて、第二回目の読み込みで取得した入力ファイルのエントリのデータを、データベースの格納形式で生成する処理（ステップＡ６，Ａ７及びＡ８）を実行させる構成としてある。
【０１２３】
このように、本発明は、バルクロードプログラムの発明としても有効であり、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを一括ロードでき、エントリのデータを効率良くかつ高速でロードできる。
【０１２４】
「バルクロードプログラムの第二実施形態」
また、本発明は、バルクロードプログラムとしても有効であり、上述した図９に示す概略フローチャート図を用いて、第二実施形態にかかるバルクロード方法を説明する。
本実施形態におけるバルクロードプログラムは、コンピュータに、先ず、入力ファイルを解析して、一つのエントリごとにデータを読み込む処理（ステップＡ１）を実行させる。
【０１２５】
次に、バルクロードプログラムは、コンピュータに、エントリの識別情報をハッシュテーブルで管理し、エントリをエントリリストで管理する処理、つまり、各エントリの識別情報を、エントリリストに登録する処理（ステップＣ１）を実行させる。
次に、バルクロードプログラムは、コンピュータに、各エントリを、エントリの識別情報によりソート処理（ステップＣ２）を実行させる。
【０１２６】
次に、バルクロードプログラムは、コンピュータに、データページ生成手段が、ソートしたエントリの順番で、入力ファイルから再び読み込まれたエントリのデータを、データベースの格納形式で生成する処理（ステップＣ３，Ａ６，Ａ７及びＣ４）を実行させる。
つまり、データページ生成手段が、エントリリストにもとづいて、第二回目の読み込みで取得した入力ファイルのエントリのデータを、データベースの格納形式で生成する。
【０１２７】
このように、本発明は、バルクロードプログラムの発明としても有効であり、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを、検索性能が向上するようにクラスタリングして一括ロードできる。
【０１２８】
なお、このバルクロードプログラムは、コンピュータのＲＯＭに記憶される他、コンピュータ読み取り可能な記録媒体、例えば、外部記憶装置及び可搬記録媒体等に格納することができる。
ここで、外部記憶装置とは、磁気ディスク等の記録媒体を内蔵し、データ処理装置に外部接続される記憶増設装置をいう。一方、可搬記録媒体とは、記録媒体駆動装置（ドライブ装置）に装着でき、かつ、持ち運び可能な記録媒体であって、たとえば、ＣＤ−ＲＯＭ、フレキシブルディスク、メモリカード、光磁気ディスク等をいう。
【０１２９】
そして、記録媒体に記憶されたプログラムは、コンピュータのＲＡＭにロードされて、ＣＰＵにより実行される。この実行により、上述したバルクロードシステムの各機能が実現される。
さらに、コンピュータで制御プログラムをロードする場合、他のコンピュータで保有された制御プログラムを、通信回線を利用して自己の有するＲＡＭや外部記憶装置にダウンロードすることもできる。
【０１３０】
また、本発明のバルクロードシステムは、データ処理装置が、上記請求項１２又は請求項１３に記載のバルクロードプログラムを搭載した構成としてもよく、このバルクロードシステムについて、図面を参照して説明する。
【０１３１】
「バルクロードシステムの第三実施形態」
図１１は、本発明に係るバルクロードシステムの第三実施形態の基本構成を説明するための概略ブロック図を示している。
同図において、バルクロードシステムは、キーボード等の入力装置１，データ処理装置９，情報を記憶する記憶装置３，情報を格納して管理するデータベース管理装置４，記憶媒体８，及び，ディスプレイ装置や印刷装置等の出力装置５とで構成してある。
この記録媒体８は、磁気ディスク、半導体メモリ、その他の記録媒体であってもよい。
【０１３２】
バルクロードプログラムは、記録媒体８からデータ処理装置９に読み込まれ、データ処理装置９の動作を制御する。
データ処理装置９は、バルクロードプログラムの制御により以下の処理、すなわち第一実施形態および第二実施形態におけるデータ処理装置２および６により行われた処理と同一の処理を実行する構成としてある。
なお、その他の構成は、第一又は第二実施形態におけるバルクロードシステムと同様の構成としてある。
【０１３３】
上記構成のバルクロードシステムは、入力装置１からバルクロード要求が与えられると、要求内容で指定された入力ファイルを記憶装置３内の入力ファイル記憶部３１から読み込み、たとえば、上記請求項１０記載のバルクロードプログラムを搭載したときは、ハッシュテーブル記憶部３２に各エントリの識別名とＩＤを登録し、仮ＩＤマップ記憶部３３および参照リスト記憶部３４を利用して更新リストを生成し、全エントリのデータを生成してデータベース管理装置４内のデータページ記憶部４１にデータベースのページ形式で格納する。
そして、これらの処理の終了は出力装置５に表示される。
なお、上記請求項１１記載のバルクロードプログラムを搭載したときは、かかるプログラムの効果を発揮することは、勿論である。
【０１３４】
このように、本実施形態のバルクロードシステムは、バルクロードプログラムを搭載したバルクロードシステムとすることによって、入力ファイルが複数のエントリからなる場合であっても、複数のエントリのデータを一括ロードでき、又は、複数のエントリのデータを、検索性能が向上するようにクラスタリングして一括ロードできる。
【０１３５】
なお、本発明は、バルクロードシステムとして説明したが、バルクロード方法，バルクロードプログラム，及び，このバルクロードプログラムを搭載したバルクロードシステムとしても、同様の効果を発揮できることは勿論である。
【０１３６】
【発明の効果】
以上のように、本発明におけるバルクロードシステム，バルクロード方法及びバルクロードプログラムによれば、入力ファイルを２回読み込むことにして、１回目の読み込みでエントリの参照関係を解消し、２回目の読み込みでその参照関係を利用してデータを生成しているため、一つのエントリごとにデータをロードせず、入力ファイルから複数のエントリデータを一括してロードすることができる。
【０１３７】
また、本発明におけるバルクロードシステム，バルクロード方法及びバルクロードプログラムによれば、エントリの識別情報によりソートし、ソートした順番にエントリデータを生成しているため、ディレクトリの階層構造を考慮したエントリデータのクラスタリングを行うことができる。
【図面の簡単な説明】
【図１】図１は、本発明に係るバルクロードシステムの第一実施形態の基本構成を説明するための概略ブロック図を示している。
【図２】図２は、第一実施形態にかかるバルクロードシステムの動作を説明するための、概略フローチャート図を示している。
【図３】図３は、第一実施形態にかかるバルクロードシステムにおける、ハッシュテーブルおよび参照リストの登録の動作を説明するための、概略フローチャート図を示している。
【図４】図４は、ＬＤＩＦファイルの一例を示す、概略図を示している。
【図５】図５は、ＬＤＩＦファイルをロードした場合の、ディレクトリ階層を説明するための概略図を示している。
【図６】図６は、ＬＤＩＦファイルを読み込んで構築した、仮ＩＤマップ，仮参照リスト及び参照リストを説明するための表を示している。
【図７】図７は、本発明の更新リストを説明するための表を示している。
【図８】図８は、本発明に係るバルクロードシステムの第二実施形態の基本構成を説明するための概略ブロック図を示している。
【図９】図９は、第二実施形態にかかるバルクロードシステムの動作を説明するための、概略フローチャート図を示している。
【図１０】図１０は、ＬＤＩＦファイルを読み込んで構築した、エントリリストを説明するための表を示している。
【図１１】図１１は、本発明に係るバルクロードシステムの第三実施形態の基本構成を説明するための概略ブロック図を示している。
【符号の説明】
１入力装置
２，６，９データ処理装置
３，７記憶装置
４データベース管理装置
５出力装置
２１バルクロード制御手段
２２入力ファイル解析手段
２３ハッシュテーブル管理手段
２４参照リスト管理手段
２５データページ生成手段
２６エントリソート実行手段
３１入力ファイル記憶部
３２ハッシュテーブル記憶部
３３仮ＩＤマップ記憶部
３４参照リスト記憶部
３５エントリソート記憶部
４１データページ記憶部
８記録媒体
１００〜１１０ディレクトリ階層例のエントリ
２１０仮参照リスト
２２０参照リスト
２３０仮ＩＤマップ
３０１〜３１０１エントリのデータ[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a bulk loading system, a bulk loading method, and a bulk loading program, and more particularly, bulk loading that enables the identification information and attribute information of entries described in an input file in a directory service to be collectively loaded into a directory server. The present invention relates to a system, a bulk loading method, and a bulk loading program.
[0002]
[Prior art]
Conventionally, in directory services such as personal information management, network management, and file management, resource information such as people and computers is represented by entries, and each entry uses a plurality of accompanying information such as names as attribute information (attributes). keeping.
Each entry is generally classified into a hierarchical structure according to the organization or region to which the entry belongs, and is centrally managed using a directory server.
[0003]
As an example of a technique using such a directory server, for example, in Japanese Patent Application Laid-Open No. 2000-242538, a directory search system capable of improving the search speed and reducing the storage capacity by using the ancestor relationship of entries. Technology has been proposed.
When registering entry and attribute data in the storage device, the directory server used in this directory search system generates an index (an ancestor relationship table) that is used for speeding up the directory search. Register together.
[0004]
In other words, the directory server normally manages entry data, attribute data, and index data as directory data, and various indexing methods have been developed. Using this index data, directory search can be performed at high speed. It is configured as described above.
[0005]
In addition, the technologies related to the directory service include various technologies for the purpose of speeding up search processing and improving memory efficiency so that a large-scale directory can be handled more quickly and efficiently. Is disclosed.
[0006]
By the way, as a general directory data generation method, a utility is used to generate directory data for each entry, and all entries are described in a file according to an arbitrary format. There is a method of using a tool that reads and loads directory data in a lump. The latter can load directory data in a lump, and thus can deal with large-scale directory data quickly and efficiently.
[0007]
For example, in LDAP (Lightweight Directory Access Protocol), which is a standard directory service on the Internet, a file format for data exchange called LDIF (LDAPPDataInterchangeFormat) is defined in RFC2849. Many of them have the ability to read and write all files in a batch.
[0008]
In addition, in order to load LDIF files and load directory data all at once, two methods are usually adopted, one is a method of loading data according to the LDAP protocol, and the other is This is a method of loading data as a database application when a database is used for the back end.
However, in both methods, the data described in the input file is loaded into the directory server for each entry to generate the directory data.
[0009]
(First conventional example)
As the former method, for example, Japanese Patent Application Laid-Open No. 11-345234 has proposed a technique of a data processing apparatus that reads data collectively from a file according to a common format and performs arbitrary processing.
This data processing apparatus can register candidate employees in advance by using a file in a format called CSV (Comma Separated Value), and can read the data of these employees all at once.
[0010]
(Second conventional example)
Similarly, as the former method, for example, Japanese Patent Application Laid-Open No. 10-326285 proposes a document management system technique for collectively registering documents using the same CSV file.
This document management system includes a database system storing attributes and file names of all documents, a management data file for creating data required for document management in CSV format by reading data from the database system, and the created management A batch registration file in CSV format created by reading and editing data of a data file, a batch registration interface for reading a batch registration file and registering a document, an object-oriented database system in which the batch registration interface actually registers, It consists of an error log file that is output when an error occurs during registration processing.
[0011]
The batch registration process of the document management system having the above configuration operates as follows.
First, the first line of the batch registration file is read and analyzed, and the registration items are associated with the field order.
Subsequently, the second and subsequent lines of the batch registration file are read line by line, and a registration data array is generated based on the association obtained in the preprocessing.
If there is no error, registration is performed in the database using the generated registration data array.
If there is an error, the error content and error record are added to the log file in CSV format.
[0012]
(Third conventional example)
On the other hand, as the latter method, there is a bulk loading method as a method of loading data stored in a database all at once from a file.
As an example of applying the bulk loading method to an object-oriented database, BulkLoading into OODB published in 1994, Proceedings of VLD Conference, pages 120-131 (ProceedingsofVLDBConference, 1994, Pages 120-131). : JanetL., Entitled APerformanceStudy. There is a paper by Wiener et al.
[0013]
Also, as an example of further expanding the bulk loading method described in this paper, 1995, Proceedings of VLD BD Conference, p. 30-41 (ProceedingsofVLDBConference, 1995, Pages 30-41) Published in JanetL., Entitled OODBBulkLoadingRevised: The Partitioned-ListApproach. There is a paper by Wiener et al.
[0014]
The bulk loading method described in these papers is basically the following method.
Newly read data (objects) are prepared in advance as text files, and each object is assigned a unique number. A text file in which objects that have already been read are described as IDs for all objects. (Identification code) is assigned and managed by an ID map using this ID.
[0015]
In other words, each object that has already been read is managed in the ID map, and an object to be newly read (object to be referenced) is a ToDo list to which an ID map and a temporary ID are assigned (when there is a reverse reference, the InvToDo list) ).
[0016]
In addition, for an object to be newly read, an Update list including information that needs to be updated is generated using an ID map and a ToDo list (or InvTodo list).
Then, the text file of the newly read object is read again, and an object including the newly read object is generated while referring to the Update list.
[0017]
[Problems to be solved by the invention]
However, the above-described prior art is not limited to the case where the utility provided by the directory service is used (the former method) but also the case where the directory data is collectively loaded using the input file (the latter method). There is a problem that data must be loaded for each entry and the processing speed cannot be improved.
[0019]
  In addition, there is a problem that the directory data is not efficiently arranged so that the search performance of the directory service after the data is loaded is improved.
[0020]
The present invention has been made to solve the above problem, and is a bulk load system capable of clustering and collectively loading the directory data in the directory service at high speed and improving the search performance of the directory service. The object is to provide a bulk loading method and a bulk loading program.
[0021]
[Means for Solving the Problems]
  In order to achieve this object, the bulk load system according to claim 1 of the present invention comprises a data processing device controlled by a bulk load control means, an input file analysis means, and a hash table management means. , A reference list management means, and a data page generation means, wherein the input file analysis means analyzes the input file and generates data for each entry.Read the first time to read,The hash table management means manages the identification information of the entry in a hash table, and the reference list management meansA reference list is generated based on a temporary identification code map and / or a temporary reference list having a temporary identification code for the entry of the input file, which is different from the actual identification code for the entry read by the input file analyzing means. The reference relation of the entry is managed by the reference list, the bulk load control unit generates an update list based on the reference list, and the input file analysis unit converts the data of the entry of the input file On the other hand, the second reading is performed, and the data page generation unit generates the input file entry data acquired in the second reading in the database storage format based on the update list.As a configuration.
[0022]
  In this way, the reference list management means manages the reference relationship of the entries in the reference list, that is, acquires the reference relationship of the entries in advance, so that the entry data included in the input file can be loaded efficiently and at high speed. .Further, the entry reference relationship can be stored in the temporary identification code map and / or the temporary reference list every time one entry is read. Furthermore, even when the input file is composed of a plurality of entries, data of a plurality of entries can be loaded in a batch.
[0023]
  The invention according to claim 2 is the bulk load system according to claim 1, wherein the reference list includes an actual entry for the entry read by the input file analysis means.identificationA code, a parent identification code for the entry representing a superordinate concept, and a child identification code for the entry representing a subordinate concept.
[0024]
In this way, the entry reference relationship can be expressed by a parent-child relationship based on the entry data, and the reference list can be easily used.
[0025]
According to a third aspect of the present invention, in the bulk load system according to the second aspect, the reference list has the following identification code indicating a sibling relationship between the entries with respect to the entries at the peer level: As a configuration.
[0026]
In this way, the entry reference relationship can be represented by a sibling relationship based on the entry data, and the reference list can be easily used.
[0031]
  Claims4The invention described is the above claim.In any one of 1-3In the described bulk loading system, the input file analyzing means reads the first time and the second time of the data of all entries of the input file.
[0032]
In this way, even if the input file consists of a plurality of entries, the data of all the entries in the input file can be loaded in a batch.
[0033]
  Further, the claims of the present invention5In the described bulk load system, a data processing apparatus includes a bulk load control unit, an input file analysis unit, a hash table management unit, an entry sort execution unit, and a data page generation unit controlled by the bulk load control unit. The input file analysis unit analyzes the input file and reads data for each entry, and the hash table management unit manages the identification information of the entry in a hash table. The entry sort execution means manages the entries in an entry list, and the data page generation means generates the entry data in a database storage format.
[0034]
In this way, the entry sort execution means manages the entries in the entry list, that is, acquires the data contents of the entries in advance, so that the entry data included in the input file can be loaded efficiently and at high speed.
[0035]
  Claims6The invention described is the above claim.5In the described bulk loading system, the input file analyzing means reads the entry data of the input file for the first time, and the entry sort executing means performs a sorting process based on the entry list. The input file analysis means reads the input file entry data for the second time, and the hash table management means actually reads the entry read by the input file analysis means for the second time. ofidentificationA code is assigned, the identification information of the entry is registered in a hash table with the actual identification code, and a parent identification code is obtained for the entry representing a higher level concept. Based on the sorting process, the entry file entry data obtained by the second reading is generated in a database storage format.
[0036]
In this way, even if the input file consists of a plurality of entries, the data of the plurality of entries can be clustered and loaded together so that the search performance is improved.
[0037]
  Claims7The invention described is the above claim.6In the described bulk loading system, the input file analyzing means reads the first time and the second time of the data of all entries of the input file.
[0038]
In this way, even if the input file consists of a plurality of entries, the data of all the entries in the input file can be loaded in a batch.
[0039]
  Further, the claims of the present invention8In the described bulk loading method, the data processing apparatus includes a bulk load control unit, an input file analysis unit, a hash table management unit, a reference list management unit, and a data page generation unit controlled by the bulk load control unit. A bulk loading method comprising: the input file analyzing means analyzing the input file and generating data for each entry;Read the first time to read,The hash table management means manages identification information of the entry in a hash table, and the reference list management meansA reference list is generated based on a temporary identification code map and / or a temporary reference list having a temporary identification code for the entry of the input file, which is different from the actual identification code for the entry read by the input file analyzing means. The reference relation of the entry is managed by the reference list, the bulk load control unit generates an update list based on the reference list, and the input file analysis unit converts the data of the entry of the input file On the other hand, the second reading is performed, and the data page generation unit generates the input file entry data acquired in the second reading in the database storage format based on the update list.There is as a method.
[0040]
As described above, the present invention is also effective as an invention of a bulk loading method, and even when an input file is composed of a plurality of entries, data of a plurality of entries can be loaded at a time, and the data of the entries can be efficiently and It can be loaded at high speed.
[0041]
  Further, the claims of the present invention9In the described bulk loading method, the data processing apparatus includes a bulk load control unit, an input file analysis unit, a hash table management unit, an entry sort execution unit, and a data page generation unit controlled by the bulk load control unit. The input file analysis unit analyzes the input file and reads data for each entry, and the hash table management unit manages the identification information of the entry in a hash table. The entry sort execution means manages the entries in an entry list, further sorts the entries according to the identification information of the entries, and the data page generation means re-enters from the input file in the order of the sorted entries. Read the entry data into the database There a method for generating in storage format.
[0042]
As described above, the present invention is also effective as an invention of a bulk loading method, and even when an input file is composed of a plurality of entries, data of a plurality of entries is clustered so that search performance is improved. Bulk loading is possible.
[0043]
  Further, the claims of the present invention10The described bulk loading program parses the input file into a computer and generates data for each entry.Load the first time to loadProcessing, processing for managing the identification information of the entry in a hash table,A reference list is generated based on a temporary identification code map and / or a temporary reference list having a temporary identification code for the entry of the input file, which is different from the actual identification code for the entry read by the input file analyzing means. And manage the reference relationship of the entries in the reference list.Processing,A process for generating an update list based on the reference list, a process for performing a second reading on the entry data of the input file, and the reading obtained by a second reading based on the update list Generate input file entry data in database storage formatThe processing is executed.
[0044]
As described above, the present invention is also effective as an invention of a bulk loading program, and even when an input file is composed of a plurality of entries, data of a plurality of entries can be loaded at a time, and the data of the entries can be efficiently and It can be loaded at high speed.
[0045]
  Further, the claims of the present invention11The described bulk loading program is a computer that analyzes an input file and reads data for each entry, a process that manages identification information of the entry using a hash table, a management that manages the entry using an entry list, The process of sorting the entries according to the identification information of the entries, and the process of generating the data of the entries read again from the input file in the storage format of the database in the order of the sorted entries.
[0046]
As described above, the present invention is also effective as an invention of a bulk loading program, and even when an input file is composed of a plurality of entries, the data of a plurality of entries are clustered so that the search performance is improved. Bulk loading is possible.
[0047]
  Further, the claims of the present invention12The described bulk loading system includes an input device such as a keyboard, a data processing device, a storage device that stores information, a database management device that stores and manages information, a storage medium, and an output device such as a display device and a printing device. A bulk loading system comprising:10Or claim11The bulk loading program described in 1 is installed.
[0048]
In this way, by adopting a bulk loading system equipped with a bulk loading program, even if the input file consists of a plurality of entries, data of a plurality of entries can be loaded at once, or data of a plurality of entries can be loaded. , Clustering and batch loading to improve search performance.
[0049]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
First, a first embodiment of a bulk load system of the present invention will be described with reference to the drawings.
[0050]
"First embodiment of bulk loading system"
FIG. 1: has shown the schematic block diagram for demonstrating the basic composition of 1st embodiment of the bulk load system which concerns on this invention.
In the figure, a bulk load system includes an input device such as a keyboard 1, a data processing device that operates under program control, a storage device that stores information 3, a database management device 4 that stores and manages information, and a display device And an output device 5 such as a printing device.
[0051]
The storage device 3 includes an input file storage unit 31, a hash table storage unit 32, a temporary ID map storage unit 33, and a reference list storage unit 34.
[0052]
The input file storage unit 31 stores file data such as an LDIF file to be input.
The hash table storage unit 32 is a table used in the hash method, which is one of the data storage techniques, and stores a hash table storing arbitrary information using a character string or a number as a keyword.
[0053]
  A temporary ID storage unit 33 that stores a temporary identification code map (referred to as abbreviated as a temporary ID map 230, see FIG. 6) is provided with a temporary identification code (appropriately referred to as a temporary ID). (Abbreviated) and the actual identification code (abbreviated as real ID as appropriate)
  The temporary identification code is an identification code for an entry of the input file, and is an identification code different from the actual identification code set so as not to cause a collision between the actual identification code and the address. As shown in FIG. 6, when the real ID is 5 or 6, -2 and -3 are set as temporary IDs.
The temporary identification code map is a map composed of a temporary identification code and an actual identification code corresponding to the temporary identification code.
  As described above, by using the temporary ID in addition to the real ID, it is possible to appropriately use the hash method so as not to cause an address collision in the hash table.
[0054]
Furthermore, the reference list storage unit 34 stores a reference list 220 (see FIG. 6), that is, a list that represents the parent-child relationship of entries by using real IDs and temporary IDs assigned to the entries.
[0055]
The database management device 4 includes a data page storage unit 41 that holds information stored in the database in a page format.
In this way, by holding the information in the page format, when the information is output, it is not necessary to convert the information again according to the output format, and as a result, the processing speed can be increased.
[0056]
The data processing apparatus 2 includes a bulk load control unit 21, an input file analysis unit 22, a hash table management unit 23, a reference list management unit 24, and a data page generation unit 25 controlled by the bulk load control unit 21. It is as a configuration provided.
[0057]
When the bulk load control unit 21 receives an entry load request using the LDIF file from the user from the input device 1, the input file analysis unit 22, the hash table management unit 23, and the reference list management according to the request. The means 24 and the data page generation means 25 are controlled to perform bulk loading processing.
Further, when the bulk load process is completed, the bulk load control means 21 displays on the output device 4 that the bulk load process has been completed.
[0058]
Here, the input file analysis unit 22 reads and analyzes the LDIF file from the input file storage unit 31 in accordance with an entry load instruction from the bulk load control unit 21 and acquires information on entries included in the LDIF file.
[0059]
Similarly, the hash table management unit 23 creates a necessary hash table using the hash table storage unit 32 in accordance with an instruction from the bulk load control unit 21, calculates a hash value from the keyword value, and performs the above LDIF. Manages identification information of entries contained in files.
[0060]
Further, the reference list management means 24 creates a reference list 220 representing the parent-child relationship of the entries by using the temporary ID map storage section 33 and the reference list storage section 34, and in accordance with an instruction from the bulk load control means 21, Each entry with an ID or temporary ID is managed.
[0061]
Here, the reference list 220 corresponds to the real ID for the entry read by the input file analysis means 22, the parent identification code for the entry representing the superordinate concept (appropriately abbreviated as parent ID), and the entry representing the subordinate concept. It is preferable to have a configuration having a child identification code (abbreviated as a child ID as appropriate). In this way, the entry reference relationship is changed to a parent-child relationship based on the entry data (a combination of a superordinate concept and a subordinate concept). The reference list can be easily used.
[0062]
In addition, the reference list 220 may be configured to have a next identification code (appropriately abbreviated as “next ID” as appropriate) indicating the sibling relationship between the entries for the peer level entries. Can be represented by sibling relationships based on the entry data, and the reference list can be easily used.
[0063]
  In addition, the reference list 220 may be configured to be generated based on the temporary ID map 230 and / or the temporary reference list 210 having a temporary ID for the entry of the input file. Each time an entry is read, it can be stored in the temporary ID map 230 and / or the temporary reference list 210, and the operation of the system can be simplified.
  As shown in FIG. 6, the temporary reference list 210 is a list including a temporary ID, a parent ID, a child ID, and a next ID, and a reference for re-registering a parent-child relationship and the like from the temporary reference list to the reference list. It is a list that captures the list 220 (or can be said to be part of the reference list).
[0064]
Further, the data page generation unit 25 uses the reference list storage unit 34 in accordance with an instruction from the bulk load control unit 21 to configure the data of each entry in a database page format and store the data in the data page storage unit 41. .
[0065]
Next, the operation of the bulk load system configured as described above will be described with reference to the drawings.
FIG. 2: has shown the schematic flowchart figure for demonstrating operation | movement of the bulk load system concerning 1st embodiment.
[0066]
In FIG. 2, first, the bulk load control means 21 inputs the file name of the LDIF file input from the input device 1, and reads the corresponding file from the input file storage unit 31 by the input file analysis means 22. Data of one entry (referred to as one entry as appropriate) is acquired (step A1).
That is, the input file analyzing means 22 reads the data of the first entry from the input file.
[0067]
Subsequently, the hash table management unit 23 registers a hash table composed of the identification information and the real ID of the one entry in the hash table storage unit 32, and the reference list management unit 24 registers the temporary ID map storage unit 33. A reference list representing the reference relationship (parent-child relationship and sibling relationship) of the one entry is registered in the reference list storage unit 34 (step A2).
[0068]
Then, the bulk loading system in this embodiment checks whether this one entry is the last entry of the input file (step A3). If not, the process returns to step A1 and the same processing is performed for the next entry. Perform the process.
[0069]
  When the processing is completed up to the final entry in step A3, the reference list management unit 24 generates an update list (see FIG. 7) using the temporary ID map storage unit 33 and the reference list storage unit 34 (step 7). A4).
  The update list is a list including a real ID, a parent ID, a child ID, and a next ID. Based on the update list, the data page generation unit 25 stores the data of the entry of the input file acquired by the second reading. Generate in database storage format.
[0070]
Subsequently, the input file analyzing unit 22 reads the corresponding input file again from the input file storage unit 31 and acquires data of one entry (step A5). That is, the input file analysis means 22 reads the data of the second entry from the input file.
[0071]
Next, the data page generation unit 25 generates entry data from the acquired one entry (step A6), refers to the update list in the reference list storage unit 34, and sets the parent entry (entry having a parent relationship) in the entry data. Is registered (step A7), and a set of child entries (entries having child relations) is generated and registered in entry data (collection of entry data) (step A8).
That is, the data page generation unit 25 generates the input file entry data acquired in the second reading in the database storage format based on the update list.
[0072]
This entry data is configured in a page format, and is stored in the data page storage unit 41 when there is no free space on the page. That is, according to the bulk load system in the present embodiment, the entry data is collected by the data processing device 2. A collection of entry data can be loaded in a batch.
[0073]
Next, it is checked whether or not this entry is the last entry of the input file (step A9). If not, the process returns to step A5, and the same processing is performed for the next entry.
In step A9, when the process is completed up to the final entry, the entry data configured in the page format is stored in the data page storage unit 41 and displayed on the output device 5 to notify the end of the process.
[0074]
As described above, the reference list management unit 24 manages the reference relationship of the entries with the reference list 220, that is, acquires the reference relationship of the entries in advance, so that the data of the entries included in the input file can be efficiently and at high speed. Can load.
In addition, by using the update list, data of a plurality of entries can be loaded in a batch even when the input file includes a plurality of entries.
[0075]
Next, operations for registering the hash table and the reference list will be described with reference to the drawings.
FIG. 3 is a schematic flowchart for explaining the operation of registering the hash table and the reference list in the bulk load system according to the first embodiment.
In FIG. 3, the real ID is assigned to the entry (step B1), and it is checked whether or not the own entry (entry to which the real ID is assigned) is already registered in the hash table (step B2). Then, it is checked whether or not the parent entry (entry concept entry) is already registered in the hash table (step B6).
[0076]
Here, if not registered, a temporary ID is assigned to the parent entry (step B10), the parent-child relationship is registered in the temporary reference list (step B11), and the identification information and temporary ID are registered in the hash table (step B11). B12).
And it returns to step B6 again.
[0077]
Next, since the parent entry is registered with a temporary ID in step B6, the ID of the parent entry is acquired from the hash table (step B7), the parent-child relationship is registered in the reference list (step B8), and is identified in the hash table. Information and real ID are registered (step B9).
[0078]
If the parent entry has already been registered in step B6, the processes in steps B10 to B12 are omitted.
If the entry is already registered in step B2, the temporary ID and the real ID are registered in the temporary ID map (step B3), and the parent-child relationship is re-registered from the temporary reference list to the reference list (step B4). The identification information and real ID are re-registered in the hash table (step B5).
[0079]
As described above, the bulk load control unit 2 uses the temporary ID in addition to the real ID, and the hash table management unit 23 and the reference list management unit 24 further use the temporary ID map, the temporary reference list, the reference list, and the like. By using the update list, it is possible to appropriately use the hash method so that address collisions do not occur in the hash table.
[0080]
As described above, according to the first embodiment of the bulk load system, the input file entry is read twice in step A1 and step A5 as shown in FIG. As shown in FIG. 3, by acquiring the entry identification information and the reference relationship between entries, it is not necessary to refer to other entry data when generating entry data in step A6.
For this reason, it is not necessary to load data of entries one by one, that is, it is possible to realize a bulk load system capable of loading data of a plurality of entries at once.
[0081]
Further, the input file analyzing means 22 may be configured to read the first time and the second time of the data of all the entries of the input file, and in this case, the input file includes a plurality of entries. Even so, the data of all entries in the input file can be loaded at once.
[0082]
<Example>
Next, the operation of the first embodiment will be described using a specific example.
FIG. 4 is a schematic diagram showing an example of an LDIF file.
In the figure, the entry having the identifier “c = JP” is the root (highest level) of the directory, and this embodiment is an embodiment in which a lower entry is added under this entry.
[0083]
The “version: 1” in the first line represents the LDIF standard version.
The range delimited by blank lines from the next line is defined as “one entry data”, and this LDIF file contains ten “one entry data” (301 to 310).
In each “data of one entry”, a line beginning with “dn:” indicates identification information, and the other lines indicate attribute values (attribute data).
[0084]
When a directory is loaded by this LDIF file, a directory hierarchy as shown in FIG. 5 is constructed.
Here, the entry 100 is the root of the directory hierarchy and is registered in advance in the directory server. In contrast, the entries 101 to 110 are newly loaded entries.
[0085]
The names described in each of the entries 101 to 110 are relative distinguished names (RDN: Relative Distinguished Name). If the relative distinguished names of all the entries going back to the root are connected with a comma (,), they become the distinguished names of the entries. .
[0086]
When the LDIF file is read and the parent-child relationship between entries is acquired, for example, a temporary reference list 210, a reference list 220, and a temporary ID map 230 as shown in FIG. 6 are obtained.
[0087]
  In FIG. 6, the temporary ID map 230 indicates the real ID of the entry 100 that is the root.1When the real ID is assigned to the entry, it is increased from 1 in order, and when the temporary ID is assigned to the entry, it is decreased from -1.
  In this embodiment, -1, -2, and -3 are used as temporary IDs, and real IDs are used.1(301 in FIG. 4), 5 (305 in FIG. 4), and 6 (306 in FIG. 4).
[0088]
The temporary reference list 210 includes a temporary ID, a parent ID, a child ID, and a next ID. Except for a temporary integer that is a negative integer, IDs (1 to 10) that are positive integers are “data of one entry”. "(301 to 310 in FIG. 4).
[0089]
Further, the temporary reference list 210 and the reference list 220 are configured to have one child ID and one next ID with respect to the real ID, and in this way, even if there are two or more child IDs, Since sibling entries can be connected by ID, all child entries can be acquired by tracing the next ID in order from the child ID entry.
[0090]
The temporary ID map 230 is stored in the temporary ID map storage unit 33 in the format as shown in FIG. 6, and the temporary reference list 210 and the reference list 220 are similarly in the format as shown in FIG. It is stored in the storage unit 34.
[0091]
The reference list management means 24 uses the information stored in the temporary ID map storage unit 33 and the reference list storage unit 34 to generate the update list shown in FIG. 7 in step A4.
This update list can be generated by simply changing the temporary ID to the real ID using the temporary ID map 230 based on the reference list 220 of FIG.
[0092]
Next, how to read the update list will be described.
The real ID 1 (“1 entry data” 301 in FIG. 4) has no parent ID (root entry) and the child ID is 5 (“1 entry data” 305 in FIG. 4).
Further, the real ID 5 (“1 entry data” 305 in FIG. 4) displayed as 5 in the child ID has a parent ID of 1 (“1 entry data” 301 in FIG. 4) and a child ID of 2 (FIG. 4). 4 (“1 entry data” 302) and the next ID is 6 (“1 entry data” 306 in FIG. 4).
[0093]
Further, the real ID 6 displayed as 6 in the next ID has a parent ID of 1, a child ID of 4 (“1 entry data” 304 in FIG. 4), and a next ID of 7 (“1 entry of FIG. 4”). Data "307).
Further, the real ID 7 displayed as 7 in the next ID has a parent ID of 1, a child ID of 9 (“1 entry data” 309 in FIG. 4), and no next ID.
[0094]
The real ID 2 displayed as 2 in the child ID has a parent ID of 5, no child ID, and a next ID of 3 (“1 entry data” 303 in FIG. 4).
Further, the real ID 3 displayed as 3 in the next ID has a parent ID of 5 and has no child ID and next ID.
[0095]
The update list also represents the parent-child relationship in the same way for the other real IDs 4, 8, 9, 10 and easily and simply represents the parent ID, child ID, and next ID for any real ID. be able to.
[0096]
The data page generation unit 25 refers to the update list generated in the reference list storage unit 34, registers the parent entry in step A7, generates a child entry set in step A8, and registers it in the entry data. To do.
[0097]
As described above, according to the bulk load system of the present invention, a set of child entries is generated by referring to the update list and registered in the entry data. Therefore, a plurality of entries included in the input file are loaded in a batch. And the reading process can be speeded up.
[0098]
"Second embodiment of bulk loading system"
Next, a second embodiment of the bulk load system of the present invention will be described in detail with reference to the drawings.
FIG. 8: has shown the schematic block diagram for demonstrating the basic composition of 2nd embodiment of the bulk load system which concerns on this invention.
In the figure, in the bulk load system of the second embodiment of the present invention, the data processing device 6 includes a bulk load control means 21, an input file analysis means 22, a hash table management means 23, an entry sort execution means 26, The data page generation means 25 is provided, and the storage device 7 includes an input file storage unit 31, a hash table storage unit 32, and an entry sort storage unit 35.
[0099]
As shown in FIG. 10, the entry sort storage unit 35 stores the identification name of each entry and the number of lines in the LDIF file as an entry list.
Further, the entry sort execution unit 26 registers the identification name and the number of rows of each entry in the entry list of the entry sort storage unit 35 in accordance with an instruction from the bulk load control unit 21, and uses the identification name as a key. Sort.
Other configurations are the same as those of the bulk load system in the first embodiment.
[0100]
Next, the operation of the bulk load system configured as described above will be described with reference to the drawings.
FIG. 9: has shown the schematic flowchart figure for demonstrating operation | movement of the bulk load system concerning 2nd embodiment.
FIG. 10 shows a table for explaining an entry list constructed by reading an LDIF file.
[0101]
In FIG. 9, first, the bulk load control means 21 inputs the file name of the LDIF file given from the input device 1, the corresponding file is read from the input file storage unit 31 by the input file analysis means 22, and 1 The entry data is acquired (step A1).
Subsequently, the bulk load control means 21 registers the entry identification name and the number of lines on the file in the entry sort storage section 35 by the entry sort execution means 26 (step C1).
[0102]
Next, it is checked whether or not this entry is the last entry of the input file (step A3). If it is not the last entry, the process returns to step A1.
When the processing is completed up to the last entry in step A3, the entry sort execution means 26 sorts the entries registered in the entry sort storage unit 35 by the identification name (step C2).
[0103]
The entry sort execution unit 26 acquires one entry from the sorted entry list in the entry sort storage unit 35 (step C3), assigns a real ID to the entry (step B1), and subsequently the hash table management unit 23. Registers the identifier and real ID in the hash table of the hash table storage unit 32 (step B9), and acquires the ID of the parent entry from the hash table (step B7).
[0104]
The data page generation means 25 generates entry data (step A6), registers a parent entry in the generated entry data (step A7), and registers it as a child entry in the parent entry data (step C4).
[0105]
Next, it is checked whether or not this entry is the last entry in the entry list (step A9). If it is not the last entry, the process returns to step C3.
In step A9, when the process is completed up to the final entry, it is displayed on the output device 5 to notify the end of the process.
[0106]
As described above, in the bulk load system according to the present embodiment, the entry sort execution unit manages the entries in the entry list, that is, sorts all the entries described in the input file by the identification name, and closes them on the directory hierarchy. Since certain entries can be registered together, it is possible to reduce the possibility of performing unnecessary processing such as rereading registered entries.
In addition, since the data contents of the entries are acquired in advance, even if the input file is composed of a plurality of entries, the data of the plurality of entries can be clustered and loaded together so as to improve the search performance.
[0107]
<Example>
Next, the operation of the second embodiment will be described using a specific example.
As in the example of the first embodiment, it is assumed that the LDIF file shown in FIG. 4 is an input file and a directory hierarchy as shown in FIG. 5 is constructed.
[0108]
The LDIF file is read, the identification name and the number of lines in the LDIF file are registered in the entry list for each entry, and the entry list as a result of sorting by the identification name is as shown in FIG.
[0109]
The entry sort execution means 26 registers entries in the entry list in the entry sort storage unit 35 in step C1, and sorts the entries registered in step C2 by identification name.
In performing this sort, the entry sort execution means 26 can cluster the entry data in consideration of the directory hierarchy, so that the search performance of the directory service can be improved.
[0110]
In step C3, one entry is extracted from the entry list, and the information of the corresponding entry can be read from the input file storage unit 31 by the input file analysis means 22 using the number of lines of the entry.
[0111]
In this way, the bulk loading system in the present embodiment sorts all entries described in the input file by the identification name, and registers the entries in the vicinity on the directory hierarchy to register a plurality of input files. Even in the case of entries, the data of a plurality of entries can be clustered and loaded together so as to improve the search performance.
[0112]
“First Embodiment of Bulk Loading Method”
  The present invention is also effective as a bulk loading method, and the above-described FIG., 3The bulk loading method according to the first embodiment will be described using the schematic flowchart shown in FIG.
  The bulk loading method in the present embodiment includes a data processing device controlled by a bulk loading control unit, a bulk loading control unit, an input file analysis unit, a hash table management unit, a reference list management unit, and a data page generation unit. The input file analysis means analyzes the input file and outputs data for each entry.Read the first time to read,That is, data of one entry is acquired (step A1).
[0113]
  Next, the hash table management means manages the identification information of the entry in the hash table, and the reference list management meansAs shown in FIG. 3, on the basis of a temporary identification code map and / or a temporary reference list having a temporary identification code for an entry of the input file, which is different from the actual identification code for the entry read by the input file analysis means. Generate a reference list and manage the reference relationship of entries in the reference listTo do. That is, a hash table is registered and a reference list is registered (step A2).
[0114]
  next,The bulk load control means generates an update list based on the reference list (step A4), and then the input file analysis means reads the data of the entry of the input file for the second time (step A4). A5), andThere is a method in which the data page generation means generates the data of the entry read again from the input file in the database storage format based on the update list generated from the reference list (steps A6, A7 and A8).
  That is, the data page generation unit generates the data of the input file entry acquired in the second reading in the database storage format based on the update list.
[0115]
As described above, the present invention is also effective as an invention of a bulk loading method, and even when an input file is composed of a plurality of entries, data of a plurality of entries can be loaded at a time, and the data of the entries can be efficiently and It can be loaded at high speed.
[0116]
"Second embodiment of bulk loading method"
The present invention is also effective as a bulk loading method, and the bulk loading method according to the second embodiment will be described using the schematic flowchart shown in FIG. 9 described above.
The bulk loading method according to the present embodiment includes a data processing device controlled by a bulk load control unit, a bulk load control unit, an input file analysis unit, a hash table management unit, an entry sort execution unit, and a data page generation unit. The input file analysis means analyzes the input file and reads data for each entry, that is, acquires data of one entry (step A1).
[0117]
Next, the hash table management means manages entry identification information using a hash table, and the entry sort execution means manages entries using an entry list. That is, the identification information of each entry is registered in the entry list (step C1).
Then, each entry is sorted by the entry identification information (step C2).
[0118]
Next, there is a method in which the data page generation means generates the data of the entry read again from the input file in the storage format of the database in the sorted entry order (steps C3, A6, A7 and C4).
That is, the data page generation unit generates the data of the entry of the input file acquired by the second reading in the database storage format based on the entry list.
[0119]
As described above, the present invention is also effective as an invention of a bulk loading method, and even when an input file is composed of a plurality of entries, data of a plurality of entries is clustered so that search performance is improved. Bulk loading is possible.
[0120]
"First embodiment of bulk loading program"
  The present invention is also effective as a bulk load program, and the bulk load program according to the first embodiment will be described with reference to the schematic flowchart shown in FIG.
  In this embodiment, the bulk loading program first analyzes the input file into the computer and stores the data for each entry.Load the first time to loadThe process (step A1) is executed.
[0121]
  Next, the bulk load program causes the computer to manage entry identification information using a hash table, and to manage entry reference relationships using a reference list.A reference list is generated based on a temporary identification code map and / or a temporary reference list having a temporary identification code for an entry of the input file, which is different from the actual identification code for the entry read by the input file analysis means, For managing reference relationships in a reference list(Step A2) is executed.
Subsequently, based on the reference list, a process for generating an update list (step A4) is executed, and further, a process for performing a second reading on the entry data of the input file (step A5) is executed. .
[0122]
  Subsequently, the bulk loading program causes the computer to execute an update list generated from the reference list.Generate the input file entry data obtained in the second reading in the database storage formatThe process (steps A6, A7 and A8) is executed.
[0123]
As described above, the present invention is also effective as an invention of a bulk loading program, and even when an input file is composed of a plurality of entries, data of a plurality of entries can be loaded at a time, and the data of the entries can be efficiently and It can be loaded at high speed.
[0124]
"Second embodiment of bulk loading program"
The present invention is also effective as a bulk load program, and the bulk load method according to the second embodiment will be described using the schematic flowchart shown in FIG. 9 described above.
The bulk load program according to the present embodiment first causes the computer to analyze the input file and execute a process of reading data for each entry (step A1).
[0125]
Next, the bulk load program manages the entry identification information in the computer using a hash table and manages the entry in the entry list, that is, the process of registering the identification information of each entry in the entry list (step C1). Is executed.
Next, the bulk load program causes the computer to execute a sort process (step C2) for each entry based on the identification information of the entry.
[0126]
Next, the bulk load program causes the data page generation means to generate, in the storage order of the database, the data of the entries read again from the input file in the order of the sorted entries in the computer (steps C3, A6, and C6). A7 and C4) are executed.
That is, the data page generation unit generates the data of the entry of the input file acquired by the second reading in the database storage format based on the entry list.
[0127]
As described above, the present invention is also effective as an invention of a bulk loading program, and even when an input file is composed of a plurality of entries, the data of a plurality of entries are clustered so that the search performance is improved. Bulk loading is possible.
[0128]
The bulk load program can be stored in a computer-readable recording medium such as an external storage device and a portable recording medium in addition to being stored in the ROM of the computer.
Here, the external storage device refers to a storage expansion device that incorporates a recording medium such as a magnetic disk and is externally connected to the data processing device. On the other hand, the portable recording medium is a recording medium that can be mounted on a recording medium driving device (drive device) and can be carried, for example, a CD-ROM, a flexible disk, a memory card, a magneto-optical disk, and the like. .
[0129]
The program stored in the recording medium is loaded into the RAM of the computer and executed by the CPU. By this execution, each function of the above-described bulk load system is realized.
Furthermore, when a control program is loaded by a computer, the control program held by another computer can be downloaded to its own RAM or external storage device using a communication line.
[0130]
Further, the bulk load system of the present invention may be configured such that the data processing apparatus is mounted with the bulk load program according to claim 12 or claim 13, and the bulk load system will be described with reference to the drawings. .
[0131]
“Third embodiment of bulk loading system”
FIG. 11 is a schematic block diagram for explaining the basic configuration of the third embodiment of the bulk load system according to the present invention.
In the figure, the bulk loading system includes an input device 1 such as a keyboard, a data processing device 9, a storage device 3 for storing information, a database management device 4 for storing and managing information, a storage medium 8, and a display device. An output device 5 such as a printing device is used.
The recording medium 8 may be a magnetic disk, a semiconductor memory, or other recording medium.
[0132]
The bulk load program is read from the recording medium 8 into the data processing device 9 and controls the operation of the data processing device 9.
The data processing device 9 is configured to execute the following processing under the control of the bulk load program, that is, the same processing as that performed by the data processing devices 2 and 6 in the first and second embodiments.
Other configurations are the same as those of the bulk load system in the first or second embodiment.
[0133]
  When a bulk load request is given from the input device 1, the bulk load system configured as described above reads the input file specified by the request content from the input file storage unit 31 in the storage device 3.10When the described bulk loading program is installed, the identification name and ID of each entry are registered in the hash table storage unit 32, an update list is generated using the temporary ID map storage unit 33 and the reference list storage unit 34, Data of all entries is generated and stored in the data page storage unit 41 in the database management apparatus 4 in the database page format.
  The end of these processes is displayed on the output device 5.
  The above claims11Of course, when the described bulk load program is installed, the effect of such a program is exhibited.
[0134]
As described above, the bulk load system of the present embodiment is a bulk load system equipped with a bulk load program, so that even when an input file is composed of a plurality of entries, data of a plurality of entries can be loaded at once. Alternatively, data of a plurality of entries can be clustered and loaded in a batch so that search performance is improved.
[0135]
Although the present invention has been described as a bulk load system, it is needless to say that the same effect can be achieved by a bulk load method, a bulk load program, and a bulk load system equipped with this bulk load program.
[0136]
【The invention's effect】
As described above, according to the bulk loading system, the bulk loading method, and the bulk loading program of the present invention, the input file is read twice, the entry reference relationship is canceled by the first reading, and the second reading is performed. Since the data is generated by using the reference relationship, a plurality of entry data can be loaded at a time from the input file without loading the data for each entry.
[0137]
Further, according to the bulk load system, bulk load method, and bulk load program of the present invention, the entry data is sorted according to the identification information of the entries and the entry data is generated in the sorted order. Can be clustered.
[Brief description of the drawings]
FIG. 1 is a schematic block diagram for explaining a basic configuration of a first embodiment of a bulk load system according to the present invention.
FIG. 2 is a schematic flowchart for explaining the operation of the bulk load system according to the first embodiment.
FIG. 3 is a schematic flowchart for explaining an operation of registering a hash table and a reference list in the bulk load system according to the first embodiment.
FIG. 4 is a schematic diagram showing an example of an LDIF file.
FIG. 5 is a schematic diagram for explaining a directory hierarchy when an LDIF file is loaded.
FIG. 6 shows a table for explaining a temporary ID map, a temporary reference list, and a reference list constructed by reading an LDIF file.
FIG. 7 shows a table for explaining an update list of the present invention.
FIG. 8 is a schematic block diagram for explaining a basic configuration of the second embodiment of the bulk load system according to the present invention.
FIG. 9 is a schematic flowchart for explaining the operation of the bulk load system according to the second embodiment.
FIG. 10 is a table for explaining an entry list constructed by reading an LDIF file.
FIG. 11 is a schematic block diagram for explaining a basic configuration of a third embodiment of a bulk load system according to the present invention.
[Explanation of symbols]
1 Input device
2,6,9 data processing device
3,7 storage device
4 Database management device
5 Output device
21 Bulk load control means
22 Input file analysis means
23 Hash table management means
24 Reference list management means
25 Data page generation means
26 Entry sort execution means
31 Input file storage
32 Hash table storage
33 Temporary ID map storage
34 Reference list storage
35 entry sort storage
41 Data page storage
8 Recording media
100-110 Directory hierarchy example entries
210 Temporary reference list
220 Reference list
230 Temporary ID map
301-310 1 entry data

Claims

A data processing apparatus is a bulk load system including a bulk load control unit and an input file analysis unit, a hash table management unit, a reference list management unit, and a data page generation unit controlled by the bulk load control unit. There,
The input file analyzing means analyzes the input file and performs a first reading to read data for each entry ,
The hash table management means manages the identification information of the entry in a hash table;
In the temporary identification code map and / or the temporary reference list, the reference list management means has a temporary identification code for the entry of the input file different from the actual identification code for the entry read by the input file analysis means. Based on this, a reference list is generated, and the reference relationship of the entry is managed in the reference list.
The bulk load control means generates an update list based on the reference list,
The input file analyzing means performs a second reading on the input file entry data,
The bulk load system, wherein the data page generation unit generates data of the entry of the input file acquired by the second reading in a database storage format based on the update list .

The reference list is
An actual identification code for the entry read by the input file analysis means;
A parent identification code for the entry representing a superordinate concept;
The bulk load system according to claim 1, further comprising: a child identification code for the entry representing a subordinate concept.

The reference list is
3. The bulk load system according to claim 2, further comprising: a next identification code indicating a sibling relationship between the entries for the entry at the peer level.

The bulk loading according to any one of claims 1 to 3, wherein the input file analyzing means performs first and second reading on data of all entries of the input file. system.

A data processing apparatus is a bulk load system including a bulk load control unit, an input file analysis unit, a hash table management unit, an entry sort execution unit, and a data page generation unit controlled by the bulk load control unit. And
The input file analyzing means analyzes the input file, reads data for each entry,
The hash table management means manages the identification information of the entry in a hash table;
The entry sort execution means manages the entries in an entry list;
The bulk load system, wherein the data page generation means generates data of the entry in a database storage format.

The input file analyzing means performs a first reading on the data of the input file entry,
The entry sort execution means performs a sort process based on the entry list,
The input file analyzing means performs a second reading on the input file entry data,
The hash table management unit assigns an actual identification code to the entry read by the input file analysis unit a second time, and registers the identification information of the entry in the hash table with the actual identification code, Further, a parent identification code is obtained for the entry representing the superordinate concept,
The bulk load according to claim 5, wherein the data page generation unit generates data of the entry of the input file acquired by the second reading in a database storage format based on the sorting process. system.

The bulk load system according to claim 6, wherein the input file analysis unit performs first and second readings on data of all entries of the input file.

A data processing apparatus is a bulk load method comprising a bulk load control means and an input file analysis means, a hash table management means, a reference list management means, and a data page generation means controlled by the bulk load control means. And
The input file analyzing means analyzes the input file and performs a first reading to read data for each entry ,
The hash table management means manages the identification information of the entry in a hash table;
In the temporary identification code map and / or the temporary reference list, the reference list management means has a temporary identification code for the entry of the input file different from the actual identification code for the entry read by the input file analysis means. Based on this, a reference list is generated, and the reference relationship of the entry is managed in the reference list.
The bulk load control means generates an update list based on the reference list,
The input file analyzing means performs a second reading on the input file entry data,
The bulk loading method, wherein the data page generation unit generates the data of the entry of the input file acquired by the second reading in a database storage format based on the update list .

A data processing apparatus is a bulk load method comprising a bulk load control means, an input file analysis means, a hash table management means, an entry sort execution means, and a data page generation means controlled by the bulk load control means. And
The input file analyzing means analyzes the input file, reads data for each entry,
The hash table management means manages the identification information of the entry in a hash table;
The entry sort execution means manages the entries in an entry list, and further sorts the entries according to the identification information of the entries,
The bulk loading method, wherein the data page generation unit generates data of an entry read again from an input file in the order of the sorted entries in a database storage format.

Processing that causes the computer to analyze the input file and read the data for each entry for the first time ,
A process of managing identification information of the entry in a hash table;
A reference list is generated based on a temporary identification code map and / or a temporary reference list having a temporary identification code for the entry of the input file, which is different from the actual identification code for the entry read by the input file analyzing means. A process of managing the reference relationship of the entry in the reference list ;
A process for generating an update list based on the reference list;
A process for performing a second reading on the entry data of the input file;
A process of generating entry data of the input file obtained in the second reading based on the update list in a database storage format ;
A bulk load program to execute.

Processing to analyze the input file to the computer and read the data for each entry,
A process of managing identification information of the entry in a hash table;
Managing the entries in an entry list, and further sorting the entries according to the identification information of the entries;
Processing for generating the data of the entries read again from the input file in the storage format of the database in the order of the sorted entries;
A bulk load program to execute.

A bulk loading system comprising an input device such as a keyboard, a data processing device, a storage device for storing information, a database management device for storing and managing information, a storage medium, and an output device such as a display device and a printing device. And
A bulk loading system comprising the bulk loading program according to claim 10 or 11 .