JP3653333B2

JP3653333B2 - Database management method and system

Info

Publication number: JP3653333B2
Application number: JP11731196A
Authority: JP
Inventors: 憲宏原; 信男河村; 健一北村
Original assignee: Hitachi Software Engineering Co Ltd; Hitachi Ltd
Current assignee: Hitachi Software Engineering Co Ltd; Hitachi Ltd
Priority date: 1996-05-13
Filing date: 1996-05-13
Publication date: 2005-05-25
Anticipated expiration: 2016-05-13
Also published as: JPH09305622A

Description

【０００１】
【発明の属する技術分野】
本発明は、文書検索機能を有するデータベース管理方法およびデータベース管理システムに関する。
【０００２】
【従来の技術】
文書情報の蓄積、再利用の重要性に伴い、売り上げデータ等と同じように文書情報そのものをテーブル形式のデータベースとして格納管理し、任意の文字列を入力して種々の条件で文書検索を行うデータベース管理システムが望まれている。このため、データベース内の論理的構造がテーブルの、行（ロー）、列（カラム）から構成されるリレーショナルデータベースにおいて、列に含まれるデータ型として文書格納のための文書型を対応させ、その文書型列に対する検索処理という形態で上記要求に応えるシステムが提供されている。それらデータベース管理システムでは、行の格納形態であるレコード内のカラム対応部分に文書格納領域へのポインタを格納することによって、あたかも文書実体が行内に存在しているかのように見せている。
【０００３】
一方、一般の文書検索システムでは、大量の文書を高速に検索するために、前処理で索引を作成する。文字列を入力して検索条件を満たす文書を確定するために、文書本体をすべてアクセスするのでは検索効率が悪いからである。文書に含まれる文字列を文書から切り出して、その文字列をキーとして索引を構成し、検索時にその索引を用いることにより、文書本体へのアクセス無しに効率的に検索条件を満たす文書を確定する。その場合、文書からの文字列の切り出し方、すなわちキーの構成方法が索引の容量に大きく影響する。切り出し文字列として一文字を用いると、索引キーから文書へのポインタ付けが漏れなく行われ、検索条件判定の際に文書本体をアクセスする必要がなくなる。しかし、索引容量が莫大になり、結局検索効率の向上は十分には得られない。また、切り出し文字列長を増やしていくと、文書へのポインタ付けが荒くなり、それを補うため文書本体をアクセスしなければならなくなってしまう。検索効率を向上させるためには、文書へのポインタ付けの漏れがないように索引を作成し、かつその容量を減らすことが重要となる。
【０００４】
通常のリレーショナルデータベース管理システムでは、各行の列値をキーとする索引(インデクス)を用いて検索効率の向上を図っている。索引の構造としてＢ木構造を持つＢ木インデクスがよく用いられる。インデクスキーから行（ロー）への関連付けは、行（ロー）を一意に識別できる「行（ロー）識別子」によって行われる。すなわち、インデクスはキーと行（ロー）識別子から成るインデクスエントリによって構成される。この「行（ロー）識別子」は、インデクスエントリの構成要素であるとともに検索結果集合に対する集合演算処理のような、リレーショナルデータベースに対するデータ処理の対象データである行の識別子として用いられる。「行（ロー）識別子」を用いて行本体への高速アクセスが可能なように、「行（ロー）識別子」は、データベースを格納するファイルのアクセス単位であるページのページ番号とページ内の行格納位置情報で構成されることが多い。
【０００５】
【発明が解決しようとする課題】
従来、文書検索機能付きリレーショナルデータベース管理システムでは、文書索引の構成要素である文書へのポインタとして、Ｂ木インデクスと同様に、関連文書が属する行の「行（ロー）識別子」を用いていた。そのため、文書索引から検索条件を満たす「行（ロー）識別子」集合を取得し、これと他カラムに対する検索条件からの結果集合との集合演算等が可能となる。しかし、その場合、文書管理システムの文書索引内で管理される文書へのポインタ、または文書識別手段に比べ、一般にファイル全体の中の格納位置を示す「行（ロー）識別子」のサイズの方が大きいため、文書索引の容量が莫大になり、結果的に検索効率の低下を招いてしまうという問題があった。
【０００６】
本発明は、これらの問題を解決するため、格納文書を一意に認識でき、行識別子よりサイズの小さい「文書番号」を文書索引に用いることにより、文書索引の容量を削減すると共に文書検索条件を伴う問合せ要求に対する検索効率の高い文書検索機能付きリレーショナルデータベース管理方法を提供することを目的としている。
【０００７】
【課題を解決するための手段】
上記目的を達成するために、本発明におけるデータベース管理システムは、以下の構成を有する。
【０００８】
データベースの文書に文書オブジェクトを登録する際に、当該文書オブジェクトを一意に識別して、文書索引に用いられる文書番号の割当てを行う手段を有する文書番号管理部と、
検索要求の際に、文書に対応して作成された文書索引に基づいて、文書検索条件に合致した文書オブジェクトの文書番号を集合の形で取得する手段と、文書登録の際に書索引を管理する手段とを有する文書索引管理部と、
文書番号管理部によって格納文書中の各文書オブジェクトに割り当てられた文書番号と、文書オブジェクトに関連付けられているデータ中の各データオブジェクトを一意に識別するためのデータ識別子とを関連付ける変換テーブルを設け、検索操作の際に文書索引管理部により取得された文書番号集合を関連付けられているデータ識別子に変換する手段とを有する変換テーブル管理部と、を備える。
【０００９】
【発明の実施の形態】
以下、図を用いて本発明の実施の一形態を詳細に説明する。本発明は、図１４に示す計算機システムで実施される。図４の計算機システムは、中央処理装置（ＣＰＵ）100、入出力端末（ＶＤＴ）200、及びディスク装置300からなり、ディスク装置には、後述するデ−タベ−ス（ＤＢ）4、及び本発明による処理手順を実行するプログラム500が格納されている。プログラム500は、ＣＰＵの主記憶に読み込まれて実行される。
【００１０】
まず、図１３の本発明の概念図を説明する。本発明のデータベース管理システムでは、データの属性値に対する条件式およびキーワードを伴う問合せ元からの検索要求１を受け付けた際、キーワードを元に文書145に対応して作成された文書索引142を参照し、そのキーワードを含む文書オブジェクトの文書番号(文書NO.)(群)を取得する。そして、文書NO.に対応する変換テーブル141のエントリ(文書NO.から算出される格納位置)内に記憶してあるレコード識別子51(レコードID)(群)を取得する。レコ−ド識別子51は、デ−タ144におけるデ−タレコ−ド23の格納位置を示す情報であり、デ−タ内のペ−ジの識別子とペ−ジ内の格納位置を格納したスロットの番号とで構成される。取得したレコード識別子51を持つ変換テーブル141の行には、検索要求1のキーワードを含む文書オブジェクトがデ−タ144を介して関連付けられている。
【００１１】
また、検索要求１の条件式に含まれるデータの属性値に対応して作成された索引143を用いて、条件式に合致した行のレコードID(群)を取得する。ここで、変換テーブル141を参照して得られたレコードIDの集合を、先の索引143から得られたレコードIDを用いて絞り込む。
【００１２】
ここで、絞り込まれた結果に含まれるレコードIDからページ20内に格納されているデータレコード23(テーブルの行の格納形態)をアクセスし、検索結果として文書オブジェクトへのポインタなどを出力する(5)。
【００１３】
次に図１には、本発明のデータベース管理システムの構成図が示してある。図１に示すように、本発明のデータベース管理システムは、問合せ元からの問合せ要求1を受付けて解析し、問い合わせ要求に応じてデータベース４の検索処理および更新処理を行うデータベース処理部3から構成される。問合せ要求・結果処理部2は、利用者からの問合せ要求1を受付けて解析し(121)、問合せ要求に対応したデータ処理の実行をデータベース処理部3に要求し(122)、データベース処理部3から問い合わせ結果を処理して(123)、問合せ元に問合せ結果5を出力する。
【００１４】
データベース処理部３は、問合せ要求・結果処理部２からの要求に応じて、データベース４を検索あるいは更新し、その結果を問合せ要求・結果処理部２に返す。データベース４の検索あるいは更新処理を担当するのが、文書索引管理部131、索引管理部133、データ管理部134、変換テーブル管理部132、そして文書番号管理部135である。
【００１５】
ここで、図１における矢印は、検索要求の際の処理の主な流れを示す。文書索引管理部131および索引管理部133を用いることにより効率よく検索結果を絞り込む。要求によっては、データ管理部134が検索結果として絞り込まれたデータを参照する。文書番号管理部は、文書登録の際に文書番号の割り当てを行う。
【００１６】
本発明のデータベース管理システムで管理されるデータベース4は、データベース操作対象であるデータオブジェクトの集まりから成るデータ144、データ144に対応して作成された索引143、データ144のデータオブジェクトそれぞれに関連付けられた文書オブジェクトの集まりから成る文書145、文書145に対応して作成された文書索引142、そして上記文書145の文書オブジェクトとデータ144のデータオブジェクトとを論理的に結び付けるための変換テーブル141からなる。
【００１７】
データ144のデータオブジェクトの一例として、リレーショナル管理システムにおけるデータモデルであるテーブルの構成要素である行が挙げられる。データ144のデータオブジェクト、すなわち本形態における行は、データベース４へのアクセス単位であるページ中に、データレコードという形態で格納される。そのデータレコードに対して、問合せ要求・結果処理部２の問合せ処理実行制御122の指示によって、格納/読み出し等を担当するのがデータベース処理部３のデータ管理部134である。検索高速化のためにしばしばデータに対して索引143を作成し、検索時参照する。その索引143の参照および更新処理を行うのが、索引管理部133である。
【００１８】
本発明のデータベース管理システムでは、文書をテーブルの列の属性値として提供するに当たり、データ144と文書145とをそれぞれ別領域に格納して互いに関連付ける。その関連付けの手段が変換テーブル141である。また、文書高速検索手段として文書索引142を有し、文書索引の維持管理をデータベース処理部３内の文書索引管理部131が担当する。文書索引管理部131は、文書検索要求に際し、文書索引142を参照することにより文書検索条件に合致した文書に関する情報を取得する。文書145内の文書オブジェクトは、データベース格納時に文書番号管理部135によって割り当てられた文書オブジェクトを一意に認識するための文書番号によって識別される。文書索引管理部131は、文書オブジェクトに関する情報として文書番号を取得する。
【００１９】
テーブルの行の識別手段として、ある条件等に合致した行集合に対して集合演算を施したり、特定行をアクセスしたりするために、行に対応する格納データレコードのレコード識別子（データ識別子）を用いる。データオブジェクトと文書オブジェクトの関連付けは、上記文書番号とレコード識別子を用いて行う。
【００２０】
図２は、データ144の各ページ内におけるデータレコードの格納構造の一形態を示す図である。１つのページ20内には、複数のデータレコード23が格納可能であり、データレコード23のページ内の格納位置は、スロット21により指示される。スロット21の領域には指示するデータレコード23が格納されているページ20の先頭からの格納位置が記憶される。ページ制御情報22は、スロットの割当て状況などのスロット管理およびページ内領域管理を行うためのものである。データレコード23は、文書を属性値として持つ列(カラム)に対応する文書フィールド24を含む。文書フィールド24は、文書を一意に認識するための文書番号25および、文書本体をアクセスするためのポインタ（文書格納位置情報）26から成る。文書番号25は、データレコード23と文書145中の各文書オブジェクトとを論理的に結び付けるために用いられる。その文書番号25に対し、文書オブジェクトへのポインタ26は両者(データレコードと文書オブジェクト)を物理的に結び付けるために用いられる。
【００２１】
図３は変換テーブル141に格納されている各レコード識別子の構成の一形態を示す図である。レコード識別子51は、データレコード(図２の23)が格納されるページ(図２の20)を一意に識別するためのページ識別子31と、ページ内のデータレコード格納位置を特定するためのスロット(図２の21)を示すスロット番号32から成る。スロット番号32は、ページ格納構造においてページ制御情報(図２の22)側から順次番号付けされる。図３では、「ページ識別子＋スロット番号」という構造を採っているが、「スロット番号＋ページ識別子」でもなんら問題はない。レコード識別子51を用いてデータレコードをアクセスする。データレコードへのアクセスは、このレコード識別子のページ識別子51を用いて格納ページをアクセスし、スロット番号32に対応するスロットに記録されているデータレコード格納位置を取得することによって高速に行われる。
【００２２】
図４は、文書索引および索引の具体例を示す図である。図４のa)は、図１のデータベース4内の文書索引142(図１３の概念図にも記述)の詳細構成例である。また図４のb)は、図１のデータベース4内の索引143(図１３の概念図にも記述)の詳細構成例である。
【００２３】
文書索引142の中には各インデクスキーワードに対応した複数の索引41が含まれる。ここで、先頭の“本”はインデクスキーワードであり、それに続く文書番号11、12、…、1nは、インデクスキーワード“本”を含む文書オブジェクトの文書番号である。同様に、“発”および“明”について図示のように登録されている。この構造によりどんなキーワードが検索列としてやってきても文書オブジェクト本体をアクセスすることなしに検索条件に合致した文書番号を取得できる。
【００２４】
索引143の中には、属性値とその属性値を持つ列(カラム)の行を示すレコードID(レコード識別子)(群)から成る索引エントリ42が記録されている。属性値を指定すると、容易にその属性値を持つレコードIDを取得することができる。ここでは、索引エントリはテーブル構造をとっているが、B木構造やハッシングを用いた構造でもよい。
【００２５】
図５は、変換テーブルの一例を示す図である。これは、図１のデータベース4内の変換テーブル141(図１３の概念図にも記述)の詳細例である。本変換テーブルは、上記文書索引によって取得した検索条件に合致する文書番号を、リレーショナルデータベース管理システムが種々の演算において採用するレコード識別子(図３の30)に変換するためのものである。変換テーブル141は、変換テーブルエントリ51により構成される。本形態において変換テーブルエントリ51はレコード識別子(図３の30)から構成される。
【００２６】
そして、テーブル141は、複数の変換テーブルエントリ51の格納位置を文書番号から計算により容易に特定できるような構造になっている。さらに具体的に述べると、文書番号を１から順に格納領域をインクリメンタルに割り当てることにより、そのシリアルな文書番号に対応するレコード識別子30が変換テーブルの対応するエントリにマッピングされるようにする。その結果、文書番号より変換テーブルのエントリをアクセスし、エントリに記録してある対応レコード識別子を取得できる。
【００２７】
各変換テーブルエントリの構成要素がページ識別子とスロット番号からなるレコード識別子のみであり、文書番号やエントリ自身の情報などを必要としないことから、変換テーブルの容量を必要最小限に抑えることができる。さらに、文書索引内にレコード識別子を持つ場合、文書索引内には同一レコード識別子が大量に存在しそれがアクセス効率の低下を招く要因になることから、変換テーブルを参照し文書番号からレコード識別子に最終的に変換する方が効率よくアクセスすることができる。さらなる変換テーブルアクセス効率向上のため、変換テーブルはメモリに常駐させる方が望ましい。
【００２８】
次に図２から図５で説明した一構成形態のもとで、図６および図７を用いてデータベースの検索処理について詳細に説明する。
【００２９】
図６は、検索要求が問合せ元から入力された際の、データベース処理の詳細を示すフローチャートであり、図１における問合せ実行制御122以降の処理について示している。まず、ステップ601において、要求検索操作は文書索引を使用する検索であるかどうかを判定する(図１の122)。文書索引の使用不使用の指定は、図１の問合せ要求受付け・解析121において問合せ要求に含まれる検索条件により決定される。(ここで図１でのデータベース処理部3に制御が渡る。)ステップ601において文書索引使用指定の場合、ステップ602以降に進み文書検索条件による検索実行を行う。文書索引の使用の指定がない場合、ステップ609に進み、索引による検索を行うかどうかの判断を行う。
【００３０】
ステップ602に進んだ場合、図１の文書索引管理部131が以下の処理を行う。文書索引をアクセスし(ステップ602)、文書検索条件を満たす文書番号集合を取得する(ステップ603)。次に、取得した文書番号の集合を対応するレコード識別子の集合に変換するために、文書番号一つ一つを評価する。すなわち、ステップ604において、文書番号集合に要素すなわち文書番号が存在するかを判定する。文書番号が存在しない場合、文書番号集合はすべてレコード識別子集合に変換完了したとみなし、ステップ609に進む。
【００３１】
集合要素である文書番号が存在する場合、ステップ605以降により一文書番号の変換処理を行う。変換処理は、図１の変換テーブル管理を通して行う。ステップ605において文書番号集合から一文書番号を取り出す。そして、文書番号から変換テーブルの対応するエントリの格納位置を算出し、変換テーブルの対応するエントリをアクセスする(ステップ606)。そして、変換テーブルのエントリからレコード識別子を取得する(ステップ607)。取得したレコード識別子を変換結果としてレコード識別子集合１に追加する(ステップ608)。そして、ステップ604に戻り、残りの文書番号の変換を続行する。
【００３２】
文書番号集合の変換処理がすべて終了後、または文書索引による検索処理を行わなかった場合、ステップ609において、索引を使った検索を行うかどうかを判定する。索引を使った検索を行う場合、図１の索引管理部133に処理制御が渡り以下の処理を行う。ステップ610に進み索引をアクセスし、検索条件を満たすレコード識別子集合２を取得する(ステップ611)。ステップ609において、索引を使用しないと判断した場合、ステップ612に進む。ステップ612において、検索条件の組合せによる集合演算を行う。具体的には、文書に対する検索条件と索引を使うような検索条件のAND条件で問合せ要求がなされている場合は、レコード識別子集合１とレコード識別子集合２の積集合を結果レコード識別子集合とする。
【００３３】
また、OR条件の問合せ集合の場合には、レコード識別子集合１とレコード識別子集合２の和集合を結果レコード識別子集合とする。どちらかの条件のみの場合はレコード識別子集合の集合演算は行わず、そのまま結果レコード識別子集合とする。その後、ステップ613において、結果レコード識別子集合を用いて要求に応じレコードをアクセスし(ステップ613)、結果として問合せ元に返す(ステップ614)。
【００３４】
図７は、本発明の検索動作説明図を示している。これは、図６のフローチャートに従って説明した検索時の具体例である。データベース4には、データ144および文書145として、図７に示す「著者」列および「文書」列（文書型）を持つテーブルが格納管理されている。問合せ元が問合せ要求1として、「著者＝HARA」かつ「"データベース"を含む」行の検索を要求する。処理122において上記問合せ要求を受付けて解析し、アクセス手段の決定を行う。本具体例では、「著者」列に索引が定義され、文書索引が用意されているので、索引および文書索引を用いて検索処理を行うことを決定する。
【００３５】
そして、処理122において決定されたアクセス手段に従って検索処理を以下のように制御する。まず、文書索引管理部131において文書索引142をアクセスし、検索条件合致文書番号(文書番号１、文書番号２)を取得する(図６のステップ603に相当)。そして、変換テーブル管理部132において、変換テーブル141を参照し、先に取得した文書番号集合をレコード識別子集合（レコード識別子n、レコード識別子m）に変換する(図６のステップ605からステップ608に相当)。次に、検索条件「著者＝HARA」より、索引管理部133において索引143をアクセスし、検索条件合致レコード識別子（レコード識別子m、レコード識別子k）を取得する。検索結果処理部123において、上記結果レコード識別子集合をマージし(本実施例の場合、積集合を求める)(図６のステップ612に相当)、最終結果レコード識別子mを取得し、検索結果709として問合せ元に返す(図６のステップ614に相当)。
【００３６】
以上によって、文書が格納された列「文書」に対する文書検索条件を含む検索操作を、文書番号からレコード識別子への容易な変換を用いて、他の列に作成されている索引を利用するのと同じ要領で実行することができた。また、索引と併用することで、結果集合の縛り込みが効率的に行えた。ここでは、「文書」列以外の列に作成されている一索引の利用例を示したが、検索条件によっては複数索引を用いてもよい。また、データベースへのI/O数等を加味して最適化を図り、適切な索引を組合せて使用するようにしてもよい。
【００３７】
次に、図８および図９を用いてデータベースへの登録操作について詳細に説明する。図９は、本発明の登録操作フローチャートであり、図６同様に図１における問合せ実行制御122以降の処理について示している。
【００３８】
新規データおよび新規文書の登録の際に、まずデータベース処理部３では、ステップ801にて新規文書番号の割り付けを行う。これは、図１の文書番号管理部135が行う。文書番号の管理方法の一形態として、文書番号を「採番カウンタ」で管理し、新規文書の登録において「採番カウンタ」に＋１した値を文書番号として割り当てる。その際「採番カウンタ」の値は＋１する。ここで、文書番号(採番カウンタ)を実現するために必要なビット数（サイズ）は、レコード識別子を構成するページ識別子およびスロット番号を実現するためのビット数（サイズ）よりも小さい。これは、レコード識別子の割当てがまばらになるのに対して、文書番号は常に順番に割り当てられることからも分かる。
【００３９】
次に、データベース4に新規文書オブジェクトを格納する(ステップ802)。先程の新規文書番号および新規文書格納位置を用いて、新規データレコードを作成する(ステップ803)。新規データレコードの作成に当たり、図２を用いて説明したデータレコード23の文書フィールド24の文書番号25および文書オブジェクトへのポインタ26（文書オブジェクト格納位置）を設定する。そして、新規データレコードを格納するためのページを決定し(ステップ804)、格納ページ内のページ制御情報から新規データレコードのためのスロットを割り当ててもらい(ステップ805)、新規データレコードをページ内に格納する(ステップ806)。格納ページおよびスロット番号決定時、レコード識別子が確定する。
【００４０】
データレコード格納後、文書列以外の列に索引が存在するかを判定し(ステップ807)、存在する場合その索引のメンテナンスをレコード識別子を用いて行う(ステップ808)。さらに、ステップ809において、ステップ801で割当てた文書番号を用いて文書索引のメンテナンスを行う。文書番号から変換テーブルエントリの位置を算出し(ステップ810)、変換テーブルエントリにステップ805までに確定したレコード識別子を設定する(ステップ811)。
【００４１】
ここでは、索引のメンテナンス処理の後に、文書索引のメンテナンスを行っているが、文書番号の割当ておよびレコード識別子の確定が完了していさえすれば、索引および文書索引のメンテナンスの順序に制約はない。もちろん両メンテナンス処理は処理高速化のため並列に実行することが望ましい。また、変換テーブルエントリのメンテナンスも文書番号および対応レコード識別子が確定した段階で行って構わない。
【００４２】
図９は、本発明の登録操作説明図を示している。これは、図８のフローチャートに従って説明した登録時の具体例である。データベース４には、図７と同様に「著者」列および「文書」列を持つテーブルがデータ144および文書145として格納管理されている。問合せ元が問合せ要求1として、「著者＝NISHI」であり文書オブジェクト「…。インターネットは、…。」を伴う新規データの登録を要求する。処理121において上記問合せ要求を受付けて解析する。そして、処理122において登録処理を以下のように制御する。
【００４３】
まず、データ管理部134において文書格納を行うが、それに先立ち文書番号管理部135において文書番号の割り付けを行い、「文書番号４」を取得する(図８のステップ801に相当)。次に新規文書オブジェクトをデータベース４に格納し(図８のステップ802に相当)、そのポインタと「文書番号４」を用いてデータレコードの格納を行う(図８のステップ806に相当)。その際、「レコード識別子p」が確定される。「著者」列に索引が作成されていることから、インデクスキー「NISHI」および「レコード識別子p」を用いて索引管理部133において索引143のメンテナンス処理を行う(図８のステップ808に相当)。それとともに、「文書番号４」を用いて文書索引131において文書索引142のメンテナンス処理を行う(図8のステップ809に相当)。またそれとともに、変換テーブル管理部132において、「文書番号４」から変換テーブルエントリ位置を算出し、エントリに「レコード識別子p」を設定することにより、新エントリ設定を完了する(図８のステップ811に相当)。
【００４４】
以上によって、「著者＝NISHI」を含むデータレコードと新規文書オブジェクト「…。インターネットは、…。」とを関連付けて登録することができる。
【００４５】
次に、図１０および図１１を用いてデータベースからの削除操作について詳細に説明する。図１０は、本発明の削除操作フローチャートであり、図６と同様に図１における問合せ実行制御122以降の処理について示している。データおよびそれに関連する文書の削除の際に、まず、データベース処理部では、ステップ1001にてレコード識別子を用いて削除対象となっているデータレコードの削除を行う。その際、関連する文書オブジェクトの文書番号および文書格納ポインタを記憶しておく。
【００４６】
次に、ステップ1002にて先に記憶しておいた文書格納ポインタを用いて文書オブジェクトの削除を行う。そして、ステップ1003において削除データレコードに関連する索引のメンテナンスを行う。すなわち、ステップ1003で索引が作成されているかどうかを判定する。索引がある場合、ステップ1004において列の値および削除対象レコード識別子を用いて索引メンテナンス(索引エントリの削除)を行い、ステップ1005に進む。索引が存在しない場合、そのままステップ1005に進む。次に、削除データレコードから記憶しておいた文書番号を用いて文書索引のメンテナンスを行う(ステップ1005)。さらに文書番号から変換テーブル位置を算出し(ステップ1006)、変換テーブルエントリ内のレコード識別子を初期化することにより、対応変換テーブルエントリを無効化する(ステップ1007)。
【００４７】
図１１は、本発明の削除操作説明図を示している。これは、図１０のフローチャートに従って説明した削除時の具体例である。データベース４には、図７および図９と同様に「著者」列および「文書」列を持つテーブルがデータ144および文書145として格納管理されている。問合せ元が問合せ要求1として、「著者＝NISHI」であるデータ(行)の削除を要求する。処理121において上記問合せ要求を受付けて解析する。そして、処理122において削除処理を以下のように制御する。
【００４８】
まず、データ管理部134において削除対象レコードの「レコード識別子p」を確定し、データレコードの削除を行う(図１０のステップ1001に相当)。そして、データレコードに格納されていた関連文書オブジェクトへのポインタを用い、対応する文書オブジェクトの削除を行う(図１０のステップ1002に相当)。図１１に示すように、データ144および文書145の削除対象レコードおよび文書オブジェクトを点線で示してある。
【００４９】
削除データレコードの「レコード識別子p」、索引がはられている列の値、および削除データレコードに格納されていた関連文書オブジェクトの「文書番号４」を用いて、索引管理部133にて索引メンテナンス処理を、文書索引管理部131において文書索引メンテナンス処理を、さらに変換テーブル管理部132において対応エントリの初期化をそれぞれ行う(図１０のステップ1001、ステップ1004、ステップ1005にそれぞれ相当)。対応エントリの初期化において、「文書番号4」から変換テーブルの位置を算出しエントリ内の「レコード識別子p」を初期化する。
【００５０】
以上によって、削除要求処理を完了する。図１１の流れからも分かるように、データレコード削除の後、関連する文書番号および削除文書格納ポインタ、索引キーが確定されているので、それらの処理を順次実行する理由はない。これは、並列処理による高速化を意味する。
【００５１】
上述の実施形態では、文書番号割当て処理を文書番号管理部135が行っているが、データベースへの文書格納を行うためのデータ管理部134が行っても構わない。また、変換テーブルの実施形態に関しても、上述のようなエントリの配列構造ではなく、Ｂ木構造であってももちろん構わない。
【００５２】
また、図１２に文書索引のさらなる一実施形態を示す。図１２は、ビットマップ・インデクスを用いた文書索引の例を示す図である。これは、図４における文書索引例とは別の実施形態である。図４の例では文書索引が文字列をキーとする文書番号リストより構成されていたのに対し、図１２の例では、文字列に対してビットマップ・インデクスを作成する。格納文書オブジェクトそれぞれに１ビットを割り当てる。また、各ビットは文書オブジェクト中の文字列に対応する。例えば、文字列が“本”のビットマップ・インデクスにおいて、１番目とｎ番目のビットが１であれば、１番目とｎ番目の文書オブジェクトに“本”が含まれていることを意味する(1200)。
【００５３】
ビットマップ・インデクスを用いた場合、ビットマップの中の位置によって文書を識別する。最初のビットが文書番号１を指し、ｎ番目のビットが文書番号ｎを指す。本実施形態において、文書番号は１からシリアルに割当てられるため、ビットマップ・インデクスによる実現は容易であり、アクセスも効率的である。また、ビットマップ・インデクスを用いた場合、検索結果は一時的にビットマップの形態をとることになる。これは、テーブルに複数の文書列が定義され、その列おのおのに検索条件が指定された問合せ要求が入力された場合、複数の文書索引を用いた検索結果のAND/OR演算は、レコード識別子に一旦変換しなくとも、ビットマップ同士のビット演算で効率良く実現できることを意味する。また、以上のことは、文書索引を用いた文書検索に縛られることなく、ビットマップ・インデクスを用いた列における高速検索への拡張も意味する。
【００５４】
すなわち、文書属性を持たない列に対する索引をビットマップ・インデクスで構築した場合でも、本発明における変換テーブルを用いることにより、レコード識別子（行識別子）とビットマップ・インデクス内のビット位置との対応が容易にでき、かつ索引中にレコード識別子を持たないことから索引の容量も小さく抑えることができる。
【００５５】
文書は多数の文字列を含むため、文書索引内には、同一文書番号が多数存在する。変換テーブルを導入することにより、文書番号として最小サイズのものを採用することができ、レコード識別子を用いて文書索引を構成するよりも、文書索引の容量を削減することができた。また、変換テーブルは、文書番号から対応レコード識別子を取得する目的のみに用い、種々の問合せよりに関してレコード識別子から文書番号への逆変換はフローチャートからも分かるように不要である。しかも、文書番号からレコード識別子への変換は容易に実行することができることから、文書を伴うデータの検索、登録、又は削除操作が効率良く行うことができる。
【００５６】
本発明における一形態として、データオブジェクトと文書オブジェクトが１対１に対応する形態を説明したが、別形態として、データオブジェクトと文書オブジェクトとの関連は、多対１、１対多、多対多でも構わない。また、データオブジェクト新規格納時において、関連文書オブジェクトが決定あるいは格納されていなくともよい。その場合、データオブジェクトの文書フィールドに、文書オブジェクト未決定情報（具体的にはnull値）を記録しておけばよい。これらのことは、データオブジェクトと文書オブジェクトとが、ユティリティにおけるデータ一括登録などにおいて独立した運用が可能であることを示している。文書オブジェクトの登録には一般にデータオブジェクトの格納に比べ多数のI/Oを伴うため、データオブジェクトのみを先に一括登録し、文書オブジェクトの登録は後回しにしておく、等という運用も容易である。
【００５７】
【発明の効果】
以上説明したように、本発明によれば、格納文書の識別手段としてレコード識別子よりサイズの小さい「文書番号」を採用し、文書索引内において文書番号を用いて索引文字列との関連を管理し、検索時にその「文書番号」と関連する行の「レコード識別子」とを変換テーブルを用いて容易に「文書番号」から「レコード識別子」に変換し、レコード識別子に条件式を適用することにより、文書検索条件を伴う問合せ要求に対し効率よく検索することができる。さらに、索引を併用することにより、レコード識別子の絞り込みを効率的に行うことができる。
【図面の簡単な説明】
【図１】本発明の原理構成図である。
【図２】データレコードの格納構造の一形態を示す図である。
【図３】レコード識別子の構成の一形態を示す図である。
【図４】文書索引の一形態を示す図である。
【図５】本発明の文書番号レコード識別子変換テーブル構造の一例を示す図である。
【図６】発明の検索操作フローチャートである。
【図７】本発明の検索操作説明図である。
【図８】本発明の登録操作フローチャートである。
【図９】本発明の登録操作説明図である。
【図１０】本発明の削除操作フローチャートである。
【図１１】本発明の削除操作説明図である。
【図１２】ビットマップ・インデクスを用いた文書索引の例である。
【図１３】本発明の概念図である。
【図１４】本発明を実施する計算機システムの構成図である。
【符号の説明】
１：問合せ要求、１３１：文書索引管理部、１３２：変換テーブル管理部、
１３３：索引管理部、１３４：データ管理部、１３５：文書番号管理部、
２：問合せ要求・結果処理部、３：データベース処理部、４：データベース、
１４１：変換テーブル、１４２：文書索引、１４３：索引、１４４：データ、
１４５：文書、５：問合せ結果[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a database management method and database management system having a document search function.
[0002]
[Prior art]
A database that stores and manages the document information itself as a table-format database in the same way as sales data, etc. due to the importance of storing and reusing document information, and searches for documents under various conditions by entering an arbitrary character string A management system is desired. For this reason, in a relational database whose logical structure in the database is composed of rows (rows) and columns (columns) of a table, the document type for storing the document is made to correspond to the data type included in the column, and the document There is provided a system that meets the above requirements in the form of search processing for a type sequence. In these database management systems, a pointer to a document storage area is stored in a column corresponding portion in a record that is a storage form of a row, thereby making it appear as if a document entity exists in the row.
[0003]
On the other hand, in a general document search system, an index is created by preprocessing in order to search a large amount of documents at high speed. This is because accessing the entire document body in order to enter a character string and determine a document that satisfies the search condition results in poor search efficiency. A text string included in a document is cut out from the document, an index is constructed using the text string as a key, and the index is used at the time of retrieval, thereby efficiently determining a document satisfying the search condition without accessing the document body. . In that case, the method of extracting the character string from the document, that is, the key construction method greatly affects the capacity of the index. When one character is used as a cut-out character string, pointers are indexed from the index key without omission, and there is no need to access the document body when determining the search condition. However, the index capacity becomes enormous, and eventually the search efficiency cannot be improved sufficiently. Further, when the length of the cut-out character string is increased, the pointer to the document becomes rough, and it is necessary to access the document body to compensate for it. In order to improve the search efficiency, it is important to create an index and reduce the capacity so that there is no omission of pointers to documents.
[0004]
In an ordinary relational database management system, an index (index) using a column value of each row as a key is used to improve search efficiency. A B-tree index having a B-tree structure as an index structure is often used. The association between the index key and the row (row) is performed by a “row (row) identifier” that can uniquely identify the row (row). That is, the index is composed of an index entry consisting of a key and a row (row) identifier. This “row (row) identifier” is used as an identifier of a row which is a constituent element of an index entry and is data subject to data processing for a relational database such as set operation processing for a search result set. In order to enable high-speed access to the row body using the “row (row) identifier”, the “row (row) identifier” is the page number of the page that is the access unit of the file storing the database and the row in the page. It is often composed of storage location information.
[0005]
[Problems to be solved by the invention]
Conventionally, in a relational database management system with a document search function, a “row identifier” of a row to which a related document belongs is used as a pointer to a document that is a component of a document index, similarly to a B-tree index. Therefore, a “row (row) identifier” set satisfying the search condition from the document index is acquired, and a set operation or the like of the result set from the search condition for other columns can be performed. However, in this case, the size of the “row (row) identifier” generally indicating the storage position in the entire file is larger than the pointer to the document managed in the document index of the document management system or the document identification means. Since it is large, the capacity of the document index becomes enormous, resulting in a problem that the search efficiency is lowered.
[0006]
In order to solve these problems, the present invention can uniquely recognize a stored document and uses a “document number” having a size smaller than the line identifier for the document index, thereby reducing the capacity of the document index and setting the document search condition. It is an object of the present invention to provide a relational database management method with a document search function that has a high search efficiency with respect to an accompanying query request.
[0007]
[Means for Solving the Problems]
In order to achieve the above object, a database management system according to the present invention has the following configuration.
[0008]
A document number management unit having means for uniquely identifying a document object and assigning a document number used for a document index when registering the document object in a document in the database;
Based on the document index created corresponding to the document at the time of the search request, means to acquire the document number of the document object that matches the document search condition in the form of a set, and manage the document index at the time of document registration A document index management unit having means for
A conversion table for associating a document number assigned to each document object in the stored document by the document number management unit with a data identifier for uniquely identifying each data object in the data associated with the document object; A conversion table management unit having means for converting the document number set acquired by the document index management unit during the search operation into an associated data identifier.
[0009]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, an embodiment of the present invention will be described in detail with reference to the drawings. The present invention is implemented by the computer system shown in FIG. The computer system shown in FIG. 4 comprises a central processing unit (CPU) 100, an input / output terminal (VDT) 200, and a disk device 300. The disk device includes a database (DB) 4 described later and the present invention. Stored is a program 500 for executing the processing procedure according to. The program 500 is read into the main memory of the CPU and executed.
[0010]
First, the conceptual diagram of the present invention in FIG. 13 will be described. In the database management system of the present invention, when a search request 1 from a query source with a conditional expression and a keyword for a data attribute value is received, the document index 142 created corresponding to the document 145 based on the keyword is referred to. The document number (document No.) (group) of the document object including the keyword is acquired. Then, the record identifier 51 (record ID) (group) stored in the entry (storage position calculated from the document No.) of the conversion table 141 corresponding to the document No. is acquired. The record identifier 51 is information indicating the storage position of the data record 23 in the data 144, and the page identifier in the data and the storage position in the page are stored. It consists of numbers. A document object including the keyword of the search request 1 is associated with the row of the conversion table 141 having the acquired record identifier 51 via the data 144.
[0011]
Further, the record ID (group) of the row that matches the conditional expression is acquired using the index 143 created corresponding to the attribute value of the data included in the conditional expression of the search request 1. Here, the set of record IDs obtained by referring to the conversion table 141 is narrowed down using the record IDs obtained from the previous index 143.
[0012]
Here, the data record 23 (table row storage form) stored in the page 20 is accessed from the record ID included in the narrowed-down result, and a pointer to the document object is output as the search result (5 ).
[0013]
Next, FIG. 1 shows a configuration diagram of the database management system of the present invention. As shown in FIG. 1, the database management system of the present invention comprises a database processing unit 3 that receives and analyzes a query request 1 from a query source, and performs search processing and update processing of the database 4 in response to the query request. The The query request / result processing unit 2 accepts and analyzes the query request 1 from the user (121), requests the database processing unit 3 to execute data processing corresponding to the query request (122), and the database processing unit 3 Then, the query result is processed (123), and the query result 5 is output to the query source.
[0014]
The database processing unit 3 searches or updates the database 4 in response to a request from the query request / result processing unit 2 and returns the result to the query request / result processing unit 2. The document index management unit 131, the index management unit 133, the data management unit 134, the conversion table management unit 132, and the document number management unit 135 are responsible for the search or update processing of the database 4.
[0015]
Here, the arrow in FIG. 1 shows the main flow of the process in the case of a search request. By using the document index management unit 131 and the index management unit 133, the search results are efficiently narrowed down. Depending on the request, the data management unit 134 refers to the data narrowed down as a search result. The document number management unit assigns document numbers at the time of document registration.
[0016]
The database 4 managed by the database management system of the present invention is associated with each data object of the data 144 that is a collection of data objects that are database operation targets, the index 143 that is created corresponding to the data 144, and the data 144 The document 145 includes a collection of document objects, a document index 142 created corresponding to the document 145, and a conversion table 141 for logically connecting the document object of the document 145 and the data object of the data 144.
[0017]
An example of the data object of the data 144 is a row that is a component of a table that is a data model in a relational management system. Data objects of the data 144, that is, rows in this embodiment, are stored in the form of data records in a page that is a unit of access to the database 4. The data management unit 134 of the database processing unit 3 is in charge of storing / reading the data record according to the instruction of the query processing execution control 122 of the query request / result processing unit 2. An index 143 is often created for the data for speeding up the search and referenced during the search. The index management unit 133 performs reference and update processing of the index 143.
[0018]
In the database management system of the present invention, when providing a document as an attribute value of a table column, the data 144 and the document 145 are stored in separate areas and associated with each other. The association means is the conversion table 141. Further, the document index 142 is provided as a high-speed document retrieval unit, and the document index management unit 131 in the database processing unit 3 is in charge of maintenance and management of the document index. When a document search request is made, the document index management unit 131 refers to the document index 142 to acquire information related to a document that matches the document search condition. The document object in the document 145 is identified by a document number for uniquely recognizing the document object assigned by the document number management unit 135 when the database is stored. The document index management unit 131 acquires a document number as information related to a document object.
[0019]
As a means of identifying a row in a table, a record identifier (data identifier) of a stored data record corresponding to a row is used to perform a set operation on a row set that matches a certain condition or to access a specific row. Use. The association between the data object and the document object is performed using the document number and the record identifier.
[0020]
FIG. 2 is a diagram showing one form of a storage structure of data records in each page of data 144. As shown in FIG. A plurality of data records 23 can be stored in one page 20, and the storage position in the page of the data record 23 is indicated by the slot 21. In the slot 21 area, the storage position from the top of the page 20 in which the designated data record 23 is stored is stored. The page control information 22 is used for slot management such as slot allocation status and page area management. The data record 23 includes a document field 24 corresponding to a column having a document as an attribute value. The document field 24 includes a document number 25 for uniquely recognizing a document and a pointer (document storage position information) 26 for accessing the document body. The document number 25 is used to logically connect the data record 23 and each document object in the document 145. For the document number 25, the pointer 26 to the document object is used to physically connect the two (data record and document object).
[0021]
FIG. 3 is a diagram showing an example of the configuration of each record identifier stored in the conversion table 141. The record identifier 51 includes a page identifier 31 for uniquely identifying a page (20 in FIG. 2) in which the data record (23 in FIG. 2) is stored, and a slot ( It consists of slot number 32 indicating 21) in FIG. The slot numbers 32 are sequentially numbered from the page control information (22 in FIG. 2) side in the page storage structure. FIG. 3 employs a structure of “page identifier + slot number”, but there is no problem with “slot number + page identifier”. A data record is accessed using the record identifier 51. The data record is accessed at high speed by accessing the storage page using the page identifier 51 of this record identifier and acquiring the data record storage position recorded in the slot corresponding to the slot number 32.
[0022]
FIG. 4 is a diagram illustrating a specific example of a document index and an index. 4A is a detailed configuration example of the document index 142 (also described in the conceptual diagram of FIG. 13) in the database 4 of FIG. 4B is a detailed configuration example of the index 143 (also described in the conceptual diagram of FIG. 13) in the database 4 of FIG.
[0023]
The document index 142 includes a plurality of indexes 41 corresponding to each index keyword. Here, the first “book” is an index keyword, and the subsequent document numbers 11, 12,..., 1n are document numbers of document objects including the index keyword “book”. Similarly, “departure” and “light” are registered as shown in the figure. With this structure, it is possible to acquire a document number that matches the search condition without accessing the document object body, regardless of what keyword comes as a search string.
[0024]
In the index 143, an index entry 42 composed of an attribute value and a record ID (record identifier) (group) indicating a row of a column (column) having the attribute value is recorded. When an attribute value is specified, a record ID having that attribute value can be easily obtained. Here, the index entry has a table structure, but a B-tree structure or a structure using hashing may be used.
[0025]
FIG. 5 is a diagram illustrating an example of the conversion table. This is a detailed example of the conversion table 141 (also described in the conceptual diagram of FIG. 13) in the database 4 of FIG. This conversion table is for converting a document number that matches the search condition acquired by the document index into a record identifier (30 in FIG. 3) used in various operations by the relational database management system. The conversion table 141 includes a conversion table entry 51. In this embodiment, the conversion table entry 51 includes a record identifier (30 in FIG. 3).
[0026]
The table 141 is structured such that the storage positions of the plurality of conversion table entries 51 can be easily specified from the document number by calculation. More specifically, by sequentially assigning storage areas in order of document numbers starting from 1, the record identifier 30 corresponding to the serial document number is mapped to the corresponding entry in the conversion table. As a result, the entry of the conversion table can be accessed from the document number, and the corresponding record identifier recorded in the entry can be acquired.
[0027]
Since the constituent elements of each conversion table entry are only record identifiers composed of page identifiers and slot numbers, and the document number and information of the entry itself are not required, the capacity of the conversion table can be minimized. Furthermore, when there are record identifiers in the document index, a large number of the same record identifiers exist in the document index, which causes a decrease in access efficiency, so refer to the conversion table and change the document number to the record identifier. The final conversion can be accessed more efficiently. In order to further improve the conversion table access efficiency, it is desirable to make the conversion table resident in the memory.
[0028]
Next, the database search process will be described in detail with reference to FIGS. 6 and 7 based on the configuration described with reference to FIGS.
[0029]
FIG. 6 is a flowchart showing details of the database processing when a search request is input from the inquiry source, and shows processing after the query execution control 122 in FIG. First, in step 601, it is determined whether the requested search operation is a search using a document index (122 in FIG. 1). The designation of whether or not to use the document index is determined by the search condition included in the query request in the query request acceptance / analysis 121 of FIG. (Here, control is passed to the database processing unit 3 in FIG. 1.) When the document index use is designated in step 601, the process proceeds to step 602 and subsequent steps, and a search is executed based on the document search condition. If the use of the document index is not specified, the process proceeds to step 609 to determine whether or not to perform a search using the index.
[0030]
When the processing proceeds to step 602, the document index management unit 131 in FIG. 1 performs the following processing. The document index is accessed (step 602), and a document number set that satisfies the document search condition is acquired (step 603). Next, each document number is evaluated in order to convert the obtained set of document numbers into a corresponding set of record identifiers. That is, in step 604, it is determined whether an element, that is, a document number exists in the document number set. If no document number exists, it is considered that all the document number sets have been converted into record identifier sets, and the process proceeds to step 609.
[0031]
If there is a document number that is a collective element, one document number is converted in step 605 and subsequent steps. The conversion process is performed through the conversion table management of FIG. In step 605, one document number is extracted from the document number set. Then, the storage position of the corresponding entry in the conversion table is calculated from the document number, and the corresponding entry in the conversion table is accessed (step 606). Then, the record identifier is acquired from the entry of the conversion table (step 607). The acquired record identifier is added to the record identifier set 1 as a conversion result (step 608). Then, the process returns to step 604 to continue conversion of the remaining document numbers.
[0032]
After all the conversion processing of the document number set is completed or when the search processing using the document index is not performed, it is determined in step 609 whether or not the search using the index is performed. When performing a search using an index, processing control is passed to the index management unit 133 in FIG. 1, and the following processing is performed. Proceeding to step 610, the index is accessed, and a record identifier set 2 satisfying the search condition is acquired (step 611). If it is determined in step 609 that the index is not used, the process proceeds to step 612. In step 612, a set operation based on a combination of search conditions is performed. More specifically, when a query request is made with an AND condition of a search condition for a document and a search condition that uses an index, a product set of the record identifier set 1 and the record identifier set 2 is set as a result record identifier set.
[0033]
In the case of an OR condition query set, the union of the record identifier set 1 and the record identifier set 2 is set as a result record identifier set. When only one of the conditions is satisfied, the record identifier set is not set and the result record identifier set is used as it is. Thereafter, in step 613, the record is accessed according to the request using the result record identifier set (step 613), and the result is returned to the inquiry source (step 614).
[0034]
FIG. 7 shows an explanatory diagram of the search operation of the present invention. This is a specific example at the time of searching described according to the flowchart of FIG. In the database 4, a table having “author” column and “document” column (document type) shown in FIG. 7 is stored and managed as data 144 and document 145. As a query request 1, the query source requests a search for a row of “author = HARA” and “includes“ database ”. In the process 122, the inquiry request is received and analyzed to determine an access means. In this specific example, since an index is defined in the “author” column and a document index is prepared, it is determined to perform a search process using the index and the document index.
[0035]
Then, the search process is controlled as follows according to the access means determined in process 122. First, the document index management unit 131 accesses the document index 142 to acquire the search condition matching document numbers (document number 1, document number 2) (corresponding to step 603 in FIG. 6). Then, the conversion table management unit 132 refers to the conversion table 141 and converts the previously acquired document number set into a record identifier set (record identifier n, record identifier m) (corresponding to step 605 to step 608 in FIG. 6). ). Next, from the search condition “author = HARA”, the index management unit 133 accesses the index 143 to acquire the search condition matching record identifier (record identifier m, record identifier k). In the search result processing unit 123, the result record identifier sets are merged (in the present embodiment, a product set is obtained) (corresponding to step 612 in FIG. 6), the final result record identifier m is obtained, and the search result 709 is obtained. Return to the inquiry source (corresponding to step 614 in FIG. 6).
[0036]
As described above, the search operation including the document search condition for the column “document” in which the document is stored is performed using the index created in the other column by using the easy conversion from the document number to the record identifier. I was able to execute it in the same way. Moreover, by using it together with the index, the result set can be bound efficiently. Here, an example of using one index created in a column other than the “document” column is shown, but a plurality of indexes may be used depending on the search condition. Further, optimization may be performed in consideration of the number of I / Os to the database, and an appropriate index may be used in combination.
[0037]
Next, the registration operation to the database will be described in detail with reference to FIGS. FIG. 9 is a flowchart of the registration operation according to the present invention, and shows the processing after the query execution control 122 in FIG.
[0038]
When registering new data and a new document, the database processing unit 3 first assigns a new document number in step 801. This is performed by the document number management unit 135 of FIG. As one form of a document number management method, a document number is managed by a “numbering counter”, and a value added to “numbering counter” in registration of a new document is assigned as a document number. At this time, the value of the “numbering counter” is incremented by one. Here, the number of bits (size) necessary for realizing the document number (numbering counter) is smaller than the number of bits (size) for realizing the page identifier and slot number constituting the record identifier. This can also be seen from the fact that document numbers are always assigned in order, whereas assignment of record identifiers is sparse.
[0039]
Next, the new document object is stored in the database 4 (step 802). A new data record is created using the previous new document number and new document storage location (step 803). In creating a new data record, the document number 25 in the document field 24 of the data record 23 described with reference to FIG. 2 and the pointer 26 (document object storage position) to the document object are set. Then, the page for storing the new data record is determined (step 804), the slot for the new data record is assigned from the page control information in the storage page (step 805), and the new data record is placed in the page. Store (step 806). When the storage page and slot number are determined, the record identifier is determined.
[0040]
After the data record is stored, it is determined whether an index exists in a column other than the document column (step 807). If the index exists, the index is maintained using the record identifier (step 808). In step 809, the document index is maintained using the document number assigned in step 801. The position of the conversion table entry is calculated from the document number (step 810), and the record identifier determined up to step 805 is set in the conversion table entry (step 811).
[0041]
Here, the document index is maintained after the index maintenance process. However, as long as the assignment of the document number and the determination of the record identifier are completed, the order of maintenance of the index and the document index is not limited. Of course, it is desirable to execute both maintenance processes in parallel in order to increase the processing speed. Further, the maintenance of the conversion table entry may be performed when the document number and the corresponding record identifier are determined.
[0042]
FIG. 9 shows a registration operation explanatory diagram of the present invention. This is a specific example at the time of registration described according to the flowchart of FIG. In the database 4, a table having an “author” column and a “document” column is stored and managed as data 144 and a document 145 as in FIG. As the inquiry request 1, the inquiry source requests registration of new data with “author = NISHI” and the document object “... In step 121, the inquiry request is received and analyzed. In the process 122, the registration process is controlled as follows.
[0043]
First, the data management unit 134 stores the document, but prior to that, the document number management unit 135 assigns the document number to obtain “document number 4” (corresponding to step 801 in FIG. 8). Next, a new document object is stored in the database 4 (corresponding to step 802 in FIG. 8), and a data record is stored using the pointer and “document number 4” (corresponding to step 806 in FIG. 8). At this time, “record identifier p” is determined. Since the index is created in the “author” column, the index management unit 133 performs maintenance processing of the index 143 using the index keys “NISHI” and “record identifier p” (corresponding to step 808 in FIG. 8). At the same time, the maintenance processing of the document index 142 is performed in the document index 131 using “document number 4” (corresponding to step 809 in FIG. 8). At the same time, the conversion table management unit 132 calculates the conversion table entry position from “document number 4” and sets “record identifier p” in the entry, thereby completing the new entry setting (step 811 in FIG. 8). Equivalent).
[0044]
As described above, the data record including “author = NISHI” and the new document object “...
[0045]
Next, the deletion operation from the database will be described in detail with reference to FIGS. FIG. 10 is a flowchart of the deletion operation according to the present invention, and shows the processing after the query execution control 122 in FIG. When deleting data and related documents, the database processing unit first deletes the data record to be deleted using the record identifier in step 1001. At this time, the document number and document storage pointer of the related document object are stored.
[0046]
In step 1002, the document object is deleted using the previously stored document storage pointer. In step 1003, the index related to the deleted data record is maintained. That is, it is determined in step 1003 whether an index has been created. If there is an index, index maintenance (deletion of index entry) is performed using the column value and the deletion target record identifier in step 1004, and the process proceeds to step 1005. If no index exists, the process proceeds to step 1005 as it is. Next, the document index is maintained using the document number stored from the deleted data record (step 1005). Further, the conversion table position is calculated from the document number (step 1006), and the corresponding conversion table entry is invalidated by initializing the record identifier in the conversion table entry (step 1007).
[0047]
FIG. 11 shows an explanatory diagram of the deletion operation of the present invention. This is a specific example at the time of deletion described according to the flowchart of FIG. In the database 4, a table having an “author” column and a “document” column is stored and managed as data 144 and a document 145 as in FIGS. 7 and 9. As the inquiry request 1, the inquiry source requests deletion of data (row) with “author = NISHI”. In step 121, the inquiry request is received and analyzed. In the process 122, the deletion process is controlled as follows.
[0048]
First, the data management unit 134 determines the “record identifier p” of the deletion target record and deletes the data record (corresponding to step 1001 in FIG. 10). Then, the corresponding document object is deleted using the pointer to the related document object stored in the data record (corresponding to step 1002 in FIG. 10). As shown in FIG. 11, the records to be deleted and the document objects of the data 144 and the document 145 are indicated by dotted lines.
[0049]
Using the “record identifier p” of the deleted data record, the value of the indexed column, and the “document number 4” of the related document object stored in the deleted data record, the index management unit 133 performs index maintenance. The document index management unit 131 performs document index maintenance processing, and the conversion table management unit 132 initializes the corresponding entries (corresponding to step 1001, step 1004, and step 1005 in FIG. 10, respectively). In initialization of the corresponding entry, the position of the conversion table is calculated from “document number 4”, and “record identifier p” in the entry is initialized.
[0050]
Thus, the deletion request process is completed. As can be seen from the flow of FIG. 11, since the related document number, the deleted document storage pointer, and the index key are determined after the data record is deleted, there is no reason to sequentially execute these processes. This means speeding up by parallel processing.
[0051]
In the embodiment described above, the document number assigning process is performed by the document number managing unit 135, but the data managing unit 134 for storing the document in the database may be performed. Further, regarding the embodiment of the conversion table, it is of course possible to use a B-tree structure instead of the entry array structure as described above.
[0052]
FIG. 12 shows a further embodiment of the document index. FIG. 12 is a diagram illustrating an example of a document index using a bitmap index. This is an embodiment different from the document index example in FIG. In the example of FIG. 4, the document index is composed of a document number list using character strings as keys, whereas in the example of FIG. 12, a bitmap index is created for the character strings. One bit is assigned to each stored document object. Each bit corresponds to a character string in the document object. For example, in a bitmap index whose character string is “book”, if the first and nth bits are 1, it means that “book” is included in the first and nth document objects ( 1200).
[0053]
When a bitmap index is used, a document is identified by its position in the bitmap. The first bit points to document number 1 and the nth bit points to document number n. In the present embodiment, since document numbers are assigned serially starting from 1, realization by a bitmap index is easy and access is also efficient. In addition, when a bitmap index is used, the search result temporarily takes the form of a bitmap. This is because when multiple document columns are defined in the table and a query request with search conditions specified for each column is input, AND / OR operations of search results using multiple document indexes are performed on the record identifier. This means that even if it is not converted once, it can be realized efficiently by bit operation between bitmaps. The above also means an extension to a high-speed search in a column using a bitmap index without being restricted by a document search using a document index.
[0054]
That is, even when an index for a column having no document attribute is constructed with a bitmap index, the correspondence between the record identifier (row identifier) and the bit position in the bitmap index can be achieved by using the conversion table in the present invention. This is easy, and since there is no record identifier in the index, the capacity of the index can be kept small.
[0055]
Since a document includes a large number of character strings, a large number of identical document numbers exist in the document index. By introducing a conversion table, the document number having the minimum size can be adopted, and the capacity of the document index can be reduced as compared with the case where the document index is configured using the record identifier. Further, the conversion table is used only for the purpose of acquiring the corresponding record identifier from the document number, and reverse conversion from the record identifier to the document number with respect to various queries is unnecessary as can be seen from the flowchart. In addition, since the conversion from the document number to the record identifier can be easily executed, the search, registration, or deletion operation of the data accompanying the document can be performed efficiently.
[0056]
As an aspect of the present invention, a form in which a data object and a document object correspond one-to-one has been described. It doesn't matter. In addition, when a data object is newly stored, the related document object may not be determined or stored. In that case, document object undetermined information (specifically, a null value) may be recorded in the document field of the data object. These indicate that the data object and the document object can be independently operated in data batch registration or the like in the utility. Since the registration of document objects generally involves a larger number of I / Os than the storage of data objects, it is easy to perform operations such as registering only data objects in advance and registering document objects later.
[0057]
【The invention's effect】
As described above, according to the present invention, the “document number” having a size smaller than the record identifier is adopted as the stored document identification means, and the relationship with the index character string is managed using the document number in the document index. By converting the “record identifier” of the line related to the “document number” at the time of search from the “document number” to the “record identifier” easily using the conversion table, and applying the conditional expression to the record identifier, It is possible to efficiently search for an inquiry request with a document search condition. Furthermore, by using an index together, it is possible to efficiently narrow down record identifiers.
[Brief description of the drawings]
FIG. 1 is a principle configuration diagram of the present invention.
FIG. 2 is a diagram showing one form of a storage structure of data records.
FIG. 3 is a diagram showing an example of a configuration of a record identifier.
FIG. 4 is a diagram showing one form of a document index.
FIG. 5 is a diagram showing an example of a document number record identifier conversion table structure according to the present invention.
FIG. 6 is a flowchart of a search operation according to the invention.
FIG. 7 is an explanatory diagram of a search operation according to the present invention.
FIG. 8 is a flowchart of a registration operation according to the present invention.
FIG. 9 is a diagram illustrating a registration operation according to the present invention.
FIG. 10 is a flowchart of a deletion operation according to the present invention.
FIG. 11 is an explanatory diagram of a deletion operation according to the present invention.
FIG. 12 is an example of a document index using a bitmap index.
FIG. 13 is a conceptual diagram of the present invention.
FIG. 14 is a configuration diagram of a computer system that implements the present invention;
[Explanation of symbols]
1: inquiry request 131: document index management unit 132: conversion table management unit
133: Index management unit, 134: Data management unit, 135: Document number management unit,
2: query request / result processing unit, 3: database processing unit, 4: database,
141: conversion table, 142: document index, 143: index, 144: data,
145: Document, 5: Query result

Claims

In a relational database management method for managing a document object and line data associated with the document object,
A first step of extracting a first record identifier corresponding to a number of at least one document including a keyword included in an input search request and corresponding to at least one of the lines;
A second step of extracting a second record identifier of at least one data record corresponding to an attribute value satisfying a conditional expression included in the search request;
Selecting a first record identifier that matches the second record identifier;
And a fourth step of extracting a data record corresponding to the record identifier selected in the third step and a document object associated with the data record. Relational database management method.

The first step includes
Searching the document index by the keyword to extract at least one document number;
2. The relational database management method according to claim 1, wherein the first record identifier stored in a storage position corresponding to the extracted document number is extracted from a conversion table.

A relational database management system for managing a document object and line data associated with the document object,
Means for extracting a first record identifier corresponding to a number of at least one document including a keyword included in an input search request and corresponding to at least one of the lines;
Means for extracting a second record identifier of at least one data record corresponding to an attribute value satisfying a conditional expression included in the search request;
Means for selecting the first record identifier that matches the second record identifier;
A relational database having a document search function, comprising: a data record corresponding to the selected record identifier; and means for extracting a document object associated with the data record. Management system.