JP2003058401A

JP2003058401A - Data structure for storage in computer memory

Info

Publication number: JP2003058401A
Application number: JP2002149067A
Authority: JP
Inventors: Douglas L Baskins; ダグラス・エル・バスキンズ; Alan Silverstein; アラン・シルバースタイン
Original assignee: Hewlett Packard Co
Current assignee: HP Inc
Priority date: 2001-06-04
Filing date: 2002-05-23
Publication date: 2003-02-28
Also published as: EP1265162A2; EP1265162A3; US6816856B2; US20030061189A1

Abstract

PROBLEM TO BE SOLVED: To provide technology and a tool for optimizing performance characteristics of a digital tree and similar constitution. SOLUTION: A data structure incorporates branch nodes (105, 106, 113) selected from a group consisting of linear branch nodes and bit map branch nodes 400, expanded branch nodes selected according to the number of subexpanses with contents and the whole status of the digital tree and leaf nodes (108, 110, 116 to 123) which are selected with the group consisting of linear leaf nodes and bit map leaf nodes, have a plurality indexes respectively, accord with the level of the numbers of leaves in the digital tree and indexes in the leaves, and include only undecoded indexes.

Description

【発明の詳細な説明】【０００１】【発明の属する技術分野】本発明は、一般的にデータ構
造の分野に関し、特にデータ組織の構造が記憶されたデ
ータに依存し、データにマッチすべく圧縮されたデータ
構造要素を伴う階層データ構造に関する。【０００２】【従来の技術】コンピュータのプロセッサ及び対応する
メモリは、速度が増加し続けている。ハードウェアが物
理的な速度の限界に近づくにつれ、しかしながら、デー
タのアクセス時間をはっきりと減少させることが要求さ
れる。かかる限界が主たる要因でなくとも、ソフトウェ
アの効率を最大化させることで、ハードウェアプラット
フォームの効率を最大化させ、ハードウェア／ソフトウ
ェアシステム全体としての能力を拡大する。【０００３】システムの効率を上げる１つの方法は、効
率的なデータ管理によるものであるが、データ構造の適
切な選択、関連しての格納、検索アルゴリズムにより達
成される。例えば先行技術の、様々なデータ構造、関連
しての格納、及び検索アルゴリズムは、アレイ、ハッシ
ング、２分木（バイナリーツリー）、AVLツリー（高さ
のバランスをとった２分木）、b-ツリー、及びスキップ
リストを含むデータ管理のため開発されてきている。【０００４】これらの先行技術である、データ構造、関
連しての格納、及び検索アルゴリズムのそれぞれでは、
アクセス時間の高速化とメモリのオーバヘッドの最小化
の間に固有のトレードオフが存在している。例えば、ア
レイは、単一アレイ要素のアドレス計算を通して高速イ
ンデックス化に備えているが、単一の値が格納される前
にメモリ内でアレイ全体を予め配置することを要求し、
そしてアレイの未使用インターバルがメモリ資源を浪費
する。代わりに、二分木、AVLツリー、bツリー、及びス
キップリストは、データ構造のためにメモリを予め配置
しておくことを要求し、未使用メモリの配置を最小限に
するよう試みるが、ポピュレーションが増加するに伴
い、アクセス時間が増加することが明らかになる。【０００５】アレイは、簡略化された構造を有し、格納
されたデータの高速アクセスに備えた従来技術のデータ
構造である。メモリはアレイ全体に配置されなければな
らないが、その構造は柔軟ではない。アレイの値は、ア
レイの各要素に配置されたインデックスをサイズ毎に増
やし、アレイの基礎アドレスのオフセットを加えること
で、各位置毎に数値単位で調べられる。【０００６】典型的には、単一の中央演算処理装置のキ
ャッシュラインフィル（cache linefill）が、アレイ要
素及びここに格納された値にアクセスするにあたり必要
とされている。説明され、典型的に具現化されているよ
うに、アレイは非効率であり、比較的柔軟でないメモリ
である。しかしながら、アクセスはO(１)として与えら
れる。すなわち、アレイサイズから独立して（ディスク
スワップを無視して）いる。【０００７】代わりに、前に説明したデータ構造は、二
分木、b-ツリー、スキップリスト及びハッシュテーブル
を含んでおり、メモリは効率的であるが望ましくない特
徴を含んだ形で利用可能となる。例えば、ハッシング
は、散在した、多分に多重ワードとなるインデックス
（例えばストリング等）を、アレイインデックスに変換
するのに用いられる。典型的なハッシュテーブルは、サ
イズ固定のアレイであり、そこへの各インデックスは、
オリジナルインデックスで実行されたハッシングアルゴ
リズムの結果である。【０００８】しかしながら、ハッシングを効率的にする
ために、ハッシュアルゴリズムは格納されるべきインデ
ックスにマッチさせなければならない。ハッシュテーブ
ルはまた、全てのデータノードがオリジナルインデック
スのコピー（又はこれのポインタ）を含むことを求めて
いるので、各シノニムのチェーンにおけるノードを識別
することができる。アレイのように、ハッシングを用い
るためにはメモリを予め配置しておくことを必要とする
が、もし適切に設計されている場合、すなわち格納され
るべきデータの特徴が、はっきり分かるものであり、動
作し、具現化されるハッシングアルゴリズム、衝突解消
技術、及び格納構造にマッチする場合、フラットなアレ
イに対して配置されなければならないのはメモリの一部
分である。【０００９】特に、デジタルツリー、又はトライ（tri
e）は、データへの高速アクセスを提供するが、一般的
にメモリは非効率である。ツリーブランチを狭く維持す
ることで、散在したインデックスのセットを処理すべ
く、メモリ効率を高めてもよいが、その結果ツリーが深
くなり、メモリの参照、迂回、及びキャッシュラインフ
ィルの平均数が増加し、その結果データへのアクセスが
遅くなる。この後者の要素、すなわち、キャッシュ効率
を最大化することは、かかる構造がまだ議論されている
ときにしばしば無視されるが、システム性能に影響を与
える支配的な要素となる場合がある。【００１０】トライは小さなアレイ又はブランチのツリ
ーであるが、ここで各ブランチはインデックスの１つ又
はそれ以上のビットを復号する。先行技術のデジタルツ
リーは、単純なポインタ又はアドレスのアレイであるブ
ランチノードを有する。典型的には、ポインタ又はアド
レスのサイズは、デジタルツリーのメモリ効率を改善す
べく最小化される。【００１１】デジタルツリーの「後尾」で、最終ブラン
チは最終ビットのインデックス、及びインデックスに特
定のストレージへの要素点を復号する。ツリー（木）の
「リーフ（葉っぱ）」は、特定インデックス用のこれら
メモリチャンクであり、特定用途向けの構造を有してい
る。【００１２】デジタルツリーは、インデックスがなかっ
たり、ポピュレーションゼロ（又は空となるサブエクス
パンスと呼ばれる）となるブランチに、メモリを割り当
てる必要がないという利点を有している。この場合にお
いて、空のサブエクスパンス（subexpanse）を指すポイ
ンタは、固有値が与えられ、有効なアドレス値を表して
はいないことを示すヌルポインタと呼ばれる。【００１３】さらに、デジタルツリーに格納されたイン
デックスは、近隣の識別を許容し、ソートされた順序で
アクセスすることができる。ここで用いられるデジタル
ツリーの「エクスパンス（expanse）」は、デジタルツ
リー内に格納可能な値の幅であるが、ここでデジタルツ
リーのポピュレーション（population）は、ツリー内に
実際に格納される値の集合である。【００１４】同様に、デジタルツリーのブランチのエク
スパンスは、ブランチ内で格納できるインデックスの幅
であり、そしてブランチのポピュレーションはブランチ
内に実際に格納される値（例えばカウント）の数であ
る。（ここで用いられるように、「ポピュレーション」
という語は、インデックスの集合又はそれらのインデッ
クスのカウントのいずれかについていうものであるが、
この語の意味は、この用語の用いられる文脈において当
業者に明らかなものである。） Acharya、Zhu及びShenによる、「Adaptive Algorithms
for Cache-efficientTrie Search」では、トライ検索の
ためのキャッシュ効率アルゴリズムを説明している。各
アルゴリズムは、異なるデータ構造を用いているが、こ
の構造は、トライにおいて異なるノードを表すための、
仕切られたアレイ、B-ツリー、ハッシュテーブル、及び
ベクトルを含んだものである。選択されたデータ構造
は、ノードのファンアウトと同様に、キャッシュ特性に
依存している。【００１５】アルゴリズムはさらに、ノードを表すのに
用いられるデータ構造を動的に切り替えることで、ノー
ドのファンアウトの変化に適合している。最後に、各デ
ータ構造のサイズ及びレイアウトは、キャッシュ特性と
同様にアルファベット記号のサイズに基づいて決定され
る。この刊行物ではさらに、現実の及びシミュレートさ
れたメモリ階層の性能評価を含んでいる。【００１６】他の刊行物で当業者に知られ、用いられ、
データ構造について説明しているものとしては、次のも
のがある。Fundamentals of Data Structures in Pasca
l、第４版、HorowitzとSahni；pp582-594；The Art of
Computer Programming、第３巻；Knuth；pp490-492；Al
gorithm in C、Sedgewick、pp245-256、265-271；「Fas
t Algorithm for Sorting and Searching String」；Be
ntley、Sedgewick；「Ternary Search Trees」；587192
6、INSPEC概要番号：C9805-6120-003；Dr.Dobb's Journ
al；「Algorithm for Trie Compaction」、ACM Transan
ctions onDatabase Systems、9(2):243-63、1984；「Ro
uting on longest-matching prefixes」；5217324、INS
PEC概要番号：B9605-6150M-005、C9605-5640-006；「So
meresults on tries with adaptive branching」；6845
525、INSPEC概略番号：C2001-03-6120-024；「Fixed-bu
cket binary storage trees」；01998027、INSPEC概要
番号：C83009879；「DISCS and other related data st
ructures」；03730613、INSPEC概要番号：C90064501；
そして、「Dynamic sources in informationtheory:a g
eneral analysis of trie structure」；6841374、INSP
EC概要番号、B2001-03-6110-014、C2001-03-6120-023。【００１７】拡張ストレージ構造は、米国特許出願番号
09/457164、1999年12月8日出願、タイトル「A Fast Eff
icient Adaptive, Hybrid Tree」（以下、164特許出願
とする）で、直前の出願と同様に出願されている。ここ
で説明されているデータ構造及びストレージ方法は、自
己同調を行い、「エクスパンス」ベースのストレージノ
ードを配置して格納要件の最小化を行う、自己適合構造
を備えており、効率的でスケーラブルなデータ格納、調
査及び検索能力を提供する。ここで説明される構造は、
しかしながら所定データ分布状況を充分に利用するもの
ではない。【００１８】前記特許出願で説明されている格納構造の
拡張は、次の出願において詳述されている。それは、米
国特許出願番号09/725373、出願日2000年11月29日、タ
イトル「A Data Structure And Storage And Retrieval
Method Supporting Ordinality Based Searching and
Data Retrieval」で、直前の出願と同様に出願されてい
る。この後者の特許出願では、データ構造、関連するデ
ータの格納、及び検索方法を記述しているが、この検索
方法は、格納された又は順序化された要素の階層構造に
より参照された要素の合計、その構造における序数値に
基づいた要素へのアクセス、及び要素の順序性の識別を
迅速に提供するものである。【００１９】順序化したツリーで具現化した構造におい
て、各サブツリーに存在するインデックスの合計が格納
される。すなわち、各サブツリーの根本が、高レベルノ
ードで、又はこれに関連して格納される。この高レベル
ノードは、そのサブツリーを指すか、又はサブツリーの
頭のノードにおいて又はこれに関連している。データ構
造の特定要件（例えば、新ノードの作成、ポインタの再
割当て、バランス化等）に加え、データの挿入及び削除
は、合計に影響を与える更新ステップを含んでいる。【００２０】【発明が解決しようとする課題】しかしながら、本構造
は、所定の散在したデータ状況を充分に利用できない。
従って、デジタルツリー及び同様の構成の性能特性を最
適化する技術及びツールに対する必要が存在する。【００２１】【課題を解決するための手段】本発明に係るデータ構造
は、メモリ中に記憶されるデジタルツリー（又は「トラ
イ」）データ構造に基づき、ダイナミックアレイとして
扱うことができ、そしてルートポインタを通して扱うこ
とができる自己修正データ構造を含んでいる。空のツリ
ーについて、このルートポインタはヌルであり、さもな
ければそれは、デジタルツリーのブランチノードの第１
階層を指す。【００２２】ローファンアウトブランチは、回避される
か又はメモリ浪費の少ない代替構造で置き換えられる
が、これはメモリの浪費が少ない。一方、従来のデジタ
ルツリー構造における性能上の利点のほとんど又は全て
を保持し、インデックス挿入、検索、アクセス及び削除
の動作を含む。この改良点は、そうしなければ散在して
中身を持ったり幅広い又は浅いデジタルツリーにおいて
広く行きわたったヌルポインタで浪費されることにな
る、メモリの削減及び除去を行う。【００２３】特に構造のサイズを減らすのにおいて固有
な処理上の利点との比較において、ブランチの修正を達
成し収容するのに必要とされる追加処理時間は最小限で
あり、その結果メモリからのデータフェッチはより効率
的となり、各CPUのキャッシュラインフィルにおける、
より多くのデータをとらえつつヌルポインタを少なくす
る。【００２４】本発明は、例えばリッチポインタ構造を用
いて具現化される、直線及びビットマップブランチ及び
リーフを含んでいる。臨機応変なノードの再配置をする
ことで、サブエクスパンスポピュレーションを変更する
ために、自動的に再整理する。【００２５】【発明の実施の形態】本発明は、データ処理システム上
で実行されるアプリケーションプログラムによるアクセ
スのための、コンピュータメモリ中のデータ記憶システ
ム及び方法を含む。システムは、メモリ中に記憶され、
「広い／浅い」デジタルツリーを指すルートポインタを
含むデータ構造及び関連情報、を含んでいる。デジタル
ツリーは、階層的に配置され、ハイブリッド抽象データ
タイプ（ADT）を用いて適合可能に圧縮された、ブラン
チ（ブランチノード）及びマルチインデックスリーフ
（リーフノード）の形のノードを複数有する。【００２６】この適用例において、ADTは、同じ仮想上
の意味を持つが、異なる文字に拡張した多重データ構造
を参照する。さらに、ここで用いられる「インデック
ス」の語は、数字、ストリング、トークン、シンボル又
は他のこのような指定又は表現を構成する、フィールド
のキー又はセットを含む。【００２７】デジタルツリーを具現化することで、デー
タ（インデックスのセット又はキー）が純粋に「ポピュ
レーションにより」ではなく「エクスパンスにより」優
先して組織化される。これは、ツリーの通過を単純化
し、アルゴリズムを修正する上で様々な利点を備える。
特に、ワイドデジタルツリーは潜在的に各ブランチに高
いファンアウトを有しており、ポピュレーションが大き
いものに対してさえ、ツリーが浅くなり、従って通過が
高速になり、従って「充分スケール化できる」ようにし
ている。【００２８】圧縮ブランチを用いることでワイドブラン
チの性能上の利点が守られるが、一方で実際のファンア
ウト、従ってメモリの利用だが、これを縮小し、記憶さ
れるデータ（インデックス又はキー）にマッチするよう
にしている。この技術を用いることで、中身のあるサブ
エクスパンスのみ、すなわち記憶されたインデックスを
含むものが、圧縮ブランチとして表されなければならな
い。空のサブエクスパンスが、典型的には（必要はない
が）欠ける。【００２９】さらに、複数のインデックス（又はキー）
及びその関連する値を記憶すること、たとえあったとし
ても、それは「複数インデックスリーフ」においてであ
るが、このことは、１つ又は複数のレベルごとにツリー
を浅くし、従ってメモリ利用を少なくし、アクセスを高
速化する。圧縮された複数インデックスリーフは、より
多くのインデックスを保持するが、同じセットのインデ
ックスを保持するために、ツリー中により多くのブラン
チを挿入しなければならないというのではない。【００３０】かかる「キャッシュが効率的な」圧縮され
たブランチ及びリーフは、「キャッシュフィル」を最小
化し、その結果ランダムアクセスメモリ（RAM）へのア
クセスが比較的遅くなるということになるように、CPU
キャッシュラインの点から、最も望ましくデザインされ
る。【００３１】従って本発明は、デジタルツリー等データ
構造の性能を最適化するための、いくつかのタイプのブ
ランチ及びリーフ圧縮を含む。このように改良すること
には、線形及びビットマップのブランチ（すなわち内側
のノード）、線形及びビットマップリーフ、及びこれら
ノードの使用を達成するための規則及び方法を含む。こ
こに含むものは例えば、汎用的で、メモリ効率のよい動
作を行う、都合のよい、圧縮ブランチの伸長及びリーフ
インデックス圧縮の使用である。【００３２】本発明による線形ブランチノードは、中身
があるサブエクスパンス及び対応する次レベルポインタ
のリストを備えることによって、ファンアウトが少ない
ブランチをアドレスする。より一般的には、線形ブラン
チは、キー又は、キーを構成するフィールドの組のうち
１つかそれ以上に対応するサブエクスパンスを選択する
基準を含む、サブエクスパンス記述語のリストを含む。
本発明の好ましい実施形態によれば、サブエクスパンス
記述語は、32ビットインデックスの１バイトセグメント
である。好ましくは、線形ブランチは、ターゲットプラ
ットフォームの単一CPUキャッシュラインに拘束され
る。サブエクスパンスの中身が多くなるにつれ、ビット
マップブランチノードは、どのサブエクスパンスに中身
がある（すなわち空ではない）かを示すバイナリベクト
ルを含めて使用されることができる。このバイナリベク
トルに中身があるサブエクスパンス（又は同等のマルチ
レベルデータ構造）へのポインタのリストが続く。【００３３】本発明に係る線形リーフノードは、同様
に、マルチインデックスリーフを用いることにより、低
いポピュレーションのインデックスへと向けられる。マ
ルチインデックスリーフは、大きいポピュレーションに
対して有効なインデックスのリストを含む。ツリーの低
いレベルにおいて、中から高のポピュレーション濃度に
対し、ビットマップリーフノードは、有効インデックス
のバイナリインデックスを備えるが、これは可能であれ
ば各有効インデックスに対応する値領域を含む。【００３４】本発明では、高域的な、メモリ効率指向
の、圧縮ブランチの機に応じた伸長を結合する。データ
構造のこの面によると、データ構造に記憶されるデータ
セット全体が、インデックス当たりに用いられるしきい
値（おそらくはインデックス当たりのバイトで測定され
た）よりも少ないメモリしか占めないとき、又は線形又
はビットマップブランチ下のサブエクスパンスのポピュ
レーションが充分大きいとき、例え高域的な距離が充分
でなくとも、線形やビットマップのブランチはブランチ
の伸長フォーム（すなわち伸長ブランチノード）で置き
換えられる。その結果、追加メモリのコストがかかる
が、計算時間が短縮され、そのレベルを通過するキャッ
シュフィルが減少される。大きなポピュレーションのイ
ンデックス、特によくクラスタされたインデックスを有
するデータにこの選択肢を用いると、本発明は、インデ
ックス及び関連するデータへの高速アクセスを維持する
のに必要とされた過剰なメモリを「清算」する。【００３５】ブランチとノードの間、すなわち線形ブラ
ンチと線形リーフの間、及びまたビットマップブランチ
とビットマップリーフの間の対称性の程度に注意すべき
である。この対称性は、本実施例において、各インデッ
クスが関連する値にマップされるときに明らかとなる。
ツリーの内部ノードは、補助ノードへのポインタに対す
るインデックス部分（桁数）をマップする一方、ツリー
のターミナルノードは、実際のところ、ツリーの外部に
ある発信元が定義したアドレス、すなわちポインタをし
ばしば含む値域に対して充分復号されたインデックスを
マップする。【００３６】しかしながらこの対称性では、伸長ブラン
チと同等なリーフがないということになってしまう。よ
り高いレベルのリーフが所定のポピュレーションを超え
るとき、それは新たなブランチの下のサブツリー、又は
適切には、より低いレベルでさらに圧縮されたリーフへ
と変換される。最小レベルの線形リーフが所定のポピュ
レーションを超えるとき、それはビットマップリーフへ
と変換される。【００３７】本発明の別の側面によると、ターゲットイ
ンデックスの一部がデジタルツリーの各レベルで復号さ
れるという事実を利用して、リーフインデックスを圧縮
している。ツリーを逆にする一方で、インデックスは部
分的に復号されるので、各インデックスのうち復号され
ずに残った部分のみがリーフ中に記憶される必要があ
り、ビット又はバイト数は、各低いレベルで小さくなっ
ていくこの未復号部分を構成する。【００３８】その結果として、より低いレベルのリーフ
（すなわちルートからより離れたリーフ）はより高いレ
ベルのリーフと同じ空間により多くのインデックスを記
憶するが、後者は、より多くのビットが、各インデック
スのより大きな未復号部を表すことを要求している。従
って、どんなに最悪の場合のインデックスの挿入及び削
除でも、ローカライズされ、ツリーより１レベル以上を
上又は下へカスケード（cascade）せず、それぞれ最悪
の場合の挿入及び削除回数を最小化している。注意すべ
きことは、このタイプの圧縮は、固定サイズの圧縮に最
もよく当てはまるが、文字ストリングやビットストリン
グ等の可変サイズインデックスには有用ではない。【００３９】複数のキー（インデックス）に共通なビッ
トが、スキップされる（表されない）よう、デジタルツ
リーを圧縮できることに注意すべきである。かかるツリ
ーは、それが固定又は可変サイズのいずれであろうと、
リーフを明確にするため、そのリーフノードにおいてキ
ー全体のコピーを記憶しなければならない（明確化が要
求されていない希少な場合を除く）。【００４０】本発明で具現化されるように、このことは
リーフ圧縮から区別されるが、ここで、復号された部分
のインデックスは、サブエクスパンス内の全てのインデ
ックスに共通のものとして、ツリー通過又はスキップ
（圧縮）のどちらにおいて要求されようと、常に記憶さ
れ、ブランチノードから回復可能であり、そしてリーフ
ノードに記憶する必要がない。【００４１】本発明は、キャッシュが効率的となる、ブ
ランチ及びリーフについての様々なADTの適切な組み合
わせを提供し、その組み合わせは、ある例において記憶
される不測のデータセットに基づいており、その結果、
幅広いデジタルツリーが、メモリが効率的であると共
に、広くダイナミックなレンジでアクセス又は修正を行
うことになる。【００４２】幅広くダイナミックなレンジということ
は、データセットの小さなものから大きなものまで、す
なわちインデックスの少ないものから多いもの（数十億
単位の）まで、そしてデータセットのタイプは、連続す
るか、群がっているか、定期的か、又はランダムなイン
デックス又はキーにわたるということである。【００４３】よくデザインされ、幅広くダイナミックな
レンジを持つハイブリッドのデジタルツリーは、初期
化、同調、又は配置を必要と（又は可能とさえ）しない
単純でダイナミックなアレイとしてソフトウェアインタ
ーフェースにおいて表すことができる。【００４４】本発明は、ポインタを含むデータ構造を通
過する幅広い構成体、及びノードを繋いだりデータ構造
の通過に備えたりするための他の方法を幅広く用いるこ
とで具現化してもよい。【００４５】説明のため、本発明の望ましい実施形態
は、拡張ポインタを含むデジタルツリーの構成体の範囲
内で具現化してもよい。このことは、継続中の米国特許
出願でタイトル「リッチポインタを有するキャッシュ効
率のよいデジタルツリーに関するシステム及び方法」に
て開示されている。【００４６】かかるポインタは、ヌルポインタとして、
又はブランチ又はリーフノードを指すために用いられる
ときは図２Aに示すような、又は即時インデックスを含
むときは図２Bに示すような、第１形態を取ってもよ
い。リッチポインタを用いることで、指されるオブジェ
クトの型、例えば線形又はビットマップ、ブランチ又は
リーフ、等の指定を与える。【００４７】本発明の他の実施形態として、従来のポイ
ンタなどの他の構成体を用いてもよい。例えば、ターゲ
ットオブジェクトを識別し、又は指されたオブジェクト
が自己識別する（すなわち、タイプ情報が親ではなく子
ノードに記憶される）ため、ポインタ自身（ポインタは
８バイトで一列に並べられたオブジェクトをさしてもよ
く、さもなければ最も重要でない３ビットは用いられな
いことを認識されたい）のもっとも重要でないビットを
用いるなどである。【００４８】図２Aに示されるように、基本的なポイン
タ構造、例えば３２ビットプラットフォームにおいて
は、２つの３２ビットワードを含み、ここで１ワード全
体は、ポインタにより使用されることで、ツリー通過フ
ローを、他のノード、０及び２バイトの間の復号された
インデックス、１及び３バイトの間のポピュレーション
フィールド、及び１バイトのタイプフィールドへと向け
直される。ヌルポインタについては、タイプフィールド
を除いて全てのバイトがゼロである。かわりに、最初の
ワードは、補助ブランチ又はリーフノードへのポインタ
である。復号及びポピュレーションフィールドは共に、
第２ワードの１バイトを除きすべて充足する。【００４９】即時インデックスを含むポインタ構成体
は、図２Bに示されているが、インデックスにアクセス
すべき他のノードに向け直したり指したりする必要を排
除している。参照された特許出願で説明された通り、こ
れらポインタ構成体の他のバリエーションは、値をそれ
ぞれのインデックスに対応付けるのに用いられる一方、
様々な機械のワードサイズを収容すべく適合がなされ
る。【００５０】本発明はこれらのポインタを用いて、ブラ
ンチ、すなわちインテリアノード及びリーフ、すなわち
ターミナルノードを含んだADTを形成する。このデータ
構造によれば、デジタルツリーは、ブランチノード（線
形の、ビットマップの、又は伸長された）及びリーフノ
ード（線形の又はビットマップのもの）の組み合わせを
含んでいる。各ブランチは、定数の（伸長された）又は
仮想の（線形の又はビットマップの）ポインタのアレイ
であり、好ましくは256のかかるリッチポインタであ
る。すなわち、各ノードは256までのサブエクスパンス
によるファンアウトを有する。【００５１】好ましい実施形態においては、インデック
スは１度に８ビット、すなわち１バイトを復号される。
言い換えると、各桁は１バイトであり、そして、各ブラ
ンチノードにおける、実もしくは仮想のファンアウトは
256である。デジタルツリーはブランチノードにおいて
いかなるファンアウトも持つことができ、例えばツリー
が２６文字のアルファベットを復号するときの２６な
ど、２の累乗でないファンアウトであってもよいことは
当業者によって明らかである。【００５２】バイナリーツリーは、通常ポピュレーショ
ンにより分けられたツリー（バイナリストレージツリー
と呼ばれる）であり、ここでキーは、各ノードに記憶さ
れるキー値全体と比較される。しかしながらまた、バイ
ナリーツリーはエクスパンスで分けられた（バイナリデ
ジタルの）、各桁数が１ビットである２つのファンアウ
トを有するツリーとすることができる。さらに、ハイブ
リッドツリーは、異なるブランチ又はツリーにおいて、
様々なファンアウトを有してもよい。しかしながら、コ
ンピュータは本来、ワードサイズのオブジェクトに加
え、バイトサイズのオブジェクトを効率的に処理するの
で、256の一貫したファンアウト、すなわち１バイトの
桁数サイズが、もっとも効率的であることが本発明の発
明者により分かっている。【００５３】圧縮されたブランチは、線形及びビットマ
ップを含み、伸長タイプブランチを補足する。この後者
のブランチタイプは、例えば256のサブエクスパンスポ
インタによるアレイを用いた従来のデジタルツリー機能
をサポートしている。実際のファンアウト（すなわち、
中身があるサブエクスパンスの数）が比較的限られてい
るとき、新しいブランチがインデックス挿入の間に作成
されるとき、概して正しいように、「圧縮された」ブラ
ンチが代わりに用いられる。【００５４】この圧縮されたブランチは、256のサブエ
クスパンスポインタの仮想アレイとして見てもよいが、
（以下に説明の理由により、１つではなく２つのキャッ
シュフィルを、関連ノードを通過するのに必要としてい
るにも関わらず、）必要とするメモリ量はかなり小さ
い。【００５５】図１A〜Eを参照すると、ルートポインタノ
ード101は、デジタルツリーの基礎となるデータ構造に
アクセスするのに用いられている。ルートポインタノー
ド101は、図表で、第１又は「トップ」レベルノード10
2、この説明ではブランチノード、を指す矢印で示され
たアドレス情報を含む。（補足すると、ここで用いられ
ている専門用語は、３２ビットでの具現例を仮定してお
り、ここでインデックスは、文字ストリングと逆に単一
ワードである。そしてその結果「レベル４」としてルー
トにより指されるツリーのトップノードをラベルし、レ
ベル４ノードの子は、レベル３ノードとして指定される
等々となる。６４ビットマシンにおいては、ルートポイ
ンタはレベル８ノードを指し、その子はレベル７等々と
なる。従って、ブランチ又はリーフノードのレベルは、
記憶された又はそのノードより下のインデックス中に、
復号すべく残っている桁数（バイト）の数に等しい。こ
の番号化の方法は、３２ビット及び６４ビットツリーの
最小レベルを同じにするという利点をさらに有してお
り、それによって様々なサイズのツリーと共に用いるの
に必要とされるソースコードを単純化する。さらに補足
すると、この取り決めは、典型的ではあるが、この説明
の目的のためのものであり、例えばリーフノードをツリ
ーの最高（例えば４）レベルを構成するものとして指定
することを含めたものなど、他の取り決めを採用しても
よい。）トップレベルノード102は伸長ブランチノードである。
これは、256までの、より低いレベルのノードを参照す
るための256のリッチポインタのアレイを含み、データ
構造、すなわち８進表記で00000000からFFFFFFFFまでの
インデックスのエクスパンス全体を表す。トップレベル
ノード102は、エクスパンス00000000から00FFFFFFに対
応し、レベル３の線形ブランチ105を指す、第１リッチ
ポインタ103（適合オブジェクトとも呼ばれる）を含
む。他のリッチポインタ104が、インデックスFF000000
からFFFFFFFFを含む最終エクスパンス部分に対応して示
されている。リッチポインタ104は、最も重要なレベル
３の上の1/256番目で、伸長されたブランチ106を指す。【００５６】レベル３の第１サブエクスパンスは、線形
ブランチ105の形式の補助ノードを含む。示したよう
に、線形ブランチ105はファンアウト（NumRP=ブランチ
により参照された子ノードの数のカウント）、ブランチ
により参照されたサブエクスパンスに対応したインデッ
クス部分（桁数）のソートされたリスト、及び指定され
たサブエクスパンスへのポインタのリストを含む。サブ
エクスパンスE1からE3までのスロットから生じる同様の
ポインタも図示しないが存在する一方、ここの説明で
は、E4としてリスト化され、00FD0000から00FDFFFFまで
を含んだサブエクスパンスを表す、最終サブエクスパン
スへのポインタのみが示されている。従って、線形ブラ
ンチ105の第４リッチポインタが示されるが、レベル２
のビットマップブランチ113を参照し、その結果これは
線形ブランチリーフ118-122及びビットマップリーフ11
6、117及び123を参照する。【００５７】順位の高いノード102の終端において、レ
ベル３の伸長ブランチはリッチポインタ104により参照
される。かかる参照の２つのみは説明のために示されて
いるが、典型的には、伸長ブランチ106は多くの数の補
助ノードを参照する。補足すると、ばらばらに中身を持
つブランチは、さもなくばメモリを保存すべく線形又は
ビットマップブランチフォーマットに変換されるが、１
つ又は２つのキャッシュラインフィルを用いたノードへ
のアクセスを依然として備える。【００５８】図１A〜図１Eに示すように、レベル３伸長
ブランチ106は、レベル１線形リーフノード108へのリッ
チポインタ107を含んだ、256のリッチポインタのアレイ
を含む。補足すると、本発明の一具現例によるリッチポ
インタの使用により、ポインタがツリーのレベルを「ス
キップ」できるようになるが、これは、中間ブランチが
単一の参照を含むときに未使用の間接参照（indirectio
n）を避けるためである。他のリッチポインタ109は、２
つの２バイトインデックスを含む、レベル２線形リーフ
ノード110を指す。【００５９】リッチポインタは、本発明に係るブランチ
及びリーフ圧縮と両立し、さらに結合するデータ構造を
具現化するのに用いてもよい。必要ではないのだが、リ
ッチポインタの使用は、本発明の一実施形態と両立し、
これをサポートする。かかるリッチポインタ構造は、少
なくとも２つのタイプのリッチポインタ又は適合可能な
オブジェクトを網羅するが、これは図２Aで表される上
述のポインタタイプ、及び図２Bで表される即時タイプ
を含む。即時タイプは、即時インデックスをサポートす
る。【００６０】すなわち、エクスパンスのポピュレーショ
ンが比較的散在しているとき、リッチポインタは、デジ
タルツリーブランチ内で、「即時」にインデックスを記
憶するために用いることができ、インデックスにアクセ
スするためのデジタルツリーの最低レベルへの通過を必
要としない。このフォーマットは、即時機械語命令と同
種のものであり、ここで命令は、全ての変位バイトにす
ぐ続く即時オペランドを特定する。【００６１】従って、即時インデックス又は少ないイン
デックスは、ノード中に記憶され、１つ又はそれ以上の
向け直しを回避し、さもなければツリーの通過にあたり
要求され、そして離れたリーフノードに到達する。それ
により即時インデックスは、より多くのメモリを配置し
て多重なメモリの参照及びデータアクセスのための可能
なキャッシュフィルを要求する代わりに、小さいポピュ
レーション（又は少ない数のインデックス）をパックす
る方法を、リッチポインタ構造に直接与える。【００６２】好ましい実施形態による２ワードフォーマ
ットは、即時インデックスを含めることを直ちにサポー
トする。このことは、リッチポインタ内で、タイプフィ
ールドを除外したリッチポインタ全体のインデックス桁
数を記憶することにより達成される。３２ビットシステ
ムで具現化されたリッチポインタは、３バイトの単一即
時インデックスから、７つの１バイトインデックスまで
のどこで記憶してもよいが、６４ビットシステムのリッ
チポインタは、１バイト即時インデックスを１５まで記
憶してもよい。【００６３】即時インデックスをサポートするリッチポ
インタの一般化された構造は（適応可能なオブジェクト
とも呼ばれる）、図２Bで示される。リッチポインタ
は、１つ又はそれ以上の、プラットフォームのワードサ
イズ及びインデックスのサイズに依存した、インデック
ス“I”と、インデックスのサイズ及び即時インデック
スの数も符号化する８ビットタイプフィールドを含む。【００６４】図３は、３２ビットプラットフォーム上で
具現化したときの本発明にかかる線形ブランチ構成体の
詳細を説明している。線形ブランチは、ファンアウト、
すなわちブランチにより参照されたサブエクスパンスの
数を示す１バイトにより構成され、サブエクスパンスの
数（例えば０から２５５）を示す中身があるサブエクス
パンス毎に１バイト（すなわち桁数）で構成されるソー
トされたアレイに続く。【００６５】中身のあるサブエクスパンスの数は、サブ
エクスパンスポインタの対応するアレイに続く。本発明
は、２つのアレイの端で埋め込んだものを結合するが、
アレイは、高速な挿入及び削除のため、それらが「同じ
場所で成長」できるようにしている。サブエクスパンス
のアレイ両方（すなわち桁数及びポインタ）は、純粋に
ポピュレーションにより、組織化又はパックされ、エク
スパンスにより一様にアドレス指定されることはない
が、エクスパンスにより組織化又はアクセスされること
はある。【００６６】典型的に、図３に示される線形ブランチノ
ードが用いられるのは、実際のファンアウト、すなわち
中身があるサブエクスパンスが比較的小さく、例えばブ
ランチあたり256の可能なサブエクスパンスのうち７つ
のリッチポインタまでのときである。本発明の一実施形
態による線形ブランチノードは、上述の３連続域を含む
が、中身のあるサブエクスパンスのカウント、中身のあ
るサブエクスパンス（それぞれ１バイト）のソートされ
たリスト、及び対応するリッチポインタのリストで、そ
れぞれ２ワードの長さであるものを含む。（当業者によ
って認識されるように、数、タイプ、サイズ、及び域の
順序について、他の構成を本発明の代わりの実施形態で
用いてもよい。）この所定の方法を用いるとき、７つの
リッチポインタを含む最大の線形ブランチは、サブエク
スパンスの数について１バイト、サブエクスパンスのリ
ストについて７バイト、従って組み合わせで２ワード
（３２ビットシステムにおいて）を必要とする。カウン
ト及びサブエクスパンスリストの組み合わせは、リッチ
ポインタ自身についての１４ワードに続き、構成体全体
は、１６ワード又は１つのキャッシュライン全体に適合
する。図３に戻ると、４つの中身のあるサブエクスパン
ス全体は、ポインタにより、E(エクスパンス)１からE
(エクスパンス)４までをそれぞれ参照される。【００６７】図４は、ビットマップブランチであり、３
２ビットワードサイズのプラットフォームで再度具現化
されたものである。ビットマップブランチノードは、中
身がある及び空であるサブエクスパンスを示す256ビッ
ト（32バイト）を含んだ第１部401を有し、続く第２部4
02は、中身があるサブエクスパンスへのリッチポインタ
による独立サブアレイを指した通常ポインタを含む。こ
の構成体は、線形ブランチにおいて必要とされる有効イ
ンデックス毎のバイトを、インデックス毎のビットへと
圧縮し、無効インデックスについてビットマップが０ビ
ットを含む場合を除いて、7/8までを節約するものとし
て考えてもよい。【００６８】概念的には、サブエクスパンスのポインタ
は、ビットマップに続く単一のアレイ（402部）で保持
される。しかしながら本発明の望ましい実施形態によれ
ば、メモリ管理を単純にし、挿入及び削除を迅速にする
ために、ビットマップは、８つの通常ポインタに続くよ
うにしてよく、それぞれは０と３２の間のサブエクスパ
ンスポインタによる独立サブアレイ408、409に対応す
る。それが桁数（0〜255）によりアドレスできるので、
ビットマップはこれによりエクスパンスで組織化される
が、一方後者が、ビットマップでセットされるビットに
のみ対応する、サブアレイへとパックされるため、サブ
エクスパンスポインタは「ポピュレーションにより」リ
スト化される。【００６９】本発明の別の実施形態によれば、一旦リッ
チポインタのビットマップブランチサブアレイがメモリ
の最大使用、すなわちポインタの数に達し、サブアレイ
に配置されたメモリ量が、３２のサブエクスパンスポイ
ンタを保持できる程度になると、サブアレイは、アクセ
ス、挿入、及び削除中時間を節約するよう伸長される。
リッチポインタのサブアレイを伸長するということは、
たとえ中身がないインデックスのサブエクスパンスに対
しても、全てのビットをビットマップの対応サブエクス
パンス内にセットすること、リッチポインタサブアレイ
をアンパック（unpack：分解）して単純で、位置毎にア
クセスされるアレイにすること、そしてヌルリッチポイ
ンタを持つ中身がないサブエクスパンスを表すこと、を
意味する。【００７０】従って図４に示すように、ビットマップブ
ランチは２列のオブジェクトであるが、線形又は伸長ブ
ランチよりも幾分複雑である。第１レベル（401部）は
ビットマップそのものであるが、本発明の32ビットワー
ドサイズでの具現化に従い、８つのサブエクスパンスに
細分化された256ビット（32バイト）を含み、第２レベ
ルADT又はサブアレイ（例えばアレイ408及び409）への
８つのポインタ（402部）が後に続く。【００７１】各ADT400は、リッチポインタのパックされ
た線形リストにより構成され、各リッチポインタは、関
連するビットマップにおける各ビットセットに対応して
いる。32ビットシステムにおいて、８ワードがビットマ
ップ（32/4）に対して求められ、８ワードがポインタに
対して、合わせて16ワードが求められる。【００７２】このここでの合計16ワードはシステムパフ
ォーマンスにおいて重要である、というのも、本発明の
実施形態によれば、それは１つのCPUキャッシュライン
に等しいからである。補足すると、６４ビットシステム
において、４ワードだけがビットマップに対して必要と
されるのに対し、８ワードは依然としてポインタに対し
て必要とされ、その結果16ワードのキャッシュラインを
仮定すると４ワードが無駄となる。【００７３】例えば、ビットマップ404は16進の値0000b
074を有しており、これは後述の２進ベクトル及びイン
デックス値を備えている。【００７４】【表１】この例によると、テーブル１の後ろの列で表されている
２進ベクトルは、40_he _xから5F_hexのレンジ内のサブエク
スパンス42、44、45、46、4C、4D、及び4Fを含んだサブ
エクスパンスの中に、インデックスが存在することを示
している。この範囲（図４）の関連通常ポインタ406
は、関連する２進ベクトルで示されたサブエクスパンス
に対応する、サブエクスパンスのそれぞれを指す個々の
リッチポインタを含んだアレイ408を指す。【００７５】比較のため、図５で伸長ブランチが表現さ
れている。この構成体は、リッチポインタの単純アレ
イ、この場合256のこうしたリッチポインタ、を含んで
おり、空エクスパンスを表すのに用いられるヌルのリッ
チポインタを伴っている。ここでリッチポインタ当たり
２ワードを仮定すると、かかる伸長ブランチは512ワー
ドが必要となる。【００７６】本発明ではさらに汎用メモリ効率をさらに
備えている。すなわち、線形ブランチが占めるキャッシ
ュラインが多すぎる（本発明の望ましい実施形態によれ
ばその限界は単一の16ワードキャッシュライン）ところ
までファンアウト（すなわち中身があるサブエクスパン
スの数）が増えるとき、ブランチはビットマップブラン
チへと変換される。かかるビットマップ構成体は、「充
足したファンアウト」を処理でき、伸長ブランチへの変
換が必要とされない点に注意すべきである。線形上の又
はビットマップのブランチのいずれもヌルのサブエクス
パンス上のメモリを浪費しない。【００７７】しかしながら、線形又はビットマップのブ
ランチの下のポピュレーションが、伸長ブランチに必要
なメモリを「償却」できる程度に大きいとき、又はデー
タ構造の（好ましくはインデックス毎のバイトで測定さ
れる）全体的な又は汎用的ななメモリ効率が、依然とし
て／選択された「同調可能な」値を超えないとき、ブラ
ンチは適宜伸長型へと変換される。【００７８】このことがヌルサブエクスパンスポインタ
上のメモリを浪費する一方で、ブランチを通過するにあ
たり迂回が１回（及びキャッシュフィル）確保される。
補足すると、後者のパラメータ、すなわち汎用的なメモ
リ効率をサポートするため、少なくともより大きいポピ
ュレーションのツリーにおいて、ルートポインタは、ツ
リーにより用いられる合計バイト数、及びツリー内で記
憶されるインデックスの合計カウントを記憶する、中間
データ構造を指す。この中間データ構造は、ツリーのト
ップブランチノードの近隣、又はその結果ツリーのトッ
プブランチとなる点に、存在させてもよい。【００７９】リーフ圧縮も、本発明に従い、上述の線形
及びビットマップリーフタイプを含む、多重インデック
スリーフの形式で利用される。典型的には、デジタルツ
リーの１ブランチにおける各ルックアップは、次に低い
レベルのサブエクスパンスポインタで、おそらく記憶可
能なインデックスのエクスパンス又は幅を減少させる。【００８０】それゆえ、各固有な、まだ復号されてない
残留ビットは、記憶する必要がある。すでに説明したよ
うに、エクスパンス内のポピュレーション（すなわち有
効インデックスの数）が小さいとき、単一オブジェクト
内に記憶することは有用となる。ここで単一オブジェク
トは、順次に、そうでなくくとも即時に検索可能だが、
より多くのツリーブランチを経て、それぞれ単一インデ
ックスと関連する特定用途リーフへと階層的に進むので
はない。一実施形態によるもっとも単純な場合では、イ
ンデックスのみのリーフは、有効インデックスのリスト
となる。【００８１】本発明者が経験的に決めたことであるが、
リーフのもっとも望ましいサイズは、比較的小さいもの
であり、例えばキャッシュライン２つ分、すなわち典型
的な３２ビットワードサイズプラットフォームにおいて
３２ワード又は128バイト以下である。２つの一杯とな
るキャッシュラインにおける、ソートされたインデック
スのリストの逐次検索でも、インデックスが第１キャッ
シュライン（１フィル）で見つかる時間の半分及び第２
ライン（２フィル）での半分の時間の後、平均して1.5
キャッシュフィルを要する（データはキャッシュに入っ
てないと仮定する）ことが分かった。すなわち、ポピュ
レーションが充分小さいとき、それをインデックスのリ
スト、ビットマップ、又は他のADTとして、デジタルツ
リーのより多くのレベルではなく、１から２のキャッシ
ュラインで記憶することが望ましいことが分かった。【００８２】図６（A）-（D）、及び図７（A）-（C）
は、本発明による線形リーフの例を示している。線形リ
ーフは順序化されたインデックスのリストであり、それ
ぞれNの未復号バイトにより構成されている。ここでN
は、最小レベル、すなわちルートから最も遠いレベルが
レベル１となる取り決めを用いたツリーのレベルであ
る。（補足すると、これはツリーが従来記述されたやり
かたと逆である。従来のやり方では、レベルの番号付け
は、一番上のノードをレベル１からはじめ、各子がその
親のレベルよりも高い数字のレベルとなる。）好ましい実施形態によれば、リーフのポピュレーション
（インデックスの数はリーフのサイズと同じ）は、リー
フへのポインタと共に保存されるが、リーフ自身には保
存されない（単一ルートレベルの線形リーフにより完全
に構成されるかなり小さいアレイがあるが、これについ
て用いられる実行例は例外となる。）。【００８３】図６（A）-（D）に示すように、線形リー
フは、各インデックスについて、ツリーにおけるリーフ
のレベルで復号されるべく残っている、最小限の数のバ
イトのみを記憶する、ソートされたインデックスの、パ
ックされたアレイである。図７（A）-（C）は、値が各
インデックスに関連するときに用いられる、分離した値
の領域をかかる値のリストを含めて加えるための、代替
の具現例を示している。また補足すると、ルートレベル
のリーフと異なり、線形リーフはインデックスのカウン
トに対するポピュレーションフィールドを含める必要が
ない。代わりに本発明の好ましい実施形態によれば、親
ノードは、ポピュレーションフィールドを運ぶ。【００８４】テーブル２は、様々なレベルのツリーにお
ける（リーフのレベルを低くするには、残った部分のイ
ンデックスを表すためのより多くのバイトが必要）、３
２及び６４ビットワードサイズプラットフォームのため
の、そしてインデックスに関する値を有するシステムの
ための、リーフの配置及び配列を含む。【００８５】【表２】補足すると、各場合において、リーフのインデックスサ
イズ、すなわち各インデックスの残った未復号バイトの
数は、参照しているリッチポインタ構造のタイプフィー
ルドにおいて列挙されている。最小リーフポピュレーシ
ョンは、即時リッチポインタが保持できるインデックス
がいくつかに基づいており、その結果より小さいポピュ
レーションが「即時化」される、すなわちリッチポイン
タ構造自身に記憶される。【００８６】対照的に、最大リーフポピュレーション
は、インデックスのみのリーフの場合、２つのキャッシ
ュラインの容量（例えば３２ワード）により制限され、
又は値とインデックスが関連するリーフの場合、４つの
キャッシュラインの容量（例えば６４ワード）により制
限される。６４ビットプラットフォーム上での別の実施
形態の発明において、インデックスのみのリーフは、１
６のインデックスに達したらすぐに、即時インデックス
タイプから直接ビットマップリーフへと再配置される。
その目的は、単一ポピュレーションサイズ、そして次の
挿入におけるビットマップリーフに対して線形リーフを
作成し、同じサブエクスパンスにおいて、１７のインデ
ックスに達するのを避けるためである。【００８７】線形リーフのメモリコストが、所定のしき
い値を超えるとき、例えば、上述の１７インデックスに
達するときにビットマップリーフは有用である。従っ
て、ツリーの最小レベルにおいて、ここでは復号すべく
残っている単一インデックス桁数（例えばバイト）のみ
があるが、256インデックスのサブエクスパンスが、充
分なポピュレーション（例えば１７インデックス）を有
しており、リーフを、サブエクスパンス内の各インデッ
クス毎に１ビットとなるビットマップとして表すことに
より、メモリが保存され、従って全体で256ビット又は
３２バイトとなる。【００８８】３２ビットワードのプラットフォームで具
現化されるインデックスのみのビットマップリーフの例
が、図８において表されている。図において、各水平の
矩形領域は、１ワードを表す。６４ビットプラットフォ
ームでは、ワードが大きくなり、そしてビットマップ中
のワードが半分である場合を除き、リーフは同じように
なる。ビットマップ中のビットは、リーフのエクスパン
スにおけるどのインデックスが実際に存在できる、すな
わち記憶できるかを示している。【００８９】図９は、問題となるデータ構造が、値と記
憶されたインデックスとの関連付けを行う代替の実施形
態の概略図を示したものである。示したように、有効イ
ンデックス当たり１ワードを含む値領域は、ビットマッ
プリーフに含まれる。ビットマップブランチと同様、こ
のビットマップリーフの実施形態は２列の構成体である
が、リッチポインタアレイ（要素毎に２ワードとなる）
の代わりに、値領域サブアレイ、すなわち要素毎に１ワ
ードを有する値のリストである点が異なる。【００９０】６４ビットプラットフォームでは、ビット
マップは４ワードを代わりに必要とし、４ワードは用い
られない。２列の構成体を用いる結果、含まれるメモリ
及びキャッシュラインのバイト数が少なくなるため、値
リストの修正が高速となる。【００９１】ビットマップブランチと同様に、エクスパ
ンスが充分小さいとき、例えば８ビット又は１バイトの
256に分かれたノードが、復号されないで残っていると
き、そしてエクスパンスのポピュレーションが充分大き
いとき、例えば２５インデックスより大きいか同じのと
き、エクスパンス中の有効インデックスをインデックス
のリストでなくビットマップとして表すことは都合がい
い（すなわち「メモリに関してはより安い」）ことが分
かる。【００９２】この特性は、インデックス毎の未復号バイ
トが丁度１つとなるツリーのレベル１（すなわちルート
ノードからもっとも遠いリーフ）においてのみ真とな
る。本発明の好ましい実施形態によれば、ビットマップ
リーフを用いることはレベル１リーフに、すなわち、唯
一の未復号バイトを含むインデックスに関して限定され
る。【００９３】データ構造はさらに、リーフインデックス
圧縮を含む。線形リーフとの関連で説明したように、デ
ジタルツリーの通過は、検索、挿入、又は削除されたタ
ーゲットインデックスの部分（例えば１バイトセグメン
ト）を表すインデックスビットの復号を含んでいる。多
くの場合において、リーフに到達すると、リーフに記憶
されたインデックスにおけるビットのいくつか又はほと
んどは、すでに復号されている。すなわち、ツリーにお
いて各位置毎に（すなわちデジタルに）記憶される。【００９４】従って、残った未復号インデックスビット
（接尾部）のみが、リーフ中に記憶される。従って、３
２ビットプラットフォーム上で、４バイトインデックス
が１度に１バイト復号された状態で（すなわち、ツリー
の各ブランチにおいて）、２つの６４バイトのワイドキ
ャッシュラインのサイズ（すなわち１２８バイト）を有
する（ターミナル）リーフは、テーブル３に示すような
圧縮インデックスの数を収容してもよい。【００９５】【表３】テーブル３を参照すると、インデックスごとに１バイト
の場合、一度ポピュレーションが２４インデックスを超
えると、３２バイト（すなわち２５６ビット）オブジェ
クトは、低レベルリーフにおける可能なインデックスの
全てを表すビットマップを保持する程度に充分となる。
また、リーフインデックス圧縮がさらなる利点を有する
点は注意すべきである。【００９６】特に、ツリーのより低いレベルのインデッ
クスは、現レベルのリーフより多くのインデックスを保
持することができる。その結果、たとえ即時インデック
スがなくとも、既存のリーフをオーバーフローする単一
インデックスを挿入することによりカスケードしても、
ツリー中で追加の１つのレベルより多くは決して作成し
ない。同様に、単一インデックスを削除することにより
カスケードしなくとも、ツリー中の１つのレベルより多
くは決して削除しない。言いかえると、リーフ圧縮は、
修正中の変化のよい局所性をサポートしている。【００９７】すでに説明したように、本データ構造が、
固定サイズのインデックスについて説明されている一
方、文字ストリング及び不定長のビットストリングのよ
うな様々なサイズのインデックスを収容すべく、それを
直ちに修正してもよい。例えば、不定長のビットストリ
ングをインデックスとして用いることで、単一インデッ
クスの、独自に残った接尾部は、もし十分小さいもので
あれば、リッチポインタ内に直ちに記憶してもよく、も
しより長ければ、様々なサイズの単一インデックスの接
尾リーフ内に記憶してもよい。【００９８】図１０は、本発明に係るデータ構造を具現
化し、維持するメモリ格納プログラムをサポートし、走
らせることのできる、コンピュータシステムの概略図で
ある。従って本発明は、幅広いデータ構造、プログラム
言語、オペレーティングシステム、ハードウェアプラッ
トフォーム及びシステムに適用可能であり、図１５は、
本発明をサポートするのに適したプラットフォームを含
んだ、かかるコンピュータシステム1000を示している。【００９９】コンピュータシステム1000は、システムバ
ス1002に結合した中央演算処理装置1001を含む。CPU100
1は、HP PA-8500又はインテルペンティアムプロセッサ
のような、一般的な用途のCPUとしてもよい。しかしな
がら、CPU1001が、例えばポインタの使用など、ここで
説明された工夫された動作をサポートする限り、本発明
はCPU1001のアーキテクチャに拘束されるわけではな
い。システムバス1002は、ランダムアクセスメモリ（RA
M）1003に結合されるが、これはSRAM、DRAM、又はSDRAM
としてもよい。【０１００】ROM1004はまた、システムバス1002に結合
されるが、PROM、EPROM、又はEEPROMとしてもよい。RAM
1003及びROM1004は、ユーザ、システムデータ、及びプ
ログラムを業界において知られたものにする。【０１０１】システム1002はまた、入出力（I/O）制御
カード1005、通信アダプタカード1011、ユーザインター
フェースカード1008、及びディスプレイカード1009に結
合される。I/Oカード1005は、ハードディスクドライ
ブ、CDドライブ、フロッピー（登録商標）ディスクドラ
イブ、テープドライブ、といったストレージデバイス10
06をコンピュータシステムに接続する。通信カード1011
は、コンピュータシステム1000をネットワーク1012に結
合するよう適合されるが、ネットワーク1012は、電話ネ
ットワーク、ローカル／ワイドエリアネットワーク（LA
N／WAN）、イーサネット（登録商標）ネットワーク、イ
ンターネットネットワークとしてもよく、有線又は無線
とすることができる。ユーザインターフェースカード10
08は、キーボード1013、ポインティングデバイス1007、
等のユーザ入力デバイスをコンピュータシステム1000に
結合する。ディスプレイカード1009は、CPU1001により
駆動し、ディスプレイ装置1010を制御する。【０１０２】本発明は、現在好ましい実施形態と考えら
れるものとの関係で説明してきたが、本発明は開示され
た範囲に限定されるものではなく、逆に付された特許請
求の範囲の趣旨及び範囲内に含まれる様々な修正や同等
の配置をカバーすることを意図してることを理解すべき
である。この発明は例として、次の実施形態を含む。【０１０３】（１）コンピュータメモリに格納するため
のデータ構造であって、データ構造はデータ処理システ
ムで実行されるアプリケーションプラグラムによりアク
セス可能であり、前記データ構造は、ルートポインタ
（101）と、前記ルートポインタにより指され、階層的
に配置された第１の複数ノードを含むデジタルツリー
（102,105,106,112-113）とを備え、第２複数ノード
は、中身のあるサブエクスパンスの数及びデジタルツリ
ーの全体ステータスに従って選択された、線形ブランチ
ノード（図３）、ビットマップブランチノード400、及
び伸長ブランチノード（図５）から構成されるグループ
から選択されるブランチノード（105,106,113）と、線
形リーフノード（図６及び図７）及びビットマップリー
フノード（図８及び図９）から構成されるグループによ
り選択され、それぞれ複数インデックスを保持し、デジ
タルツリー内のリーフ及びリーフ内のインデックスの数
のレベルに従った、未復号インデックスのみを含むリー
フノード（108,110,116-123）と、を含むデータ構造。【０１０４】（２）前記第２の複数ノードは、線形ブラ
ンチノード（図３）、ビットマップブランチノード（図
４）、及び伸長ブランチノード（図５）を含む（１）に
記載のデータ構造。【０１０５】（３）前記第２の複数ノードは、線形リー
フノード（図６及び図７）及びビットマップリーフノー
ド（図８及び図９）を含む（１）に記載のデータ構造。【０１０６】（４）前記線形ブランチノード（図３）は
少なくとも線形リストを含み、第１リストは少なくとも
各関係する中身のあるサブエクスパンスの対応するイン
デックスビットを含むサブエクスパンス記述語を含み、
第２リストは各関係するサブエクスパンスの１つ又は複
数の補助ノードへのポインタを含み、前記ポインタは前
記第１リストの前記サブエクスパンス記述語に対応する
（１）に記載のデータ構造。【０１０７】（５）前記ポインタのそれぞれがリッチポ
インタ（図２Ａ及び２Ｂ）を含む（４）に記載のデータ
構造。【０１０８】（６）前記線形ブランチノード（105及び
図３）は少なくとも２つの線形リストを含み、第１のリ
スト（E1-E4）は、各関連する中身のあるサブエクスパ
ンスの少なくとも対応するインデックスビットを含むサ
ブエクスパンス記述語を含み、第２のリスト（エクスパ
ンス1-4のリッチポインタ）は、各関連するサブエクス
パンスの１つ又は複数の補助ノードを指すポインタを含
み、前記ポインタは、前記第１リストの前記サブエクス
パンス記述語に対応する（１）に記載のデータ構造。【０１０９】（７）前記ビットマップブランチ113,400
は、各可能なビットマップブランチノード下のサブエク
スパンスについての１ビットを含むビットの第１リスト
401を少なくとも含み、前記ビットの各々は、対応する
サブエクスパンスにインデックスがあるか否かを示し、
ポインタ402の第２リストは、前記サブエクスパンスの
それぞれについて少なくとも１つの補助ノードを指し、
前記ポインタは、前記第１リスト中の前記ビットのステ
ータスに対応する（１）に記載のデータ構造。【０１１０】（８）前記ビットマップリーフノード（図
８及び図９）は、リーフ中の各可能なインデックスに対
し１ビットを含んだ第１リストを少なくとも含み、前記
ビットのそれぞれは、前記インデックスのうち対応する
１つが有効であるか否かを示す（１）に記載のデータ構
造。【０１１１】（９）インデックスが属するデータ構造
（図１）の圧縮ブランチノード（図３及び図４）を識別
するステップと、前記データ構造のパラメータを決定す
るステップと、前記値に応答して、前記圧縮ブランチノ
ードを伸長ブランチノード（図５）に選択的に変換する
ステップと、前記伸長ブランチノード下でインデックス
を記憶するステップとを備え、前記パラメータは、デー
タ構造についてのインデックス値及び前記圧縮ブランチ
ノードの下のポピュレーション毎に用いられるメモリ全
体のうちの１つを含む、データ構造中のインデックスを
記憶する方法。【０１１２】（１０）前記データ構造は、データ処理シ
ステム1000上で実行されるアプリケーションプログラム
によりアクセスできるよう、コンピュータメモリ1003,1
004に記憶され、前記データ構造は、ルートポインタ101
と、前記ルートポインタにより指され、階層的に配置さ
れた複数のノードを含むデジタルツリー102,105,106,11
2-113とを備え、各ノードはそれぞれ、前記圧縮ブラン
チノード（図３及び図４）及び前記伸長ブランチノード
（図５）により構成されるグループから選択されたブラ
ンチノード105,106,113と、線形リーフノード（図６及
び図７）及びビットマップリーフノード（図８及び図
９）により構成されるグループから選択されたリーフノ
ード108,110,116-123とを備え、各リーフノードは、複
数インデックスを保持し、デジタルツリー中のリーフの
レベル及びリーフ中のインデックスの数に従った未復号
インデックスビットのみを含む（９）に記載の方法。DETAILED DESCRIPTION OF THE INVENTION [0001] The present invention generally relates to data structures.
In the field of construction, in particular, the data in which the structure of the data organization is stored.
Data depending on the data and compressed to match the data
It relates to a hierarchical data structure with structural elements. [0002] BACKGROUND OF THE INVENTION Computer processors and corresponding
Memory continues to increase in speed. Hardware is the thing
However, as the speed limit approaches,
Data access time must be significantly reduced
It is. Even if these limitations are not a major factor, software
Hardware platform by maximizing hardware efficiency
Maximize form efficiency and reduce hardware / software
Expand the capacity of the entire software system. One way to increase the efficiency of a system is to use
Although it is based on efficient data management,
Smart selection, related storage, and retrieval algorithms
Is done. Various data structures, eg related to the prior art
Storage and retrieval algorithms for arrays, hash
Tree, binary tree (binary tree), AVL tree (height)
Binary tree balanced with b), b-tree, and skip
It has been developed for managing data, including lists. [0004] These prior art data structures, related
For each successive storage and search algorithm,
Faster access time and minimized memory overhead
There is an inherent trade-off between For example,
Rays are fast-read through address calculations for single array elements.
Be prepared for indexing, but before a single value is stored
Requesting that the entire array be pre-arranged in memory,
And unused intervals in the array waste memory resources
I do. Instead, binary trees, AVL trees, b-trees, and
Kiplist pre-allocates memory for data structures
And minimize the allocation of unused memory.
But as populations increase,
It becomes clear that the access time increases. The array has a simplified structure,
Prior art data for high-speed access to encrypted data
Structure. Memory must be located throughout the array
No, but its structure is not flexible. Array values are
Increase the index placed on each element of the ray by size
Palm, adding the offset of the base address of the array
, And can be checked in numerical units for each position. Typically, a single central processing unit key is used.
Cache linefill is required for arrays
Necessary to access the element and the value stored here
It has been. Explained and typically embodied
As such, arrays are inefficient and have relatively inflexible memory.
It is. However, access is given as O (1)
It is. In other words, independent of the array size (disk
Ignoring swaps). [0007] Alternatively, the data structure previously described is
Branch tree, b-tree, skip list and hash table
Memory is efficient but not desirable.
It will be available in a form that includes signs. For example, hashing
Is the scattered, possibly multi-word index
(For example, a string) to an array index
Used to do. A typical hash table is
Is a fixed-size array, where each index is
Hashing algo performed with original index
The result of the rhythm. [0008] However, efficient hashing
In order for the hash algorithm to be
Box. Hash table
Also, all data nodes have the original index
Seeking to include a copy of (or a pointer to)
Identify the nodes in each synonym chain
can do. Using hashing, like an array
Requires that memory be pre-arranged
But if properly designed, ie stored
The characteristics of the data to be
Hashing algorithms created and embodied, collision resolution
If the technology and storage structure match,
It is part of memory that must be located for
Minutes. In particular, a digital tree or a trie
e) provides fast access to data, but generally
In addition, memory is inefficient. Keep tree branches narrow
Should handle a sparse set of indices.
And may be more memory efficient, but the result is a deeper tree.
Memory references, bypasses, and cache lines
Increase the average number of files, resulting in less access to data
Become slow. This latter factor, cache efficiency
Maximizing that such a structure is still being discussed
Sometimes ignored, but may affect system performance
May be the dominant factor. A trie is a tree of small arrays or branches.
Where each branch is one or more of the indices
Decodes more bits. Prior art digital tools
A tree is a simple pointer or array of addresses.
Has a launch node. Typically, a pointer or ad
Less size improves the memory efficiency of the digital tree
To be minimized. In the "tail" of the digital tree, the last brand
Switch is the index of the last bit, and
Decode the element point to the fixed storage. Tree
"Leaf" is a special index
A memory chunk that has a specific application structure.
You. The digital tree has no index
Or population zero (or empty subex
Allocate memory to branches that are called pans)
It has the advantage that there is no need to use it. In this case
Poi pointing to an empty subexpanse
Counters are given unique values and represent valid address values.
Is called a null pointer that indicates no. [0013] Furthermore, the in-stored in the digital tree
Dex allows the identification of neighbors, in sorted order
Can be accessed. Digital used here
The "expanse" of the tree is a digital tool
The range of values that can be stored in the tree
Lee's population is in the tree
A set of values that are actually stored. Similarly, the execution of the branch of the digital tree
The span is the width of the index that can be stored in the branch
And the population of the branch is the branch
Is the number of values (eg, counts) that are actually stored in
You. (As used herein, "population"
The term is a set of indices or their indices.
About any of the box counts,
The meaning of the term is relevant in the context in which the term is used.
It is clear to the trader. ) "Adaptive Algorithms," by Acharya, Zhu and Shen.
for Cache-efficientTrie Search ”
A cache efficiency algorithm is described. each
The algorithm uses a different data structure,
The structure of represents the different nodes in the trie,
Partitioned arrays, B-trees, hash tables, and
It contains a vector. Selected data structure
Has the same cache characteristics as node fanout.
Depends. [0015] The algorithm further defines
By dynamically switching the data structures used,
To adapt to changes in the fan-out of the Finally, each
The size and layout of the data structure depends on the cache characteristics and
Also determined based on the size of the alphabet symbol
You. This publication further discusses real and simulated
It includes a performance evaluation of the specified memory hierarchy. In other publications, known and used by those skilled in the art,
The following describes the data structure:
There is Fundamentals of Data Structures in Pasca
l, 4th edition, Horowitz and Sahni; pp582-594; The Art of
Computer Programming, Volume 3; Knuth; pp490-492; Al
gorithm in C, Sedgewick, pp245-256, 265-271; "Fas
t Algorithm for Sorting and Searching String ”; Be
ntley, Sedgewick; "Ternary Search Trees"; 587192
6, INSPEC summary number: C9805-6120-003; Dr. Dobb's Journ
al; "Algorithm for Trie Compaction", ACM Transan
ctions onDatabase Systems, 9 (2): 243-63, 1984;
uting on longest-matching prefixes ”; 5217324, INS
PEC summary number: B9605-6150M-005, C9605-5640-006; "So
meresults on tries with adaptive branching ”; 6845
525, INSPEC general number: C2001-03-6120-024; "Fixed-bu
cket binary storage trees "; 01998027, INSPEC overview
No .: C83009879; “DISCS and other related data st
ructures ”; 03730613, INSPEC outline number: C90064501;
And `` Dynamic sources in informationtheory: a g
eneral analysis of trie structure "; 6841374, INSP
EC outline number, B2001-03-6110-014, C2001-03-6120-023. The extended storage structure is disclosed in US patent application Ser.
09/457164, filed on December 8, 1999, titled `` A Fast Eff
icient Adaptive, Hybrid Tree ”(hereinafter 164 patent applications)
), And is filed in the same manner as the previous application. here
The data structure and storage method described in
Perform self-tuning and use the “Expans” based storage
Self-conforming structure that places cards to minimize storage requirements
For efficient and scalable data storage and tuning.
Provide search and search capabilities. The structure described here is
However, those that make full use of the specified data distribution
is not. In the storage structure described in the above-mentioned patent application,
The extensions are detailed in the following application: It's rice
National Patent Application No. 09/725373, filed November 29, 2000,
Ittle `` A Data Structure And Storage And Retrieval
Method Supporting Ordinality Based Searching and
Data Retrieval '' and filed in the same manner as the previous application.
You. In this latter patent application, the data structure, associated data
Data storage and search methods are described.
The method is based on a hierarchical structure of stored or ordered elements.
Sum of more referenced elements, ordinal values in the structure
Access to elements based on their identity and order of elements
It is provided promptly. In a structure embodied by an ordered tree
Stores the sum of the indexes that exist in each subtree
Is done. In other words, the root of each subtree is
Stored in or associated with the code. This high level
A node points to its subtree or
At or associated with the head node. Data structure
Specific requirements (e.g., creating new nodes,
Allocation, balancing, etc.), as well as data insertion and deletion
Includes an update step that affects the sum. [0020] However, the present structure
Cannot make full use of certain scattered data situations.
Therefore, the performance characteristics of digital trees and similar configurations are optimized.
There is a need for technology and tools to optimize. [0021] SUMMARY OF THE INVENTION A data structure according to the present invention
Is a digital tree (or "traffic") stored in memory.
B)) As a dynamic array based on the data structure
Can be handled and handled through the root pointer.
Includes a self-modifying data structure that can be Empty tree
This root pointer is null, otherwise
If so, it is the first of the branch nodes in the digital tree.
Refers to the hierarchy. Low fan-out branches are avoided.
Or an alternative structure that consumes less memory
However, this consumes less memory. On the other hand, conventional digital
Most or all of the performance benefits of the tree structure
, Index insertion, search, access and delete
Operation. This improvement would otherwise be scattered
In a solid or wide or shallow digital tree
Widespread null pointers are wasted
Reduce and eliminate memory. In particular, in reducing the size of the structure
Branch fixes in comparison to
The additional processing time required to complete and contain
Yes, so fetching data from memory is more efficient
And in each CPU's cache line fill,
Reduce Null Pointers while Capturing More Data
You. The present invention uses, for example, a rich pointer structure.
Linear and bitmap branches and embodied
Contains leaf. Relocate flexible nodes
Change the subexpansion population by
In order to automatically rearrange. [0025] DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a data processing system.
Access by application programs running on
Data storage system in computer memory for
System and method. The system is stored in memory,
Root pointer pointing to "wide / shallow" digital tree
Data structure and related information. Digital
Trees are arranged hierarchically and hybrid abstract data
Compressed to fit using type (ADT)
(Branch node) and multi-index leaf
(Leaf nodes). In this application example, the ADT is virtually the same
Multiple data structure with the meaning of but extended to different characters
See In addition, the "index
The word “s” can be a number, string, token, symbol, or
Is a field that constitutes another such specification or expression
Key or set. By embodying a digital tree, data
Data (set of indexes or keys) is purely popu
"By expansion" rather than "by translation"
Organized first. It simplifies tree traversal
And has various advantages in modifying the algorithm.
In particular, a wide digital tree potentially has a high
Large fan-out and large population
The tree becomes shallower, even for those that are
To be faster and therefore "scalable enough"
ing. By using a compression branch, a wide
Performance benefits of the switch
Out, and thus use of memory
Data (index or key) to match
I have to. By using this technology, the sub
Only expands, i.e. the stored index
Containment must be represented as a compressed branch
No. Empty subexpanses are typically (but not required)
Lacks). Further, a plurality of indexes (or keys)
And its associated values, if any
However, it is in the "multi-index leaf"
However, this means that the tree is broken down by one or more levels.
Shallower, thus reducing memory usage and increasing access
Speed up. Compressed multi-index leaves are more
Keep many indexes, but keep the same set of indexes
More brans throughout the tree to hold
It doesn't mean you have to insert a switch. Such "cache efficient" compressed
Branches and leaves minimize “cache fill”
Access to random access memory (RAM)
Access so that it is relatively slow
Most desirably designed in terms of cache lines
You. Therefore, the present invention provides a digital tree
Several types of blocks to optimize the performance of the structure
Includes lunch and leaf compression. Improvements like this
Contains the linear and bitmap branches (ie
Nodes), linear and bitmap leaves, and these
Includes rules and methods for accomplishing the use of nodes. This
This includes, for example, general-purpose, memory-efficient operations.
Working, convenient, expansion and leaf compression branch
The use of index compression. The linear branch node according to the present invention has
There is a subexpanse and the corresponding next level pointer
Low fan-out by having a list of
Address a branch. More generally, a linear bran
Key is a key or a set of fields that make up a key.
Select one or more subexpanses
Contains a list of subexpense descriptors, including criteria.
According to a preferred embodiment of the present invention,
The descriptor is a 1-byte segment with a 32-bit index
It is. Preferably, the linear branch is
Platform bound to a single CPU cache line
You. As the content of the subexpans grows,
Map branch nodes contain subexpenses
Binary vector indicating if there is (ie not empty)
Can be used, including This binary vector
Subexpans (or equivalent multi
List of pointers to level data structures). The linear leaf node according to the present invention is similar
The use of multi-index leaves
To the index of the most popular population. Ma
Multi-index leaf for large populations
Contains a list of valid indexes for Tree low
Medium to high population concentrations at high levels
Bitmap leaf nodes, on the other hand, have a valid index
With a binary index of
For example, a value area corresponding to each effective index is included. In the present invention, a high-frequency, memory-efficiency-oriented
Of the compression branch according to the machine of the compression branch. data
According to this aspect of the structure, the data stored in the data structure
Threshold used for the entire set per index
Value (probably measured in bytes per index
Occupies less memory, or
Is the pop-up of the subexpansion under the bitmap branch
When the ratio is sufficiently large, even if the high-frequency distance is sufficient
But not linear or bitmap branches
Placed in a stretched form (ie, a stretched branch node)
Can be replaced. As a result, additional memory costs
Has reduced computation time, and
The schfil is reduced. Big population
Index, especially well-clustered indexes
The use of this option for data
Maintain high-speed access to services and related data
"Clearing out" the excess memory needed for Between the branch and the node, ie, the linear
Branch and linear leaf, and also bitmap branches
Note the degree of symmetry between the and the bitmap leaf
It is. In this embodiment, this symmetry is determined by each index.
It becomes apparent when a box is mapped to an associated value.
The internal nodes of the tree point to pointers to auxiliary nodes.
Index part (digits)
Terminal node is actually outside the tree
A pointer defined by a source
A well-decoded index for the range that often contains
Map. However, with this symmetry, the extended bran
It means that there is no leaf equivalent to Ji. Yo
Higher level reef exceeds specified population
The subtree under the new branch, or
Appropriately at lower levels for more compressed leaves
Is converted to The minimum level of linear leaf is
When it goes over the bitmap leaf
Is converted to According to another aspect of the invention, a target
Part of the index is decoded at each level of the digital tree
Exploits the fact that the leaf index is compressed
are doing. The index is partly while the tree is inverted.
Since each segment is decoded separately,
Only the remaining parts need to be stored in the leaf.
And the number of bits or bytes decreases at each lower level.
Constitute this undecoded part. As a result, lower level leaves
(I.e., leaves further away from the root)
More indices in the same space as the bell leaf
Remember, the latter has more bits, but each index
To represent a larger undecoded part of the data. Obedience
The worst case index insertion and deletion
Exclusion, but localized, one level above the tree
No cascade up or down, each worst
In this case, the number of insertions and deletions is minimized. Be careful
It is clear that this type of compression is best for fixed-size compression.
This is also true, but it can be a character string or bitstring.
It is not useful for variable size indexes such as tags. Bits common to a plurality of keys (indexes)
Digital tools so that they can be skipped (not represented).
Note that the tree can be compressed. Such a tree
-Whether it is fixed or variable size,
To clarify a leaf, a key is
-The entire copy must be remembered (needs clarification)
Unless otherwise required). This is, as embodied in the present invention,
Distinguished from leaf compression, where the decoded part
Index for all indexes in the subexpansion
Tree passing or skipping
(Compression), whichever is required, always remembered
Is recoverable from the branch node, and
There is no need to store it in the node. The present invention provides an efficient cache cache.
Proper combination of different ADTs for lunch and leaf
Provide a combination, and the combination is stored in an example.
Based on a contingency data set
A wide digital tree is shared with memory efficiency
Access or modify a wide dynamic range
It will be. Wide dynamic range
Are small to large data sets.
That is, the index is low to high (billions)
Unit), and the type of data set
Or crowded, regular, or random
It can be a dex or a key. Well designed, broad and dynamic
Hybrid digital trees with ranges
Does not require (or even makes it possible)
Software interface as a simple and dynamic array
Interface. The present invention passes through a data structure containing a pointer.
A wide range of constructs and data structures connecting nodes
Other methods to prepare for the passage of traffic
And may be embodied as For purposes of illustration, a preferred embodiment of the present invention
Is the range of the digital tree construct that contains the extended pointer
It may be embodied within. This is because of the pending US patent
In the application, the title "Cash effect with rich pointer
System and Method for Efficient Digital Tree "
Is disclosed. The pointer is a null pointer,
Or used to refer to a branch or leaf node
In some cases, such as shown in Figure 2A or with an immediate index
In such a case, the first configuration shown in FIG. 2B may be used.
No. The object pointed to by using the rich pointer
Object type, such as linear or bitmap, branch or
Give designation of leaf, etc. As another embodiment of the present invention, a conventional poi
Other components such as a printer may be used. For example, target
Object that identifies or points to the object
Is self-identifying (ie, the type information is child, not parent)
The pointer itself (which is stored in the node)
May be an object that is 8 bytes in a row
Otherwise the least significant 3 bits are not used
The least significant bit of
And so on. As shown in FIG. 2A, the basic points
Data structures, for example on 32-bit platforms
Contains two 32-bit words, where one complete word
The body is used by the pointer to
Row is decoded between other nodes, 0 and 2 bytes
Index population between 1 and 3 bytes
Fields and 1 byte type fields
Will be fixed. For null pointers, type field
All bytes are zero except for. Instead, the first
Word is a pointer to an auxiliary branch or leaf node
It is. The decryption and population fields are both
All but one byte of the second word are satisfied. Pointer construct containing immediate index
Accesses the index, as shown in FIG. 2B
Eliminates the need to redirect and point to other nodes
Has been removed. As explained in the referenced patent application,
Another variation of these pointer constructs is to set the value to
Used to map to each index,
Adapted to accommodate word sizes of various machines
You. The present invention uses these pointers to
, Ie, interior nodes and leaves, ie
Form an ADT containing terminal nodes. This data
According to the structure, the digital tree consists of branch nodes (lines
Shaped, bitmap, or decompressed) and leafno
Code (linear or bitmap)
Contains. Each branch is either a constant (stretched) or
Array of virtual (linear or bitmap) pointers
And preferably 256 such rich pointers.
You. That is, each node has up to 256 sub-expansions
With fan out. In a preferred embodiment, the index
The data is decoded eight bits at a time, ie one byte.
In other words, each digit is one byte, and each
Real or virtual fanout
256. Digital tree at branch node
Can have any fanout, for example a tree
Is 26 when decoding the 26-character alphabet
However, it is possible that fanouts that are not a power of 2
It will be clear to those skilled in the art. The binary tree is usually populated
Tree (binary storage tree)
Where the key is stored in each node
Key value. However, also
The narry tree was separated by expanse (binary data
Digital), two fan-outs with one digit each
Can be a tree with In addition, the hive
The lid tree, in different branches or trees,
It may have various fan-outs. However,
Computers are inherently part of word-sized objects.
To efficiently handle byte-sized objects.
With 256 consistent fanouts, or 1 byte
It is a finding of the present invention that the digit size is the most efficient.
I know by the lighter. The compressed branches are linear and bitmap
To supplement the elongate type branch. This latter
Branch types are, for example, 256 subexpansion sports
Conventional digital tree function using arrays with interconnects
Is supported. The actual fanout (ie,
The number of subexpans with content) is relatively limited
New branch is created during index insertion
When done, the "compressed"
A punch is used instead. This compressed branch has 256 sub-edges.
It may be viewed as a virtual array of xpans pointers,
(For reasons described below, instead of one, two
Need schfil to pass through related nodes
Requires much less memory)
No. Referring to FIGS. 1A to 1E, the root pointer
Card 101 is the underlying data structure of the digital tree.
Used to access. Root pointer no
The node 101 is a diagram, the first or “top” level node 10.
2, this description is indicated by the arrow pointing to the branch node,
Address information. (Supplementally, it is used here
The terminology used assumes 32-bit implementation.
Where the index is a single
Is a word. Then, as a result, "level 4"
Label the top node of the tree pointed to by the
The children of the bell 4 node are designated as level 3 nodes
And so on. On 64-bit machines, the root
Is a level 8 node, its children are level 7, etc.
Become. Thus, the level of a branch or leaf node is
In the index stored or below the node,
Equal to the number of digits (bytes) remaining to be decoded. This
The numbering method of the 32-bit and 64-bit trees
With the additional benefit of keeping the same minimum level.
To use with trees of various sizes
Simplifies the source code needed for Further supplement
So, while this arrangement is typical, this explanation
For example, a leaf node
-As the highest (eg 4) level
Even if you adopt other arrangements, such as those that include
Good. ) Top level node 102 is an extended branch node.
This references up to 256 lower level nodes.
Includes an array of 256 rich pointers for data
Structure, i.e., from 00000000 to FFFFFFFF in octal notation
Represents the entire expanse of the index. Top level
Node 102 responds to expansions from 00000000 to 00FFFFFF.
1st rich, corresponding to level 3 linear branch 105
Contains pointer 103 (also called a conforming object)
No. The other rich pointer 104 has the index FF000000
From FFFFFFFF to the final expanded part
Have been. Rich pointer 104 is the most important level
The 1 / 256th above 3 points to the expanded branch 106. The first subexpansion of level 3 is linear
Contains auxiliary nodes in the form of branches 105. As shown
And the linear branch 105 is fan-out (NumRP = branch
Count of number of child nodes referenced by), branch
Index corresponding to the subexpansion referenced by
List of columns (number of digits)
Contains a list of pointers to subexpanses. sub
Similar from the expansions E1 to E3 slots
Although the pointer is not shown, it exists,
Is listed as E4, from 00FD0000 to 00FDFFFF
The final subexpansion, representing the subexpansion containing
Only pointers to the resources are shown. Therefore, the linear bra
The fourth rich pointer of the
Refer to bitmap branch 113 of
Linear branch leaf 118-122 and bitmap leaf 11
6, 117 and 123. At the end of the higher-order node 102,
Bell 3 extension branch referenced by rich pointer 104
Is done. Only two of such references are shown for illustrative purposes.
However, typically, the extension branch 106 has a large number of complements.
Refers to an auxiliary node. Supplementally, the contents are separated
One branch is linear or otherwise to save memory.
Converted to bitmap branch format,
To nodes using one or two cache line fills
Still have access. As shown in FIGS. 1A to 1E, level 3 extension
Branch 106 includes a link to level 1 linear leaf node 108.
Array of 256 rich pointers, including switch pointer 107
including. Supplementally, the rich port according to an embodiment of the present invention is described.
The use of interfaces allows pointers to "street" through the levels of the tree.
Kip ", but this is because the middle branch
Unused indirect references (indirectio
n) to avoid. The other rich pointer 109 is 2
Level 2 linear leaf containing two 2-byte indices
Refers to node 110. The rich pointer is a branch according to the present invention.
And data compression that are compatible with leaf compression
It may be used to embody. Although it is not necessary,
The use of a switch pointer is compatible with one embodiment of the present invention,
Support this. Such a rich pointer structure has few
At least two types of rich pointers or compatible
It covers objects, which are represented in Figure 2A.
Pointer type and immediate type represented in FIG. 2B
including. Immediate type supports immediate index
You. That is, the population of Expans
When the pointer is relatively scattered, the rich pointer
In the tree tree branch, index "immediately"
Can be used to remember
Must pass through the digital tree to the lowest level
No need. This format is the same as immediate machine instructions.
Kind, where the instruction is assigned to every displacement byte.
Identify the immediate operand that follows. Therefore, an immediate index or a small index
The dex is stored in the node, and one or more
Avoid re-orientation or otherwise traverse the tree
Requested and reaches a remote leaf node. It
The more immediate the index, the more memory is allocated
For multiple memory references and data access
Instead of requiring a small cache fill,
Packing (or a small number of indices)
Is provided directly to the rich pointer structure. [0062] Two-word formater according to preferred embodiment
Will immediately support the inclusion of the immediate index.
To This means that in the rich pointer
Index digits of the entire rich pointer excluding fields
Achieved by storing numbers. 32-bit system
A rich pointer embodied in a
From hour index to seven 1-byte indexes
In a 64-bit system.
The up-pointer records up to 15 1-byte immediate indexes.
You may remember. A rich policy that supports an immediate index
The generalized structure of an interface is
2B), as shown in FIG. 2B. Rich pointer
Is one or more platform word services.
Index depending on size and index size
Index “I”, index size and immediate index
The number of bits also includes an 8-bit type field to encode. FIG. 3 shows a 32-bit platform.
When embodied, the linear branch structure according to the invention
Details are explained. Linear branches are fan-out,
That is, the subexpansion referenced by the branch
It consists of 1 byte indicating the number of subexpanses.
Subex with content indicating a number (eg, 0 to 255)
Saw consisting of one byte (ie number of digits) for each bounce
Following the array that was reset. The number of solid subexpanses is
Following the corresponding array of expand pointers. The present invention
Combines the embedded at the ends of the two arrays,
Arrays are "same" for fast insertion and deletion.
"Grow in place". Subexpans
Both arrays (ie, digits and pointers) are purely
Populated, organized or packed,
Not uniformly addressed by spans
Are organized or accessed by EXPANS
Is there. Typically, the linear branch shown in FIG.
The mode is used for the actual fanout,
The subexpans with the contents is relatively small,
7 out of 256 possible subexpans per lunch
Up to the rich pointer. One embodiment of the present invention
The linear branch node according to the state includes the above-mentioned three continuous areas.
Is a subexpansion with content,
Sub-expanses (1 byte each)
List and the list of corresponding rich pointers
Includes each two words long. (By those skilled in the art
Number, type, size, and area
For ordering, other configurations may be used in alternative embodiments of the invention.
May be used. ) When using this predetermined method,
The largest linear branch containing rich pointers is
1 byte for the number of spans, subexpansion resource
7 bytes per strike, thus 2 words in combination
(In a 32-bit system). Coun
And the list of subexpense lists are rich
14 words about the pointer itself, followed by the entire construct
Fits 16 words or an entire cache line
I do. Returning to FIG. 3, four solid subexpans
The entire space is moved from E (expansion) 1 to E
(Expans) 4 is referred to. FIG. 4 shows a bitmap branch,
Re-embodied on 2-bit word size platform
It was done. The bitmap branch node is medium
256 bits to indicate the presence of a subexpense
The first part 401 containing the first part (32 bytes), followed by the second part 4
02 is a rich pointer to the sub-expansion with contents
And a normal pointer pointing to an independent sub-array. This
The structure of
Bytes per index into bits per index
Compressed and bitmap 0 bit for invalid index
Savings of up to 7/8, except where
You may think. Conceptually, the sub-expansion pointer
Is stored in a single array (402 parts) following the bitmap
Is done. However, according to a preferred embodiment of the present invention,
Simplifies memory management and speeds up insertions and deletions
In order for the bitmap to follow 8 normal pointers
Each with a subexpr between 0 and 32
Corresponding to independent subarrays 408 and 409
You. Since it can be addressed by the number of digits (0-255)
Bitmaps are now organized in Expands
But the latter, on the other hand,
Only the corresponding, sub-arrays are packed into
The expansion pointer is reset "by population".
It is turned into a strike. According to another embodiment of the present invention, once the
Bit pointer branch sub-array is memory
Use of the maximum, i.e. the number of pointers reached, subarray
Has 32 sub-expansion spoilers
Once the subarray is ready to hold the
Files are expanded to save time during insertion, insertion, and deletion.
Decompressing the rich pointer sub-array means
Even for subexpanses with empty indexes
Even if all bits are
Set in pans, rich pointer subarray
Is unpacked (unpacked) and simple,
Accessible arrays and null rich poi
Represents a solid subexpansion with
means. Therefore, as shown in FIG.
A launch is a two-row object that can be linear or extended.
Somewhat more complicated than lunch. The first level (401)
The bitmap itself, but the 32-bit word
8 sub-expanses according to the realization of
Includes subdivided 256 bits (32 bytes)
ADT or sub-array (eg, arrays 408 and 409)
Eight pointers (402 parts) follow. Each ADT 400 is packed with a rich pointer.
Each rich pointer is composed of a related linear list.
For each bit set in the linked bitmap
I have. In a 32-bit system, eight words are bit-
Is determined for the step (32/4), and 8 words are
On the other hand, a total of 16 words are required. The total of 16 words here is the system puff.
Important in performance,
According to the embodiment, it is one CPU cache line
Because it is equal to To add, 64-bit system
In, only 4 words are needed for the bitmap
While 8 words are still
Required, resulting in a 16-word cache line.
Assuming that 4 words are wasted. For example, the bitmap 404 is a hexadecimal value 0000b
074, which is a binary vector and
Has a dex value. [0074] [Table 1] According to this example, it is represented by the column after Table 1.
The binary vector is 40_he _xFrom 5F_hexSub-exe in the range
Subs including spans 42, 44, 45, 46, 4C, 4D, and 4F
Indicates that an index exists in the expansion
are doing. Related normal pointer 406 in this range (FIG. 4)
Is the subexpansion indicated by the associated binary vector
Corresponding to each of the subexpanses
Points to the array 408 that contains the rich pointer. For comparison, the extended branch is represented in FIG.
Have been. This construct is a simple array of rich pointers.
B, including in this case 256 such rich pointers
Null null used to represent empty expanses
With a pointer. Here per rich pointer
Assuming two words, such a decompression branch would be 512 words.
Required. In the present invention, the general-purpose memory efficiency is further increased.
Have. That is, the cache occupied by the linear branch
Too many lines (according to a preferred embodiment of the present invention)
The limit is a single 16 word cache line)
Until fan-out (ie subexpans with content)
Number of branches), the branch is a bitmap brand
Is converted to j. Such bitmap constructs are
Can handle the `` added fan-out ''
It should be noted that no exchange is required. On the linear
Is a null subex in any of the bitmap branches.
Don't waste memory on Pounce. However, linear or bitmap blocks
Population under lunch is needed for extension branch
Memory is large enough to “amortize”
Data structure (preferably measured in bytes per index)
Overall or general memory efficiency remains
Not exceed the selected / tunable value.
The punch is converted into a stretch type as appropriate. This is the null subexpansion pointer
While wasting memory on the
One round trip (and cache fill) is secured.
To add, the latter parameter, a generic note
At least a larger poppi to support re-efficiency
In the compilation tree, the root pointer is
The total number of bytes used by the
Intermediate to remember total count of remembered indexes
Refers to the data structure. This intermediate data structure
Neighbor of the top branch node or the top of the resulting tree
It may exist at a point that becomes a branch. According to the present invention, leaf compression is also performed according to the linear compression method described above.
Indexing, including bitmap leaf types
Used in the form of a leaf. Typically, digital tools
Each lookup in one branch of Lee is the next lowest
Level sub-expansion pointer, probably memorable
Reduce the expansion or breadth of the effective index. Therefore, each unique, not yet decoded
The remaining bits need to be stored. I already explained
As in the population within the expansion (ie
The number of effective indexes) is small, a single object
It is useful to store it in Single object here
Can be searched sequentially, or at least instantly,
Through more tree branches, each with a single index
To the specific use leaf related to
There is no. In the simplest case according to one embodiment,
Index-only leaves are a list of valid indexes
It becomes. As the present inventor has determined empirically,
The most desirable size of the leaf is relatively small
For example, two cache lines, that is, a typical
On a typical 32-bit word size platform
32 words or 128 bytes or less. Two cups
Sorted index on cache line
In the sequential search of the list of
Half of the time found in Shrine (1 fill) and the second
After half an hour on the line (2 fills), on average 1.5
Requires cache fill (data enters cache
Not assumed). That is,
When the relation is small enough,
A digital tool as a list, bitmap, or other ADT.
1 to 2 cashiers, not more levels of Lee
It turns out that it is desirable to memorize in the memory. FIGS. 6 (A)-(D) and FIGS. 7 (A)-(C)
Shows an example of a linear leaf according to the invention. Linear
Is an ordered list of indices,
Each is composed of N undecoded bytes. Where N
Is the lowest level, that is, the level furthest from the root
At the level of the tree using the level 1 convention
You. (Additionally, this is how the tree is described in the traditional way.
The opposite is true. In the traditional way, level numbering
Starts with the top node at level 1 and each child
The level is higher than the parent's level. ) According to a preferred embodiment, population of leaves
(The number of indexes is the same as the leaf size)
Stored with the pointer to the leaf, but not on the leaf itself.
Not exist (complete with a single root level linear leaf
There is a fairly small array configured in
An exception is the execution example used for ). As shown in FIGS. 6 (A)-(D),
Is the leaf in the tree for each index
Minimum number of bytes remaining to be decoded at
Of sorted indexes that only remember
This is a locked array. 7 (A)-(C) show that the values are
Disjoint value used when related to the index
Alternative to add an area with this list of values
Is shown. In addition, at the root level
Unlike leafs, linear leaves are indexed
Need to include a population field for
Absent. Instead, according to a preferred embodiment of the present invention, the parent
Nodes carry a population field. Table 2 shows various levels of trees.
(In order to lower the leaf level,
Need more bytes to represent index), 3
For 2 and 64 bit word size platforms
Of the system with the value of the index
The arrangement and arrangement of leaves. [0085] [Table 2] To add, in each case the leaf index
The remaining undecoded bytes of each index
The number is the type field of the referenced rich pointer structure.
Enumerated in the field. Minimum leaf population
The index is an index that can be stored by the immediate rich pointer.
Is based on some, and as a result
Is "immediate", that is, rich points
Is stored in the data structure itself. In contrast, the maximum leaf population
Is two caches for an index-only leaf
Line capacity (eg, 32 words)
Or, if the value and index are related leaves,
Limited by cache line capacity (for example, 64 words)
Limited. Alternative implementation on 64-bit platforms
In the invention of the form, the leaf having only the index is 1
Immediate index as soon as index 6 is reached
Rearranged directly from type to bitmap leaf.
Its purpose is to use a single population size,
Linear leaf to bitmap leaf in insertion
And create 17 indices in the same subexpansion.
To avoid reaching the box. The memory cost of a linear leaf is equal to a predetermined threshold.
When the value exceeds the set value, for example,
Bitmap leaves are useful when reaching. Follow
At the minimum level of the tree,
Only the remaining single index digits (eg bytes)
However, the sub-expansion with 256 indices
Has a reasonable population (eg 17 indexes)
Each leaf in the subexpansion.
To be represented as a bitmap with 1 bit for each
More memory is conserved and therefore 256 bits or
This is 32 bytes. Implemented on 32-bit word platforms
Example of index-only bitmap leaf to be instantiated
Is represented in FIG. In the figure, each horizontal
The rectangular area represents one word. 64-bit platform
The word grows, and in the bitmap
Leaves are the same, except when the word is half
Become. Bits in the bitmap are expanded leaves
Which indexes in the source can actually exist,
In other words, it shows whether it can be memorized. FIG. 9 shows that the data structure in question is represented by a value.
Alternative implementation for associating with a forgotten index
FIG. As shown,
Value fields containing one word per index are bit mapped.
Included in the briefs. Like a bitmap branch,
Bitmap leaf embodiment is a two-column construct
Is a rich pointer array (2 words per element)
Instead of a value domain subarray, ie, one word per element
The difference is that it is a list of values that have On 64-bit platforms, the bit
The map needs 4 words instead, 4 words are used
I can't. Memory included as a result of using two rows of constructs
And the number of bytes in the cache line is smaller,
Modifying the list is faster. As in the case of the bitmap branch,
Is small enough, for example, 8 bits or 1 byte
If the node divided into 256 remains without being decoded
And the population of Expans is large enough
When it is greater than or equal to 25 indices
Index the effective index during expansion
It is not convenient to represent as a bitmap instead of a list of
(Ie, "cheap for memory")
Call This characteristic is based on the undecoded byte for each index.
Level 1 of the tree (that is, the root
True only at the leaf farthest from the node)
You. According to a preferred embodiment of the present invention, a bitmap
Using leaves is a level 1 leaf, that is, only
Limited with respect to an index containing one undecoded byte
You. The data structure further includes a leaf index
Including compression. As described in connection with linear leaves,
The traversal of the digital tree is based on the search, inserted, or deleted
Target index part (for example, 1 byte segment
G) to decode the index bits. Many
In many cases, when the leaf is reached, it is stored in the leaf
Some or most of the bits in the index
Most have already been decrypted. In other words, the tree
And stored for each location (ie, digitally). Therefore, the remaining undecoded index bits
Only the (suffix) is stored in the leaf. Therefore, 3
4-byte index on 2-bit platforms
Are decoded one byte at a time (ie, the tree
Two 64-byte wide keys)
Cache line size (ie 128 bytes)
The (terminal) leaf is as shown in Table 3.
It may contain the number of compression indexes. [0095] [Table 3] Referring to Table 3, one byte for each index
, Once population exceeds 24 indices
The 32 byte (ie 256 bit) object
Of the possible indexes on the low-level leaf
It is enough to hold a bitmap representing everything.
Also, leaf index compression has additional advantages
A point should be noted. In particular, the lower level indexing of the tree
Index retains more indices than the current level leaf.
You can have. As a result, even if the immediate index
Single overflows existing leaf without any
Even if you cascade by inserting an index,
Never create more than one additional level in the tree
Absent. Similarly, by removing a single index
More than one level in the tree without cascading
Never delete it. In other words, leaf compression is
It supports good locality of change during modification. As described above, this data structure is
One that describes fixed-size indexes
Character strings and bit strings of indefinite length
To accommodate various sizes of indexes
It may be modified immediately. For example, an undefined length bitstream
Single indexing by using indexing as an index.
The suffixes that were left on their own, if small enough
If so, it may be stored immediately in the rich pointer,
Longer than a single index of various sizes
It may be stored in the tail leaf. FIG. 10 illustrates a data structure embodying the present invention.
Support memory storage programs
A schematic diagram of a computer system that can be
is there. Thus, the present invention provides a wide variety of data structures,
Language, operating system, hardware platform
FIG. 15 is applicable to
Includes a suitable platform to support the invention
That is, such a computer system 1000 is shown. The computer system 1000 has a system
A central processing unit 1001 coupled to the CPU100
1 is HP PA-8500 or Intel Pentium processor
The CPU may be a general-purpose CPU as described above. But
However, the CPU 1001 determines here, for example,
As long as the described devised operation is supported, the present invention
Is not bound by the CPU1001 architecture
No. The system bus 1002 has a random access memory (RA
M) coupled to 1003, which can be SRAM, DRAM, or SDRAM
It may be. The ROM 1004 is also connected to the system bus 1002
However, it may be a PROM, EPROM, or EEPROM. RAM
1003 and ROM 1004 store user, system data, and
Make the program known in the industry. The system 1002 also provides input / output (I / O) control.
Card 1005, communication adapter card 1011, user interface
Face card 1008 and display card 1009
Are combined. The I / O card 1005 is
Drive, CD drive, floppy (registered trademark) disk drive
Storage devices such as eves, tape drives, etc.10
Connect 06 to the computer system. Communication card 1011
Connects computer system 1000 to network 1012
Network 1012 is
Network, local / wide area network (LA
N / WAN), Ethernet (registered trademark) network,
May be an Internet network, wired or wireless
It can be. User interface card 10
08 is keyboard 1013, pointing device 1007,
And other user input devices to the computer system 1000
Join. Display card 1009 is controlled by CPU 1001
It drives and controls the display device 1010. The present invention is considered to be the presently preferred embodiment.
Although the present invention has been described in relation to
The scope is not limited to the
Various amendments and equivalents included in the spirit and scope of the request
Understand that it is intended to cover the arrangement of
It is. The present invention includes the following embodiments as examples. (1) To store in computer memory
The data structure is a data processing system.
By the application program executed in the
Accessible, said data structure comprising a root pointer
(101), pointed by the root pointer,
Tree containing a first plurality of nodes arranged in a tree
(102, 105, 106, 112-113) and the second plurality of nodes
Is the number of solid subexpans and the digital tree
Linear branch, selected according to the overall status of the
Node (FIG. 3), bitmap branch node 400,
And extended branch nodes (Fig. 5)
Branch nodes (105,106,113) selected from
Shape leaf node (Figs. 6 and 7) and bitmap tree
8 (FIGS. 8 and 9)
Are selected, each holds multiple indexes,
The number of leaves in the tree and the indexes in the leaves
Only the undecoded index according to the level of
(108, 110, 116-123). (2) The second plurality of nodes are linear
Branch node (Figure 3), bitmap branch node (Figure
4), and (1) including an extended branch node (FIG. 5)
The described data structure. (3) The second plurality of nodes are linear
Fnode (Figs. 6 and 7) and bitmap leaf node
The data structure according to (1), including data (FIGS. 8 and 9). (4) The linear branch node (FIG. 3)
At least a linear list, the first list at least
The corresponding ins of each relevant subexpansion
Including a subexpanse descriptor with dex bits,
The second list contains one or more of each relevant subexpansion.
A pointer to a number of auxiliary nodes, said pointer being the previous
Corresponding to the subexpansion descriptor in the first list.
The data structure according to (1). (5) Each of the pointers is a rich pointer
Data according to (4), including interchanges (FIGS. 2A and 2B)
Construction. (6) The linear branch nodes (105 and
FIG. 3) contains at least two linear lists, the first
The strikes (E1-E4) are each relevant subexp
A sensor that includes at least the corresponding index bit of the
The second list (exp
Rich pointers 1-4 of each instance)
Contains pointers to one or more auxiliary nodes of the pan
And the pointer is the sub-exit of the first list.
The data structure according to (1), which corresponds to a Pance description word. (7) The bitmap branch 113,400
Is a sub-extract under each possible bitmap branch node.
First list of bits including one bit for spans
At least 401, each of said bits corresponding to
Indicates whether the subexpansion has an index,
The second list of pointers 402 contains the subexpansion
Refers to at least one auxiliary node for each,
The pointer indicates the status of the bit in the first list.
Data structure corresponding to (1). (8) The bitmap leaf node (see FIG.
8 and FIG. 9) correspond to each possible index in the leaf.
And at least a first list including one bit,
Each of the bits corresponds to a corresponding one of the indices
The data structure described in (1) indicating whether or not one is valid
Build. (9) Data structure to which index belongs
Identify compressed branch nodes (FIGS. 3 and 4) in (FIG. 1)
Determining the parameters of the data structure.
Responsive to the value,
Selectively convert a node into a decompressed branch node (FIG. 5)
Step and index under said decompression branch node
Storing the parameter, wherein the parameter is
Index value for data structure and said compression branch
Total memory used for each population under the node
Index into a data structure that contains one of the fields
How to remember. (10) The data structure is a data processing system.
Application programs running on stem 1000
Access to the computer memory 1003,1
004, said data structure is stored in the root pointer 101
Are pointed by the root pointer and are arranged hierarchically.
Digital tree 102, 105, 106, 11
2-113, and each of the nodes is
3 and 4 and the extended branch node
Bra selected from the group consisting of (FIG. 5)
And the linear leaf nodes (see FIG. 6).
And Figure 7) and bitmap leaf nodes (Figure 8 and Figure
Leafno selected from the group consisting of 9)
Cards 108, 110, 116-123, and each leaf node has multiple
Holds a number index for leafs in the digital tree
Undecoded according to level and number of indices in leaf
The method according to (9), including only index bits.

【図面の簡単な説明】【図１A】メモリ利用効率を最大化し、その一方でイン
デックスアクセス時間を最小化するための、本発明に係
るハイブリッド抽象データタイプ構造（ADTs）を結合す
るデジタルツリーの例の概略図である。【図１B】本発明に係るハイブリッド抽象データタイプ
構造（ADTs）を結合するデジタルツリーの例の概略図で
ある。【図１C】本発明に係るハイブリッド抽象データタイプ
構造（ADTs）を結合するデジタルツリーの例の概略図で
ある。【図１D】本発明に係るハイブリッド抽象データタイプ
構造（ADTs）を結合するデジタルツリーの例の概略図で
ある。【図１E】本発明に係るハイブリッド抽象データタイプ
構造（ADTs）を結合するデジタルツリーの例の概略図で
ある。【図１F】本発明に係るハイブリッド抽象データタイプ
構造（ADTs）を組み合わせた例の概略図である。【図２A】適合オブジェクト又は「リッチポインタ」の
一般的な略図である。【図２B】インデックスの中間格納を結合するリッチポ
インタの一般的な略図である。【図３】線形ブランチの例の略図である。【図４】ビットマップブランチの例の略図である。【図５】伸長ブランチの略図である。【図６】インデックスのみを参照する構造についての線
形リーフの例を示す概略図である。【図７】構造に記憶された各有効インデックスに関連す
る値を有する構造についての線形リーフの例を示す概略
図である。【図８】インデックスのみを参照する構造についてのビ
ットマップリーフ構造の概略図である。【図９】各インデックスに関連する値を含むビットマッ
プリーフ構造の概略図である。【図１０】本題となるデジタルツリーが具現化されるコ
ンピュータシステムのブロック図である。【符号の説明】１０１ルートポインタ１０２トップレベルノード１０３，１０４，１０７，１０９リッチポインタ１０５線形ブランチ１０６伸長ブランチ１０８，１１０線形リーフノード１１６，１１７，１２３ビットマップリーフ１１８−１２２線形ブランチリーフ４００ビットマップブランチBRIEF DESCRIPTION OF THE DRAWINGS FIG. 1A is an example of a digital tree combining hybrid abstract data type structures (ADTs) according to the present invention to maximize memory utilization while minimizing index access time. FIG. FIG. 1B is a schematic diagram of an example of a digital tree combining hybrid abstract data type structures (ADTs) according to the present invention. FIG. 1C is a schematic diagram of an example of a digital tree combining hybrid abstract data type structures (ADTs) according to the present invention. FIG. 1D is a schematic diagram of an example of a digital tree combining hybrid abstract data type structures (ADTs) according to the present invention. FIG. 1E is a schematic diagram of an example of a digital tree combining hybrid abstract data type structures (ADTs) according to the present invention. FIG. 1F is a schematic diagram of an example combining hybrid abstract data type structures (ADTs) according to the present invention. FIG. 2A is a general schematic of a matching object or “rich pointer”. FIG. 2B is a general schematic of a rich pointer that combines intermediate storage of indexes. FIG. 3 is a schematic diagram of an example of a linear branch. FIG. 4 is a schematic diagram of an example of a bitmap branch. FIG. 5 is a schematic view of an extension branch. FIG. 6 is a schematic diagram illustrating an example of a linear leaf for a structure that refers only to an index. FIG. 7 is a schematic diagram illustrating an example of a linear leaf for a structure having a value associated with each valid index stored in the structure. FIG. 8 is a schematic diagram of a bitmap leaf structure for a structure that refers only to an index. FIG. 9 is a schematic diagram of a bitmap leaf structure including a value associated with each index. FIG. 10 is a block diagram of a computer system in which the subject digital tree is embodied. DESCRIPTION OF SYMBOLS 101 Root pointer 102 Top level node 103, 104, 107, 109 Rich pointer 105 Linear branch 106 Decompression branch 108, 110 Linear leaf node 116, 117, 123 Bitmap leaf 118-122 Linear branch leaf 400 Bitmap branch

───────────────────────────────────────────────────── フロントページの続き (72)発明者アラン・シルバースタインアメリカ合衆国80525コロラド州フォートコリンズ、シドニー・ドライヴ 618 ────────────────────────────────────────────────── ─── Continuation of front page (72) Inventor Alan Silverstein United States 80525 Fort Colorado Collins, Sydney Drive 618

Claims

Claims 1. A data structure for storage in a computer memory, the data structure being accessible by an application program running on a data processing system, the data structure comprising: a root pointer; A digital tree including a first plurality of nodes pointed by the root pointer and arranged in a hierarchy.
Are the branch nodes selected by a group consisting of a linear branch node, a bitmap branch node, and a decompression branch node, which are selected according to the number of solid subexpanses and the overall status of the digital tree. A leaf selected from the group consisting of a linear leaf node and a bitmap leaf node, each holding a plurality of indices and containing only undecoded indices according to the level of leaves in the digital tree and the number of indices in the leaves A data structure containing nodes.