EP1049990A1 - Base de donnees - Google Patents

Base de donnees

Info

Publication number
EP1049990A1
EP1049990A1 EP99901096A EP99901096A EP1049990A1 EP 1049990 A1 EP1049990 A1 EP 1049990A1 EP 99901096 A EP99901096 A EP 99901096A EP 99901096 A EP99901096 A EP 99901096A EP 1049990 A1 EP1049990 A1 EP 1049990A1
Authority
EP
European Patent Office
Prior art keywords
index
data
thε
key
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
EP99901096A
Other languages
German (de)
English (en)
Other versions
EP1049990A4 (fr
Inventor
Moshe Shadmon
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ori Software Development Ltd
Ori Software Dev Ltd
Original Assignee
Ori Software Development Ltd
Ori Software Dev Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ori Software Development Ltd, Ori Software Dev Ltd filed Critical Ori Software Development Ltd
Publication of EP1049990A1 publication Critical patent/EP1049990A1/fr
Publication of EP1049990A4 publication Critical patent/EP1049990A4/fr
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees

Definitions

  • This invention relates to databases and database management systems.
  • a database system is a collection of interrelated data files, indexes and a set of programs that allow one or more users to add data retrieve and modify the data stored in these files.
  • the fundamental concept of a database system is to provide users with a so called “abstract” and simplified view of the data (referred to also as data model or conceptual structure) which exempts a conventional user from dealing with details such as how the data is physically organized and accessed.
  • relational model Other concepts introduced by the relational model are high level operators that operate on tables (i.e., both their parameters and results are tables) and comprehensive data languages (now called 4th generation languages) in which one specifies what are the required results rather than how these results are to be produced.
  • Such non-procedural languages SQL - Structured Query Language
  • SQL - Structured Query Language have become an industry standard.
  • the relational model suggests a very high level of data independence. There should not be any effect on the programs written in these languages due to changes in the manner data are organized, stored, indexed and ordered.
  • the relational model has become a de-facto standard for data analysts.
  • Network Model - In the relational model, data (and relationship between data) are regarded as a collection of tables. In distinction therefrom in the network model data are represented as a collection of records whereas relationship between the records (data) are represented as links.
  • a record in the network model is similar to an "entity" in the sense that it is a collection of fields each holding one type of data.
  • the links may be effectively viewed preferably (but not necessarily) as pointers.
  • a collection of records and the relation therebetween constitutes a collection of graphs.
  • Hierarchical Model resembles the network model in the manner that data and relations between data are treated, i.e. as records and links. However, in distinction from the network model, the records and the relations between them constitute a collection of trees rather than of arbitrary graphs.
  • the structure of the Hierarchical Model is simple and straightforward particularly in the case that the data that needs to be organized in a database are of inherent hierarchical nature.
  • the hierarchical model has some inherent shortcomings, e.g. in many real life scenarios data cannot be easily arranged in hierarchical manner. Moreover, even if data may be organized in hierarchical manner, it may require larger volumes as compared to other database models.
  • the object-oriented approach views all entities a objects. Each object belongs to a class, with each class there are associated methods and fields. - 4 -
  • the fields are private, accessible only to methods of the class while others axe public accessible to all.
  • "Joe Smith" belongs to the class of persons.
  • the private fields age can be defined.
  • Applying the class method update_age() to the object Joe will change his age.
  • the methodology allows to define sub-classes which inherit all the methods and fields of the super-class.
  • the employee class can be defined as a subclass of the person class.
  • the employee class could support a salary field, and the get_raise ( ) method.
  • Object Relational Model allows an object view on relational-organized data. Thus, one is able to operate on the data as if it is organized as objects and at the same time, support the relational approach.
  • data models deal with the conceptual or logical level of data representation and "hide” details such as how the data are physically airanged and accessed.
  • the latter characteristics are normally dealt with by a so-called database file management system.
  • the database file management system maps the logical structure (in terms of database model) to a data structure, pertinent operations and possibly other data.
  • the data structure includes index -and data records.
  • the index enables accessing or updating the data records by a key. .In the context of search, the term search key is used.
  • Database file management system should preferably operate on the data records so as to accomplish enhanced performance in terms of time (i.e. from the user's standpoint fast response time of the database), and space (i.e. to minimize the storage volume that is allocated for the database files). As is well known in the art, normally, there is a trade off between the time and space requirements.
  • the performance of the database depends on the efficiency of the data structures that are used to represent the data and how efficiently the system can operate on these data. A detailed discussion on conventional file and management systems is given for example in Chapters 7 (file system structure) and 8 (indexing ) in "Database System Concepts", ibid.
  • Known database file management systems typically utilize the following indexing schemes, which fall into the following main categories that include: Multi-way trees indexes and others.
  • Multi-way trees indexes- These techniques can be used to create a one or more access paths (referred to also as search paths) to the same data record.
  • the search paths form a multi-way tree.
  • Its main disadvantages are that it requires space (usually all the keys to the records plus some pointers) and maintenance (addition and/or deletion of keys whenever an update transaction (see definition below) occurs i.e. record is added and/or deleted.
  • the nature of the indexing scheme as well as the volume of the data held in the files determine the number of accesses that are required to find or update (update encompasses, insert, delete or modify) a given data record.
  • the storage medium under consideration is an external memory, the number of accesses is effectively the number of .I/O accesses.
  • a block of data is loaded into the memoiy.
  • Trie indexing scheme An example of the latter is the trie discussed in G. Wiederhold, “File organization for Database design”; Mcgraw-Hill, 1987, pp. 272, 273, or in D.E. Knuth, “The Art of Computer Programming”; Addison- Wesley Publishing Company, 1973, pp. 481-505, 681-687.
  • the trie indexing scheme enables a rapid search whilst avoiding the duplication of keys as manifested for example by the B tree technique.
  • the trie indexing scheme has the general structure of a tree wherein the search is based on partitioning the search according to search key portions (e.g. search key digit or bit).
  • search key portions e.g. search key digit or bit.
  • each node in the trie indexing file represents an offset of the search key and the link to any one of its children represents the character's value at said offset.
  • the trie structure affords efficient data structure in terms of the memory space that is allocated therefor, since, as specified before, the search-key is not held, as a whole, in internal nodes and hence the duplication that is exhibited for example in the B -tree indexing technique is avoided.
  • a trie indexing file should be built by selecting the digits (or bits) from the search key such that the best possible partition of the search space in obtained, or in other words so as to accomplish a tree which is as balanced as possible. This, however, requires a priori .knowledge of the data records of the trie and is accomplished at the penalty of obtaining an unsorted data, which in many real-life scenarios is inapplicable. It is noteworthy that if sorted data is mandatory, a balanced structure can not be guaranteed even if there is sufficient a prioiri knowledge of the data records of the trie. It should be noted that the specified trie does not support sequential sub-range processing.
  • the specified B-tree indexing scheme constitutes an inherent balanced tree structure, even after the tree has been subject to update transactions.
  • the inherent balanced (or essentially balanced) structure is accomplished, however, and as explained above, at the penalty of inflating the contents of the blocks in the tree and, consequently, unduly increasing the file size that holds the index, particularly insofar as large trees which hold multitude of data records are concerned.
  • the large volume of the files adversely affects the performance of the data management system in terms of number of accesses (and consequently in terms of accessing time) to the storage medium in order to reach a sought data record, which is obviously undesired.
  • the r ⁇ resentatives cf level i constitute the nodes of level i - 1 .
  • Level h+1 is the first empty level.
  • the index scheme includes here three index files. This obviously poses undesired overhead insofar as data volumes and additional integrity maintenance and checking are concerned.
  • r ⁇ moval of a giv ⁇ n book from the book file requires a preliminary t ⁇ st to inquire whether it exists in the borrower-book index file.
  • Block - a storage unit which can be access ⁇ d by a singl ⁇ I/O op ⁇ ration.
  • a block may contain data arrang ⁇ d in any d ⁇ sir ⁇ d mann ⁇ r, ⁇ .g. nod ⁇ s arrang ⁇ d as a tree and possibly also links to actual data records.
  • a block may reside in main (ref ⁇ rr ⁇ d to also as int ⁇ rnal) or s ⁇ condary (referred to also as ext ⁇ rnal) storag ⁇ .
  • Tree - a data structure which is cither empty or consists of a root node linked by means of d ⁇ d ⁇ ) pointers (or links) to d disjoint trees called subtrees of the root.
  • a node all the subtrees of which ar ⁇ ⁇ mpty is called a leaf node.
  • the nodes in th ⁇ tr ⁇ that ar ⁇ not l ⁇ av ⁇ s ar ⁇ d ⁇ signat ⁇ d as internal nodes.
  • leaf nodes are also nodes that are associated with data records.
  • tre ⁇ encompasses also a tre ⁇ of blocks wh ⁇ r ⁇ in each node constitutes a block.
  • desc ⁇ nd ⁇ nt blocks of a said block ar ⁇ all th ⁇ blocks that can be access ⁇ d from th ⁇ block.
  • tr ⁇ refer also to the book Cormen, L ⁇ is ⁇ rson and Riv ⁇ st, or L ⁇ wis and D ⁇ n ⁇ b ⁇ rg "Data structures and th ⁇ ir algorithms”.
  • th ⁇ association ⁇ .g. link
  • data r ⁇ cord encompasses any realization, which enabl ⁇ s to access data records from l ⁇ af nod ⁇ s.
  • a data record may be accessed directly (i.e. through pointer) from the leaf node.
  • th ⁇ l ⁇ af nod ⁇ points to data structure, (e.g. a table) which, in turn, enables to access data records.
  • Oth ⁇ r variants ar ⁇ of course, also feasible.
  • Depth of an index - is defin ⁇ d as th ⁇ maximum number of blocks from a root block to a block associated with a data record. - 11 -
  • An ind ⁇ x is balanced if th ⁇ r ⁇ ⁇ xists a constant c such that th ⁇ numb ⁇ r of accesses needed to reach any data record is at most clogrc , where n is the number of records in the structure.
  • Accessing in an index would be consider ⁇ d as a process of moving from a node to another node within a block or to another block usually, although not necessarily, in order to reach sought data records.
  • Navigating is consider ⁇ d as accessing data records, usually (although not necessarily), in order to collect them in an order ⁇ d mann ⁇ r by th ⁇ ir k ⁇ y.
  • Search scheme m ⁇ aning th ⁇ algorithm that is associated with an index that is used for accessing a given data record by key; intra-block search scheme meaning the algorithm that is us ⁇ d insid ⁇ th ⁇ block for accessing a given data record or another block. Th ⁇ data r ⁇ cord is not necessarily accommodated within said block.
  • the common key of a block is the long ⁇ st prefix of all k ⁇ ys of th ⁇ data r ⁇ cords that can b ⁇ accessed from the block by the rel ⁇ vant search scheme. If d ⁇ sir ⁇ d, part or all of th ⁇ common k ⁇ y may b ⁇ h ⁇ ld explicitly in the block.
  • Update transactions - transaction consisting of eith ⁇ r inserting a new data record, or del ⁇ ting an ⁇ xisting data r ⁇ cord or modifying an existing data record or portion ther ⁇ of .
  • Horizontal oriented trie structure having h l ⁇ v ⁇ ls of v ⁇ rtical orientated trie structures with the first lev ⁇ l standing for th ⁇ upp ⁇ rmost l ⁇ v ⁇ l and the h th lev ⁇ l standing for th ⁇ low ⁇ nost level (constituting the tri ⁇ that is susc ⁇ ptibl ⁇ to an unbalanced structure) which is normally associated with data r ⁇ cords, and allows to mov ⁇ from a block in the z ' th lev ⁇ l to a block in th ⁇ i + 1 st level according to a common key value of the block.
  • the h upper levels constitute a representativ ⁇ ind ⁇ x ov ⁇ r th ⁇ common k ⁇ ys of th ⁇ blocks of th ⁇ low ⁇ rmost level tre ⁇ .
  • Storage medium - Any medium that may be used to store data, including eith ⁇ r or both of int ⁇ mal and external memory.
  • Ext ⁇ rnal m ⁇ mory may b ⁇ one or more of the following: magnetic tape, magnetic disk, optical disk, or any oth ⁇ r physical medium used for storing data.
  • Int ⁇ rnal m ⁇ mory includes any known main memory including cache memory as well as any other physical storage medium that serr ⁇ as internal memory.
  • Short link - (ref ⁇ rr ⁇ d to also as near link) a link lab ⁇ l ⁇ d k b ⁇ tween a node a having the value r to node b in the same block such that the keys of the data records that include node b on their access path hav ⁇ th ⁇ value k at key position r.
  • Long link - (referred to also as far link) a link betw ⁇ n a nod ⁇ v in block B of level i to block W of level i - 1 or to a data record. If v has value r and the label of the link is k, then th ⁇ valu ⁇ of th ⁇ common k ⁇ y of block B' or th ⁇ k ⁇ y of the data record is k at position r.
  • the label of a short link or a far link is also referred as the value or direction of the link.
  • Aft ⁇ r th ⁇ split, th ⁇ split link is the link betw ⁇ n nod ⁇ a and block B (that is accommodating nod ⁇ b).
  • a split link is a lab ⁇ l ⁇ d link.
  • Direct link - a link betwe ⁇ n nod ⁇ v in block B of l ⁇ v ⁇ l i to block B' of level i - ⁇ , that includes a node v' such that nodes v and v' have the same value. If a search path to data record with a key k includes node v but does not include any of its near and far links then it should contain the dir ⁇ ct link to block B'. A dir ⁇ ct link has no label.
  • v is considered a duplicated node of v'.
  • a duplicated node maintains a direct link to the block that includes node v . (a duplicated node is also ref ⁇ rr ⁇ d as copied node).
  • Data records consist as a .rule of several fields, some of which are designat ⁇ d as keys. Som ⁇ tim ⁇ s th ⁇ records ar ⁇ ord ⁇ r ⁇ d by on ⁇ of th ⁇ keys, called the primary key. .An index (or index schem ⁇ ) ov ⁇ r th ⁇ keys of data records or over representativ ⁇ k ⁇ ys (for the definition of the latter se ⁇ b ⁇ low) is a data structure that facilitates search by one or more of the keys. Examples of index are any of the specified Multi-way tree index schemes. An index according to the invention may be constituted by using more than one index schem ⁇ .
  • Th ⁇ ind ⁇ x may be stored in a file or files that reside partially or entirely in the internal memory or ext ⁇ rnal m ⁇ mory.
  • an index that includes a partitioned index — a dynamic data structure - that allows search by key, and is partitioned into blocks, each of which contains a representative key.
  • the representative keys should be sufficient to find the block associated with a record whose key equals the s ⁇ arch k ⁇ y (if on ⁇ ⁇ xists). Having located the block, the data record may easily be retrieved.
  • the repr ⁇ s ⁇ ntative keys are not necessarily stored physically in the block.
  • partitioned index examples are:
  • partition ⁇ d index contains its key and its link.
  • Thes ⁇ pairs ar ⁇ ord ⁇ r ⁇ d by non-d ⁇ creasing value of th ⁇ k ⁇ y.
  • a partitioned index ⁇ s ov ⁇ r th ⁇ k ⁇ ys of data r ⁇ cords is called a basic partitioned index and is denot ⁇ d ind ⁇ x layer I..
  • This partitioned index might become non-balanc ⁇ d, thus giving rise to some long search paths.
  • an additional index layer (an index layer is denot ⁇ d in short also index) I x is constructed over the representativ ⁇ k ⁇ ys of I Q .
  • I x is also a partition ⁇ d ind ⁇ x th ⁇ n an additional index I. may be constructed over the repr ⁇ sentative k ⁇ ys of th ⁇ blocks of I ⁇ . This process may be rep ⁇ at ⁇ d until creating an index I h (her ⁇ inaft ⁇ r root ind ⁇ x) which preferably is fully contained within a single block.
  • Th ⁇ root ind ⁇ x I h is not necessarily a partitioned index.
  • the layered index is not necessarily a partitioned index.
  • I v ... ,I h constitute a so called representative index.
  • a search is perform ⁇ d as above to find the block B . Having found B in I. , r is added to B .
  • Th ⁇ ov ⁇ rflow of block B x in I x entails a splitting of B x and the repr ⁇ sentative of B x in I. is replaced by the representativ ⁇ s of th ⁇ new blocks etc. If the block of /,, overflows an additional layer I h+X is created and added to the layer ⁇ d ind ⁇ x. It should b ⁇ not ⁇ d that an "ov ⁇ rflow" stat ⁇ may b ⁇ d ⁇ t ⁇ rmined according to the particular application, and do ⁇ s not necessarily trigger ⁇ d wh ⁇ n block is r ⁇ nd ⁇ r ⁇ d full. Thus, for example, by one embodim ⁇ nt ov ⁇ rflow occurs wh ⁇ n a block is at least half size full.
  • Deletion is similar to insertion, and might involve merging — rev ⁇ rs ⁇ process of splitting.
  • the updat ⁇ or th ⁇ split n ⁇ d not n ⁇ cessarily be performed on the fly, but may b ⁇ d ⁇ lay ⁇ d (i.e. performed post factum).
  • const ction of the layer ⁇ d ind ⁇ x preferably retains a balanced index.
  • th ⁇ inh ⁇ r ⁇ nt limitations of a basic partitioned index e.g. trie
  • a basic partitioned index e.g. trie
  • memory ⁇ ⁇ fficient means that the number of accesses to the storage medium through the layer ⁇ d ind ⁇ x in ord ⁇ r to p ⁇ rform an update transaction (e.g. insert, del ⁇ t ⁇ or modify) on a data r ⁇ cord or access data record is smaller compared to the number of accesses to the storage medium through the basic partitioned index.
  • Numb ⁇ r of accesses should be construed such that in each access a block is handled (e.g. loaded or proc ⁇ ss ⁇ d) from th ⁇ storag ⁇ m ⁇ dium.
  • Th ⁇ r ⁇ may b ⁇ ⁇ xceptional scenarios where the latter "mor ⁇ ⁇ fficient" provision does not apply ⁇ .g. in th ⁇ cas ⁇ of v ⁇ ry small fil ⁇ having only f ⁇ w blocks, wh ⁇ r ⁇ accessing a data record through the basic partitioned index may requir ⁇ th ⁇ sam ⁇ or even l ⁇ ss op ⁇ rations than through said lay ⁇ r ⁇ d ind ⁇ x.
  • each k ⁇ y is r ⁇ gard ⁇ d as a character or bit string.
  • the trie if it cannot be accommodated in a single block, it is partition ⁇ d into blocks, such that ⁇ ach block contains a singl ⁇ subtree of the trie.
  • the repr ⁇ s ⁇ ntativ ⁇ k ⁇ y of the block is the string associated with the root node of the trie in th ⁇ block, i.e., the sequence of labels of the path from th ⁇ root of th ⁇ trie of /,. to the root of the trie of th ⁇ block.
  • the r ⁇ presentative k ⁇ ys of /,. are the k ⁇ ys of I i+ .
  • To search a key k in I M one s ⁇ arches for the longest prefix k in the blocks of I i+X and from there moves to the appropriate block of /,..
  • a r ⁇ cord ⁇ n tails adding its k ⁇ y to 7 0 , i.e., adding a value to the tri ⁇ of I- . If as a result a block overflows, the block is split — it is partitioned into typically two (in some implem ⁇ ntations mor ⁇ ) blocks, such that ⁇ ach block contains a (conn ⁇ ct ⁇ d) tri ⁇ . To accomplish this a link b ⁇ tw ⁇ n a nod ⁇ w and its child v is severed, and the subtre ⁇ root ⁇ d atv is mov ⁇ d to anoth ⁇ r block. The repr ⁇ s ⁇ ntative key of the n ⁇ w block, is add ⁇ d to I x . As in th ⁇ g ⁇ n ⁇ ral lay ⁇ r ⁇ d ind ⁇ x sch ⁇ m ⁇ , this process is continued to y..y.
  • th ⁇ s ⁇ savings affect the manner in which the search is perform ⁇ d. In such compressed tries usually only nodes of d ⁇ gr ⁇ e greater than or equal to two are maintained. If the s ⁇ arch k ⁇ y k do ⁇ s not b ⁇ long to compr ⁇ ss ⁇ d tri ⁇ , th ⁇ s ⁇ arch might t ⁇ rminat ⁇ at som ⁇ record r , and we have to check wheth ⁇ r k is ⁇ qual to the key of r . If the keys ar ⁇ different th ⁇ n th ⁇ tri ⁇ does not contain a record with key k .
  • Thes ⁇ links do not hav ⁇ a direction, and ar ⁇ taken when the appropriat ⁇ position of th ⁇ s ⁇ arch k ⁇ y do ⁇ s not agree with any one of the directions of the nod ⁇ .
  • Th ⁇ search continued from block of I t _ x pointed at by that direct link. (If no such node exists, we go to the first block of the index f_ x .)
  • each layer might r ⁇ quir ⁇ one extra access.
  • 3 layers ar ⁇ sufficient to address billions of r ⁇ cords and usually 2 layers can be maintained in the internal memory of a computer.
  • the split process also has to accommodate dir ⁇ ct links. Suppos ⁇ that th ⁇ access path to block B t _ of /,._, consists of blocks,, of layer I ; , £,._, - 19 -
  • Block B l has now to contain links to all its d ⁇ sc ⁇ nd ⁇ nt blocks in I t _ x . This can b ⁇ accomplished by the following non-limiting technique:
  • ky be the representative key of By, this key is insert ⁇ d to T, — th ⁇ compr ⁇ ss ⁇ d tri ⁇ of B, — so that th ⁇ s ⁇ arch to the keys of descend ⁇ nts of B reaches By, and the search for th ⁇ descend ⁇ nts of #,_, reaches B t _ x .
  • a non-limiting method to accomplishing split process is as follows:
  • At least one short link among the short links of a node (her ⁇ on split nod ⁇ ) in th ⁇ block is d ⁇ l ⁇ t ⁇ d (h ⁇ r ⁇ on split link) in a way that at least two tries exist in the block.
  • each of the sub-tre ⁇ s is mov ⁇ d to a separate block.
  • B l is cr ⁇ at ⁇ d and a copied node of the split node is cr ⁇ at ⁇ d in B t .
  • th ⁇ far link can b ⁇ r ⁇ plac ⁇ d by a dir ⁇ ct link from th ⁇ child nod ⁇ to block s .
  • a split of a block in I k , k>0 is performed such that the split links (of I k ) are links b ⁇ tween copi ⁇ d nod ⁇ s of - 20 -
  • the invention provides for in a storage m ⁇ dium us ⁇ d by a databas ⁇ file managem ⁇ nt system ex ⁇ cuted on data processing syst ⁇ m, a data structure that includes: a layered index arranged in blocks; the layer ⁇ d index includes a basic partitioned index that is associated with data records; the basic partitioned ind ⁇ x ⁇ nables accessing or updating the data records by key or keys, and b ⁇ ing susceptible to an unbalanced structure of blocks; said layer ⁇ d ind ⁇ x ⁇ nabl ⁇ s accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
  • Th ⁇ inv ⁇ ntion furth ⁇ r provides for, in a storage m ⁇ dium used by a database file management system ex ⁇ cuted on data processing system, a data structure that includes: an index arranged in blocks and being constructed over the keys of data records; the index includes a basic partitioned index that is associated with the data records; the basic partitioned index enabl ⁇ s accessing or updating the data records by key or keys, and being susceptibl ⁇ to an unbalanced structure of blocks; said index enables accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
  • Still fu.rtl ⁇ er, th ⁇ invention provides for, in a storage m ⁇ dium us ⁇ d by a databas ⁇ file managem ⁇ nt system ex ⁇ cut ⁇ d on data processing system, a data structure that includes: an index arranged in blocks and being constructed over the k ⁇ ys of data r ⁇ cords; the index includes a trie that is associated with the data records; the trie enables accessing or updating the data records by k ⁇ y or keys, and being susceptibl ⁇ to an unbalanced structure of blocks; said ind ⁇ x ⁇ nables accessing or updating the data records by key or keys and constitutes a balanced structure of blocks.
  • the invention provides for in a database file management - 21 -
  • syst ⁇ m for accessing data records and being ex ⁇ cut ⁇ d on data processing system
  • the data records are associated with a basic partitioned index arranged in blocks and being stored in a storage medium
  • the basic partition ⁇ d ind ⁇ x ⁇ nabl ⁇ s accessing or updating the data records by key or keys and being susceptibl ⁇ to an unbalanced structure of blocks
  • a method for constructing a layer ⁇ d ind ⁇ x arranged in blocks comprising the steps of:
  • Th ⁇ inv ⁇ ntion furth ⁇ r provid ⁇ s for in a databas ⁇ file management system for accessing data r ⁇ cords and being ex ⁇ cuted on data processing system;
  • the data records are associated with a basic partitioned index arranged in blocks and being stored in a storage medium; the basic partitioned index enables accessing or updating th ⁇ data r ⁇ cords by k ⁇ y or keys and being susceptibl ⁇ to an unbalanced structure of blocks;
  • a method for constructing an index ov ⁇ r the keys of the data r ⁇ cords, th ⁇ ind ⁇ x b ⁇ ing arrang ⁇ d in blocks comprising the steps of:
  • th ⁇ r ⁇ is furth ⁇ r provid ⁇ d in a database file managem ⁇ nt system for accessing data records and being ex ⁇ cut ⁇ d on data processing system; the data records are associated with a tri ⁇ arrang ⁇ d in blocks and b ⁇ ing stor ⁇ d in a storag ⁇ medium; the trie enables accessing or updating the data records by key or k ⁇ ys and being susceptible to an - 22 -
  • Th ⁇ ind ⁇ x is pref ⁇ rably, although not necessarily constructed by on ⁇ or mor ⁇ of th ⁇ ind ⁇ xing schemes sel ⁇ ct ⁇ d from the specified index schem ⁇ s.
  • Typical, y ⁇ t not exclusive, examples of multi-way trees indexes being the B-tre ⁇ ind ⁇ xing sch ⁇ m ⁇ .
  • said basic partitioned search scheme being a tri ⁇ that is constituted by a digital tre ⁇ of th ⁇ .kind disclosed in U.S patent no. 5,495,609.
  • said trie is constituted by a so called Probabilistic Access Inde.xing File (PACF).
  • PAF Probabilistic Access Inde.xing File
  • a data structure that includes at least one probablistic access indexing file (P.AIF) having a plurality of nodes and links; the l ⁇ av ⁇ nodes of said P.AIF are associated each with at least one data record accessibl ⁇ to said user application program and wherein at least portion of said data record constitutes at least one search-k ⁇ y; sel ⁇ ct ⁇ d nodes in said PLAF represent, each, a given offset of a search key portion within said inset s ⁇ arch k ⁇ y; link(s) originat ⁇ d from ⁇ ach given node from among said selected nodes, represent, each, a unique valu ⁇ of said search key portion; the PLAF having at least two sub-PIAF's being arrang ⁇ d, each, in a block; - 23 -
  • said data base file managem ⁇ nt syst ⁇ m is furth ⁇ r capable of arranging said blocks as a balanced structure of blocks.
  • one or more of said nodes may include other information, such as portions of the keys and/or other information, all as requir ⁇ d and appropriat ⁇ .
  • the indexing schem ⁇ is constituted by a search scheme substantially identical to that of the PAIF trie.
  • a database fil ⁇ manag ⁇ m ⁇ nt system that employs a layer ⁇ d index of the invention is advantageous, in terms of enhanced perfoimance as compared to hitherto .
  • known techniques inter alia owing to the following characteristics:
  • the proposed layered index constitutes an advantage ov ⁇ r ⁇ .g. hashing scheme and some implem ⁇ ntations of digital trees.
  • furth ⁇ r provides for in a computer system having a storage medium of at least an internal m ⁇ mory that rang ⁇ s b ⁇ tween 10 to 20
  • Th ⁇ inv ⁇ ntion furth ⁇ r provides for In a computer system having a storage medium, a data structure that includes an index over th ⁇ k ⁇ ys of data r ⁇ cords; th ⁇ index is arranged in a balanced structure of blocks and enables to perform sequ ⁇ ntial op ⁇ rations on said data records; the index siz ⁇ is ⁇ ss ⁇ ntially not aff ⁇ cted from the size of said k ⁇ ys.
  • the data records may resid ⁇ in th ⁇ blocks of th ⁇ lay ⁇ red index, or may reside in separate data files (one or more). In th ⁇ latter embodiment the data records should be associated, of course, to the corre- - 25 -
  • a given data record may accommodate more than one search key.
  • Th ⁇ ind ⁇ x is pr ⁇ ferably, although not necessarily constructed by one or more of the ind ⁇ xing sch ⁇ m ⁇ s s ⁇ l ⁇ ct ⁇ d from the specified index schemes.
  • normally data consists of records of several types (e.g. in the exampl ⁇ abov ⁇ books and borrowers).
  • the type of the record determines its fields (attributes) and its keys.
  • th ⁇ typ ⁇ of each key is not kept with the r ⁇ cord and not considered part of the key.
  • Th ⁇ program "k. nows" th ⁇ typ ⁇ of the record, and therefrom the fields of the data records and their structure.
  • Each typ ⁇ of key is assigned with a designator — a string of bits, e.g. a series of one or more characters which, normally but not necessarily, (is) are add ⁇ d as a prefix to all keys of this type.
  • a designated key is a key with its designator.
  • the designator is treated as part of the key (for search or update purposes), and ther ⁇ for ⁇ is part of the index schem ⁇ .
  • th ⁇ d ⁇ signator of th ⁇ k ⁇ y by looking at th ⁇ d ⁇ signator of th ⁇ k ⁇ y, on ⁇ obtains th ⁇ d ⁇ signator h ⁇ nc ⁇ can d ⁇ duc ⁇ th ⁇ typ ⁇ of th ⁇ r ⁇ cord, on ⁇ need not .know the record type a priori.
  • Data records in which th ⁇ k ⁇ ys ar ⁇ d ⁇ signat ⁇ d ar ⁇ called designated data records.
  • a designated index is an index that enabl ⁇ s s ⁇ arch on designated data records.
  • th ⁇ r ⁇ follows a d ⁇ scription of another feature according to the second aspect — subordination of data records.
  • the designated key of R2 is the composite key K1',K2' , where K2' consists of th ⁇ k ⁇ y K2 pr ⁇ fix ⁇ d by a designator D2.
  • the subordination relationship is ⁇ xt ⁇ nd ⁇ d also to r ⁇ cords. If K2 is subordinated to Kl, the designator of K2' is D2 and the designator of R2 is also D2 (or Dl, D2). If R2 is subordinated to Rl, the key of R2 is composed by concatenating K2' to Kl . Note that in K2', D2 is prefixed to K2.
  • the type of record Rl and the type of r ⁇ cord R2 may stand in a one-to-many relationship, meaning that several records of type R2 may be related to a single record of type Rl.
  • Such a relation can be implem ⁇ nt ⁇ d by th ⁇ subordination r ⁇ lation: s ⁇ v ⁇ ral records of type R2 will be subordinat ⁇ d to a singl ⁇ r ⁇ cord of typ ⁇ ( ⁇ .g., s ⁇ v ⁇ ral books can b ⁇ borrow ⁇ d by th ⁇ sam ⁇ borrow ⁇ r).
  • this relationship is one-to-one (e.g.
  • th ⁇ subordinat ⁇ d record can itself have a record subordinated to it and accordingly n level of subordination may be accomplished.
  • ⁇ xampl ⁇ consider a banking database, wher ⁇ th ⁇ account r ⁇ cords are subordinated to the branch r ⁇ cords, and deposits records ar ⁇ subordinated to accounts.
  • l ⁇ t R b ⁇ a r ⁇ cord that is id ⁇ ntifi ⁇ d by ⁇ ith ⁇ r of two k ⁇ ys Kl and K2.
  • Th ⁇ n, th ⁇ designatored index should contain two search paths to R, one by the designated key Kl' and one by th ⁇ d ⁇ signat ⁇ d k ⁇ y K2'. Accordingly, R constitutes a multi-dimensional record.
  • a multi-dimensional index includes the desisnated index and the - 28 -
  • the above discussion and exampl ⁇ considered a multi-dimensional index wher ⁇ the data records do not necessarily exhibit subordination relationship.
  • the multidimensional index may optionally applied also to subordinat ⁇ d data r ⁇ cords.
  • For ⁇ xampl ⁇ consider a banking database, where the d ⁇ posits ar ⁇ subordinat ⁇ d to both accounts and depositors.
  • a single designated index provides access to accounts (by the designated key k x account-number), to depositors (by the d ⁇ signator ⁇ d k ⁇ y & 2 ' depositor-name) and to deposits by both k x k 2 and k 2 k (It is possible, of course, to use differ ⁇ nt designators for the k x when it is subordinated to k 2 and to k 2 when it is subordinated to k .)
  • the d ⁇ signator of a car r ⁇ cord (FIAT, 127) is A wh ⁇ n s ⁇ arching or updating th ⁇ r ⁇ cord by th ⁇ k ⁇ y AFIAT, and is B wh ⁇ n accessing it via the license plate number B 127.
  • the meta-data includes info ⁇ nation on the differ ⁇ nt r ⁇ cords as a function of th ⁇ ir typ ⁇ . Thus, it is needed to identify the designator and as a result the - 29 -
  • Th ⁇ s ⁇ arch scheme in the designated index is oblivious to the meta-data. It locates th ⁇ record, identifi ⁇ s th ⁇ d ⁇ signator (for ⁇ xample the designator can be prefixed to the record) and construct the (composite) designated key.
  • a data structure that includes: an index over the keys of data records; the data records b ⁇ ing of at l ⁇ ast two typ ⁇ s where data records of the s ⁇ cond typ ⁇ ar ⁇ subordinat ⁇ d to th ⁇ data r ⁇ cords of the first type.
  • ther ⁇ is provid ⁇ d in a storag ⁇ medium used by a database file management system executed on data processing system, a data structure that includes: a designat ⁇ d ind ⁇ x over designat ⁇ d k ⁇ ys of data records; the data r ⁇ cords, constituting designated data records, b ⁇ ing of at l ⁇ ast two types wher ⁇ d ⁇ signat ⁇ d data r ⁇ cords of th ⁇ s ⁇ cond typ ⁇ ar ⁇ subordinat ⁇ d to th ⁇ d ⁇ signat ⁇ d data r ⁇ cords of the first type.
  • the data structure that includes designated index and designat ⁇ d data can maintain the relations b ⁇ tw ⁇ n diff ⁇ rent data items.
  • the data structure that includes designated index and designat ⁇ d data can link logically related items.
  • the data structure that includes designated index and designat ⁇ d data can support s ⁇ v ⁇ ral data models simultaneously and efficiently.
  • the data structure that includes designat ⁇ d ind ⁇ x and d ⁇ signat ⁇ d data allows high efficiency in r ⁇ tri ⁇ ving relating data.
  • the data records may constitute part of the PAIF, or may resid ⁇ in on ⁇ or mor ⁇ s ⁇ parat ⁇ data fil ⁇ s.
  • th ⁇ latt ⁇ r ⁇ mbodim ⁇ nt th ⁇ data records should be linked, of course, to the corresponding P.AIF.
  • a giv ⁇ n data r ⁇ cord may accommodate more than one s ⁇ arch k ⁇ y.
  • a data structure that includes: an index being stored in the storage medium and constructed over the keys of said data r ⁇ cords that ar ⁇ stored in blocks; the index being arranged in blocks with th ⁇ l ⁇ af blocks being linked to data records by means of links; said index is characteriz ⁇ d in that at l ⁇ ast on ⁇ of said links is shared by at least two data records stored in th ⁇ same block.
  • the index b ⁇ ing constituted by a trie.
  • the invention provides for, in a storage medium used - 31 -
  • a data structure that includes: an index b ⁇ ing stored in a storag ⁇ m ⁇ dium and constructed over the keys of said data records that ar ⁇ stor ⁇ d in blocks; the index being arranged in blocks with the leaf blocks being link ⁇ d to data r ⁇ cords by means of links; said index is charact ⁇ riz ⁇ d in that at l ⁇ ast on ⁇ of said links is shared by at least two data records stored in the sam ⁇ block; said ind ⁇ x constituting a lay ⁇ r ⁇ d index according to claim 1, and blocks of said basic partitioned index ar ⁇ linked to said data records.
  • Fig. 1 shows a generalized block diagram of a system employing a database file management system
  • Fig. 2 shows a sampl ⁇ databas ⁇ structure r ⁇ pr ⁇ s ⁇ nt ⁇ d as an Entity R ⁇ lationship Diagram (ERD), and serving for illustrative purposes;
  • ERP Entity R ⁇ lationship Diagram
  • Fig. 3 shows the database of Fig. 2, represented as tables in accordance with the relational data model, with each table holding few data occurrences;
  • Fig. 4 shows the "CLIENT" table of Fig. 3, in accordance with file managem ⁇ nt syst ⁇ m employing conventional B + tre ⁇ ind ⁇ x sch ⁇ m ⁇ ;
  • Fig. 5 shows th ⁇ "CLIENT" tabl ⁇ of Fig. 3, in accordance with file manag ⁇ m ⁇ nt syst ⁇ m employing conventional trie index scheme;
  • Figs. 6A-6C show the "CLIENT" table of Fig. 3, in accordance with file managem ⁇ nt system employing a P.AIF index scheme; - 32 -
  • Figs. 7A-7H show schematic illustrations ex ⁇ mplifying construction of a lay ⁇ r ⁇ d ind ⁇ x, according to on ⁇ ⁇ mbodim ⁇ nt of th ⁇ inv ⁇ ntion;
  • Figs. 8A-B show schematic illustrations ex ⁇ mplifying construction of a lay ⁇ r ⁇ d ind ⁇ x, according to y ⁇ t another embodim ⁇ nt of th ⁇ invention
  • Figs. 9A-G show schematic illustrations ex ⁇ mplifying construction of a lay ⁇ r ⁇ d ind ⁇ x, according to y ⁇ t another ⁇ mbodim ⁇ nt of th ⁇ invention
  • Figs. 10A-B show schematic illustrations exemplifying construction of a layered index, according to another embodim ⁇ nt of the invention.
  • Fig. 11 shows a schematic illustration exemplifying construction of a layered index, according to still yet another ⁇ mbodim ⁇ nt of th ⁇ inv ⁇ ntion;
  • Fig. 12 shows a schematic illustration for exemplifying use of designators in a designated index in accordance with one embodiment of the invention
  • FIG. 13A-E show five schematic illustrations for exemplifying feature of subordination of data r ⁇ cords in a d ⁇ signat ⁇ d ind ⁇ x in accordance with one embodim ⁇ nt of th ⁇ inv ⁇ ntion;
  • Fig. 14 shows a schematic illustration of a designat ⁇ d ind ⁇ x ⁇ x ⁇ mplifying multi-dimension record according to an embodim ⁇ nt of the invention
  • Fig. 15 shows a schematic illustration of a designated index according to another embodiment of the invention.
  • Fig. 16 shows a schematic illustration for ex ⁇ mplifying feature of relations among data records provided in accordance with one embodiment of the invention
  • FIG. 17A-B show a schematic illustration of compress ⁇ d repres ⁇ ntation of links to data records in accordance with one embodiment of the invention
  • Fig. 18A-D show four benchmark graphs demonstrating the enhanced performance, in terms of response tim ⁇ and fil ⁇ siz ⁇ , of a databas ⁇ utilizing a fil ⁇ manag ⁇ m ⁇ nt system of the invention vs. commercially available Ctre ⁇ based database; and - 33 -
  • Fig. 19A-D show four b ⁇ nchmark graphs demonstrating the enhanced performance, in terms of r ⁇ spons ⁇ time and file size, of a databas ⁇ utilizing a file management system of the invention vs. commercially available Btree based database.
  • a gen ⁇ ral purpos ⁇ computer e.g. a p ⁇ rsonal computer (P.C.) employing a Pentium microprocessor 3 commercially available from Intel Co.rp. U.S.A, has an operating system module 5, ⁇ .g. Windows NT ® commercially available from Microsoft Inc. U.S.A., which communicates with processor 3 and controls the overall operation of computer 1.
  • P.C. p ⁇ rsonal computer
  • U.S.A has an operating system module 5, ⁇ .g. Windows NT ® commercially available from Microsoft Inc. U.S.A., which communicates with processor 3 and controls the overall operation of computer 1.
  • P.C. 1 further accommodates a plurality of user application programs of which only thre ⁇ 7, 9 and 11, r ⁇ sp ⁇ ctiv ⁇ ly ar ⁇ shown.
  • Th ⁇ us ⁇ r application programs ar ⁇ ⁇ x ⁇ cut ⁇ d by processor 3 under the control of operating system 5, in a .known per se manner, and are responsive to user input f ⁇ d tlirough keyboard 13 by the intermediary of I/O port 15 and th ⁇ op ⁇ rating syst ⁇ m 5.
  • the user application programs further communicate with monitor 16 for displaying data, by the intermediary of I/O port 17 and operating system 5.
  • the user application programs can access data stored in a database by means of database managem ⁇ nt system module 20.
  • the gen ⁇ raliz ⁇ d database management system includes high l ⁇ v ⁇ l manag ⁇ m ⁇ nt system 22 which views, as a rule, the und ⁇ rlying data in a "logical" manner and is responsive, to th ⁇ us ⁇ r application program by means .known per se such as, e.g., SQL Data Definition and Data Manipulation language (DDL and DML).
  • the databas ⁇ manag ⁇ m ⁇ nt syst ⁇ m typically exploits, in a .known per se manner, a data dictionary 24 that includes meta-data which maintains information on the underlying data.
  • DDL and DML SQL Data Definition and Data Manipulation language
  • Th ⁇ underlying structure of th ⁇ data is gov ⁇ rn ⁇ d by databas ⁇ file management system 26 which is associated with the ind ⁇ x sch ⁇ m ⁇ and actual data r ⁇ cords 28.
  • Th ⁇ "high-l ⁇ v ⁇ l” logical instructions e.g. SQL commands
  • Th ⁇ high-l ⁇ v ⁇ l manag ⁇ m ⁇ nt system 22 are converted into "lower level” commands that access or update the data records that are stored in the database file(s) and to this ⁇ nd th ⁇ databas ⁇ file managem ⁇ nt system considers the actual structure and organization of the data records.
  • the "high lev ⁇ l” and “low level” portions of the database file management system can communicate through a known per s ⁇ Application Programmers Interface (.API), e.g. the Microsoft op ⁇ n databas ⁇ connectivity (ODBC) interface commercially available from Microsoft.
  • .API Application Programmers Interface
  • ODBC Microsoft op ⁇ n databas ⁇ connectivity
  • the utilization of the ODBC enables "high lev ⁇ l” modules of the database fil ⁇ manag ⁇ m ⁇ nt syst ⁇ m or application program to transparently communicate with differ ⁇ nt "database file manag ⁇ m ⁇ nt systems" that support the ODBC standard.
  • Fig. 1 further shows, schematically, a storage medium in the form of internal memory module 29 ( ⁇ .g. 16 M ⁇ ga byt ⁇ and possibly ⁇ mploying a cache memory sub-module) and an ⁇ xt ⁇ rnal m ⁇ mory modul ⁇ 29' ( ⁇ .g. 1 gigabyt ⁇ ).
  • internal memory module 29 ⁇ .g. 16 M ⁇ ga byt ⁇ and possibly ⁇ mploying a cache memory sub-module
  • an ⁇ xt ⁇ rnal m ⁇ mory modul ⁇ 29' ⁇ .g. 1 gigabyt ⁇
  • ⁇ xt ⁇ rnal m ⁇ mory 29' is accessed through an ext ⁇ rnal, relatively slow communication bus (not shown), whereas the internal memory is normally accessed by means of a faster internal bus (not shown).
  • the internal memory is normally accessed by means of a faster internal bus (not shown).
  • database management system utilizes operating system services (i.e. an I/O operation) in order to load, through the ext ⁇ rnal communication bus, on ⁇ or mor ⁇ blocks of data from the eternal to the int ⁇ mal memory. If the sought data records are not found in the loaded blocks, additional I/O operations are requir ⁇ d until the sought data records are targeted.
  • operating system services i.e. an I/O operation
  • Comput ⁇ r 1 may serve as a workstation forming part of a L ⁇ AN Local .Area Network (L.AN) (not shown) which employs a server having also ess ⁇ ntially th ⁇ same structure of Fig. 1.
  • L.AN Local .Area Network
  • a predominant portion of said modules (including the database r ⁇ cords th ⁇ ms ⁇ lv ⁇ s 28) reside in th ⁇ server.
  • th ⁇ databas ⁇ may be an on-line database residing in an Int ⁇ m ⁇ t W ⁇ b sit ⁇ .
  • Th ⁇ invention is, of course, not limited to the specified partition of small internal m ⁇ mory and larg ⁇ ⁇ xternal memory.
  • a large internal and ext ⁇ rnal m ⁇ mori ⁇ s ar ⁇ employ ⁇ d and by yet another modified embodiment only internal m ⁇ mory is ⁇ mployed.
  • the ERD 30 of Fig. 2 consists of the entities "CLIENT” 32 and “ACCOUNT” 34 as well as an "n to m" "DEPOSIT" 36 relationship indicating that a given client may have more than one account and by th ⁇ sam ⁇ tok ⁇ n a giv ⁇ n account may be owned by more than one client.
  • the entity “CLIENT” has the following attributes (fields): "Client_Id” 38 b ⁇ ing a k ⁇ y attribute that uniquely identifies each client, "Name” 39 standing for the client's name and "Address” 40 standing for the client's address.
  • the ⁇ ntity “ACCOUNT” has th ⁇ following attribut ⁇ s (fi ⁇ lds): "Acc_No” 42 b ⁇ ing a key attribute that uniquely identifi ⁇ s ⁇ ach account, and "Balance” 43 holding the balance of the account.
  • the relationship “DEPOSIT” consists of pairs of keys of the "CLIENT” and “ACCOUNT” entities, such that each pair is indicative of particular account owned by specific client.
  • Fig. 3 ther ⁇ is shown a databas ⁇ of Fig. 2, r ⁇ pr ⁇ s ⁇ nt ⁇ d as three tables 50, 51 and 52 corresponding to th ⁇ relational data model, 32, 34 and 36, r ⁇ sp ⁇ ctiv ⁇ ly, with ⁇ ach tabl ⁇ holding a few data occurrenc ⁇ s for illustrative purposes.
  • the length of the key field ("Client D") of the "CLIENT” table is 5 digits
  • the l ⁇ ngth of the key field (“AccJD”) of the "ACCOUNT" tabl ⁇ is 6 digits.
  • Th ⁇ client table holds 5 data occurrences 55-59
  • th ⁇ account tabl ⁇ holds 2 data occurrences 65, 66 and the deposit table holds 3 data occurrences 70-72.
  • Fig. 4 illustrates an und ⁇ rlying ind ⁇ xing fil ⁇ of th ⁇ "CLIENT" tabl ⁇ of Fig. 3, in accordance with file managem ⁇ nt syst ⁇ m ⁇ mploying th ⁇ conventional B- tre ⁇ indexing schem ⁇ .
  • the indexing file 80 consists of three blocks 80a-c, standing for a root block and two leaf blocks respectively.
  • the data records are organized randomly in a separat ⁇ file 81 holding the five data records 83-87.
  • Each block consists of a succession of pair of fields (e.g. 82a-b and 83a-b in block 80a).
  • first fi ⁇ ld stands for a s ⁇ arch k ⁇ y value
  • the second field stands for a link such as number that identifies the next block to s ⁇ arch, or in the case of a leaf block a link to the data record such as a number identifying the data record.
  • a s ⁇ arch for a r ⁇ cord whos ⁇ k ⁇ y is 12355 (82a) starts in root block 80a and is dir ⁇ ct ⁇ d by th ⁇ link 82b to block 80b.
  • the search key 12355 (86a) is associated with link 86b indicating the address of the data record identifi ⁇ d by this s ⁇ arch k ⁇ y in th ⁇ data file 81.
  • the data record that is identified by search key "12355" (57 in Fig. 3) is the forth in order in data file 81.
  • the B ' tre ⁇ ind ⁇ xing fil ⁇ of Fig. 4 ⁇ xhibits on ⁇ of the significant shortcomings of this approach in that the keys (i.e. search k ⁇ ys) ar ⁇ duplicated, i.e. they are h ⁇ ld both in th ⁇ internal blocks (i.e. in the index scheme) and in the data records associated with the B- tre ⁇ ind ⁇ x.
  • th ⁇ search key of data record 57 (in Fig. 3) is not only held as an integral part of the data record 86 in fil ⁇ 81 but also in block 80b (s ⁇ arch k ⁇ y 86a) and sometimes in parent blocks such as 80a (s ⁇ arch k ⁇ y 82).
  • Fig. 5 illustrates a differ ⁇ nt ind ⁇ xing scheme of the "CLIENT" table of Fig. 3, in accordance with a file manag ⁇ m ⁇ nt syst ⁇ m ⁇ mploying a .known trie indexing schem ⁇ .
  • trie indexing file 90 includes plurality of nodes and links wh ⁇ r ⁇ in each node stands for an offset position and the link stands for a value at this offset.
  • Table 91 has four columns. Th ⁇ first column indicates which digit position is to be us ⁇ d, th ⁇ s ⁇ cond column th ⁇ valu ⁇ of that digit. A digit valu ⁇ partitions the key into two subs ⁇ ts. Columns thr ⁇ and four dir ⁇ ct th ⁇ s ⁇ arch procedure to the next step.
  • a digit at the position indicated by the root is compared to the value specified at the second column of the same line (valu ⁇ "5" indicated also by link 90b in the trie index). Since the digit at position 5 of the sought search key 12355 is inde ⁇ d 5, control is transferred to line 2 (as indicated by the third column of line 1 of table 91).
  • the digit at position 3 of the sought search key (90c in the tre ⁇ , b ⁇ ing also th ⁇ valu ⁇ of th ⁇ first column of th ⁇ s ⁇ cond lin ⁇ in tabl ⁇ 91) is compared to th ⁇ valu ⁇ 3 (link 90d, being also the second column in th ⁇ second line of the table 91). Since match occurs control is transferred to line 3 in the table.
  • the digit at position 4 of the sought search key do ⁇ s not match the value specified at the second column of line thre ⁇ (i. ⁇ . "5" vs. "4") and accordingly as indicated in the fourth column of table 91 ("not equal") a link to the sought data record 57 (86 in fig. 4) is obtained.
  • the above trie is associated with some shortcomings: it retains an ev ⁇ n distribution of th ⁇ data at th ⁇ cost of knowing - 39 -
  • a n ⁇ w trie index schem ⁇ d ⁇ signat ⁇ d P.AIF As will be shown below, the PAIF is not confined to a tre ⁇ structure.
  • various embodim ⁇ nts of lay ⁇ red index are described, with reference to FIG. 7-9, which include representative index constructed over th ⁇ representative keys of the PAIF.
  • th ⁇ ind ⁇ x scheme of the representative index and that of the basic partitioned index being substantially th ⁇ sam ⁇ PAIF.
  • th ⁇ r ⁇ is d ⁇ scrib ⁇ d y ⁇ t another embodim ⁇ nt of th ⁇ lay ⁇ r ⁇ d ind ⁇ x, with a diff ⁇ r ⁇ nt tri ⁇ .
  • This, how ⁇ v ⁇ r, is not obligatory and as is ⁇ xemplified, ⁇ .g. with refer ⁇ nce to Fig. 11, wher ⁇ the trie and th ⁇ repres ⁇ ntative index are differ ⁇ nt. - 40 -
  • FIGs. 6A-C there is shown a succession of schematic illustration of th ⁇ "CLIENT" tabl ⁇ of Fig. 3, in accordance with the file management system employing the P.AIF.
  • the terms “transaction” and “operation” are used interchangeably.
  • Th ⁇ Cli ⁇ nt's data record 103 (56 in table Client of Fig. 3) having search key "12345" (i.e. a 5-byt ⁇ -long s ⁇ arch k ⁇ y).
  • Th ⁇ P.AIF of Fig. 6A (100) is, of course, trivial and consists of a single node 101 (standing for both the root nod ⁇ and th ⁇ leaf node) linked by means of a long link 102 to data record 103.
  • the data record 103 is associated with a search path being a unit that consists of a nod ⁇ 101 and a link 102 which defines an offset and a pertinent search key portion valu ⁇ that conforms to th ⁇ coir ⁇ sponding search key portion value at that particular offset within the search key of the specified data record. More specifically, th ⁇ value of the on ⁇ -byt ⁇ search-key-portion at offset 0 within search key "12345" is inde ⁇ d
  • Fig. 6B-1 ther ⁇ is shown a P.AIF 108 aft ⁇ r the termination of a successive transaction in which the data record having Cli ⁇ nt_Id_No "12445" 107 has b ⁇ n ins ⁇ rt ⁇ d (data occurrence 58 in table Client of Fig. 3).
  • Th ⁇ search keys of data r ⁇ cords 103 and 107 are distinguished only in the third byte (offset 2), i.e. "3" and "4" resp ⁇ ctiv ⁇ ly.
  • root node 101 and the link 102 are not sufficient to - 41 -
  • FIG. 6B-2 and 6B-3 illustrate other two options of realizing the PAIF of Fig. 6B-1, where in Fig. 6B-2 the full key is repr ⁇ s ⁇ nt ⁇ d in th ⁇ P.AIF ( ⁇ .g. all th ⁇ digits of th ⁇ r ⁇ cord 12445 ar ⁇ sp ⁇ cifi ⁇ d in th ⁇ links comm ⁇ ncing from th ⁇ root nod ⁇ and ending at the data record).
  • Th ⁇ latter realization is more explicit and less efficient in terms of space, as compared to the sparse realization of Fig. 6B-3 where only the nodes which ar ⁇ absolut ⁇ ly necessary appear in th ⁇ tree.
  • Other variants are, of course, applicable
  • the pref ⁇ rr ⁇ d procedure for inserting a new data record into an existing P.AIF includes th ⁇ execution of the following steps: i. advancing along a reference path commencing from the root node and ending at a data record associated to a l ⁇ af node (referred to as "reference data record"); in each node in the ref ⁇ r ⁇ nc ⁇ path, advancing along a link originated from said node if the value repr ⁇ s ⁇ nted by the link equals the value of the 1-bit-long key portion at th ⁇ offs ⁇ t sp ⁇ cifi ⁇ d by said nod ⁇ ; in th ⁇ cas ⁇ that th ⁇ offs ⁇ t sp ⁇ cified in the node is beyond any corresponding key portion in the key, or if ther ⁇ is no link with said value, advancing along an arbitrary path to any ref ⁇ r ⁇ nc ⁇ data r ⁇ cord ; - 42 -
  • th ⁇ n ⁇ w nod ⁇ is assign ⁇ d with a value of the disceming offset
  • iii.2.2 connect the ref ⁇ r ⁇ nce data record and th ⁇ n ⁇ w nod ⁇ (which now b ⁇ com ⁇ s a l ⁇ af nod ⁇ ) and assign to the link (long link) a value of the search-k ⁇ y-portion at th ⁇ discerning offset taken from the search key of th ⁇ refer- ence data record
  • iii.2.3 connect by means of a link the n ⁇ w data r ⁇ cord and the new node and assign to the link (long link) a value of the search-k ⁇ y-portion at th ⁇ discerning offset taken from th ⁇ search key of the new data record; or iii.3 if conditions iii.0,iii.1 and iii.2 are not m ⁇ t, th ⁇ r ⁇ ⁇ x
  • iii.3.2 for cas ⁇ A and B connect by means of a link (long link) the new data record and said new internal nod ⁇ ; th ⁇ valu ⁇ assign ⁇ d to th ⁇ link is that of th ⁇ s ⁇ arch-k ⁇ y-portion at the discerning offset, as taken from the s ⁇ arch k ⁇ y of th ⁇ n ⁇ w data r ⁇ cord; iii.3.3 for cas ⁇ A and B, connect by means of a new link th ⁇ n ⁇ w node and for case A - the child node, for case B - the root nod ⁇ (i.e.
  • the new node becomes for case A - a new fath ⁇ r nod ⁇ , for cas ⁇ B - a n ⁇ w root nod ⁇ ), and the value assigned to said link is the s ⁇ arch-k ⁇ y-portion at th ⁇ offs ⁇ t indicated by the new node, taken from the search key of the ref ⁇ rence data record.
  • UUH It should b ⁇ not ⁇ d that for a different reference path a different PAIF may be obtained.
  • s ⁇ arch k ⁇ y "12546" (59 in tabl ⁇ Cli ⁇ nt of Fig. 3) is inserted to the P.AIF of Fig. 6B.
  • a mov ⁇ is mad ⁇ along th ⁇ r ⁇ f ⁇ r ⁇ nce path commencing from the root 101 and ending, for ⁇ xample, at data record 103 which stands for th ⁇ "reference data record”.
  • Th ⁇ comparison op ⁇ ration stipulated in step (ii) results in that the search key of the new data r ⁇ cord in distinguished from the search key of the reference data record (103) at offsets 2 ("5" vs. "3") and 4 ("6" vs. "5"). The smallest offs ⁇ t ("discerning offset”) is therefore 2.
  • step (iii) th ⁇ condition of step iii.1 is met since th ⁇ discerning offset is ⁇ qual to that assign ⁇ d to nod ⁇ 104. Accordingly, and as is shown in Fig. 6C-1, n ⁇ w link 111 connects node 104 to th ⁇ n ⁇ w data r ⁇ cord 112. Th ⁇ value assigned to link 111 is 5, b ⁇ ing th ⁇ byt ⁇ value at position 2 in the search key of the new data record 112. P.AIF 110 of Fig. 6C-1 is therefore the result of inserting the data record 112 into the PAIF 108 ofFig. 6B-l.
  • the CLIENT data record having Client_Id (or search k ⁇ y) "12355" (57 in tabl ⁇ Cli ⁇ nt of Fig. 3) is ins ⁇ rt ⁇ d into th ⁇ P.ALF of Fig. 6B-1. Steps i and ii, stipulated above result in a ref ⁇ r ⁇ nc ⁇ path starting at nod ⁇ 101 and ⁇ nding at data r ⁇ cord 103.
  • step iii.2 the condition of step iii.2 is m ⁇ t since the discerning offset 3 is larger than the offset 2 of l ⁇ af node 104 in the ref ⁇ r ⁇ nc ⁇ search path. Accordingly, in compliance with step iii.2 J and as is shown in the resulting PAIF 120 of Fig. 6C-2, th ⁇ link 106 is disconnected from reference data record 103 and is connected to a new node 121. The new node - 45 -
  • step iii.2.2 the ref ⁇ r ⁇ nce data record 103 and the new node 121 are connected by means of new link 122.
  • the n ⁇ w link is assign ⁇ d with th ⁇ valu ⁇ 4 (being the digit value at the disceming offset 3 taken from the search key "12345" of the reference data record 103); and finally, as stipulated in step iii.2.3, the new data record 123 is connected to node 121 by means of link 124 which is assigned with the valu ⁇ "5" (b ⁇ ing th ⁇ digit at th ⁇ disceming offset 3 taken from th ⁇ s ⁇ arch k ⁇ y "12355" of th ⁇ new data record 123).
  • PAIF 120 of Fig. 6C-2 is, therefore, the result of inserting the data record 123 into the PAIF 108 of Fig. 6B-1.
  • the third and last ⁇ xampl ⁇ concerns inserting the CLIENT data record having Client_Id (or s ⁇ arch key) "H346" (55 in table Cli ⁇ nt of Fig. 3) into th ⁇ PAIF of Fig. 6B-1.
  • Applying th ⁇ afor ⁇ mentioned st ⁇ ps i and ii result in advancing from node 101 to data record 103 (in Fig. 6B) and establishing that the disceming offset is 1.
  • step iii th ⁇ condition of step iii.3 is met. Accordingly, in compliance with step iii.3 J and as is shown in the r ⁇ sulting PAIF 130 of Fig. 6C-3, th ⁇ link 102 is shift ⁇ d to a n ⁇ w int ⁇ mal node 131.
  • the new internal node 131 is assigned with the value 1 (b ⁇ ing th ⁇ discerning offset).
  • the n ⁇ w data r ⁇ cord 132 and node 131 are directly connected by means of new link 133.
  • the value assigned to link 133 is 1 (being the digit at the disceming offset 1 taken from the search key "H346" of th ⁇ new data record 132), and finally, in compliance with step iii.3.3 the new internal nod ⁇ 131 is linked to node 104 by m ⁇ ans of link 134 assign ⁇ d with th ⁇ valu ⁇ 2 (being the digit at th ⁇ discerning offset (1) taken from the search key "12345" of the reference data record 103).
  • step i.l the value of the digit "I" at the offset assigned to the root nod ⁇ (offs ⁇ t 0) of th ⁇ sought data r ⁇ cord is compared to the one assigned to link 102 (being the sole link originated from node 101). Since a match is found, control is shifted to node 131.
  • step i.l the valu ⁇ of the digit ("2") at the offset assigned to node 131 (offset 1) of the sought data record is compared to the on ⁇ assign ⁇ d to link 134.
  • a match is found so control is shifted to node 104.
  • th ⁇ value of the digit "4" at the offs ⁇ t assign ⁇ d to nod ⁇ 104 (offset 2) of the sought data record is compared for ⁇ ach link originating from mode 104.
  • the comparison results in a match for link 105 and accordingly control is shifted to data record 107.
  • the leaf node that is linked to the sought data r ⁇ cord is r ⁇ f ⁇ rr ⁇ d to as th ⁇ "targ ⁇ t node".
  • the father of the target nod ⁇ is r ⁇ f ⁇ rr ⁇ d to as th ⁇ "predecessor target node”.
  • the link that connects the pred ⁇ cessor target node to th ⁇ targ ⁇ t nod ⁇ is refeir ⁇ d to as th ⁇ "pr ⁇ d ⁇ c ⁇ ssor link” and th ⁇ link that connects the target node to a child nod ⁇ thereof (or to a data record other than the sought data r ⁇ cord) is referred to as th ⁇ "targ ⁇ t link”.
  • the latter record is searched in the PAIF according to the procedure described above. Having found the data record 132 and in compliance with step i above, the data record as well as the link 133 leading thereto ar ⁇ both d ⁇ l ⁇ t ⁇ d. Sinc ⁇ aft ⁇ r the latter del ⁇ ting st ⁇ p, the target node 131 remains only with th ⁇ sol ⁇ targ ⁇ t link 134, st ⁇ p iii and iii.l apply, and accordingly th ⁇ predecessor link 102 bypasses targ ⁇ t nod ⁇ 131 and is directly linked to the child node th ⁇ reof 104.
  • step ii.2 target node 131 and the target link 134 are del ⁇ t ⁇ d ther ⁇ by obtaining th ⁇ ?A1 ⁇ shown in Fig. 6B-1.
  • step ii.2 target node 131 and the target link 134 are del ⁇ t ⁇ d ther ⁇ by obtaining th ⁇ ?A1 ⁇ shown in Fig. 6B-1.
  • .Another Example is given with reference to the P.AIF of Fig. 6C-1.
  • the latter record is searched in the P.AIF according to the procedur ⁇ described above.
  • the data record as well as the link (111) leading th ⁇ r ⁇ to are both del ⁇ t ⁇ d.
  • .Anoth ⁇ r common primitive is the "Modify existing data record", e.g. change the home address of an existing client.
  • the "Modify” primitive is normally realiz ⁇ d by s ⁇ lectively utilizing the aforemention ⁇ d primitives. For executing a "Modify" command one should distinguish b ⁇ tw ⁇ n th ⁇ following cases:
  • the "modify” applies to a search key fi ⁇ ld (e.g. change an account - 50 -
  • each search key is represented as a series of bytes and accordingly the search procedure is perform ⁇ d by partitioning th ⁇ s ⁇ arch-k ⁇ y into s ⁇ arch k ⁇ y portions ⁇ ach consisting of at l ⁇ ast on ⁇ byt ⁇ .
  • differ ⁇ nt links in a given PAIF may be assign ⁇ d with s ⁇ arch-k ⁇ y-portions of different length as long as the resp ⁇ ctiv ⁇ s ⁇ arch-k ⁇ y-portion is .known th ⁇ corresponding node.
  • th ⁇ data r ⁇ cords are h ⁇ ld in a sorted foim according to search key. Navigating , for example, in the PAIF of Fig. 63-C (from right to left) brings about the ordered series "11346", "12345” and " 12445". This characteristics constitutes y ⁇ t anoth ⁇ r advantag ⁇ which ⁇ as ⁇ data manipulation as compared to the tree of Fig. 5 wh ⁇ r ⁇ th ⁇ data r ⁇ cords ar ⁇ not sorted. As sp ⁇ cified before, a node in the P.AIF is not necessarily classified uniquely.
  • nod ⁇ 104 is at th ⁇ sam ⁇ time a leaf nod ⁇ (link ⁇ d, by m ⁇ ans of a long link 105 to data r ⁇ cord 107) and an internal node (linked by means of a short link 106 to node 121).
  • Fig. 7A-H ther ⁇ ar ⁇ shown schematic illustrations of a layer ⁇ d index constructed in response to a succession of split block operations, according to one embodim ⁇ nt of the invention.
  • Consid ⁇ r for example a block 140 in Fig. 7 A (in the basic partitioned index) which overflows in terms of memory space.
  • a "split block" procedur ⁇ is invok ⁇ d which results in a lay ⁇ r ⁇ d ind ⁇ x 142 of Fig. 7B consisting of root block 144 and a duplicated node A' (155) linked to leaf block 146 by means of direct link 145 and by means of long link 147 to a leaf block 148.
  • the split point was sel ⁇ cted to be link 149 (fig. 7A) (her ⁇ inaft ⁇ r "split link”) th ⁇ r ⁇ by shifting nod ⁇ s A,B,E D and F to n ⁇ w block 146 and nod ⁇ s C,G,I,J,K,L and H to a n ⁇ w block 148.
  • Th ⁇ split link is pr ⁇ f ⁇ rably s ⁇ lected in ord ⁇ r to accomplish an ⁇ ss ⁇ ntially even distribution of nodes and links between the new blocks (e.g. the size of the sub P Fs that resides in blocks 148 and 146 is ess ⁇ ntially th ⁇ sam ⁇ ).
  • a father block -144 (constituting I x ) is created with a duplicated node A' (155) of the split node A (156).
  • the node is copied - 52 -
  • nodes A and C may also b ⁇ linked by means of split link marked as dashed line 150.
  • direct link 154 connects the copied nod ⁇ C 153a to th ⁇ block 148A of th ⁇ original split nod ⁇ 153 whilst th ⁇ link 155 is a far link to th ⁇ split block 148B and th ⁇ valu ⁇ of the link is as the original value of link 152 betw ⁇ n nod ⁇ s C and G b ⁇ for ⁇ (and after) the split.
  • the layer ⁇ d ind ⁇ x 151 is constituted by the trie that includes blocks 141, 148A and 148B forming and block 16 which forms a representative index over the common k ⁇ ys of th ⁇ tri ⁇ .
  • nod ⁇ A in block 141 and nod ⁇ C in block 148 A ar ⁇ optionally disconnected and lik ⁇ wis ⁇ nod ⁇ C of 148A and nod ⁇ G of 148B ar ⁇ optionally disconn ⁇ ct ⁇ d.
  • nodes A ' and C are connected in block 140 to form a (connected) trie and it is - 53 -
  • the resulting layer ⁇ d ind ⁇ x constitutes a balanced structure of blocks thereby ke ⁇ ping th ⁇ index depth to a minimum and consequ ⁇ ntly minimizing th ⁇ numb ⁇ r of accesses (normally, although not necessarily, I/O operations) that are requir ⁇ d in order to find, insert or delete a given data record.
  • the layer ⁇ d ind ⁇ x maintains substantially logarithmic function that depends on the number of records, the layer ⁇ d ind ⁇ x is mor ⁇ ⁇ ffrci ⁇ nt in t ⁇ rms of numb ⁇ r of 1 0 op ⁇ rations r ⁇ quired for access a given data r ⁇ cord as compared to the numb ⁇ r of I/O op ⁇ rations required to access a data record through the trie.
  • th ⁇ r ⁇ pr ⁇ s ⁇ ntativ ⁇ ind ⁇ x and th ⁇ tri ⁇ comply with substantially th ⁇ same index sch ⁇ m ⁇ i.e. the P.AIF.
  • substantially th ⁇ sam ⁇ sch ⁇ me it is meant that th ⁇ r ⁇ ar ⁇ som ⁇ diff ⁇ r ⁇ nc ⁇ s as will ⁇ xplain ⁇ d with r f ⁇ rence to Fig. 9G b ⁇ low.
  • Node A being the lowest ancestor node of nodes B and I, and thus a (connected) trie is formed in block 402.
  • the valu ⁇ associated with short link 414 (betw ⁇ n blocks A' and B' in block 402) is of th ⁇ sam ⁇ valu ⁇ as link 412 (b ⁇ tw ⁇ n A and B in block 405).
  • Th ⁇ valu ⁇ of th ⁇ link 415 (b ⁇ tw ⁇ n nodes A' and F) in block 402 is of the same value as that of link 413 which originates from node A in the direction ne ⁇ d ⁇ d to access node B.
  • the internal structure of block 402 is such that it allows a search to th ⁇ repres ⁇ ntativ ⁇ s of blocks 405, 406 and 407.
  • Th ⁇ direct links 416, 417 of nodes 422 and 411 ar ⁇ optionally r ⁇ tain ⁇ d since it is possible to move along direct link 418 to block 405, se ⁇ ing that node 410 is maintained in th ⁇ access path to both nodes 422 and 411.
  • Fig. 7G shows the resulting layer ⁇ d ind ⁇ x after splitting block 407 of Fig. 7F (in link 420) and Fig. 7H shows th ⁇ r ⁇ sulting lay ⁇ red ind ⁇ x aft ⁇ r splitting block 402 (in the link between nodes I' and N').
  • the resulting layered index in Fig. 7H has, as shown three layers, the first consisting of block 430, the second consisting of blocks 402 and 408 and the trie consisting of blocks 405, 407, 426 and 406. - 55 -
  • FIGs. 8A-BB showing resp ⁇ ctiv ⁇ two illustrations ⁇ x ⁇ mplifying the application of the technique of th ⁇ inv ⁇ ntion to a according to another embodim ⁇ nt of th ⁇ invention.
  • Fig. 8A illustrates a given trie structure having vertical orientation (i. ⁇ . constituting a vertical tre ⁇ ) which, as shown, is unbalanced i.e. three blocks depth (260, 261 and 262) vs. two blocks depth (260 and 264).
  • the description below does not aim at explaining the search scheme of the specified vertical tre ⁇ but ⁇ mphasiz ⁇ s only thos ⁇ aspects which are requir ⁇ d to obtain balanced layered index.
  • nev ⁇ rth ⁇ l ⁇ ss b ⁇ not ⁇ d that th ⁇ nod ⁇ s in trie structure 260 signify offsets in a half byte size. (The nodes valu ⁇ s ar ⁇ presented in hexad ⁇ cimal repres ⁇ ntation) of th ⁇ data r ⁇ cords (a-k) that ar ⁇ shown in Fig. 8A.
  • Fig. 8B illustrates one possible embodiment of the invention.
  • a repr ⁇ s ⁇ ntativ ⁇ ind ⁇ x that consists of on ⁇ block 270 (forming I / ) is constructed with the result that horizontal balanced tree is obtained having a root block 270 from which all the blocks of th ⁇ low ⁇ r l ⁇ v ⁇ l v ⁇ rtical tr ⁇ (th ⁇ latt ⁇ r constitutes the unbalanced tri ⁇ ) ar ⁇ accessed through one I/O operation.
  • the common key of block 260 (in h ⁇ xad ⁇ cimal r ⁇ pr ⁇ s ⁇ ntation of half byt ⁇ units) is 0x4, Oxl and 0x3, wh ⁇ re 0x4 stands for the most signficant bits of the byt ⁇ of the character A and Oxl stands for the least significant bits of the Character A, and Ox 3 stands for the most significant bits of the characters which reside in offset 2 of the data records.
  • block 261 can accommodat ⁇ a root nod ⁇ with valu ⁇ 8, thus, the common key, hereafter k of the block, is changed to be 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, i.e. it consists of 8 units.
  • the repres ⁇ ntative of block 261 in 11 should be changed accordingly.
  • the representative of 261 is k, even if the root nod ⁇ with th ⁇ value 8 does not exist.
  • the ind ⁇ x ov ⁇ r the common keys is accomplished in the repres ⁇ ntativ ⁇ ind ⁇ x (consisting of block 270) such that it constructs a trie that address ⁇ s th ⁇ common k ⁇ ys of th ⁇ first vertical tre ⁇ .
  • ⁇ xample in order to find data record g, one follows node 290, link 291 to node 292. Then, one advances with the dir ⁇ ct link 293 to block 261, which is associated with data record g. Th ⁇ r ⁇ sulting lay ⁇ r ⁇ d index is balanced.
  • th ⁇ r ⁇ pr ⁇ s ⁇ ntativ ⁇ k ⁇ y of a block b ⁇ ing a common k ⁇ y is th ⁇ longest prefix of all keys of th ⁇ data r ⁇ cords that can b ⁇ acc ⁇ ss ⁇ d from th ⁇ block by th ⁇ relevant index scheme.
  • th ⁇ specified prefix size (calculated in 1-bit-long units) ⁇ quals th ⁇ valu ⁇ of th ⁇ root nod ⁇ in the block (which as recalled holds offset value). If the prefix siz ⁇ is ⁇ xpr ⁇ ssed as number of bits, then the prefix size is calculated as the offset value multiplied by the 1-bit-long value. - 58 -
  • Th ⁇ r ⁇ follows now a d ⁇ scription of y ⁇ t anoth ⁇ r ⁇ mbodiment of constructing a layered index of the invention with reference to Figs. 9A-9G.
  • FIGs. 9A-9G showing a succession of modify (insert) transaction on a PAIF tre ⁇ (constituting a tri ⁇ that is susc ⁇ ptibl ⁇ to an unbalanced structure) and the so obtained layer ⁇ d ind ⁇ x.
  • the data records are shown as foiming part of the trie.
  • the actual manner in which the data records are associated to th ⁇ trie may vary dep ⁇ nding upon the particular application.
  • th ⁇ first step (Fig. 9A) record A is inserted whereafter Block 300, includes node 301 having offs ⁇ t 0, being associated to first record A through link 302, having the value 0.
  • th ⁇ tr ⁇ consists of Block 100 having only on ⁇ nod ⁇ .
  • Th ⁇ index schem ⁇ dictates that the s ⁇ arch path to data r ⁇ cord A is d ⁇ t ⁇ rmined according to value 0 at offs ⁇ t 0 as depicted on link 302 and node 301, respectively.
  • Sinc ⁇ Block 300 accommodates nodes 301 and 305, it is not required, as yet, to split the block.
  • Fig. 9D data record D is insert ⁇ d, and the structure of the block following the insert operation is shown in Fig. 9D. Since, how ⁇ v ⁇ r, th ⁇ data block cannot accommodate more than two nodes (overflow occurs), it is now required to split Block 300.
  • Fig. 9E illustrates the tre ⁇ structure after splitting.
  • link 306 is the split link with the motivation that approximately the contents of a half block will b ⁇ r ⁇ tain ⁇ d in Block 300, and th ⁇ contents of the remaining half block will b ⁇ mov ⁇ d to another block 310.
  • other links could b ⁇ likewise sel ⁇ ct ⁇ d to b ⁇ the split link.
  • block 300 in I. is replaced with two blocks 300 and
  • th ⁇ basic partition ⁇ d index of Fig. 9E consists now of two blocks 300 and 310 (which in fact constitute the unbalanced trie).
  • the split node (313) is copied to the block (312) to thereby constitute a duplicated nod ⁇ (314).
  • N ⁇ xt, th ⁇ duplicated node (314) is connected by means of direct link 316 to block 300, and the duplicated node 314 is linked by means of a far link 318, to the block 310.
  • This far link replaces th ⁇ original split link 306 that is mark ⁇ d in Fig. 9E in a dash ⁇ d lin ⁇ .
  • the value of the far link 318 is the same as the value of the split link.
  • the repr ⁇ sentative index (constituted by block 312), allows to search according to th ⁇ common k ⁇ ys of th ⁇ basic partition ⁇ d ind ⁇ x.
  • data record E is insert ⁇ d.
  • this cas ⁇ advancing in the horizontal tre ⁇ (being on ⁇ foim of the layer ⁇ d ind ⁇ x) from th ⁇ first nod ⁇ 314 of block 312 (having a value 1) is not possible by means of the far link 318 since it repr ⁇ s ⁇ nts direction 1 from nod ⁇ 314 (having a 1) valu ⁇ , and a link in direction 0 is required.
  • Ther ⁇ for ⁇ advancing by means of the direct link 316 to block 300.
  • the block that needs to be associated with the new data record is found.
  • data record F is ins ⁇ rt ⁇ d r ⁇ sulting in a tr ⁇ structure shown in Fig. 9F.
  • nod ⁇ 320 is copi ⁇ d to block 312 (d ⁇ signat ⁇ d 323 in Fig. 9G) and since it can not be linked to node 314 of block 312 (since it will not retain the correct inta-block links of th ⁇ nodes) - node 311 of block 300 is also copied to block 312 (designated 322 in Fig 9G) in order to cr ⁇ at ⁇ a (conn ⁇ ct ⁇ d) tri ⁇ that ⁇ nabl ⁇ s to s ⁇ arch by th ⁇ s ⁇ arch sch ⁇ m ⁇ to blocks 300, 326, 310 according to the common keys of the blocks.
  • Figs. 9A-G and 8A-B illustrate two of many possibl ⁇ mann ⁇ rs of r ⁇ alizing th ⁇ split block mechanism that maintains the balance structure of th ⁇ inv ⁇ ntion by constructing a lay ⁇ r ⁇ d ind ⁇ x.
  • the flexibility in adopting another non-limiting variant is shown e.g. in fig. 8B where the near link 271 and - 61 -
  • direct link 272 are r ⁇ pr ⁇ s ⁇ nt ⁇ d by far link 273 (marked in dash ⁇ d lin ⁇ ) with direction as of link 271 r ⁇ nd ⁇ ring thus nod ⁇ 276 redundant.
  • th ⁇ balance technique of the invention confers to the so obtained balanced horizontal oriented digital tre ⁇ (b ⁇ ing one form of the layer ⁇ d index structure) a so called “probabilistic access " characteristics.
  • a s ⁇ arch in connection with an input data record e.g. search for a data record A
  • Fig. 9E For a better understanding of the foregoing consider, for exampl ⁇ , Fig. 9E.
  • Th ⁇ s ⁇ arch path will follow nod ⁇ 314 and link 318 (offs ⁇ t 1 value 1, resp ⁇ ctiv ⁇ ly) and th ⁇ n at offs ⁇ t '6' (root nod ⁇ of block 310) through link 319 (valu ⁇ ' 1 ') to data r ⁇ cord C.
  • Th ⁇ latt ⁇ r example ex ⁇ mplifi ⁇ s the probabilistic search characteristics of the so obtained layer ⁇ d index.
  • the size of th ⁇ common prefix of the k ⁇ y of the sought data record and th ⁇ k ⁇ y of the data record is calculated.
  • the common k ⁇ y of th ⁇ block (310) is the prefix portion of th ⁇ k ⁇ y of th ⁇ actual data record C.
  • the size of the common prefix is zero.
  • the search path follows the direct link from a node with the larg ⁇ st valu ⁇ on the search path (that maintains a direct link).
  • a comparison to the common k ⁇ y (if availabl ⁇ ) or to data r ⁇ cords associated with nodes (if available) can lead to a decision as to wh ⁇ th ⁇ r or not to advance by the index schem ⁇ or to return to a node with a direct link. It should b ⁇ not ⁇ d that th ⁇ common k ⁇ y is not n ⁇ c ⁇ ssarily physically attached to the data records.
  • the criterion to .know that the sought data record does not reside in the tre ⁇ is that th ⁇ siz ⁇ of th ⁇ common k ⁇ y pr ⁇ fix of th ⁇ sought data record and the common key of the block is greater than the valu ⁇ of the split node.
  • th ⁇ value of the split nod ⁇ is 1 (of nod ⁇ 313), thus block 310 is not th ⁇ block that accommodates record L (if such record exists). Therefore, the s ⁇ arch for record L is continued from nod ⁇ 314 and link 316. This proc ⁇ dur ⁇ appli ⁇ s to all modify transactions.
  • block 300 is found in th ⁇ mann ⁇ r sp ⁇ cifi ⁇ d abov ⁇ and is associated with the new data record L.
  • Figs. 7 to 9 exemplified a layer ⁇ d index utilizing a P.AIF based indexing scheme as the basic partitioned index and th ⁇ r ⁇ pr ⁇ s ⁇ ntativ ⁇ ind ⁇ x . Thos ⁇ v ⁇ rs ⁇ d in th ⁇ art will readily appreciate that the layered index of the invention is not bound only to PIAF. Thus, for exampl ⁇ , U.S. 5,495,609 illustrat ⁇ s a diff ⁇ r ⁇ nt tri ⁇ . Consid ⁇ r, for example, the trie of Fig.
  • the layer ⁇ d ind ⁇ x of Fig. 10B brings about, thus, a balanced tre ⁇ of blocks, assuring that essentially the same number of I/O operations is requir ⁇ d to reach ⁇ ach and ⁇ v ⁇ ry data r ⁇ cord in the tree.
  • Those v ⁇ rs ⁇ d in the art will readily appreciate that pref ⁇ rably th ⁇ numb ⁇ r of I/O op ⁇ rations is a logarithmic function d ⁇ p ⁇ nding upon th ⁇ numb ⁇ r of data r ⁇ cords and the number of links originated from a block.
  • a layer ⁇ d index with 3 levels allows access to 1,000,000,000 data records.
  • ther ⁇ follows numerical example. Assuming that every block has 1000 far links. Assuming that the size of ⁇ ach far link is 4 byt ⁇ s it r ⁇ adily aris ⁇ s that the size n ⁇ d ⁇ d for r ⁇ pr ⁇ s ⁇ nting the far links is 4000 bytes. Assuming further that th ⁇ nod ⁇ s and the near links within a block occupy another 4000 byt ⁇ s, th ⁇ r ⁇ sulting block - 65 -
  • each block size is less than 10,000 bytes. For sake of discussion assuming that each block size is 20,000 bytes.
  • a layer ⁇ d index that consists of one block (e.g. block 144 in Fig. 7B) as ind ⁇ x lay ⁇ r I x and assuming that it is link ⁇ d to a thousand blocks in th ⁇ layer I. (of which only two blocks 146 and 148 are shown in
  • the layer ⁇ d ind ⁇ x amounts for a total of 1001 blocks ⁇ ach having a siz ⁇ of 20,000 byt ⁇ s. Accordingly, the total space that should be allocated for holding the blocks of the lay ⁇ r ⁇ d ind ⁇ x is about 20 m ⁇ ga byt ⁇ s. This order of size can b ⁇ ⁇ asily accommodated in the int ⁇ mal m ⁇ mory of say, for ⁇ xample, a personal computer. Assuming now that each block in I.
  • the net ⁇ ff ⁇ ct is that by utilizing a lay ⁇ r ⁇ d ind ⁇ x of th ⁇ inv ⁇ ntion (according to th ⁇ latt ⁇ r ⁇ mbodim ⁇ nt) which is wholly accommodated in the internal m ⁇ mory, a million data records can be acc ⁇ ss ⁇ d without I/O ind ⁇ x.
  • th ⁇ r ⁇ sulting layered index of fig. 10B includes two trees having vertical orientation i.e. the first tree structure consisting of blocks - 66 -
  • 159B and 159C (b ⁇ ing on ⁇ form of the basic partitioned index I. ) and second tree having one block 159A (being one form of the basic partitioned index I x ).
  • the trie index with which the technique of the invention is of concern is not confined to the search tr ⁇ disclosed in the '609 patent, and it may encompass other types of tre ⁇ s as ⁇ xplained above.
  • intra-block structure is not necessarily balanced , i.e. nodes inside block are not necessarily arranged in a balanced sfructure. Whilst this fact is s ⁇ mingly a drawback, those vers ⁇ d in the art will readily appreciate that its implications on the overall database performance are virtually insignificant. This stems from the fact that intra-block search schem ⁇ is normally p ⁇ rfo ⁇ n ⁇ d in th ⁇ fast internal memory of the computer system.
  • th ⁇ arrangement of a block within a layered index is retained in a balanced structure thereby the number of blocks in a search path is a logarithmic function dep ⁇ nding on the number of data records and refl ⁇ cts therefore the number of I/O access ⁇ s to th ⁇ ⁇ xt ⁇ rnal m ⁇ mory (an op ⁇ ration which is inherently slow) in order to load a desired block to the internal memory.
  • the offset size (in t ⁇ rms of numb ⁇ rs of bits) that is accommodated within each node may be alt ⁇ r ⁇ d, th ⁇ mann ⁇ r of realizing empty pointers (i.e. pointers that point to null - having no children) and others.
  • the latter physical realization flexibility applies also to th ⁇ int ⁇ r-block portion.
  • Th ⁇ r ⁇ tention of the index scheme for both the trie and the repr ⁇ s ⁇ ntativ ⁇ ind ⁇ x is not obligatory as will b ⁇ ⁇ x ⁇ mplifi ⁇ d with r ⁇ f ⁇ r ⁇ nc ⁇ to Fig. 11.
  • Fig. 11 illustrat ⁇ s another approach of balancing an unbalanced tre ⁇ of Fig. 8A (i. ⁇ . constructing a layered index) using a conventional B tre ⁇ as a repres ⁇ ntativ ⁇ ind ⁇ x ov ⁇ r th ⁇ r ⁇ pr ⁇ sentative keys of the unbalanced trie.
  • the so obtained horizontal orient ⁇ d balanced tre ⁇ (lay ⁇ r ⁇ d ind ⁇ x) includ ⁇ s blocks 272 at the upper level (index layer I. ), 270 and 271 at a lower lev ⁇ l (ind ⁇ x lay ⁇ r I x ) and the original blocks of the unbalanced vertical orient ⁇ d tree of Fig.
  • Th ⁇ databas ⁇ fil ⁇ management system of the invention not only copes with the drawbacks of th ⁇ conventional trie ind ⁇ xing fil ⁇ but also offers - 68 -
  • the invention is by no m ⁇ ans bound to th ⁇ sp ⁇ cifi ⁇ d storag ⁇ m ⁇ dium.
  • the storage m ⁇ dium with which th ⁇ pr ⁇ s ⁇ nt inv ⁇ ntion is applicable may also be an internal memory.
  • Th ⁇ r ⁇ follows a d ⁇ scription of the second aspect of the invention.
  • th ⁇ databas ⁇ file managem ⁇ nt system of the invention enables to address diff ⁇ r ⁇ nt typ ⁇ s of data r ⁇ cords using a singl ⁇ ind ⁇ x.
  • each data record belonging to a given typ ⁇ is associated with a given designator.
  • the latter forms part of the key of the data r ⁇ cord constituting a d ⁇ signator k ⁇ y.
  • the designator is unique for ev ⁇ ry typ ⁇ of data.
  • a data dictionary maintains meta-data information, which provides information on the data records as a function of the type of th ⁇ r ⁇ cords.
  • meta-data information provides information on the data records as a function of the type of th ⁇ r ⁇ cords.
  • th ⁇ data records it is need ⁇ d to maintain a d ⁇ signator, to b ⁇ abl ⁇ to id ⁇ ntify th ⁇ d ⁇ signator and by using th ⁇ meta-data information, to b ⁇ abl ⁇ to identify or construct the designated key as w ⁇ ll as other information such as the r ⁇ cord siz ⁇ .
  • the search schem ⁇ of the index is oblivious to the meta-data. It locates th ⁇ r ⁇ cord from th ⁇ d ⁇ signator (or composite) key without using the meta-data.
  • the meta-data is required to construct the (composite) designator key and, onc ⁇ the record is retrieved, to determine the properti ⁇ s of th ⁇ r ⁇ cord.
  • the designator -B- is identified, and information on the record designated B is available from the meta-data. For example the size of the book record, its fields and the fields that ar ⁇ th ⁇ k ⁇ y fields.
  • d ⁇ signat ⁇ d data r ⁇ cords is not bound to only on ⁇ typ ⁇ , but rather (pref ⁇ rably) mor ⁇ than on ⁇ typ ⁇ may b ⁇ tr ⁇ at ⁇ d by the designated ind ⁇ x and as will be explained b ⁇ low with subordination r ⁇ lationship.
  • data records of different types may b ⁇ addr ⁇ ssed from the same index.
  • the keys of data records that belong to different types do - 70 -
  • a layer ⁇ d index which is also a designated index based on a trie as its basic partitioned layered index of the kind d ⁇ picted in Fig. 8A.
  • Th ⁇ siz ⁇ of th ⁇ k ⁇ y of th ⁇ r ⁇ cords that b ⁇ long to th ⁇ "Borrow ⁇ r" ⁇ ntity is 6 byt ⁇ s long, whereas the size of the key of the records that b ⁇ long to th ⁇ "Book” ⁇ ntity is 5 bytes long. Inserting books to the designated index of fig.
  • th ⁇ data structure of fig. 12 that includes a designated index that address 2 types of data r ⁇ cords - data r ⁇ cords a-k which are assigned with the designator A and data records w-x which are assigned with th ⁇ d ⁇ signator B.
  • record of type X or r ⁇ cord designated X are used to describe a record having a designat ⁇ d k ⁇ y and th ⁇ designator is X.
  • th ⁇ latter example illustrated on ⁇ manner of realizing designated data (i.e. pre-p ⁇ nding as prefix a character, string or any number of bits) to the key of th ⁇ data r ⁇ cord
  • prefix a character, string or any number of bits i.e. pre-p ⁇ nding as prefix a character, string or any number of bits
  • the proposed designator may be realiz ⁇ d in any known manner provided that the designator distinguishes betw ⁇ n diff ⁇ r ⁇ nt data r ⁇ cords, treated as part of the key, and ther ⁇ fore forms part of the search.
  • wheth ⁇ r th ⁇ designator (i) forms part of the data record (or key portion), (ii) being stores elsewher ⁇ ( ⁇ .g. in a different data structure), or (iii) it may b ⁇ defined elsewh ⁇ r ⁇ , or ⁇ v ⁇ n d ⁇ fin ⁇ d oth ⁇ rwis ⁇ . .An ⁇ xampl ⁇ of th ⁇ latt ⁇ r is a trie structure that is associated with data records all of the sam ⁇ type (for exampl ⁇ , all ar ⁇ d ⁇ signat ⁇ d with a character A ).
  • data record d is access ⁇ d from node 266 by link 270.
  • the first character of data record d is A - the designator.
  • Fig. 13A illustrates a designated index 800 (in the form of PAIF) with four data records 802, 804, 806 and 808 (of which only the designator keys are shown) associated thereto.
  • the data records are all of the sam ⁇ type as readily arises from the designator 'A' that is prep ⁇ nd ⁇ d to ⁇ ach of the data records.
  • Fig. 13B ther ⁇ is shown th ⁇ PAIF 800 with new data record (812) with a composite key A12355B940201333333 (the designator of r ⁇ cord 81 is B). Th ⁇ new data record is subordinated to data r ⁇ cord 806 whos ⁇ k ⁇ y is A 12355. According to the PAIF index, node 814 indicated that the discerning offset is 6 and that the value B links to data record 812 (having the value B at offset 6).
  • Fig. 13C illustrates the PAIF 800 in which another data record 820 is inserted.
  • Data record 820 which represents another instance of B type data record that is subordinated to A typ ⁇ data r ⁇ cord (806) is inserted to th ⁇ PAIF.
  • Th ⁇ disceming offset is 11 (the value of the new node 822) and the link values ther ⁇ of are '0' and ' 1 ' to data records 812 and 820, respectively.
  • Fig. 13D illustrates the PAIF 800, where a differ ⁇ nt typ ⁇ s of records are subordinated to record 806.
  • Data record of typ ⁇ 'D' (824) b ⁇ ing subordinat ⁇ d to data record of type 'A' is linked from node 814 by link 823 having the value D.
  • the PAIF already represents data record d ⁇ signat ⁇ d B wh ⁇ r ⁇ th ⁇ latt ⁇ r is subordinat ⁇ d to th ⁇ data record designated A.
  • Fig. 13E there is shown another embodiment of the P.AIF of Fig. 13D implemented slightly differently.
  • the subordinated data records 812, 820 and 824 are repr ⁇ s ⁇ nt ⁇ d and maintained in the data file without their key prefix that is the designator k ⁇ y of the record 806 (i.e. the prefixed key A12355 is omitted).
  • data record 812 the infoimation availabl ⁇ from the meta-data according to the designator B allows to ⁇ xtract the following information: (i) identify that part of the key is missing,
  • Th ⁇ implem ⁇ ntation described above obviate the n ⁇ cessity to duplicate the repr ⁇ s ⁇ ntation of th ⁇ d ⁇ signat ⁇ d k ⁇ y of data r ⁇ cord 806 in respect of each subordinated data record (by the particular ⁇ xampl ⁇ of Fig. 13D, th ⁇ sp ⁇ cifi ⁇ d pr ⁇ fix A12355 is duplicated thre ⁇ tim ⁇ s for r ⁇ cords 812, 820 and 824).
  • Replacing the key prefix with a link can save space (if the size of th ⁇ pr ⁇ fix ⁇ d is larg ⁇ r than the representation of the link) and allows to access the record that the subordination relates to without necessitating a separat ⁇ s ⁇ arch.
  • Fig. 13D, 13E illustrate that the subornation relationship characteristics of the invention is not limited to any sp ⁇ cific realization.
  • each of the subordinated records 812, 820, 824 can hav ⁇ r ⁇ cords subordinated to it.
  • Moreov ⁇ r, th ⁇ re are som ⁇ oth ⁇ r advantag ⁇ s that ar ⁇ brought about using th ⁇ proposed technique of the inv ⁇ ntion, ⁇ .g. maintaining data int ⁇ grity.
  • an insert transaction that is applied to the PAIF 800 of Fig. 13E, of data record designated B with a composite k ⁇ y A12355B930101123456 subordinat ⁇ d to data r ⁇ cord 806 (having designated key A12355).
  • Th ⁇ s ⁇ arch leads to node 822.
  • the value at key offset 11 of the insert ⁇ d data r ⁇ cord is 0 thus r ⁇ cord 812 is accessed.
  • the search key of record 812 needs to be constructed (by accessing record 806 via link 826) and the insertion of th ⁇ n ⁇ w data record can be compl ⁇ t ⁇ d. It should be noted that th ⁇ link to r ⁇ cord 806 obviates the ne ⁇ d to conduct a separate search for record 806 by it's key in order to confirm it's exist ⁇ nc ⁇ . Thus th ⁇ maintenance of data integrity is more ⁇ ffici ⁇ nt.
  • P ⁇ rforming th ⁇ sam ⁇ data int ⁇ grity check using the sp ⁇ cified B-tre ⁇ ind ⁇ x implies considerabl ⁇ ov ⁇ rh ⁇ ad sinc ⁇ it is r ⁇ quired two phase operation.
  • a search is applied to the index of data records of type 'A' in order to find data record whose key is 12355. Only upon finding it record of type B can be insert ⁇ d (and a s ⁇ parat ⁇ index file is normally updated).
  • th ⁇ data structure of fig 20E exemplifies other advantages r ⁇ sulting from th ⁇ fact that subordinat ⁇ d data r ⁇ cords ar ⁇ link ⁇ d to th ⁇ ir "parent" r ⁇ cord. For example, if record from type A is a customer and record from type B is an invoice, it is usually needed to access the invoice details with the customer details. The link from the invoice to the customer obviat ⁇ s a separate search for the customer details. - 74 -
  • the mov ⁇ from node 814 to node 812 can be by the split link. If the split link does not exist, for exampl ⁇ in fig. 7F on ⁇ n ⁇ eds to use the link 421 of node B' (422) when it is needed to advance by link 400 from node B (423) to node E (424).
  • th ⁇ r ⁇ is shown a schematic illustration of a designat ⁇ d ind ⁇ x according to on ⁇ embodiment of th ⁇ invention.
  • the ind ⁇ x contains two s ⁇ arch paths to on ⁇ d ⁇ signated data record ("DEPOSIT" data - 75 -
  • r ⁇ cord such that the deposit can be access ⁇ d by ⁇ ach of the two composite keys - a designat ⁇ d k ⁇ y that includes the key fields account number, date and client number and a second designated key that includes th ⁇ k ⁇ y fi ⁇ lds cli ⁇ nt numb ⁇ r, dat ⁇ and account number.
  • th ⁇ account data record has a d ⁇ signat ⁇ d k ⁇ y 'A 133333' (1201)
  • Updating a d ⁇ posit for the account can b ⁇ impl ⁇ m ⁇ nted by means of designated record 203 subordinated to designated record 201.
  • the P.AIF would allow to access records 201,203 from node 207 by link 206.
  • data record 204 r ⁇ pr ⁇ s ⁇ nt s a deposit of a client.
  • the key of record 202 is B133333. Updating a deposit 204 to a client 202 can b ⁇ impl ⁇ m ⁇ nt ⁇ d by th ⁇ index 200 and node 209 linked (208) to data record 204.
  • the k ⁇ y of data r ⁇ cord 203 is. 'A133333C01019811346' (jfc, ).
  • the key of record 204 is Bl 1346D010198133333 (k. )
  • This drawback may be overcome by repr ⁇ s ⁇ nting a single DEPOSIT record as a multidim ⁇ nsion r ⁇ cord 210.
  • Data r ⁇ cord 210 (Fig. 14) is a multi-dimension record that is updated and accessed by the designat ⁇ d ind ⁇ x 200 according to the designator key k x (designator C) and according to the designator key k 2 (designator D). (note that when data record is a multi-dimension record, the designator of th ⁇ r ⁇ cord d ⁇ p ⁇ nds on th ⁇ k ⁇ y that is b ⁇ ing used) The path in the index by k x leads to nod ⁇ 207 and from that node to the designator C of record 210.
  • the information in the m ⁇ ta-data according to th ⁇ d ⁇ signator C allows to construct th ⁇ r ⁇ l ⁇ vant structure.
  • d ⁇ signator D of r ⁇ cord 210 Th ⁇ information in the meta-data according to the designator D allows to construct th ⁇ r ⁇ levant structure, for example construct a data structure that includes the key k 2 .
  • the search path defined by the search keys of r ⁇ cord 203 leads to th ⁇ first fi ⁇ ld 212 having a valu ⁇ 'C (which is th ⁇ d ⁇ signator according to s ⁇ arch key k x ).
  • the third fi ⁇ ld points to data r ⁇ cord 201.
  • Th ⁇ s ⁇ cond field 215 (having a value 'D' - which is the designator according to search key k 2 ) of th ⁇ same data structure 210 is accessibl ⁇ by s ⁇ arch path that is defined by the s ⁇ arch k ⁇ y of r ⁇ cord 204.
  • the fourth field has a link to the actual data record 202.
  • data record 210 can include other fields.
  • the inv ⁇ ntion is by no m ⁇ ans bound to a giv ⁇ n realization and accordingly the manner of realizing data record 210 as depicted in Fig 14 is only one out of many possible variants. Th ⁇ number of search paths is not limited. As had been ⁇ xplain ⁇ d above with ref ⁇ r ⁇ nce also to Fig. 13E, if the sought data record is Axxxx (i.e.
  • the specified description which provides two (and in the g ⁇ n ⁇ ral cas ⁇ at l ⁇ ast two) s ⁇ arch paths to on ⁇ physical occurrence of data records constitutes the multi-dim ⁇ nsional data structure which is a designated index that contains at least two search paths to one data record (called multi-dimension record).
  • Relation among data el ⁇ ments - Fig. 15 illustrates another feature of - 77 -
  • data record A (a book data record) has C, F, J, K and L data records subordinated thereto.
  • L on ⁇ -to-on ⁇
  • on ⁇ -to-many relations may easily be r ⁇ aliz ⁇ d.
  • Consid ⁇ r for ⁇ xampl ⁇ , that a book has many categories (L), i.e. one-to-many, howev ⁇ r, it has only on ⁇ abstract (K), i.e. one-to-one.
  • a one-to-on ⁇ data relationship is implem ⁇ nt ⁇ d by a d ⁇ signat ⁇ d (composit ⁇ ) k ⁇ y of two components: the first is th ⁇ d ⁇ signat ⁇ d k ⁇ y of its subordinating r ⁇ cord and th ⁇ s ⁇ cond is th ⁇ d ⁇ signator of th ⁇ subordinat ⁇ d r ⁇ cord (sinc ⁇ it is a on ⁇ -to-on ⁇ relation th ⁇ r ⁇ is no n ⁇ d to us ⁇ th ⁇ k ⁇ y field of the subordinated r ⁇ cord).
  • Wh ⁇ r ⁇ as a one-to-many relationship is impl ⁇ mented by a designator (composite) key whose first component is the designator key of the subordinating record, and whose second component consists of th ⁇ d ⁇ signator and k ⁇ y of th ⁇ subordinat ⁇ d record.
  • the one-to-on ⁇ r ⁇ lation b ⁇ tw ⁇ n a book and its abstract is maintained by defining th ⁇ k ⁇ y of L to be .AxxxL, wher ⁇ Axxx is th ⁇ d ⁇ signat ⁇ d key of A, L is the designator of th ⁇ k ⁇ y of record L.
  • the one-to-many relation betw ⁇ n a book and a category is maintained by defining the key of L to be AxxxLyyy, wh ⁇ r ⁇ Axxx is the designated key of A, L is the designator of the key and yyy are the key field(s) of record L.
  • the r ⁇ lational mod ⁇ l considers all data as consisting of tables. Each table consists of records of the same structure, call ⁇ d tuples. Suppos ⁇ , th ⁇ - 78 -
  • tuples consist of fields FI, F2 and F3. Each such field is a key. If k ⁇ y F2 is subordinat ⁇ to key FI, and key F3 is subordinate to key F2, we can easily construct th ⁇ tabl ⁇ : to r ⁇ trieve its tupl ⁇ s, follow the designator of key FI, and from there for each value of FI, follow th ⁇ d ⁇ signator of F2, and in th ⁇ same manner continue to F3. Each such triple defin ⁇ s a tupl ⁇ of th ⁇ table.
  • Performing the proj ⁇ ction of (F2, F3) might b ⁇ ⁇ xp ⁇ nsiv ⁇ , sinc ⁇ it requires searching all valu ⁇ s of FI first. How ⁇ v ⁇ r, if this op ⁇ ration is common, the designat ⁇ d index should also maintain the search path (F2, F3, FI).
  • the designat ⁇ d ind ⁇ x enables to repr ⁇ s ⁇ nt additional data mod ⁇ ls, including . relational database, an obj ⁇ ct oriented system, and a hierarchical database, wher ⁇ substantially no data is duplicated.
  • Th ⁇ obj ⁇ ct ori ⁇ nt ⁇ d approach considers all data as objects. Every object belongs to a class, which determines its structure and which methods (functions) can be applied to it. Th ⁇ classes are organized in a hierarchy, from which structure and method may be inherit ⁇ d. Th ⁇ obj ⁇ ct-ori ⁇ nt ⁇ d approach is ⁇ ph ⁇ m ⁇ ral — an obj ⁇ ct ⁇ xists only whil ⁇ th ⁇ program that cr ⁇ at ⁇ d it is active Objects that need to be supported for a long ⁇ r p ⁇ riod of tim ⁇ , ar ⁇ d ⁇ fin ⁇ d as persistent. Th ⁇ s ⁇ obj ⁇ cts are stor ⁇ d on th ⁇ disk and ar ⁇ availabl ⁇ to - 79 -
  • the multi-model d ⁇ signat ⁇ d ind ⁇ x can easily support such object. Since their structure is uniformly encoded with the aid of designators, later incarnations of the program as well as other programs can access thes ⁇ p ⁇ rsist ⁇ nt obj ⁇ cts. Not ⁇ that at th ⁇ sam ⁇ time a persist ⁇ nt object can also be part of a relational table. Th ⁇ r ⁇ is no n ⁇ d to duplicate data.
  • th ⁇ relational approach considers all data as tables.
  • the object-relational approach provides an int ⁇ rfac ⁇ to convert tables to objects.
  • the int ⁇ rfac ⁇ requires the user to sp ⁇ cify th ⁇ r ⁇ lationship between the obj ⁇ cts and the table attribut ⁇ s. If som ⁇ attributes thems ⁇ lves are tables, we n ⁇ d to allow relational algebra operations on thes ⁇ tabl ⁇ s too. Th ⁇ s ⁇ conversions are performed by the application program.
  • Th ⁇ databas ⁇ is unabl ⁇ to optimize the queri ⁇ s.
  • the application program's queri ⁇ s are - 80 -
  • a claim can be efficiently access ⁇ d both from th ⁇ customer object and the policy object and being from a typ ⁇ structured as for example in fig.16 (structure 210).
  • the object-orient ⁇ d approach allows users to add user-d ⁇ fin ⁇ d typ ⁇ s (UDT) and us ⁇ r-d ⁇ fmed functions (UDF).
  • UDT user-d ⁇ fin ⁇ d typ ⁇ s
  • UDF us ⁇ r-d ⁇ fmed functions
  • the relation b ⁇ tween the photo data to the claim is handled in the same manner as with built in classes and relations.
  • the new UDT can be bas ⁇ d on or b ⁇ related (by subordination) to any other data type.
  • th ⁇ application can navigate to the new UDT from the defin ⁇ d classes from which the new UDT can inherent m ⁇ thods and other properties.
  • wh ⁇ n navigating in the index one would navigate to a claim from which on ⁇ could reach the photo as well as any other part of the claim's data.
  • the network and hierarchical models have be ⁇ n r ⁇ plac ⁇ d by th ⁇ relational model. However, even though these models are obsolete, they have some advantages (as well as many disadvantages) over the tabl ⁇ -ori ⁇ nt ⁇ d impl ⁇ m ⁇ ntation. Onc ⁇ a r ⁇ cord is retrieved the addr ⁇ ss ⁇ s of related records are readily available.
  • the B-tre ⁇ implementation requires us to maintain two tre ⁇ s: on ⁇ of th ⁇ customers and home address ⁇ s, and th ⁇ s ⁇ cond of loans and customers.
  • th ⁇ s ⁇ cond of loans and customers For having retriev ⁇ d the data of a loan, the names of the customers that - 82 -
  • the proposed multi-model designat ⁇ d ind ⁇ x (such as for example in fig. 16), once reaching the node repr ⁇ s ⁇ nting th ⁇ loan , on ⁇ can continue to a designator that identifies the customers that took that loan (for exampl ⁇ r ⁇ cords of typ ⁇ B). Normally, at most on ⁇ disk access is required for each customer.
  • the proposed multi-dimensional d ⁇ signat ⁇ d ind ⁇ x has the advantages of the network model, without its disadvantages. While the network model treated each node separat ⁇ ly, and was susceptible to long search paths, the multi-model designat ⁇ d index treats all data uniformly and the length of the search paths in probably logarithmic such that the bas ⁇ of th ⁇ logarithm is th ⁇ block siz ⁇ . Thus, in practice, the search requir ⁇ s a singl ⁇ disk access.
  • Th ⁇ client-serv ⁇ r model enabl ⁇ s ⁇ ffici ⁇ nt impl ⁇ m ⁇ ntations of th ⁇ relational model.
  • the server central computer
  • clients oth ⁇ r computers
  • an application n ⁇ ds data it formulat ⁇ s an SQL qu ⁇ iy, which is sent by th ⁇ cli ⁇ nt to th ⁇ s ⁇ rv ⁇ r.
  • Th ⁇ s ⁇ rv ⁇ r evaluates the query and returns the resulting tabl ⁇ to th ⁇ client.
  • the interface betw ⁇ n the client and the serv ⁇ r is via SQL queries — the serv ⁇ r is unaware of th ⁇ int ⁇ mal data structures and code of the application.
  • Th ⁇ designated index allows to apply the client-s ⁇ rver approach for the object-oriented and object-relational models.
  • the application program sends the path of k ⁇ ys and link d ⁇ signators leading to the desir ⁇ d nod ⁇ to the server. Based on this data the server can fulfill the request without any .knowledg ⁇ of th ⁇ data structure of the application program.
  • the client and the s ⁇ rv ⁇ r should agr ⁇ on th ⁇ nam ⁇ s of th ⁇ f ⁇ lds and th ⁇ ir d ⁇ signators.
  • Th ⁇ s ⁇ rv ⁇ r n ⁇ d not be aware of the type of data of each such field, and its semantic content.
  • On ⁇ of the most important f ⁇ atur ⁇ s of a tri ⁇ bas ⁇ d data structure is the modest size of its representation.
  • the PAIF for example maintains ev ⁇ n smaller size than a conventional trie b ⁇ caus ⁇ of it's compr ⁇ ss ⁇ d r ⁇ pr ⁇ sentation.
  • the last lev ⁇ l of the P.AIF index contains a trie with links that point to other trie nodes in th ⁇ sam ⁇ block, and links that point to r ⁇ cords.
  • Th ⁇ ind ⁇ x contains exactly N pointers to these records. If each pointer r ⁇ quir ⁇ s 4 byt ⁇ s, the size needed for the pointers is 4N bytes. In addition, each pointer has a direction, (1 byt ⁇ ) thus the total is 5N bytes.
  • n ⁇ N - l trie nodes Let d denote the av ⁇ rag ⁇ numb ⁇ r of children of a trie nod ⁇ th ⁇ n n ⁇ N l ⁇ d - ⁇ ) . Sinc ⁇ in practice d » 2 , n « N . Each trie node has a l ⁇ vel numb ⁇ r (1 byt ⁇ ). Sinc ⁇ each trie node has at most one incoming tri ⁇ link, th ⁇ r ⁇ ar ⁇ at most n - 1 tri ⁇ links, ⁇ ach tri ⁇ link has a label, which is a single character and an intra-block pointer (1 byte), thus a total of 3n bytes. Thus in the worst cas ⁇ it is n ⁇ d ⁇ d 3n + 4N ⁇ IN byt ⁇ s in th ⁇ worst cas ⁇ . And b ⁇ tw ⁇ n 4N and 6N byt ⁇ s in practice.
  • Perfo ⁇ ning th ⁇ sam ⁇ analysis but from anoth ⁇ r angl ⁇ Consid ⁇ r two point ⁇ rs p and p 2 that ⁇ manat ⁇ from nod ⁇ v of l ⁇ v ⁇ l k . Let x be a k ⁇ y reachable from p ⁇ andx 2 a key reachable from p 2 . Then jtj and x 2 share the first & -1 characters. In A PAIF structure, each one of these characters is represented at most once. In the B-tree repr ⁇ s ⁇ ntation it is needed to explicitly represent th ⁇ first k character of each key.
  • first two records reside in the same block, then it is possible to keep a single full sized point ⁇ r for the first pointer to a block, and instead of keeping a pointer for each of the r ⁇ maining outgoing links to that block, computing their displacement, i.e., if the first two records reside in block number 2000 and the third record in block 7000 it is possible to maintain the structure 2000(e,f) 7000(h).Th ⁇ savings would be much more substantial if a larger number of outgoing links point all to the same block. If k such links point to - 85 -
  • fig. 17A shows a nod ⁇ 2000 of a trie with the links 2010, 2011, 2012 (values 5,9,A resp ⁇ ctiv ⁇ ly) that address 3 data records - 2002, 2004, 2006 at disk address 3000, 5000, 7000 respectively.
  • the size ne ⁇ d ⁇ d to r ⁇ pr ⁇ s ⁇ nt th ⁇ link valu ⁇ s (1 byt ⁇ for each link) and the pointers (4 byt ⁇ s) to th ⁇ data is 15 bytes.
  • r ⁇ pr ⁇ s ⁇ nt th ⁇ link is the address to block 2020 (4 bytes) and th ⁇ link values to the data records 2002, 2004, 2006 that reside in the block (1 byte for each link value).
  • the size ne ⁇ d ⁇ d to r ⁇ present th ⁇ point ⁇ r to the data block and the valu ⁇ of th ⁇ links is only 7 byt ⁇ s - (3000:5,9,A).
  • node 2000 can include links to other data records or data blocks (such as link 2024 to data block 2022 accommodating data r ⁇ cord 2008).
  • the database may b ⁇ located in a central location, or distributed among two or more r ⁇ mot ⁇ locations.
  • Figs. 18A-D th ⁇ r ⁇ ar ⁇ shown four b ⁇ nchmark graphs demonstrating the enhanced performance, in terms of response time and file size of databas ⁇ utilizing a file managem ⁇ nt system that employs a system of the invention vs. commercially available Ctr ⁇ based database.
  • the inserts are realized through Uniface application running in Windows (for workgroup) op ⁇ rating syst ⁇ m.
  • Th ⁇ benchmark of Fig. 18A concerns measuring the time in minutes for inserting an ev ⁇ r increasing number of a priori sorted data records to a file (0-1,000,000).
  • the larger number of inserts th ⁇ gr ⁇ at ⁇ r is th ⁇ improv ⁇ m ⁇ nt in terms of response time of the database file managem ⁇ nt syst ⁇ m of th ⁇ inv ⁇ ntion.
  • inserting 1 million records takes about 669 minutes in the Ctree based database as compared to only 65 minutes in the syst ⁇ m of th ⁇ inv ⁇ ntion.
  • Mor ov ⁇ r, th ⁇ r ⁇ sponse time in th ⁇ fil ⁇ management system of the invention increases by only a small extent as the numb ⁇ r of records increases, as opposed to significant increas ⁇ in th ⁇ r ⁇ spons ⁇ tim ⁇ in the counterpart syst ⁇ m according to the prior art.
  • the b ⁇ nchmark of Fig. 18B illustrates the file size in mega bytes as a function of number of data records in the file (0-1,000,000). As shown in Fig. 18B, the larger number of r ⁇ cords the greater is the improvem ⁇ nt in t ⁇ ims of file size in th ⁇ databas ⁇ fil ⁇ manag ⁇ m ⁇ nt syst ⁇ m of th ⁇ inv ⁇ ntion. Thus for 1 million r ⁇ cords th ⁇ fil ⁇ siz ⁇ of Ctr ⁇ bas ⁇ d fil ⁇ is about 151 m ⁇ ga byt ⁇ as compared to only 22 mega byte in the database file managem ⁇ nt syst ⁇ m of th ⁇ inv ⁇ ntion.
  • Graphs 18C and 18D are similar to thos ⁇ shown in Figs. 18A and 12B apart from the fact that in the former (18C and 18D) th ⁇ data r ⁇ cords ar ⁇ ins ⁇ rted randomly whereas in the latter (18A and 18B) the data records are a - 87 -
  • the system of the invention is more efficient in terms of both respons ⁇ time and file size.
  • Figs. 19A-D illustrates a benchmark graphs of a system of the invention (operating under DOS operating system) vs. commercially available Btre ⁇ bas ⁇ d databas ⁇ syst ⁇ m. The results are as before i.e. the system of the invention is more efficient in terms of both respons ⁇ time and file siz ⁇ .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Cette invention a trait à un système de gestion de base de données permettant d'accéder à des enregistrements de données, celles-ci étant exécutées dans un système de traitement de données. Les enregistrements de données sont liés à un index d'arborescence de type 'trie' agencé en blocs (402, 405, 406 and 407), ceux-ci étant mémorisés dans un support de données. L'index de type 'trie' (A, B and I, element 402), qui donne accès à des enregistrements de données ou facilite leur mise à jour et ce, à l'aide d'une ou de plusieurs clefs, fait montre de réceptivité à l'égard d'une structure non équilibrée de blocs. L'invention concerne également à une méthode de mise au point d'index à couches agencé en blocs, laquelle méthode consiste à élaborer l'index de type 'trie' et à construire un index représentatif coiffant les clefs représentatives de l'index de type 'trie'. Cet index à couches, qui donne accès à des enregistrements de données ou facilite leur mise à jour, constitue un structure équilibrée de blocs.
EP99901096A 1998-01-22 1999-01-22 Base de donnees Ceased EP1049990A4 (fr)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
WOPCT/IL98/00029 1998-01-22
IL9800029 1998-01-22
PCT/IL1999/000038 WO1999038094A1 (fr) 1998-01-22 1999-01-22 Base de donnees

Publications (2)

Publication Number Publication Date
EP1049990A1 true EP1049990A1 (fr) 2000-11-08
EP1049990A4 EP1049990A4 (fr) 2004-09-08

Family

ID=11062302

Family Applications (1)

Application Number Title Priority Date Filing Date
EP99901096A Ceased EP1049990A4 (fr) 1998-01-22 1999-01-22 Base de donnees

Country Status (12)

Country Link
EP (1) EP1049990A4 (fr)
JP (1) JP2002501256A (fr)
CN (1) CN1292901A (fr)
AU (1) AU759360B2 (fr)
BR (1) BR9907227A (fr)
CA (1) CA2319177A1 (fr)
HU (1) HUP0101298A3 (fr)
NO (1) NO20003759L (fr)
NZ (1) NZ505767A (fr)
RU (1) RU2000122092A (fr)
TR (1) TR200002119T2 (fr)
WO (1) WO1999038094A1 (fr)

Families Citing this family (42)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6208993B1 (en) 1996-07-26 2001-03-27 Ori Software Development Ltd. Method for organizing directories
US6175835B1 (en) 1996-07-26 2001-01-16 Ori Software Development, Ltd. Layered index with a basic unbalanced partitioned index that allows a balanced structure of blocks
US6675173B1 (en) 1998-01-22 2004-01-06 Ori Software Development Ltd. Database apparatus
JP2003505791A (ja) * 1999-07-22 2003-02-12 オリ・ソフトウェア・ディベロップメント・リミテッド ディレクトリを構成する方法
GB0007868D0 (en) 2000-03-31 2000-05-17 Koninkl Philips Electronics Nv Methods and apparatus for editing digital video recordings and recordings made by such methods
GB2367917A (en) 2000-10-12 2002-04-17 Qas Systems Ltd Retrieving data representing a postal address from a database of postal addresses using a trie structure
GB2369695B (en) * 2000-11-30 2005-03-16 Indigo One Technologies Ltd Database
US6804677B2 (en) 2001-02-26 2004-10-12 Ori Software Development Ltd. Encoding semi-structured data for efficient search and browsing
US7287033B2 (en) 2002-03-06 2007-10-23 Ori Software Development, Ltd. Efficient traversals over hierarchical data and indexing semistructured data
EP1437662A1 (fr) * 2003-01-10 2004-07-14 Deutsche Thomson-Brandt Gmbh Procédé et dispositif d'accés d'une base de données
US7366725B2 (en) 2003-08-11 2008-04-29 Descisys Limited Method and apparatus for data validation in multidimensional database
US7734661B2 (en) 2003-08-11 2010-06-08 Descisys Limited Method and apparatus for accessing multidimensional data
US20050065960A1 (en) * 2003-09-19 2005-03-24 Jen-Lin Chao Method and system of data management
US7908242B1 (en) 2005-04-11 2011-03-15 Experian Information Solutions, Inc. Systems and methods for optimizing database queries
EP2074572A4 (fr) 2006-08-17 2011-02-23 Experian Inf Solutions Inc Système et procédé pour fournir une marque pour un véhicule d'occasion
US7912865B2 (en) 2006-09-26 2011-03-22 Experian Marketing Solutions, Inc. System and method for linking multiple entities in a business database
US8036979B1 (en) 2006-10-05 2011-10-11 Experian Information Solutions, Inc. System and method for generating a finance attribute from tradeline data
US8606666B1 (en) 2007-01-31 2013-12-10 Experian Information Solutions, Inc. System and method for providing an aggregation tool
US8204856B2 (en) 2007-03-15 2012-06-19 Google Inc. Database replication
US8285656B1 (en) 2007-03-30 2012-10-09 Consumerinfo.Com, Inc. Systems and methods for data verification
WO2008147918A2 (fr) 2007-05-25 2008-12-04 Experian Information Solutions, Inc. Système et procédé pour la détection automatisée de jeux de données jamais payés
US9690820B1 (en) 2007-09-27 2017-06-27 Experian Information Solutions, Inc. Database system for triggering event notifications based on updates to database records
US8312033B1 (en) 2008-06-26 2012-11-13 Experian Marketing Solutions, Inc. Systems and methods for providing an integrated identifier
US8478798B2 (en) 2008-11-10 2013-07-02 Google Inc. Filesystem access for web applications and native code modules
US9152727B1 (en) 2010-08-23 2015-10-06 Experian Marketing Solutions, Inc. Systems and methods for processing consumer information for targeted marketing applications
US9147042B1 (en) 2010-11-22 2015-09-29 Experian Information Solutions, Inc. Systems and methods for data verification
US9483606B1 (en) 2011-07-08 2016-11-01 Consumerinfo.Com, Inc. Lifescore
US20130013605A1 (en) * 2011-07-08 2013-01-10 Stanfill Craig W Managing Storage of Data for Range-Based Searching
US9853959B1 (en) 2012-05-07 2017-12-26 Consumerinfo.Com, Inc. Storage and maintenance of personal data
US9697263B1 (en) 2013-03-04 2017-07-04 Experian Information Solutions, Inc. Consumer data request fulfillment system
US10223637B1 (en) 2013-05-30 2019-03-05 Google Llc Predicting accuracy of submitted data
US10102536B1 (en) 2013-11-15 2018-10-16 Experian Information Solutions, Inc. Micro-geographic aggregation system
US9529851B1 (en) 2013-12-02 2016-12-27 Experian Information Solutions, Inc. Server architecture for electronic data quality processing
US10262362B1 (en) 2014-02-14 2019-04-16 Experian Information Solutions, Inc. Automatic generation of code for attributes
US9576030B1 (en) 2014-05-07 2017-02-21 Consumerinfo.Com, Inc. Keeping up with the joneses
US10242019B1 (en) 2014-12-19 2019-03-26 Experian Information Solutions, Inc. User behavior segmentation using latent topic detection
US10678894B2 (en) 2016-08-24 2020-06-09 Experian Information Solutions, Inc. Disambiguation and authentication of device users
CA3050139A1 (fr) 2017-01-31 2018-08-09 Experian Information Solutions, Inc. Ingestion de donnees heterogenes a grande echelle et resolution d'utilisateur
CN110807028B (zh) * 2018-08-03 2023-07-18 伊姆西Ip控股有限责任公司 用于管理存储系统的方法、设备和计算机程序产品
US10963434B1 (en) 2018-09-07 2021-03-30 Experian Information Solutions, Inc. Data architecture for supporting multiple search models
US11941065B1 (en) 2019-09-13 2024-03-26 Experian Information Solutions, Inc. Single identifier platform for storing entity data
US11880377B1 (en) 2021-03-26 2024-01-23 Experian Information Solutions, Inc. Systems and methods for entity resolution

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5276872A (en) * 1991-06-25 1994-01-04 Digital Equipment Corporation Concurrency and recovery for index trees with nodal updates using multiple atomic actions by which the trees integrity is preserved during undesired system interruptions
US5404510A (en) * 1992-05-21 1995-04-04 Oracle Corporation Database index design based upon request importance and the reuse and modification of similar existing indexes
JP2583010B2 (ja) * 1993-01-07 1997-02-19 インターナショナル・ビジネス・マシーンズ・コーポレイション 多層インデックス構造におけるローカルインデックステーブル及び大域インデックステーブルの間の一貫性を維持する方法
DE69401662T2 (de) * 1993-07-07 1997-08-21 Europ Computer Ind Res Datenbankstrukturen
US5651099A (en) * 1995-01-26 1997-07-22 Hewlett-Packard Company Use of a genetic algorithm to optimize memory space
US5765168A (en) * 1996-08-09 1998-06-09 Digital Equipment Corporation Method for maintaining an index

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
No further relevant documents disclosed *
See also references of WO9938094A1 *

Also Published As

Publication number Publication date
HUP0101298A3 (en) 2003-07-28
BR9907227A (pt) 2001-09-04
NZ505767A (en) 2003-09-26
CA2319177A1 (fr) 1999-07-29
EP1049990A4 (fr) 2004-09-08
NO20003759L (no) 2000-09-20
HUP0101298A2 (hu) 2001-08-28
TR200002119T2 (tr) 2000-12-21
CN1292901A (zh) 2001-04-25
AU759360B2 (en) 2003-04-10
WO1999038094A1 (fr) 1999-07-29
RU2000122092A (ru) 2002-07-27
NO20003759D0 (no) 2000-07-21
AU2071999A (en) 1999-08-09
JP2002501256A (ja) 2002-01-15

Similar Documents

Publication Publication Date Title
AU759360B2 (en) Database apparatus
US6175835B1 (en) Layered index with a basic unbalanced partitioned index that allows a balanced structure of blocks
US6208993B1 (en) Method for organizing directories
US6240418B1 (en) Database apparatus
US20230006144A9 (en) Trie-Based Indices for Databases
US10762071B2 (en) Value-ID-based sorting in column-store databases
US9870382B2 (en) Data encoding and corresponding data structure
US6675173B1 (en) Database apparatus
EP3362916A1 (fr) Optimisation de mémoire cache basée sur une signature en vue d'une préparation de données
US7363284B1 (en) System and method for building a balanced B-tree
EP2788897B1 (fr) Recherche voisin le plus proche flou en texte intégral rangé optimalement
US8312050B2 (en) Avoiding database related joins with specialized index structures
US10599614B1 (en) Intersection-based dynamic blocking
WO2017065888A1 (fr) Éditeur d'étape pour préparation des données
US20070094313A1 (en) Architecture and method for efficient bulk loading of a PATRICIA trie
EP1208479A1 (fr) Procede d'organisation de repertoires
WO2013097065A1 (fr) Dispositif et procédé de traitement de données d'indices
US8812453B2 (en) Database archiving using clusters
US8984301B2 (en) Efficient identification of entire row uniqueness in relational databases
IL137347A (en) Database apparatus
Roumelis et al. Bulk Insertions into xBR-trees
CA2262593C (fr) Appareil de base de donnees
US11899640B2 (en) Method of building and appending data structures in a multi-host environment
MXPA00007026A (en) Database apparatus
US8849866B2 (en) Method and computer program product for creating ordered data structure

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20000821

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LI LU MC NL PT SE

AX Request for extension of the european patent

Free format text: AL PAYMENT 20000819;LT PAYMENT 20000819;LV PAYMENT 20000819;MK PAYMENT 20000819;RO PAYMENT 20000819;SI PAYMENT 20000819

A4 Supplementary search report drawn up and despatched

Effective date: 20040728

17Q First examination report despatched

Effective date: 20050823

REG Reference to a national code

Ref country code: HK

Ref legal event code: WD

Ref document number: 1033010

Country of ref document: HK

APBK Appeal reference recorded

Free format text: ORIGINAL CODE: EPIDOSNREFNE

APBN Date of receipt of notice of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA2E

APBR Date of receipt of statement of grounds of appeal recorded

Free format text: ORIGINAL CODE: EPIDOSNNOA3E

APAF Appeal reference modified

Free format text: ORIGINAL CODE: EPIDOSCREFNE

APAF Appeal reference modified

Free format text: ORIGINAL CODE: EPIDOSCREFNE

REG Reference to a national code

Ref country code: DE

Ref legal event code: R003

APBT Appeal procedure closed

Free format text: ORIGINAL CODE: EPIDOSNNOA9E

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN REFUSED

18R Application refused

Effective date: 20150316