CN1292901A

CN1292901A - Database apparatus

Info

Publication number: CN1292901A
Application number: CN998036986A
Authority: CN
Inventors: 莫谢·沙德门
Original assignee: ORI SOFTWARE DEVELOPMENT Ltd
Current assignee: ORI SOFTWARE DEVELOPMENT Ltd; Ori Software Dev Ltd
Priority date: 1998-01-22
Filing date: 1999-01-22
Publication date: 2001-04-25
Also published as: BR9907227A; AU759360B2; EP1049990A4; RU2000122092A; HUP0101298A2; CA2319177A1; NO20003759L; NZ505767A; EP1049990A1; JP2002501256A; WO1999038094A1; AU2071999A; HUP0101298A3; NO20003759D0; TR200002119T2

Abstract

A database file management system for accessing data records is being executed on data processing system, the data records are linked to a trie index that is arranged in blocks (402, 405, 406 and 407) and being stored in a storage medium. The trie index (A, B and I, element 402) enables accessing or updating the data records by key or keys and being susceptible to an unbalanced structure of blocks. There is provided a method for constructing a layered index arranged in blocks, which includes the steps of providing the trie index and constructing a representative index over the representative keys of the trie index. The layered index enables accessing or updating the data records by key or keys and it constitutes a balanced structure of blocks.

Description

Database facility

The present invention relates to database and data base management system (DBMS).

As everyone knows, Database Systems are set of mutually relevant data file, index and program, and wherein these programs allow one or more users to increase data retrieval and revise the data of storing in these files.The key concept of Database Systems provides so-called data " abstract " and general survey (being also referred to as data model or concept structure) that simplify to the user, this general survey makes domestic consumer needn't handle details, and for example data are what how to organize with access physically.

Existing some known data models of brief overview, i.e. " hierarchical model ", " network model ", " relational model " and " object relationship model ".More detailed discussion for example can be at Henry F.Korth, Abraham Silberschatz, ＂ Database System Concepts ＂, McGRAW-HillInternational Editions, 1986 (or the 3 ^RdEdition (1997)), Chapters 3-5, pp.45-172.

In general, all models of discussing below have a common characteristic, be that they represent each " entity " with one " record " with one or more " fields ", each " field " represents certain given attribute of this entity, and (for example the record of certain this book can comprise following field " BOOKID ", " BOOK NAME ", " TITLE ").Constitute one " key word " by one or more attributes, that is, it identifies this record." BOOKID " serves as key word in the above example.Various models differ from one another, especially on the structure more complicated these record organizations one-tenth.

The relational model of relational model-introduced by Codd is a milestone on the database developing history.Introduce an abstract concept in the relational database,, wherein respectively be listed as and represent each field with table (being called " relation ") representative according to these abstract concept data, each row is represented each record.

Association between each table is just notional.It is not the part in the database definition.It is from the fact of the row of an identical class value (being called " territory ") and implicit expression is relevant that two tables can have one or more values wherein by them.

Other notion of being introduced by relational model is higher level operation person and integrated data language (now being called fourth generation language), higher level operation person operates (promptly on table, their parameter and result are tables), and people stipulate that what desired result is rather than how goes to generate these results in the integrated data language.This nonprocedural language (SQL-SQL) has become a kind of industrial standard.In addition, relational model provides very high data independence.Data are organized, the change in storage, index and the sortord should not have influence on the program of writing out under these language.Relational model has become the actual standard in the data analysis.

Network model-in relational model, the set that data (and the relation between data) are considered to show.With data in this network model that do not coexist is to represent with the set of record, thereby the relation between the record (data) is represented with link (link).

At it is respectively to keep on one type the meaning of set of a plurality of fields of data, and the record in the network model is similar to " entity ".In fact best (but nonessential) can regard link as pointer.The set of record and the set that concerns pie graph between them.Hierarchical model-on the mode by the relation between record and link deal with data and the data, hierarchical model is similar to network model.Yet, be that with the different of network model each record and the relation between them constitute set of tree rather than the set of figure arbitrarily.The data that are organized in the database at needs are inherently under the situation of layered characteristic, and the structure of hierarchical model is simple especially and direct.Hierarchical model has certain intrinsic deficiency, for example, can not easily press the layered mode array data in many real-life situations.In addition, even can organize data with layered mode, compare with other database model, it may need bigger volume.

Imagining one is the example of primary entity with " employee ", and it has following subordinate attribute " employee-wage " and " employee-turn out for work ".The latter also can have the subordinate attribute, for example " employee-enter " and " employee-leave away ".In this case, data are inherently hierarchical nature, thereby be preferably in hierarchical model undertissue, imagination for example is assigned to the situation of several " projects " to " employee ", and the time that he spends on each project (" time-cost ") be an attribute that all comprises in two entities " employee " and " project ".This layout of data is not easy the structural constituent layer model, and a kind of possible solution is to make clauses and subclauses " time-cost " become double and independent preservation the in " employee " layering neutralization " project " layering.Guarantee forever that at needs these two " times-cost " are on the identical meaning, this method be trouble and make mistakes easily.OO model-at " Object Oriented Modeling and Design ", JamesRumbaugh, Michael Blaha, William Premerlani can obtain full-time instruction among Fredrick Eddi and the William Lorensen.

OO method is all regarded all entities as object.Each object belongs to certain class, and certain methods and some fields in a plurality of class associations.Be " special-purpose " in order to be packaged into some field, it can only be by such way access, and other field can be by the public visit of all methods.Thereby " Joe Smith " belongs to class " personnel ".For such, can define specific field " year makes ".Impose the year order that class methods " renewal-year makes 0 " can change him to object Joe.This method allows definition inheriting all methods of total class (super-class) and the subclass of field.Like this, for example, " employee " class definition can be become the subclass of " personnel " class.In addition can be to other field of subclass definitions and method.Like this, class " employee " can support field " wage " and method " to obtain-improve 0 ".

The object relationship model allows with the data object of observation by the relation tissue.Thereby, can be to be organized into by it to service data like that, support relational approach simultaneously.

As described above, conceptual level or logic level that the data model deal with data is expressed, and " sheltering " such as data physically are how to arrange and by the details of access.Usually handle the latter by so-called database file management system.

The database file management system is transformed into data structure, suitable operation to logical organization (being database model) and may is other data.Data structure comprises index and data recording.Index makes can be by key word visit or new data more.In search environment, use term " search key ".The database file management system is preferably on the data recording and moves, thereby the time that reaches is gone up on (i.e. the fast response time of the database from the User Perspective) and the space performance raising of (even the memory space of distributing to database file is for minimum).Known as technical institute, a kind of the trading off between common life period requirement and the space requirement.To the operating efficiency of these data how performance of database depends on to be used for the efficient of data structure of representative data and system.Conventional file and management system have been gone through in the 7th chapter (system architecture) of " the DatabaseSystem Concepts " that for example lists and the 8th chapter (indexing) in the above.

Known database file management system typically adopts following key map, and they concentrate on following primary categories, comprising: the tree-shaped index of multichannel and other.The tree-shaped index of multichannel-these technology can be used to set up one or more access path by same data recording (being also referred to as searching route).Its major defect is that it needs space (usually, all key words that point to this record add some pointers) and safeguards (affairs (definition of seeing below) promptly increase and/or deletion record need increase and/or delete key word in case occur upgrading).Usually, the data volume that keeps in the characteristic of this key map and each file determined to find or to upgrade (renewal comprise insertion, the deletion or on change) the required access times of certain given record.At the storage medium of being considered is under the situation of external memory storage, and these access times are actually the I/O access times.As the back will be explained, each access storage media can be packaged in this storer data.

Developed various types of tree-shaped indexing means, still, compared with special-purpose direct access index technology, tree-shaped index realizes that cost is higher.On the other hand, tree-shaped index allows sequential and sub-region processes.A kind of B of being tree in the most widely used key map (existing various commercial product names and enforcement modified example such as B+ to set under it), wherein each key word remains in the balance-type tree construction and lowermost layer sensing data itself.In the 275-282 page or leaf of above-mentioned " Database System Concept " book, can obtain detailed description to b-tree indexed figure.The I/O access times are obeyed logarithm expression formula Log _kN+1, wherein K is a constant relevant with realization, and N is the sum of record.This means that performance is fallen numerical expression is underground along with the increase of record quantity.

Certainly might adopt above-mentioned and combination other technology, for example, according to the key map of two or more realizations in the above-mentioned technology.

One of significant drawbacks of above-mentioned b-tree indexed figure commonly used is not only will also will preserve each key word as the part of index as the part of data recording.

Much more relatively when adopting large-sized index (when needing with figure place expression key word), this can cause the expansion of undesirable index size certainly, and this shortcoming worsens more.

A kind of possibility method that addresses this problem is thread index mode (Trie indexingscheme).The latter's example is at g.Wiederhold " File organization forDatabase disign ", Mcgraw-Hill, 1987, PP.272.273 or at " the Art of Computer Programming " of D.E.Knuth, Addison-Wesley PublishingCompany, 1973, pp.481-505 discusses among the 681-687.

In brief, thread index figure enables the key word that quick search avoids occurring in the B tree technology for example simultaneously and duplicates.Thread index figure has tree-shaped general construction, but according to based on search key section (for example searched key characters/numerals or position) to searching for cutting apart of search.Like this, for example each the node representative in the thread index file is represented the character value of described skew place to the skew of search key in the face of its arbitrary children's link.With regard to the storage space of being distributed, the clue structure provides effective data structure, because no longer keep whole search keys with in the past different in intermediate node, thereby has avoided the duplicating key word that for example occur in the b-tree indexed technology.

For the performance that reaches for the response time improves, under a kind of particular variation of the clue of in above-mentioned " File organization forDatabase design " book for example, describing, should from search key, select digit groups (or hyte) to set up the thread index file, thereby obtain preferably possible cutting apart, perhaps in other words obtain balanced as far as possible tree the search volume.Yet this need be to the priori of the data recording of clue, and is to obtain under the cost that obtains unsorted data, and this can not use for the situation in many real-lifes.It should be noted that if the data of putting in order are enforceable,, can not guarantee the balance-type structure, it should be noted that special clue do not support sequential sub-region processes even there are enough prioris in the data recording of clue.

When considering lot of data, particularly importantly keep the balance-type structure of so-called tree-shaped index, so that avoid long access path to certain given data recording from root node to the leaf node relevant with the data recording of being searched.Special-purpose b-tree indexed mode constitutes intrinsic balance-type tree construction, even still like this after tree is subjected to upgrading affairs.Yet it is such as previously explained, the structure of this intrinsic equilibrium (or balanced substantially) formula is to obtain under such cost, promptly, the expansion of piece content and the document size that correspondingly increase to keep index excessively in the tree, the especially undue size that increases the big tree that keeps the mass data record.On the number of times that storage medium is conducted interviews in the data recording of searching in order to arrive (and on corresponding access time), the large volume of file group influences the performance of data management system nocuously, and clearly, this is undesirable.

Now forward the key map of " other " classification to, wherein for example comprise so-called skip list index: skip list is a kind of randomized data structure, and it is made of a plurality of layer, lowermost layer, and layer 0 is made of a table of all records of arranging by nondecreasing order.Layer i (i=0 ..., each node h) selects whether to represent a layer i+1 according to probability P.The representative of layer i constitutes the node of layer i+1.These representatives are also by an orderly table organization.Layer h+1 is first dead level.

At the major defect that hitherto known key map has been discussed, promptly the data volume of Peng Zhanging (for example, B tree and its modification) and, the relevant others of various characteristics with subordinate characteristic that comprises data recording and multidimensional characteristic are discussed below to the susceptibility (for example, clue) of unbalanced formula structure.

Thereby, for example consider represent two entities (table), i.e. two types the data recording of " book " and " people checks out ", each entity is relevant with its unique key word, and the people that for example checks out is identified by " people-ID checks out ", and book is identified by " book-ID ".In actual life, for example in a public library, interested such as all bibliographys that certain given people of checking out is lent.This issued transaction is an example of the subordinate of data recording, and wherein " book " is subordinated to " people checks out ".In order to solve this inquiry, need impose two inquiries, the people's information that is used to check out, and another is used for the book that he borrows (according to composite key-book people that checks out).

With regard to b-tree indexed figure, for the data subordinate under the mode of supporting defined, need following several separate index file: the book index file, can visit the people's index file that checks out by " book-Id " key word, can be by " people-Id checks out " key word visit issued transaction by the people that checks out, can be by composite key (" people that checks out-Id book-Id ").

Thereby this key map comprises three index files wherein.With regard to the inspection that data volume and additional globality are safeguarded, this forms undesirable system overhead significantly.Therefore, for example, removing certain from written matter needs to check earlier whether this book exists in the people that checks out-book index file to agreement.

The subordinate of data records is being discussed so far on the shortcoming of known technology, the expression of its inconvenience and mode of operation make more to be worthy of consideration and realize so-called multidimensional data record.

Now get back to an example, now table " book " and table " people checks out " are regarded as the multidimensional table, this can draw from several angles.Thereby except the above-mentioned people of checking out → book visual angle (book of lending by the people that checks out, this can reach by the index on the people that checks out-book composite key) outside, this database should support to lend the alternative visual angle that certain gives the people that checks out of agreement (several books), and this needs to use the composite key (book-people checks out) that substitutes certainly.

Under B tree was expressed, another can (book-Id index of reference file of people-Id) that checks out, this causes four index files altogether by composite key thereby need to increase.

Relevant shortcoming be oneself explanation and worsen for n dimension table (n＞2).

Thereby the technical shortcoming that need to reduce data handling system is to open up hitherto known database file management system.Particularly, technical needs provide a kind of by using active data library file management system video data storehouse properties data disposal system.

Technically also need to provide a kind of database file management system, this system uses in essence not the index to the unbalanced construction sensitivity under the above-mentioned explanation mode.

Technically also need to provide a kind of index, it supports the expression of numerous types of data, the subordinate and/or the multidimensional of data recording inherently.

Explaining clearly in order to make, is that the frequent term that uses replenishes vocabulary in this instructions and the appended claims book below, some terms be use always and other are produced.Piece-can be by the storage unit of single I/O operation access.Piece can contain the data of arranging with any desired way, and for example the node of arranging by tree also also may be connected with the data recording of reality.Piece can reside in main (being also referred to as inside) storer or auxiliary (being also referred to as the outside) storer.Tree-a kind of data structure, it or empty or be called as the root node formation that the non-intersect tree of the subtree of root links by the individual pointer of d (d 〉=0) (or link) and d by one.The root of subtree is called the child node of the root node of tree, and the node of subtree is the descendant node of root.Its subtree all is that empty node is called leaf node.The node that in the tree is not leaf is called internal node.

Under situation of the present invention, leaf node is the node related with data recording still.

Should under broad sense, explain node and tree.Thereby the definition of tree also comprises the tree of piece, and wherein each node constitutes a piece.In the same way, described successor block is all pieces that this piece can be visited.About the specific definition of " tree " see also Cormen, Leiserson and Rivest's or Lewis and Deneberg's " Data structures and their algorithms ".

Please note association between leaf node and the data recording (for example link) comprise any can be from the realization of leaf node visit data record.Thereby for example mode is, from the leaf node visit data result that can directly (promptly pass through pointer).As the nonrestrictive example of another kind, leaf node points to data structure (for example table), the latter and then permission visit data record.Other modification also is feasible certainly.The degree of depth of index-use maximum piece number to define from the root piece to the interblock relevant with certain data recording.Balance-type index-Ruo exists a constant C to make to reach the required access times of any data recording and mostly is clogn most, and wherein n is the record quantity in this structure, and then index is a balance-type.

Obtaining the balance-type tree comprises: use balancing technique, (on unbalanced construction) produces the balance-type result then, perhaps, if need, is in operation and uses balancing technique to keep by balanced balance-type structure.

Visit should be by as a process in index, the data recording of wherein searching in order to arrive usually in piece or to another piece from a node motion to another node (although needn't leave no choice but like this.

Navigation is usually by as the visit (though needn't be necessarily like this) to data record group, so that collect these data recording in an orderly way by their key word.

Way of search: mean the algorithm that be used for by key word visit certain given data recording related with index; Way of search means that the usefulness of using in the piece visits the algorithm of certain given data recording or other piece in the piece.This data recording needn't be leaveed no choice but within described.

The public-key of the public-key-piece of piece is can be by the longest-prefix of relevant way of search from all key words of each data recording of this block access.If need, can be in this piece part or all of explicit maintenance public-key.

Upgrade issued transaction-by inserting new data recording or deletion available data record or one of revising in available data record or its part the issued transaction that constitutes.

The routine orientation of the numeral tree of vertical orientated clue structure-from the root to the leaf.As the back illustrates, always must in vertical clue, not keep between node and/or the all-links of interblock.As the back is explained in more detail, in index of the present invention, to the vertical tree of clue formation of unbalanced construction sensitivity.As the back was illustrated, in some specific embodiments, index building constituted vertical orientated clue on each key word of the data recording of clue.

The clue structure of horizontal alignment-the have vertical orientated clue structure of h layer, wherein the top and h layer representative of ground floor representative usually and the lowermost layer (constituting clue) that is associated of data recording to the unbalanced construction sensitivity, and permission moves on to certain piece in the i+1 layer according to public-key value certain piece from the i layer of piece.In various embodiment of the present invention, and as explain in more detail the back, represent index for one on the public-key of each piece of h high-rise formation lowermost layer tree.Storage medium-any medium that can be used for storing data comprises in internal storage and the external memory storage one or both of.External memory storage can be one or more in following: tape, disk, CD or any other are used to store the physical medium of data.Internal storage comprises any known primary memory, comprises that high-speed cache and any other serve as the physical storage medium of internal storage.Short chain connects-and (being also referred to as nearly link) from node a with the value r link that is marked as k the node b to same, thus on key position r, have value k at each key word of each data recording that is comprising node b on their access path.Long-chain connects-and (being also referred to as link far away) node V from the piece B of layer i is to the piece B ' of layer i-1 or to the link certain data recording.If V has the k that is labeled as of value r and this link, then the value of the public-key of piece B ' or the value of recognizing the key word of data recording are k on the r of position.

The mark that short chain connects or far links is also referred to as the value or the direction of link.Separation (split) link-Ruo piece overflows and carries out separating treatment, thereby if node a is arranged in a different pieces-piece B-with node b link and this separate node b descendant node afterwards, then the separation that is linked as between node a and the node b links.After this separated, it was link between node a and the piece B (containing receiving node b) that this separations links.Separating link is the link of tape label.

In several enforcements such as PAIF, keep separation link from node a to piece B, wherein the existence of node b is chosen wantonly, because can pass through this hierarchical index access block B.Directly the node v among the piece B of link-layer i is to the link of the piece B ＇ of layer i-1, and it comprises node V ' as long as node V has identical value with V '.If comprise node V to certain searching route of the data recording that has key word k but do not comprise its any near link and far link, then should cover the direct link of piece B '.Directly link not tape label.

Below explanation and piece detachment process in employed term replication node and to copy node relevant.

If node V ' has value k, thus then can from the key word of all data recording of V ' visit and its tape label be linked at position 0 ..., k-1 is last consistent.

If setting up value that node V makes its value that has and node V ' equates and can think that then V is the replica node of V ' from the links and accesses of node V ' and tape label thereof from the link of addressable all data recording of V and its tape label.Replica node keeps the direct link to the piece that comprises node V '.(replica node is also referred to as the copy node).

All other terms and the process of using in instructions and in claims under situation of the present invention is discussed below.

Be made of several fields as the regular data record, some fields are called key word.Sometimes arrange by key word that is called primary key in each key word.On the key word of data recording or respectively to represent index (or key map) on the key word (its definition is seen below) be the data structure of a kind of facility by one or more close key words search.The example of index is any in the tree-shaped key map of various specific multichannels.Can constitute by using according to index of the present invention more than a kind of key map.

Can in a file that partly or wholly resides in internal storage or external memory storage or a plurality of file, store index.

According to the invention provides a kind of comprise index-a kind of dynamic data structure of cutting apart-index, it allows by keyword search and is divided in each piece, each piece contains one represents key word.Represent key word should be enough to the piece that finds the record identical (if existence) to be associated with its key word and search key.After locating this piece, retrieve this data recording easily.Physically needn't be representing key word to be stored in the piece.The example of the index of cutting apart is:

1. the piece sequence of a file of arranging by the key value that increases primary key.Index guides the search to the piece that is comprising key word.In order to allow by one is not that the key word of primary key is searched for, and constructs an index of cutting apart, and comprises its key word and its link thereby write down this index of cutting apart for each.Arrange by the non-decreasing value of key word that these are right.This index is directed to the piece of the address that contains the record of wanting to some extent.

2. the clue of arranging in the piece group.

3. the key map of the regulation of the satisfied index of cutting apart of other type.

The index of cutting apart on each key word of each data recording is called and cuts apart index and index of reference layer I substantially ₀Expression.

This cuts apart that index may become is unbalanced, thereby produces some long searching routes.

Cut apart index in order to search for effectively, at I ₀respectively represent on the key word and to make up an additional index level (index level also abbreviates index as) I ₁If I ₁Also be to cut apart index, then can be at I ₁The representative key word of each piece on make up another index I ₂Can repeat this processing, up to setting up an index I who preferably all is included in single _h(hereinafter referred to as the root index).Root index I _hNeedn't be to cut apart index.Hierarchical index (it also constitutes an index) is I ₀..., I _hSet.I ₁..., I _hConstitute so-called representative index.

In order to search for certain record, at I by key word k _hIn (and in some cases at I _H-1To I ₁In data recording) the search latter, so that look for I _H-1The piece B of middle guiding k.Repeat this processing up to reaching I ₀And piece that record (if exist words) with key word k is associated.

In order to insert a new record r who has key word k, by as above searching for to find piece B.Find I ₀In B after, r is added among the B.

If (I ₀In) B overflows, and it is separated into the piece of two (or more), and substitute I with each representative of new ₁The representative of middle B.I ₁Middle piece B ₁Overflow and cause B ₁Separation, and substitute I with each representative of new ₂Middle B ₁Representative.If I _hPiece overflow, then set up one deck I again _H+1And be added on the hierarchical index.Note that and to determine " overflowing " state according to concrete application, needn't when occurring expiring, just trigger.Like this, as an example, occur overflowing among a kind of embodiment when half-full when piece is at least one.

Deletion is similar to insertion, and may relate to merging-promptly and separate opposite processing.Needn't under in real time, upgrade or separate, but ductile (that is, carrying out) afterwards.

Note that and make up the hierarchical index formula index of preferably keeping in balance.

Note that the balance-type index is enough in certain embodiments, and (do not have I at hierarchical index ₀) the relatively little situation of amount under (for example, can in internal storage, hold under the major part or the whole circumstances) can exempt " balance-type structure " requirement.

According to a first aspect of the invention, have been found that and more specifically to provide hierarchical index under a kind of prescribed manner to solve the inherent limitations of cutting apart index (for example clue) substantially by a kind of index is provided the unbalanced construction sensitivity.

For example concentrate on hierarchical index and cut apart the comparison of index (for example clue) substantially, draw by the selected data recording of hierarchical index visit more much effective easily than visit identical data recording by described clue.

Under environment of the present invention, " more effective " mean on certain data recording, upgrade issued transaction (for example, insert, deletion or revise) or for the visit data record by hierarchical index to the access times of storage medium than the access times of storage medium being lacked by cutting apart index substantially.Access times should be founded: handle (for example pack into or handle) piece from storage medium in each visit.

May there be the situation that to use this " more effective " regulation, for example under the situation of the very little file that only has a small amount of piece, wherein may be equal to or less than operation by described hierarchical index by cutting apart the required operation of index accesses data recording substantially.

In order to realize the index of cutting apart, from itself being that the hierarchical index of index construct of cutting apart substantially of clue needs some other considerations resemble the clue.

Thereby each key word is regarded as character string or bit string.In addition, if single can not be held following clue, then it is divided into a plurality of, thereby each piece comprises the single subtree of clue.The representative key word of this piece be with piece in the relevant string of root node of this clue, promptly from I _iThe root of clue to the flag sequence in the path of the root of the clue of this piece.As the hierarchical index mode in like that, I _iThe representative key word be I _I+1Key word.For Searching I _I+1In key word k, Searching I _I+1Each piece in longest-prefix k, move to I more thus _iSuitable piece.

The insertion of record need be added its key word to I ₀, promptly to I ₀The clue added value.If cause piece to overflow, separate this piece-typically it is divided into two (some are more in implementing) pieces, thereby each piece comprises (a being connected) clue.For accomplishing that this cuts off the link between certain node u and its sub-v, and the subtree that with V is root moved on to another piece.The representative key word of new piece is added to I ₁As the hierarchical index mode in like that, to I ₁..., I _hContinue this processing.

If cutting apart index substantially is the compression clue that resembles Patricia or PAIF, have to stay a part of key word, can save index space like this.Yet, the mode that this saving influence is searched for.In this compression clue, usually only keep exponent number more than or equal to 2 node.If search key k does not belong to the compression clue, then may write down the r place and stop search, and we must check whether k equals the key word of r at certain.If two key word differences, then this clue does not comprise the record that has key word k.

This strategy is that to the influence of layered index the prefix of k may not occur in this index.In order to search for introducing in this case from I _iThe node of each piece to I _I-1The direct link of piece.These links do not have a direction, get along well when the suitable part of search key and use directly link when the either direction of this node conforms to.

Suppose that search arrives it and represents k _I-1Not the I of the prefix of k _I-1Piece B _I-1(if k _I-1Not at B _I-1In explicit record, we can arrive Danone from B _I-1Any data recording r of visit, and determine k from the key word of r _I-1).In order to continue search, compare k and k _I-1To find the position j of first character that differs from one another, search block B _iClue, up to find have a direct link be less than or equal to the node V of j with value.To directly link the I that points to by this _I-1Piece continue search.(, then forward index I to if there is not such node _I-1First piece.) like this, under bad situation, every layer may need once extra visit.However, such as explained later, 3 layers are enough to tens are write down addressing and can keep 2 layers usually in the internal storage of computing machine.This makes and might visit the I/O of exterior storage medium and be no more than secondary in order to visit the piece relevant with certain data recording.

Separating treatment also must provide each direct link.Suppose I _I-1Piece B _I-1Access path comprise a layer I _iPiece B _i, B _I-1Overflow and be separated into B _I-1And B _I-1'.Piece B _iMust contain I now _I-1In its link of all successor blocks.Can realize this by following infinite technology:

Make k _I-1' be B _I-1' the representative key word, this key word is inserted T _i-B _iThe compression clue, thereby to B _I-1' the search of key word of filial generation arrive B _I-1' and from B _I-1The search of filial generation arrive B _I-1

A kind ofly realize that the infinite method of separating treatment is as follows:

1. in this piece, exist under two clues each short chain of certain node in this piece (being separate node here) to delete a short chain at least in connecing at least and connect (here for separating link).

2. each subtree is moved on to one independently in the piece.

3. if do not exist in B _iPiece, set up B _iAnd at B _iIn set up the copy node of this separate node.

4. if piece B _iExist but at B _iIn do not have the copy node of this separate node, then at B _iIn set up the copy node of this separate node and it be connected to B _iClue, thereby can be from comprising B _iIn root node and this replica node and it according to B _I-1' the searching route of link of each tape label of representative key word on visit B _I-1' (separating treatment finishes the back).

5. if this copy node does not have direct link, increase from this copy node to piece B _I-1Direct link.

6. increase by one from this copy node to piece B _I-1' link far away, perhaps if this copy node on this direction that far links, have to the short chain of certain child node connect then available from this child node to piece B _I-1' one directly link replace should link far away.

In the superincumbent realization, carry out I _kThe separation of the piece in (k＞0), thereby (I _k) each explant is the link between the separate node that resides in the different masses.

Correspondingly, according to one aspect of the present invention, the storage medium that uses for the database file management system that carries out on data handling system provides a kind of data structure, and it comprises:

The hierarchical index of in the piece group, arranging; This hierarchical index comprises related with each a data recording index of cutting apart substantially, and this is cut apart index substantially and makes it possible to by key word or groups of keywords visit or new data records more, and it is responsive to the unbalanced construction of piece group;

Described hierarchical index makes it possible to by visit of key word or a plurality of key word or new data records and make up the balance-type block structure more.

Provide a kind of data structure in the storage medium that the database file management system that the present invention also carries out on data handling system uses, it comprises:

One be arranged in the piece group and be the index that on the groups of keywords of data recording group, makes up; This index comprises related with each a data recording index of cutting apart substantially; This is cut apart index substantially and makes it possible to by visit of key word or a plurality of key word or new data records more, and it is responsive to the unbalanced construction of piece group;

Described index makes it possible to by visit of key word or a plurality of key word or new data records and make up a weighing apparatus formula block structure more.

In addition, provide a kind of data structure in the storage medium that the database file management system that the present invention also carries out uses on data handling system, it comprises:

One be arranged in the piece group and be the index that on the groups of keywords of data recording group, makes up; This index comprises a clue related with each data recording; This clue makes it possible to by visit of key word or a plurality of key word or new data records more, and it is responsive to the unbalanced construction of piece group;

Described index makes it possible to by visit of key word or a plurality of key word or new data records and make up the balance-type block structure more.

In addition, the present invention provides a kind of method in the database file management system that is used for the visit data record and carry out on data handling system; These data recording and the index of cutting apart substantially that is arranged in the piece group and is stored in the storage medium are associated; This is cut apart index substantially and makes it possible to by key word or groups of keywords visit data record and be responsive to the unbalanced construction of piece group; This method be used for making up one be arranged in the piece group hierarchical index and comprise step:

(a) provide the described index of cutting apart substantially;

(b) on the described representative groups of keywords of cutting apart index substantially, make up one and represent index;

Described hierarchical index makes it possible to by key word or the visit of a plurality of key word or upgrades

Data recording and constitute a balance-type block structure.

The present invention also provides a kind of method in the database file management system that is used for data recording and carries out on data handling system; These data recording and the index of cutting apart substantially that is arranged in the piece group and is stored in the storage medium are associated; This is cut apart index substantially and makes it possible to by visit of key word or a plurality of key word or new data records and be responsive to the unbalanced construction of piece group more; This method is used for making up an index on each key word of each data recording and that be arranged in the piece group, and the method comprising the steps of:

(a) provide the described index of cutting apart substantially;

(b) on the described representative groups of keywords of cutting apart index substantially, make up an index, described

Index makes it possible to by visit of key word or a plurality of key word or new data records more

And constitute a balance-type block structure.

Also provide a kind of method according to the present invention in the database file management system that is used for the visit data record and on data processing, carry out; These data recording and one are arranged in clue in the piece group and that be stored in the storage medium and are associated; This clue makes it possible to by visit of key word or a plurality of key word or new data records and be responsive for the unbalanced construction of piece group more; This method is used for making up an index on each key word of each data recording and that be arranged in the piece group, and the method comprising the steps of:

(a) provide a clue;

(b) make up an index on the representative groups of keywords of described clue, described index makes

Can be by the visit of key word or a plurality of key word or more new data records and formation

A balance-type block structure.

According to the present invention, best, although needn't be far from it, the one or more figure of indexing that select in the some key maps with regulation make up this index.Typically, but not exclusively, the example of multidirectional tree-shaped index is the B tree figure that indexs.

By a kind of embodiment, described basic sectioning search method is to utilize United States Patent (USP) 5,495, the clue that disclosed digital tree type makes up in 609.

By another kind of embodiment, described clue utilizes so-called random access index file (PAIF) to make up.

Like this, the storage medium that uses of the database file management system that carries out on by data handling system by a specific embodiments provides a kind of data structure with random access index file (PAIF) of a plurality of nodes and a plurality of links that comprises at least;

The leaf node of described PAIF each and at least one can be associated by the data recording of described user application visit, and at least a portion of wherein said data recording is formed at least one search key;

The node of selecting among the described PIAF is respectively represented certain given skew of a search key in the described embedding search key; Respectively represent a unique value of described search key section from the link that originates from each given node in the described selected node;

PIAF have at least two respectively be arranged on sub-PIAF in the piece;

Described database file management system can also be arranged to the balance-type block structure to described group.

Under the PAIF environment, preferably include only given skew although it should be noted that described each selected node, needn't be always such situation.Like this, with suitable, one or more described nodes can comprise out of Memory, such as a part and/or the out of Memory of key word as required.

According to a kind of modification embodiment, clue is the PAIF type, by constituting key map with the essentially identical way of search of the way of search of this PAIF clue.

It should be noted that before continuing downwards, only is the convenience for explanation, and the present invention is main with reference to describing as a kind of clue of cutting apart index substantially.The insider understands the present invention easily and never is subject to clue, and any index of cutting apart substantially all is spendable.

Thereby raising is compared down with hitherto known technology with regard to performance, and adopting the database file management system of hierarchical index of the present invention is useful on following characteristic especially: keep data with the classification form in essence according to search key.That is, people can be led to

The order of crossing the key word of data recording navigates in tree.Hierarchical index is supported inherently

Sequential operation is as " obtaining the next one " and " obtaining last one ".On aspect this,

The hierarchical index that is proposed is better than for example hash mode and some numeral tree realization.Be the formula index of keeping in balance, do not need to know in advance the content of database.The degree of depth of the formula of keeping in balance hierarchical index and hierarchical index is little relatively, and to upgrade the required access times of issued transaction or visit data record (normally slow I/O operation) minimum thereby make.According to an embodiment, in fact given data recording of visit from tens data records needs an I/O (and no more than secondary I/O) operation (constituting once or the secondary visit).

Thereby the present invention also provides a kind of data structure in having the computer system of storage medium, wherein said storage medium is that at least one capacity is at 10 to 20M bytes or bigger internal storage, with an external memory storage, this data structure comprises index on each key word of each data record; Draw and be arranged in the piece group; Thereby for 1,000,000,000 data records, for visit with described 1,000,000,000 data record in any piece that is associated, irrelevant down in fact to the no more than secondary of the visit of described external memory storage with the length of the key word of described data recording.

In addition, the present invention provides a kind of data structure in having the computer system of storage medium, wherein said storage medium is at least a capacity at 10 to 20M bytes or bigger internal storage and external memory storage, and this data structure comprises the index on each key word of each data recording; This index is arranged in the piece group; Thereby all pieces that irrespectively in fact in described internal storage, hold this index for the length of the key word of 1,000,000,000 data records and described data recording.

The present invention also provides a kind of data structure in having the computer system of storage medium, this data structure comprises the index on each key word of each data recording; This index is arranged in the balanced type block structure and can carries out sequential operation on described data recording; The index size is not subjected to the influence of the length of described key word basically.

Note that these data recording can reside in the piece group of this hierarchical index, perhaps can reside in (one or more) independently in the data file.In a kind of embodiment in back, data recording certainly should be related with the hierarchical index of correspondence.As the back reference was set forth to some extent to saying of specific embodiment, given data recording can be held the search key more than.

Be the discussion relevant below with a second aspect of the present invention.

That is, common data are made of the record (for example going up the book in the example and the people that checks out) of several types.The type of record is determined its each field (each attribute) and its each key word.In conventional system, for example adopt the system of b-tree indexed, not keeping the type of each key word nor treat as in record is the part of key word.Program " is understood " type of record, thereby and understands each field of data recording and their structure.

According to a second aspect of the present invention a kind of diverse ways is proposed.Distribute identifier-a bit string to every kind of key word type, for example one or more character strings, common but nonessential this identifier is added on all key words of the type as a prefix.The formula key word of indicating is the key word that has identifier.This identifier is treated as the part (be used for search or upgrade purpose) of key word, thereby this identifier is the part of key map.

This identifier makes it possible to obtain the characteristic of function of its type of conduct of data recording.Thereby by checking the identifier of key word, people obtain identifier and and then derive the type of record, thereby know record type no longer in advance.Wherein the data recording indicated of each key word is called and indicates the formula data recording.The formula index of indicating is the index of the search on a kind of data recording that enables to indicate.

Following explanation is the use of foundation identifier of the present invention for example.Thereby, consider class C, and make such all data recording have a key field (or a plurality of field) k ₁, and may have several other non-keyword fields.Make that R is data record, the wherein R.k of class C ₁=FIAT.Make k ₁Identifier be A.Can obtain key word AFIAT by increasing this identifier.Have R.k in order to visit certain ₁The record of=FIAT is for the formula index is indicated in key word AFIAT search.

After the identifier feature has been described, the following describes another subordinate according to the feature-data recording of a second aspect of the present invention.Consider a record R1 who has key word K1, and one have by orderly key word K1, K2 is to the record R2 of the composite key formed.(in this case, the formula of the indicating key word of R2 is composite key K1 ', K2 ', and wherein K2 ' is by with identifier D2 key word K2 being added that prefix constitutes.D2 is the identifier of R2.) in indicating the formula index, can be by search key K1 '-select R1 with the key word of its identifier D1, and can be by selecting R2 with the same index of key word K1 ' K2 ' search, wherein K1 ' K2 ' is the series connection of K1 ' and K2 ', and K2 ' is the key word K2 with its identifier D2.In this case, K2 is the subordinate of K1.

Also subordinate relation is extended on the record.If K2 is subordinated to K1, the identifier of K2 is D2, and then the identifier of R2 also is that D2 is (as if D1, D2).If R2 is subordinated to R1, by K2 ' being connected in series to the key word of K1 formation R2.Attention adds prefix D2 to K2 in K2 '.

In the ERD model, the type of the type of record R1 and record R2 may be in one to a plurality of relations, and its record that means several R2 types can be relevant with the single record of R1 type.Can realize this relation by subordinate relation: several records of R2 type are subordinated to the record (for example, same individual can borrow several books) of single R1 type.Especially, if this relation is man-to-manly (for example can only borrow each people that checks out that this relation be exactly man-to-man under the situation of a book, thereby key word K1 ' D2 (wherein D2 is the identifier of R2) to be enough to locate R2.In indicating the formula index, be included in the searching route of K1 ' to the searching route of K1 ' K2 '.(this does not get rid of the possibility that arrives record R2 by other path).This feature demonstrates another key property according to a second aspect of the present invention,, keeps data integrity inherently that is.Thereby (or K1 ' record D2) has only if just can carry out when existing its key word to be the record of K1 ' to insert its key word and be K1 ' K2 '.Under the example in front, ((the insertion office of the people _ Id=111111 that checks out only ought to cause when existing this specific people of checking out (the data recording R1 that has K1=111111) Shi Caihui to insert it and indicate that the formula key word was the record (in the above example, check out people's identifier be that the identifier of the people that the checks out-book data recording of A and subordinate is B) of A111111B2222 certain people that checks out of book _ Id=2222) to lending certain book.Only under little expense, realize data integrity, because the path to the people that checks out-secretary's record is comprising the enough information whether definite this people that checks out exists in this index.If this people that checks out does not exist, will be to the path of this composite key without this people that checks out.This can automatically obtain detecting in inserting processing.On the contrary, according to prior art, dissimilar records and different index file associations.Before a new data records (having a composite key) is inserted into the people that checks out-book index file, must in the people's index file that checks out, once independently check so that determine this specific people of checking out (record R1, key word K1) whether exists, thereby cause undue expense.

Notice that subordinate relation is not limited only to two layers, itself can have the record that is subordinated to it record of subordinate, thereby can reach the subordinate of n layer.For example, consider banking data base, wherein Account History is subordinated to department's record, and the deposit record is subordinated to account.

Now forward multidimensional characteristic to according to a second aspect of the present invention.Make that R is a record by a sign among two key word K1 and the K2.Thereby the formula index of indicating should comprise two searching routes to R, and one by indicating formula key word K1 ', and another is by indicating formula key word K2 '.Like this, R constitutes a multidimensional record.Multi-dimensional indexing comprises that this indicates formula index and multidimensional data record (group).

Research does not wherein apply first embodiment of multi-dimensional indexing to the subordinate data recording.Thereby, for example, consider class C, and make such all data recording have two key field k ₁-automobile model and k ₂-its license plate numbers, and may have several non-keyword fields.Make that R is the data recording of a class C, wherein R.k ₁=FIAT and R.k ₂=127.Make k ₁Identifier be A, k ₂Identifier be B.Can get key word AFIAT and B127 by adding identifier.The key word of these expansions is inserted into single indicating in the formula index.Have R.k in order to visit ₁The record of=FIAT according to keywords AFIAT is searched for this and is indicated the formula index, and in order to select to have R.k ₂=127 record is by the same formula index of indicating of B127 search.

Top discussion and example are considered a kind of multi-dimensional indexing, and wherein data recording needn't demonstrate subordinate relation.Multi-dimensional indexing also can be for the data recording that is applied to subordinate with selecting for use.For example, study a banking data base, wherein deposit is subordinated to account and depositor.The single formula index of indicating provides the account (by indicating formula key word k ＇ ₁Account-number) visit, to the depositor (by indicating formula key word k ＇ ₂Depositor-name) visit and by k ' ₁K ' ₂And k ' ₂K ' ₁To the deposit visit.(might work as k certainly ₁Be subordinated to k ₂The time to k ₁Use different identifiers and work as k ₂Be subordinated to k ₁The time to k ₂Use different identifiers.)

The identifier of multidimensional record depends on and is used for searching for or the identifier of the key word of new record more.Thereby the identifier of this record is A when by key word AFIAT search or renewal recorded vehicle (FIAT, 127), and its identifier is B when visiting this record by license plate numbers B127.

Except data recording, also need to keep metadata.Metadata comprises the information of the function of their type of conduct on the different recording.Thereby, need distinguishing mark symbol and as a result of discern information on the available record, for example to the description of each field, each key word, subordinate, record length etc.Indicate that the way of search in the formula index forgotten by metadata.Its position the record, mark and label symbol (but for example identifier prefix to record) and structure (compound) are indicated the formula key word.

Thereby according to second aspect of this aspect, provide a kind of data structure in the employed storage medium of carrying out of database file management system on data handling system, it comprises:

Index on each key word of each data recording; These data recording are at least two types, and wherein the data recording of second type is subordinated to the data recording of the first kind.

Provide a kind of data structure in the employed storage medium of also carrying out in data handling system according to this second aspect of database file management system, it comprises:

The formula of indicating index on the formula of respectively the indicating key word of each data recording; The data recording of being made up of the formula data recording of indicating is at least two types; Wherein the formula of the indicating data recording of second type is subordinated to the formula of the indicating data recording of the first kind.

Can realize various advantages according to this second aspect, comprise:

comprises that the data structure of indicating the formula index and indicating the formula data can keep the relation between the different pieces of information item.

comprises and indicates that the formula index can link relevant in logic project with the data structure of indicating the formula data.

comprise the data structure of indicating the formula index and indicating the formula data can be side by side and effectively formula support several data models.

But comprises the data structure high-level efficiency of indicating the formula index and indicating the formula data and keeps data integrity.

But comprises the data structure high-level efficiency retrieve relevant data of indicating the formula index and indicating the formula data.

The back can go through the various advantages that provided by database file management of the present invention system with reference to each specific embodiment.

Please note that data recording can constitute the part of RAIF, perhaps can reside in one or more independent data files.In a kind of embodiment in back, should be linked to each data recording on the corresponding PAIF certainly.As further set forth with reference to the explanation of each specific embodiment the back, given data recording can be held the search key more than.

Also can demonstrate by a kind of new unification and simple technology and how can support complex data structures and data relationship.

Also can demonstrate a kind of index structure and how can be minimum dimension, and irrelevant with the length of key word.

The present invention is in all advantages that needn't consider in advance that (that is, key range is unknown, and record quantity is unknown, physical location or the like at random of tentation data record) mentioned above supporting inherently under the data.

According to another aspect of the present invention, provide a kind of data structure in the employed storage medium of on data handling system, carrying out of database file management system, it comprises:

A kind of index that makes up on each key word of described each data recording in the storage medium and that in the piece group, stored that is stored in; Be arranged in the piece group by this index of leaf piece by link means and data recording link;

Described index is characterised in that at least one described link is shared by two the data recording that is stored in same at least.

Utilize a kind of embodiment, this index makes up by clue.

In addition, provide a kind of data structure in the employed storage medium of database file management system that the present invention also carries out on data handling system, it comprises:

Described index is characterised in that at least one described link is shared by two the data recording that is stored in same at least;

Described index is made of a kind of hierarchical index of foundation claim 1, and described each piece and the link of described data recording of cutting apart index substantially.

In order to understand the present invention and to understand how to realize it in practice, the mode of giving an example by indefiniteness referring now to each accompanying drawing illustrates a kind of preferred enforcement, and accompanying drawing is:

Fig. 1 illustrates the generalization calcspar of the system that uses the database file management system;

Fig. 2 illustrates the sample data library structure with entity relationship diagram (ERD) expression, and as illustrative purpose;

Fig. 3 illustrates the database of Fig. 2, and with representing, each table keeps low volume data occurrence according to relational data model for it;

Fig. 4 illustrates " client " table according to Fig. 3 of the file management system that adopts conventional B+ tree key map;

Fig. 5 illustrates " client " table according to Fig. 3 of the file management system that adopts conventional thread index figure;

Fig. 6 A-6C illustrates " client " table according to Fig. 3 of the file management system that adopts the PAIF key map;

Fig. 7 A-7H illustrates the illustrated exemplary according to the hierarchical index of structure of a kind of embodiment of the present invention;

Fig. 8 A-8B illustrates the illustrated exemplary according to the hierarchical index of structure of another kind of embodiment of the present invention;

Fig. 9 A-9G illustrates the illustrated exemplary according to the hierarchical index of structure of another kind of embodiment of the present invention;

Figure 10 A-10B illustrates the illustrated exemplary according to the hierarchical index of structure of another kind of embodiment of the present invention;

Figure 11 illustrates the illustrated exemplary according to the hierarchical index of structure of another embodiment of the present invention;

Figure 12 illustrates the illustrated exemplary according to the symbol of service marking in indicating the formula index of a kind of embodiment of the present invention;

Figure 13 A-13E illustrates subordinate characteristic according to the data recording in the formula of the indicating index of a kind of embodiment of the present invention in five figure example of passing the imperial examinations at the provincial level;

Figure 14 is with illustrating a kind of formula index of indicating to give an example out according to the multidimensional record of a kind of embodiment of the present invention;

Figure 15 illustrates the formula of the indicating index according to a kind of embodiment of the present invention;

Relation property between the data recording that provides according to a kind of embodiment of the present invention is provided Figure 16 illustrated exemplary;

Figure 17 A-17B illustrates the compression expression to the link of data recording according to a kind of embodiment of the present invention;

Figure 18 A-18D illustrates four benchmaring figure, is presented at the database and commercial improvement in performance of comparing based on the database of C tree that adopt file management system of the present invention on response time and the document size; And

Figure 19 A-19D illustrates four benchmaring figure, is presented at the database and commercial improvement in performance of comparing based on the database of B tree that adopt file management system of the present invention on response time and the document size.

At first Fig. 1 of the generalization calcspar of noting forwarding to the system that employing database file management of the present invention system is shown.Like this, by computing machine 1, for example adopt to have the also operating system module 5 of the whole operation of control computer 1 of communicating by letter with processor 3 from the personal computer (P.C.) of the little processing 3 of the Pentium that U.S. Intel Corp. buys, for example the form NT that can buy from MS.

P.C.1 also holds a plurality of user applications, but only illustrates three, is respectively 7,9 and 11.Carry out user applications with known self method processor 3 under the control of operating system S, user application is responsible for through keyboard 13 user's input of feed-in by I/O port one 5 and operating system intermediary.By the intermediary of I/O port one 7 and operating system S, user application is also communicated by letter so that video data with monitor 16.User application can be by the data of storing in data base management system (DBMS) module 20 accessing databases.Extensive data base management system (DBMS), as summary among Fig. 1 illustrates, comprise top management system 22, the latter observes bottom data by rule and by self known for example SQL data definition and data manipulation language (DML) (DDL and DML) response user application under " logic " mode.Data base management system (DBMS) is typically utilized a data dictionary 24 in known self mode, and the latter comprises the metadata that keeps the information on the bottom data.

The fabric of data is by 26 management of database file management system, and it and key map are associated with each actual data recording 28." high level " logical order (for example sql command) that top management system 22 receives and handles is converted into visit or upgrades " low layer " order of the data recording of being stored in the database file (group), so the database file management system considers the practical structures and the tissue of data recording." high level " part of database file management system and " low layer " part can for example can be communicated with (ODBC) interface from Microsoft's open database that Microsoft buys, communication by the application programmer's interface (API) known to self.Adopt ODBC that " high level " module of database file management system or application program can be communicated by letter with the difference " database file management system " of supporting the ODBC standard pellucidly.Visit used herein or more the term of new data records comprise all types of data manipulations and " searchings ", " insertion ", " deletion " and " modification " data recording and provide the DDL of structure, modification and the deletion of database to order accordingly.Fig. 1 illustrates that also with internal memory module 29 (for example 16 megabyte also may adopt the high-speed cache submodule) and external memory modules 29 ' (for example 1 gigabyte) be the storage medium of form.Typically, external memory storage 29 ' be is by outside, slow relatively communication bus (not shown) access, and internal storage is usually by internal bus (not shown) access faster.Usually, because internal memory space is smaller, only those application programs of current execution (or their part) are encased in internal storage from external memory storage.By the same token, can not hold whole big database, externally its major part of storage in the storer at internal storage.Thereby, the inquiry of searching the one or more data recording in the database that response application generates, data base management system (DBMS) utilize operating system business (that is, I/O operation) with by external communication bus one or more data blocks from the external memory storage internal storage of packing into.If the data recording that discovery will be searched in the piece of packing into needs more I/O to operate the data recording of searching up to hitting.

Note that for reduced representation each

module

5,7,9,11,20 of internal storage 29 and external memory storage 29 ' be independent of.Clearly, though not shown, each module (operating system, DBMS and each user application) is stored in usually in the external memory storage and their current parts that is performed and is encased in internal storage.

Computing machine 1 can be used as the part of workstation as LAN (Local Area Network) (LAN, not shown), and LAN uses a server that also has the same structure of Fig. 1 basically.Under workstation and server adopt based on the situation of client/server agreement on server the major part of resident described module (comprise data-base recording 28) itself.

Each embodiment that describes with reference to Fig. 1 above the insider understands easily is two kinds in many possibility modification.Thereby as nonrestrictive example, database can be the online database that resides on the Internet Web website.The present invention is not limited in the concrete delimitation to little internal storage and big external memory storage certainly.Like this, for example, a kind of embodiment of modification adopts big inside and outside storer, and another modification embodiment only uses internal storage.

It shall yet further be noted that simple and clear for what explain, system 1 be with simplify and extensive form shown in.Database file management system and for example more the going through of various members that especially comprise usually in the database file management system can in the 7th chapter of above-mentioned " Database System Concepts " book, obtain.

After the general structure that system of the present invention has been described, the existing sample data library structure of noting using entity relationship diagram (ERD) expression, it is used for illustrative purpose.Like this, the ERD of Fig. 2 comprises entity " client " 32 and " account " 34 and " n is to m " " deposit " 36 relations, this relation represent its given client can have more than one account and equally certain given account may have by client more than one.

As shown, entity " client " has following attribute (field): " client _ Id " the 38th, each client's of unique identification key attribute, and client's name represented in " name " 39 and client's address is represented in " address " 40.Entity " account " has following attribute (field): " number of the account " the 42nd, and the key attribute of each account of unique identification, and " remaining sum " 43 held the remaining sum of this account.Relation " deposit " is made of a pair of key word of " client " and " account " entity, thus each concrete account that expression is had by particular customer.

Now forward Fig. 3 to, a database of representing with three tables 50,51 and 52 that correspond respectively to

relational data model

32,34,36 correspondences shown in it, each table is held the data occurrence of some illustrative purposes.Please note that (length of " client _ ID " is 5 numerals, and the length of key field (" Acc_ID ") of " account " table is 6 numerals for the key field of " client " table.Client's table is held 5 data occurrence 55-59, and account table is held 2 data occurrences 65,66, and the deposit table is held 3 data occurrence 70-72.

According to prior art, as a kind of rule, each table has the index file of a different use primary key.Thereby Fig. 4 illustrates the file management system of setting the figure that indexs according to the conventional B of employing, a basic index file of " client " table of Fig. 3.As shown, index file 80 is made up of three piece 80a-80c, represents a root piece and two leaf pieces respectively.Each data recording is organized in the file 81 of independently holding five data record 83-87 randomly.Each piece is by a pair of field (for example 82a-82b and the 83a-83b among the piece 80a) of a butt joint.Every centering, first field is represented the search key value, and second field is represented one such as the link of next number will searching for of sign or represent the number that for example identifies this data recording to a link of data recording under the situation of leaf piece.The back is a kind of realizes constituting a kind of non-restrictive example that data recording and piece are associated.In the specific embodiment of Fig. 4, the search that its key word is equal to or less than 12355 record forwards on the piece 80b from root piece 80a.

Thereby, the search that to its key word is certain record of 12355 (82a) begins and forwards piece 80b to by link 82b from root piece 80a, this search key 12355 (86a) and link 86b are associated, and its address that shows the data recording that is identified by this search key is in data file 81.By search key " 12355) on the data recording (57 among Fig. 3) of sign and former different the 4th positions that are put in the data file 81.

Table " account " and " deposit " is arranged in two independently in the b-tree indexed file equally.

The b-tree indexed file of Fig. 4 shows a shortcoming in the distinct disadvantage of this method, and promptly key word (search key) is repetition, promptly all holds them in (promptly in this key map) and the data recording related with b-tree indexed in home block.Like this, for example the search key of (among Fig. 3) data recording 57 not only remains on as the component part of data recording 86 and also remains among the piece 80b (search key 86a) in the file 81 and also keep father's piece for example among the 80a (search key 82) sometimes.

Therefore, notice that easily the repetition (under the especially long key word) for big file (many real-life situations are this situation) search key causes the index that needs large buffer memory and go back the expansion of negative effect performance.

Fig. 5 illustrates the different index figure of " client " table of a kind of Fig. 3 according to the file management system that adopts known thread index figure.Thereby thread index file 90 comprises a plurality of nodes and link, and wherein each node is represented a deviation post and linked the value of representing this skew place.Table 91 has four row.Which digit position first row indicate to use, and secondary series is represented the value that this is digital.Data value is divided into two subclass to this key word.The 3rd row and the 4th row point to the search operation of next step.

In order to locate certain given search key, for example 12355, the position numeral of representing by root (, also being first row of first row of table 91) and with value (value " 5 ", it is also shown by the link 90b in the thread index) comparison of the secondary series defined of delegation by the position " 5 " of node 90a indication.Because the numeral on the position 5 of the search key of being searched 12355 is 5 really, control is sent to row 2 (bright by the 3rd tabulation that first of table 91 is gone).Then, numeral at 3 places, position of the search key of being searched (90c in the tree also is the value of first row of second row in the table 91) and value 3 (link 90d also is the value of the secondary series during second of table 91 is gone) are relatively.Owing to coupling occurs, control is sent to the row 3 of this table.In this step, the value coupling of the row 2 places regulation of the numeral discord row 3 at 4 places, position of the search key of being searched (promptly, " 5 " are to " 4 "), thus such shown in the row 4 (" unequal ") of table 91, obtain to the link of the data recording 57 of being searched (86 among Fig. 4).

Table " account " and " deposit " is arranged in two independently in the thread index file.Different with the b-tree indexed file of Fig. 4, the index file shown in Fig. 5 needn't leave no choice but duplicate search key.Different with the B tree, in clue (90), only keep skew and link value rather than whole key word.It causes an advantage that is better than the B technology under this meaning.

Yet as defined, above-mentioned clue is related with some shortcomings: it is understood the content of database and one after the other cuts apart the even distribution that keeps data under the cost of each key word for obtaining the balance-type structure prior.Because the type of database described in Fig. 2 has dynamic perfromance, for example for the concrete database of Fig. 2, cancel account, new client enrollment and be existing account's the person of owning together or the like such as make out a bill family, frequent customer of new client, the content of understanding database in advance clearly is undesirable, because it causes undue restriction.

Another shortcoming of above-mentioned tree is that it does not support sequential processing.The navigation of tree should produce by following order-83,86,87,84,85 (Fig. 4) rather than order visit data according to keywords.

After known thread index figure (with reference to Fig. 5) is shown, the following describes each embodiment of a kind of index of the present invention, this index comprises to be cut apart index substantially and solves the relevant various shortcomings of described above and hitherto known technology.A kind of preferred embodiment of the index that is form can be shown particularly with the hierarchical index, and be the preferred embodiment of cutting apart index substantially of form with the clue.These examples do not mean that constraint.

Before the explanation that forwards to each embodiment, the thread index figure of a kind of new called after PAIF is described with reference to Fig. 6 A-6C earlier.Shown in the back, PAIF is not limited to tree construction.On the basis of PAIF, with reference to the various embodiment of Fig. 7-9 explanation hierarchical index, comprising the representative index that on the representative key word of PAIF, makes up.To each embodiment of Fig. 7 to 9, the key map of representing the key map of index and cutting apart index substantially is same PAIF basically.

Explanation has the another kind of embodiment of the hierarchical index of different clues in Figure 10.As meeting illustrate, in the embodiment of Figure 10, representing index and clue also is identical basically.But this is not necessary, and for example with reference to Figure 10 clue to be shown for example be different with representing index.

Now forward Fig. 6 A-6C to, a series of diagrams of " client " table under the file management system that adopts PAIF of Fig. 3 shown in it.Term " issued transaction " and " operation " are used interchangeably.

In the following description review is enabled each basic command of the data manipulation among the PAIF, that is, PAIF is inserted new data records, searches data recording and deletion existing record in PAIF.Undoubtedly, insider's understanding can realize more complicated data manipulation operation (for example " merging ") on the basis of these primitives.

Now forward Fig. 6 A to, have the user data record 103 (56 in Fig. 3 client's table) of search key " 12345 " (i.e. 5 byte long search keys) shown in it.That yes is inappreciable for the PAIF100 of Fig. 6 A, and it is connect 102 nodes 101 (not only representing root node but also represent leaf node) that are linked to data recording 103 by long-chain and constituted by single.

Skew 0 in the described search key of node 101 representatives, the value " 1 " of the search key section (being 1 byte long under this specific embodiment) of this regulation skew place is represented in link 102.

As Fig. 6 A is clearly shown that, record 103 is that the searching route of a combination is related with one, the value that the skew search key section relevant with formed and defined to this combination by node 101 and link 102, this value are deferred to the value of the corresponding search key section in interior concrete skew place of the search key of this predetermined data record.More specifically, the value of the search key section of a byte at search key " 12345 " bias internal 0 place is " 1 " really.

Now forward Fig. 6 B-1 to, it is illustrated in the PAIF after the issued transaction that end insertion in succession has client _ Id_ number " 12445 " 107 (Fig. 3 client table in data occurrence 58).Difference between data recording 103 and 107 the search key only in the 3rd byte (skew 2), promptly is respectively " 3 " and " 4 ".

Combinations by root node 101 and link 102 definition are not enough to distinguishes

data record

103 and 107, because all be " 1 " for the value of the 1 byte search key section at these two record-shifted 0 places of data.Thereby the minimum skew of these two records is distinguished in node 104 indications, and

links

105 and 106 indications in skew 1 byte search key section " 3 " and " 4 " separately, 2 places.The realization that it should be noted that PAIF is not limited to the object lesson shown in the figure, adopts different realizations but can be depending on concrete application.Like this, for example Fig. 6 B-2 and 6B-3 illustrate the two kinds of selections in addition that realize Fig. 6 B-1, wherein in Fig. 6 B-2, in PAIF, present whole key word (for example, begin from root node and in the link that the data recording place finishes all numerals of regulation record 12445).The sparse realization of Fig. 6 B-3 that wherein only occurs the node of absolute demand in tree is compared, and the realization of Fig. 6 B-2 is clearly but spatially so not effective.Certainly can use other modification.

Before entering the process of explanation to existing database insertion new data records, please remember that the node among the clue RAIF is high more, its indicated skew is more little (for example, in the RAIF of Fig. 6 B, node 101 is higher than node 104, thereby the former skew is less--and " 0 " is to " 2 ").

In general, insert new data records to existing PAIF and comprise the following step of execution:

ⅰ. begin to certain and advancing that a related data recording (being called " comparable data record ") of leaf node finishes from root node along one with reference to the path; On this each node,, the expressed value of certain link that rises from described node then advances along this link if equaling the value of the keyword fragment of 1 byte long on the deviation post of described node regulation with reference to the path; The skew of stipulating in this node exceeds under the situation of any corresponding key word section in this key word, perhaps if there is not the link with described value, advances along arbitrary path to arbitrary comparable data record;

ⅱ. compare the search key of comparable data record and the search key of new data data recording, with definite smallest offset (being offset) of distinguishing the search key section of these two to call in the following text to distinguish.

ⅲ. according to this value of distinguishing skew, one of continue in the following step (ⅲ .0_ ⅲ .3):

ⅲ .0 is if data recording equates then to finish; Or

ⅲ .1 is if distinguish skew and be somebody's turn to do the skew coupling of being indicated with reference to one of node in the path, increase another link, and distribute this that take out from the search key of this data recording to distinguish the value of the search key section of skew described link from one of described node origin; Perhaps

ⅲ .2 is if this differentiation is offset greater than the indicated skew of the leaf node that it linked, by linking the comparable data record;

ⅲ .2.1 disconnects from the link of comparable data record (that is its temporary transient maintenance " unclamping ") and this link and moves on to new node; The value of the opinion new node being distributed this differentiation skew;

ⅲ .2.2 connects comparable data record and this new node (it becomes leaf node now) and this link (long-chain connects) is distributed from the search key of comparable data record in the value of distinguishing the search key section that skew takes out;

ⅲ .2.3 connects this new data records and new node by a link, and this link (long-chain connects) is distributed from the search key of this new data records in the value of distinguishing the search key section that skew takes out; Perhaps

ⅲ .3 is not if satisfy the condition of ⅲ .0, ⅲ .1 and ⅲ .2, in the reference searching route, there are a father node and a child node, thereby this differentiation skew simultaneously less than the skew of distributing to this child node (situation about being considered A), perhaps all has the value (situation about being considered B) that is offset greater than this differentiation with reference to all nodes in the searching route greater than the skew of distributing to this father node; Correspondingly impose following substep:

3.1 couples of situation A of ⅲ and B, the value of setting up a new node and distributing described differentiation to be offset to this node,

Only situation A-is disconnected the link from this father node to this child node and this link moved on to (that is, this child node temporarily keeps " unclamping ") on the new internal node;

ⅲ .3.2 connects this new data records and described new internal node to situation A and B by a link (long-chain connects); The value of distributing to this link is the value of the search key section that obtains of the differentiation skew from the search key of this new data records;

ⅲ .3.3 is for situation A and B, by a new url this new node is being connected (promptly with root node with father node or under the B situation under the A situation, this new node becomes new father node and become new root node under situation B under situation A), and the value of distributing to described link to be search key from the comparable data record take out by the search key section of the skew of this new node indication.

Please note for different and may obtain different PAIF with reference to the path.

For better understanding, concrete PAIF to Figure 65 is continuously used above-mentioned " insertion data recording " operation, every next different data recording, so that list three kinds of different situations stipulating among the top step ⅲ .1-ⅲ .3, thereby be created in three PAIF shown in Fig. 6 C-1 to 6C-3 respectively.

In first example, insert customer data record (59 among Fig. 3 in client's table) with client _ Id (or search key) " 12546 " to the PAIF of Fig. 6 B.As step (ⅰ) defined, along from root 101 beginning and data recording 103 end of representative " comparable data record " with reference to path movement.This is performed such, from node 101 edge links 102 (value of 1 long number word is " 1 " during being inserted into the skew of data recording " 0 "), get along well all that to be inserted into key word identical in the value (" 5 ") of skew 2 because locate (by node 104 regulation) link 105 and 106 value (being respectively 4 and 3) in skew " 2 " then, (in this specific embodiment by linking 106) advances to comparable data and writes down 103 along arbitrary path.

Not being located on skew 2 (" 5 " are to " 3 ") and being offset 4 (" 6 " are to " 5 ") of the search key of the compare operation generation new data records of the middle regulation of step (ⅱ) and the search key of comparable data record (103) located.Thereby smallest offset (" distinguishing skew ") is 2.

Now forward step (ⅲ) to,, satisfy the condition of step ⅲ .1 because this differentiation skew equals to distribute to the skew of node 104.Thereby, and as shown in Fig. 6 C-1, new url 111 is connected to new data records 112 to node 104.According to the byte value at 2 places, position in the search key of new data records 112, be 5 to linking 111 values of distributing.Like this, the PAIF110 of Fig. 6 C-1 is inserted into result among the PAIF of Fig. 6 B-1 to data recording 112.

Now forward second example to, the data recording (57 in client's table of Fig. 3) with client Id (or search key) " 12355 " is inserted among the PAIF of Fig. 6 B-1.The step ⅰ that stipulates above and ⅱ produce finish in data recording 103 from node 101 beginnings one with reference to the path.

Now forward step (ⅲ) to,, satisfy the condition of step ⅲ .2 because distinguish skew 3 greater than this skew 2 with reference to the leaf node in the searching route 104.Therefore, abide by step ⅲ .2.1 and, write down 103 break links 106 and be connected to a new node 121 from comparable data as a result shown in the PAIF120 of Fig. 6 C-2.Give differentiation skew 3 to this new node.Then, abide by step ⅲ .2.2, connect comparable data record 103 and new node 121 by new url.To this new url apportioning cost 4 (for the differentiation from the search key " 12345 " of comparable data record 103 is offset 3 digital values of taking out); And last, as regulation among the step ⅲ .2.3 like that,, new data records 123 is connected to node 121 by distributing the link 124 of value " 5 " (being offset the numerals of 3 taking-ups for differentiation) from the search key " 12355 " of new data records 123.Thereby the PAIF120 of Fig. 6 C-2 is inserted into result among the PAIF108 of Fig. 6 B-1 to data recording 123.

The 3rd and last example relate to the PAIF that the customer data record with client _ Id (or search key) " 11346 " (55 in client's table of Fig. 3) is inserted into Fig. 6 B-1.Application above-mentioned steps ⅰ and ⅱ cause from node 101 and advance to data recording 103 (Fig. 6 B) and determine that distinguishing skew is 1.

Among such step ⅲ, satisfy the condition of step ⅲ .3.Thereby, abide by step ⅲ .3.1 and as a result shown in the PAIF130 of Fig. 6 C-3, link 102 moves on on the new internal node 131.To new internal node 131 apportioning costs 1 (promptly distinguishing skew).As step ⅲ .3.2 defined, directly connect new data records and node 131 by new url 133.The value of distributing to link 133 is 1 (distinguishing skew 1 numeral of getting from the search key " 11346 " of new data records 132), and it is last, abide by step ⅲ .3.3, new internal node 131 is linked to node 104 by distributing the link 134 that is worth 2 (distinguishing the numeral that skew (1) is got) from the search key " 12345 " of comparable data record 103.

Though the PAIF with reference to Fig. 6 A-6C explanation can be contained in the piece above, yet preferably separates " node " and " data recording ", thereby each data recording is focused in different one or more files.The PAIF of Fig. 6 C-3 is used this method to be caused and produces 132,103,107 the data file of holding the record.Link 133,106 and 105 becomes long-chain certainly and connects.

Clearly, if the data recording that certain insertion process causes discovery to exist in PAIF will be inserted is then inserted the suitable error messages of process loopback of ordering to calling.

Note that the whole PAIF of supposition resides in single in these examples.Clearly, above-mentioned by following " insertion process " piece may occur when inserting the more data record and overflow, this must (back more goes through) call " explant " process, needs to enter the piece that will search then and carries out insertion process by the mode of stipulating above.

After typical " insertion " issued transaction had been described, existing explanation " was sought (or retrieval) data record ".Thereby, in order in existing PAIF, to find certain data recording (hereinafter referred to as being searched data recording) with certain search key, should carry out following step:

ⅰ. begin and advance from root node along one, and each node on this searching route (hereinafter referred to as " present node ") is carried out following substep in the searching route that certain data recording with the link of certain leaf node finishes:

ⅰ .1. each link to originating from from present node: relatively the present node value is in the defined value of being searched the search key section of data recording and distributing to described link of offset point; Advance and turn back to step in the described link in match condition lower edge

ⅰ．1；

If ⅰ .2. is the search key section coupling that the link of sending from present node is all got along well and searched data recording, and loopback " is not found " and finished the searching process;

ⅰ .3. compares the key word of the whole search key of being searched data recording and comparable data record if arrive certain data recording (hereinafter referred to as " comparable data record ");

ⅰ .3.1 is if mate, and loopback " is found " (and going back the whole data recording of loopback under " procuratorial work " situation) and finished the searching process; Perhaps

ⅰ .3.2 loopback under the situation of not matching " is not found " and is finished the searching process.

In order to understand " searching " process better, the concrete PAIF secondary of application drawing 6C-3 causes " finding " and " not finding " result respectively.

Thereby, consider to seek data records (being searched data recording) to call in the following text by search key " 12445 ".According to step ⅰ .1, the value of being distributed of the value of the numeral " 1 " that the root node (skew 0) of being searched data recording distributes in this skew place and link 102 (unique link of sending from node 101) compares.Because coupling occurs, control moves on to node 131.Once more according to step ⅰ .1, distribute to the digital value under this skew (" 2 ") of the node 131 (skew 1) of being searched data recording and distribute to and link 134 value and compare.Also obtain coupling herein, thereby control moves on to node 104.Then, according to step ⅰ .1, distribute to the node 104 (skew 2) of being searched data recording and compare in the digital value " 4 " of this skew place with from each link that node 104 sends.Relatively produce and link 105 coupling, thereby control moves on to data recording 107.

According to step ⅰ .3, searched the search key of data recording and the key word of data recording 107 and compared, " find " result (step ⅰ .3.1) owing to obtain mating loopback.

Now forward second example to, consider that being searched data recording has search key " 12463 situation.The process that example is described in the recycling, but the relatively generation of being searched in step ⅰ .3 between data recording and the data recording 107 do not match, and the result " is not found " in loopback according to step ⅰ .3.2.

The issued transaction of common " delete data record " now is described.Thereby elder generation carries out " searching data recording " issued transaction to PAIF as the phase one.Under " not finding " situation, to the suitable error messages of process loopback that calls this " deletion " order.Find the data recording of being searched under the another kind of situation.For the explanation of clear " deletion " process, introduce following term.

Be called as " destination node " with the leaf node of being searched the data recording link.The father node of destination node is called " pioneer's destination node ".The link that pioneer's destination node is connected to destination node calls " pioneer's link ", and the link that destination node is connected to its child node (or certain is not searched the data recording of data recording) is called " Object linking ".To remember these terms, carrying out following step: ⅰ. deletion is by the link of searching data recording and being linked to destination node; ⅱ. if remaining number of links then finishes delete procedure more than or equal to 2 on the destination node; ⅲ. on the other hand, if remaining number of links is one (that is, an Object linking) just on the destination node, then:

ⅲ .1 is by being connected to described child node (or to certain data recording) " bypass " destination node to pioneer's link from pioneer's node; And

ⅲ .2 deletion destination node and Object linking; Finish delete procedure.

Note that existing step is that " " step so that discharge the space that is occupied by destination node and Object linking, thereby can be given other node and link in this piece these allocation of space in more thrifty memory management.Note that also described step (ⅲ) selects for use.

In order to understand better, above-mentioned " delete data record " process now is applied to the specific PAIF of Fig. 6 C-3.

Thereby response command " deletion has the record of search key=11346 " is searched for this record according to process described above at this PAIF.Data recording 132 is found and meet above-mentioned step ⅰ, deletes this data recording and guides to link 133 on this record.Because after this deletion, 131 of destination nodes stay unique Object linking 134, applying step ⅲ and ⅲ .1, thus link 102 is earlier walked around on the child node 104 that destination node 131 directly is linked to the latter.Then, abide by step ⅱ .2, deletion destination node 131 and Object linking 134, thus obtain the 6B-1 shown in Fig. 6 B-1.PAIF with reference to Fig. 6 C-1 provides another example.Thereby response command " deletion has the record of search key=12546 " is searched for this record according to process described above in this PAIF.Data recording 112 is found and meet top step ⅰ, deletes this data recording and guides to link (111) on this record.Regulation among the ⅱ because remaining number of links is 2 (promptly linking 105 and 106) in the destination node 104, then finishes delete procedure set by step.PAIF is the PAIF shown in Fig. 6 B-1 again as a result.

Another kind of common basic operation is " revising the available data record ", for example, changes certain existing customer's home address.Normally above-mentioned basic operation realizes by optionally using in " modification " operation.Should distinguish following each situation in order to carry out one " modification " order:

1. the field outside the search key imposing modification (for example revising the client's of client Id number=" xxxxx " address)--modification process calls " searching " operation (data recording with user Id number=" xxxxx ") simply in this case.After obtaining being searched data recording, with replace old address, new address.

2. the search key field is imposed modification (for example account number being made into " yyyyyy " from " xxxxxx ").This order is realized by other two basic operation sequences, that is, delete the data recording of " number of the account "=" xxxxxx " and follow the data recording of inserting " account number "=" yyyyyy ", and vice versa.Clearly, once revising issued transaction can be made up of both of these case.

The a string byte representation of each search key in superincumbent each example, thus carry out search procedure by search key being divided into each search key section that constitutes by at least one byte.The insider understands easily, and gulp is not that unique that search key is represented may.Thereby, for example search key can be with binary mode, i.e. 1 and 0 string, expression, correspondingly by this search key is divided into each by 1 (being 1=1) or more multidigit carry out search procedure as the search key section of byte formations such as (being the 1=8 position).In some cases, also may be the situation that all nodes among its PAIF do not have identical l value.

Also note that also can be to the different linking allocated length among the given PAIF different search key sections, as long as each search key section is known corresponding node.

As finding out like that from each PAIF of Fig. 6 A-6C is clear, according to search key with classification form maintenance data recording.For example the navigation in Fig. 6 C-3 (from right to left) causes ordered sequence " 11346 ", " 12345 " and " 12445 ".This feature constitutes another advantage, promptly with Fig. 5 in the non-classified tree of wherein data recording to compare data manipulation easier.As top the regulation, the node among the PAIF needn't be the uniqueness classification.Thereby for example, in the PAIF120 of Fig. 6 C-2, node 104 is a leaf node (connect 105 by long-chain and be linked to data recording 107) and an internal node (connect 106 by short chain and be linked to node 121) simultaneously.

The insider understands easily a kind of in many modification that " insertions ", " deletion ", " searching " and " modification " process of describing only are these processes of realization herein and is can be according to concrete enforcement by required suitable modification.

The insertion of these regulations, deletion, searching issued transaction are applied to so-called interior issued transaction.As explain in more detail the back, these issued transaction are applied to the interblock environment need solve minority and the irrelevant situation of interblock operation.

After the structure of having explained the PAIF clue, the following describes various according to embodiments of the invention, a kind of hierarchical index that comprises PAIF tree (as cutting apart index substantially) based on the PAIF key map shown in it.

Now forward Fig. 7 A-7H to, wherein illustrate the hierarchical index that under a series of explant operations, constructs according to a kind of embodiment of the present invention.Consider the piece 140 (cutting apart in the index substantially) among Fig. 7 A, this piece overflows on storage space.This is a situation of calling " explant " process, and this produces the hierarchical index 142 of Fig. 7 B, and it is made up of root piece 144 and replica node A ' (155), replica node A ' by

direct link

145 and 146 links of leaf piece and by long-chain connect 147 and leaf piece 148 link.

In this object lesson, burble point is chosen as link 149 (Fig. 7 A) (to call " separating link " in the following text), thereby node A, B, E, D, F are moved on to new piece 146 and node C, G, I, J, K, L, H are moved on to piece 148.Preferably the disengaging latch selecting is selected at new interblock and reaches average substantially node and the distribution of link (for example the size of resident sub-PAIF is roughly the same in the piece 148 and 146).Under the non-existent situation of father's piece, with father's piece 144 of replica node A ' (155) establishment (constituting I1) of separate node A (156).Under the replica node that sends the separate node of separating link does not from it reside in situation in father's piece 144 as yet, this node is copied in father's piece (with A ' mark), and by described direct link 145 realization A ' (155) nodes and A the connection between the resident piece.Separating link 149 (being originally that the short chain between A and the C connects) connects 147 by the long-chain between the resident piece of A ' and C and replaces.Alternatively, but also the disengaging latch of with dashed lines 150 sign fetch hinged node A and C (being respectively 156,153).

Net effect is that a hierarchical index that is made of piece 144 is provided in Fig. 7 B, and the piece of clue is 146 and 148.The insider understands now easily might be by clue (promptly from node A156) but visit or new data records more by hierarchical index (promptly from node A ' 155).In this connection, it should be noted that link 147 and link 150 have equal value, this value and then be the value of the original link 149 of Fig. 7 A.

Now consider piece 148, this can go through the hierarchical index 151 among similar block sort process generation Fig. 7 C.According to this example, the short chain that separates link and be Fig. 7 B connects 152, and correspondingly node C and H reside among the piece 148A of Fig. 7 C and node G, I, K, L and J reside among the piece 148B.The node (the node C-153 of Fig. 7 B) of this separation link origin is replicated (the replica node 153a that produces Fig. 7 C) and is placed in the piece 140 with C ' mark.As before, directly link 154 is connected to the piece 148A of original separate node 153 to copy node C ' 153a, is that the original value of a link 152 between (and afterwards) node C and the G before the value of the link far away of explant 148B and this link and the separation is equal and link 155.

In Fig. 7 C, hierarchical index 151 is by comprising that piece 141,148A and 148B (constitute I ₀) clue and the piece 16 that constitutes the representative index on the public-key of this clue form.

Please note that in Fig. 7 C node A in the piece 141 and the node C among the piece 148A are chosen to disconnect, similarly, the node C of 148A and the node G of 148B are chosen to disconnect.As being clearly shown that, node A ' with in piece 140, be connected forming (connection) clue, and might be by node A ' and directly link 156 access block 141; By node A ', C ' with directly link 154 access block 148A; By node A ', C ' with directly link 155 access block 148B.The value that merits attention the link (in piece 140) between node A ' and the C ' is identical with the original value between node A and the C (seeing the link 149 among Fig. 7 A).

As clear illustrating among Fig. 7 C, the hierarchical index that is produced constitutes a kind of balance-type block structure, thereby it is minimum keeping the index degree of depth, and correspondingly make in order to seek, insert or delete the required visit of certain given data recording (normally, although needn't be, I/O operation) number of times be a minimum.Now imagine this that keep depending on the record number of times basically for visit data record hierarchical index and count function, then on for the required I/O number of operations of certain given data recording of visit with compare hierarchical index by clue more effective, for example, for by this hierarchical index visit data recording related with node J, it needs at first access block 140, piece 148B and then searched data recording (promptly three I/O operate) then.Compare, cause I/O visit, i.e. piece 141, piece 148A, piece 148B and data recording 159 4 times by clue visit identical data record.As shown, have the more effective specific example of a small amount of clue (for example, visiting the data recording related with node A), then, clue big more (promptly being made of more piece) is effective more by the hierarchical index visit.

For the specific embodiment of Fig. 7, (for cutting apart an embodiment of index substantially) is PAIF in accordance with identical key map basically with clue to represent index.The figure identical to " substantially ", it means some differences of existence, as the back illustrates with reference to Fig. 9 G.

Be illustrated in the hierarchical index more high-rise I with further reference to other example of describing among Fig. 7 D to 7H _jThe consideration of replica node.Thereby the hierarchical index of consideration Fig. 7 D wherein carries out piece and separates in link 400.The hierarchical index that is produced is wherein created piece 402 shown in Fig. 7 E, and node 401 copies more high-rise piece 402 (constituting the part of layered index) to, and the original link between Node B and the E is chosen to keep (linking 403 by a dotted line).May visit two pieces (405 and 406) of clue now by B, respectively by

link

407 and 408.

Then, supposition now need be such as explant 405 in link 409, and the structure that is produced reveals in the piece 402 of present Fig. 7 F, and wherein the node A of piece 405 and I are copied into A ' and the I ' (410 and 411) in the piece 402.Node I ' is the replica node of the separate node in the piece 405 significantly.Consider that Node B (its copy B ' resides in the piece 402 in advance) and I (its I ' copies to piece 402 now) are the descendant nodes of A, thereby node A is copied also.Node A is the minimum pioneer of Node B and I, thereby forms (connection) clue in piece 402.And short chain (in the piece 402 between node A ' and the B ') relevant value that connects 414 and link 412 value of (in the piece 405 between A and the B) is identical.The value that links 415 (between node A ' and the I ') in the piece 42 is identical with the value that is connected 413, and link 413 originates from node A and points to the required direction of access node B.The inner structure of piece 402 is the structures that allow the representative search of piece 405,406 and 407.

Node

422 and 411 straight chain extension connect 416,417 and are chosen to remain, because might move to piece 405 along direct link 418, can see that egress 410 remains on the access path to

node

422 and 411.

Fig. 7 G illustrates the hierarchical index as a result behind the piece 407 (in link 420) of separation graph 7F, and Fig. 7 H illustrates the hierarchical index as a result behind the explant 402 (chaining between node I ' and the N ').The hierarchical index as a result of Fig. 7 H has three layers as shown, and ground floor is made up of piece 430, and the second layer is made up of

piece

402 and 408, and clue is made up of piece 405,407,426 and 406.

The insider understands easily, realizes that the mode of explant is not limited to the example of Fig. 7 D to Fig. 7 H certainly.

After explanation makes up the embodiment of hierarchical index by the separating treatment (with reference under Fig. 7) that is caused by a series of insertion issued transaction, can understand when only staying next not related the node of data recording in the delete data record piece and can activate opposite process, i.e. " deleted block ".

The insider understands easily, is a kind of in the modification of many possible realization hierarchical indexs with reference to the hierarchical index of Fig. 7 explanation, represents index and cut apart index substantially roughly the same in these modification.

Using PAIF to cause a kind of advantage that is better than hitherto known clue under the mode of defined on following meaning, although promptly clue may be unbalanced in essence, the hierarchical index that is reached has the structure of balance-type.

Now forward Fig. 8 A-8B to, they illustrate two examples that technology of the present invention are applied to another kind of embodiment of the present invention respectively.

Like this, Fig. 8 A illustrate a kind of have vertical orientated (promptly make up vertical tree) give the word line Cable Structure, as shown, it is unbalanced, i.e. three dark (260,261 and 262) two pieces wide (260 and 264) of piece.The search graph of this specific vertical tree is not explained in following explanation, and only emphasizes to obtain the required various aspects of balance-type hierarchical index.Yet should notice that node in the clue structure 260 represents skew under nibble length.(nodal value is represented with the hexadecimal representation formula of the data recording (a-k) shown in Fig. 8 A.)

Note that once extra I/O operation, promptly as three access block for visit data record k (or three I/O operate) of comparing with block access visit data record b (or an I/O operation) described in Fig. 8 A, can regard equilibrium as.In some situations of actual life, needn't use technology of the present invention to cause the identical I/O operation of quantity.Certainly, insert data recording more and may produce higher " unbalanced " degree, if need not technology of the present invention will as top going through (with reference to prior art), cause performance decline (because unbalanced construction) to this processing.

Fig. 8 B illustrate of the present invention one may embodiment, as shown, construct one and (form I by a piece 270 ₁) the representative index, this causes the horizontal equalization formula tree that obtains having root piece 270, operates all pieces that addressable low layer is vertically set (latter constitutes unbalanced clue) from root piece 270 by an I/O.

As shown, the public-key value by each piece obtains the first vertical actual access of the piece in the tree (it is a clue).Elder generation is with reference to Fig. 8 illustration term public-key before proceeding.

The public-key of piece 260 (hexadecimal representation under the nibble unit) is Ox4, Ox ₁And Ox3, wherein Ox4 represents the most significant digit group of the byte of character A, Ox ₁Represent the minimum hyte of character A, and Ox3 represents the most significant digit group of the character in the skew 2 that resides in data recording.

Please note and to share the public-key prefix of stipulating previously by all data recording of piece 266 visits.In the same manner, following table is summarized the public-key of each piece: piece public-key 260 Ox4, Ox ₁, Ox3261 Ox4, Ox ₁, Ox3, Ox3, Ox3, Ox3, Ox3, Ox3, Ox3269 Ox4, Ox ₁, Ox3, Ox3, Ox3, Ox3, Ox3, Ox3, Ox3, Ox3, Ox3, Ox3, Ox3264 Ox4, Ox ₁, Ox3, Ox3, Ox3, Ox3, Ox3, Ox4, Ox3

Note that it is 8 root node that piece 261 can hold a value, thereby the public-key of this piece (hereinafter referred to as k) changes over Ox4, Ox ₁, Ox3, Ox3, Ox3, Ox3, Ox3, Ox3, promptly it is made up of 8 unit.In this case, I ₁The representative of middle piece 261 should corresponding change.In a kind of different enforcement, 261 be represented as k, be 8 root node even there is not value.

Index on each public-key is realized in representing index (being made up of piece 270), thereby it makes up a clue to each public-key addressing of the first vertical tree.Now, for example,, can arrive node 292 along node 290, link 291 in order to search data recording g.Then, advance to the piece 261 that is associated with data recording g by direct link 293.The hierarchical index that is produced is balanced.

As top defined, for the clue special circumstances, the representative key word of piece is a public-key.In general, the public-key of a piece is can be from the longest-prefix of this piece by all key words of the data recording of relevant key map visit.For PAIF, the prefix length of regulation (calculating with 1 long unit) is the value (recall it and keep off-set value) of root node in the piece.If prefix length is expressed with figure place, then multiply by 1 long value and calculate prefix length by off-set value.

Another embodiment of structure hierarchical index of the present invention is described below with reference to Fig. 9 A-9G.

Correspondingly, existing noting forwarding Fig. 9 A-9G to, shown in it certain PAIF tree (it constitutes a clue to the unbalanced construction sensitivity) is carried out a series of modifications (insertion) issued transaction, thereby obtain hierarchical index.For the convenience of expressing, data recording is shown a part that constitutes this clue.As noted, the data recording practical ways related with clue can change according to concrete application.

Among the figure below, insert following non-classified data recording A-F (making things convenient for them to constitute a part of piece for what express) by connecting ground: the serial data bits string representation, wherein 1 bit position is represented 1:A=001000011B=110011100C=011011111D=011011011E=10101010 1F=111111111

In first step (Fig. 9 A), A is inserted into piece 300 record, and piece 300 comprises that skew is 0 node 301, and node 301 is that 0 link 302 is related with first node A by value.This moment, the tree of being made up of piece 300 has only a node.This key map represent to the searching route of data recording A be according to respectively as the value 0 at skew 0 place of describing on link 302 and the node 301 determine.

(Fig. 9 B) then inserts data recording B, wherein as can know find out different with data recording A, key value is 1 on 0 being offset, thereby links 302 lead data recording B and distributing and be worth 1.

(Fig. 9 C) then inserts data recording C, and the value in the skew 1 is used for and writes down A and

distinguish.Link

303 and 304 is connected respectively to node 305 (representative skew 1) on illustrated the data recording C and A.Because piece 300 holds

node

301 and 305, also needn't separate this piece now.

Then, insert data recording D, abide by the block structure that this inserts operation shown in Fig. 9 D.Yet,, need explant 300 now because data block can not be held the node (occurring overflowing) more than two.Fig. 9 E illustrates the tree construction after the separation.Thereby link 306 is to separate link, and its function is that the content of about half-block will be stayed in the piece 300, and the content of all the other half-block will move on to another piece 310.Certainly similarly other link is chosen as and separates link.

As the first step, I ₀In two pieces 300 of piece 300 usefulness and 310 substitute.Node 0,1 (respectively with 311,313 signs) and data recording A and B stay in the piece 300 that breaks, and node 6, data recording D and C (node of representative residue in this particular example) move on in the piece 310.Thereby the index of cutting apart substantially of Fig. 9 E is now formed (in fact this constitutes unbalanced clue) by two

pieces

300 and 310.

Then, because piece B1 does not exist, it is established, thereby piece 312 is set.Separate node (313) copies piece (312) to make up a replica node (314).Then, be connected to piece 300, and be linked to piece 310 by link 318 replica nodes 314 far away by direct link 316 replica nodes (314).Should far link the original separation link 306 of with dashed lines sign among the alternate figures 9E.The value of link 318 far away equates with the value of separating link.Thereby, represent index (constituting) to allow according to the public-key search of cutting apart index substantially by piece 312.

Note that and separate link and do not have restriction whether deleting or keep.As shown, the level of the formation hierarchical index that obtains like this tree (form by piece 312,300 and 310, wherein piece 312 belong to represent index) be balanced.

Then, insert data recording E.Can not level tree (for a kind of form of hierarchical index), advance by link 318 first nodes 314 far away (having value 1) in this case from piece 312, because the direction 1 of node 314 (value is 1) is come from its representative, thereby need the link under the direction 0.Thereby advance to piece 300 by direct link 316.Like this, find the needs piece related with this new data records.Insert data recording F with identical method, and produce the tree construction shown in Fig. 9 F.

Then, if carry out the node 320 of piece 300 and separating of 321 of nodes, node 320 copies piece 312 (indicating with 323) in Fig. 9 G, because it can not be linked to the node 314 (because this can not keep correct joint block internal chaining) of piece 312, also the node 311 of piece 300 is copied to piece 312 (indicating with 322) in Fig. 9 G, can be so that set up one by this key map (connection) clue according to the public-key search block 300,326,310 of piece.

Also note that to substitute to possess each direct link of coming from all copy nodes 314,322,323 of piece 312 among Fig. 9 G, possessing one, to come from that copy node (322) to the direct chain of piece 300 scoops out be enough.(direction of the link 315 of Fig. 9 F) is provided with a link far away of 126 from node 323 to piece on the direction of the link before separating.Clearly, if in piece 326, carry out another time separation, should be in piece 312 by a node and one to piece B _I-1' link far away show, wherein said node by under direction 1 to B _I-1The link of coming from node of direct link connect.

Fig. 9 A-9G and 8A-8B illustrate by two kinds in many possibility modes of the explant mechanism that makes up hierarchical index realization maintenance equalizing structure of the present invention.The dirigibility of adopting other non-limiting modification is for example shown in Fig. 8 B, wherein closely links 271 and represents by having the far away link 273 identical with the direction that links 271 (with dashed lines signs) with direct link 272, and this makes node 276 redundancies.

With regard to many embodiment, balancing technique of the present invention makes the balance-type horizontal alignment numeral tree (a kind of hierarchical index structure of form) that obtains like this have so-called " random access " feature.This means that there are not the node of the link on the described direction of key map in the search relevant with certain input data recording (search data record A in the example) the different data recording that may lead or its place of leading, and the data recording of being searched for final visit need impose " correction ".

In order to understand better, for example consider Fig. 9 E.For example consider the hierarchical index of Fig. 9 E is imposed being searched the search issued transaction of data recording L=111011110.Searching route will and link 318 (being respectively skew 1, value 1) along node 314, and follow at skew " 6 " (root node of piece 310) and pass through link 319 (value " 1 ") arrival data recording C.This illustration model goes out the random search feature of the hierarchical index that obtains like this.

In order to solve illustrated failure, calculate the length of public prefix of the key word of the key word of being searched data recording and this data recording, the public-key of piece (310) is the prefix part of the key word of real data record C.Like this, the length of public prefix is zero.Then, be equal to or less than the node of public prefix length along tree visiting its value with direct link of asking the way in the footpath of swashing.If can not satisfy this requirement, promptly the value that has of all nodes is greater than the prefix length that calculates, and then first has the node of direct link (it should point to index I from this access path _I-1First piece).Now, move on to the more vertical orientated tree of low layer (promptly to a layer I by direct link 316 from node 311 _I-1), press then and continue search as described in the key map.

According to another kind of occasion, suppose that key map indication enters certain assigned direction but do not have link on required direction, this searching route is followed and is come from have peaked node on this searching route direct link of (it holds a directly link).When advancing to another piece, can draw to public-key (if can obtain) or to the comparison of the data recording related and whether to advance or turn back to the decision-making that certain has the direct node that links by key map with node from piece.It should be noted that public-key needn't physically be attached on the data recording.

Get back to the data recording C of top example (by searching data recording L) and relevant Fig. 9 E, then do not need visit data to write down C as if the public-key (it is 011011) of maintainance block 310 in piece.Thereby,, can not turn back to node 314 and link 316 under the Visitor Logs C because the public prefix between the public-key of the key word of L and this piece is 0.Needing to avoid the visit data record to have the advantage of improvement in performance certainly in the mode of this explanation.Know that being searched data recording does not reside in public-key prefix between the public-key that criterion in this tree is to be searched the key word of data recording and this piece greater than the value of separate node.

In this example, the value of separate node is 1 (value of node 313), thereby piece 310 is not to hold the record L piece of (if having this record).Thereby, the search of record L is continued from node 314 and link 316.This process is applied to all modification issued transaction.

With regard to inserting issued transaction, find piece 300 and it relevant with new data records L in mode described above.

This example is quoted the object lesson of hierarchical index.The insider understands this random access feature application easily in the various modification that adopt other hierarchical index type of cutting apart index substantially.

The reason that the random search feature is drawn " makeing mistakes " is may not be from residing in I _H-1In the searching route of piece on the value of node know a layer I _H-1In whole public-keys of certain piece.Thereby, for verify to the searching route of specified block whether with according to the searching route coupling of the key word of being searched data recording, need know I _H-1In the public-key of piece.If do not keep public-key in the piece, may in index, advance to certain data recording so that know the public-key value.

The intrinsic mode of easily makeing mistakes feature and being handled of hierarchical index illustrates with reference to Fig. 9 example in the above, and more explained in general is as follows: in order to search for one at I by key word k _hIn (and in some cases at I _H-1To I _iOr to data recording) search k guide I into so that find _H-1Piece B.Repeat this process, until arriving and the relevant I of the data recording with key word k (if existence) ₀Piece.

The explanation of Fig. 7 to 9 a kind of handle of giving an example out is used as basic block index based on the key map of PAIF and represents the hierarchical index of index.The insider understands easily, and hierarchical index of the present invention is not only limited to PIAF.For example, United States Patent (USP) 5,495,609 illustrate a kind of different clue.For example consider the clue of Figure 10 A, and suppose that this clue comprises a piece that holds node 11,12,13 and 14 according to this 609 patent.Supposition now needs to separate this piece so that subsequently tree is inserted new node, a kind of method of the possible explant according to prior art for example disconnects in the joint 12 and 14 chain and fetches and obtain two pieces, a piece holds node 11,12 and 13, and another piece holds node 14 (and later new node).Suppose that first piece resides in the internal storage,, only need an I/O operation if need to arrive record 26 now.On the other hand, if interested be record 20, in order to visit the I/O operation for the first time of new piece (piece that promptly holds node 14) needs, and operate for Visitor Logs 20 needs another time (i.e. the second time) I/O from this.Thereby can understand explant and cause unbalanced tree.The unbalanced feature of influence tree is negatively comprehended by insertion office in succession, promptly needs repeatedly the I/O visit, and this is obviously undesirable.

Use the shortcoming that technology of the present invention can solve unbalanced tree, and, wherein on this clue representative key word of (forming), make up and represent index by piece 159B and 159C by piece 159A at the hierarchical index that obtains shown in Figure 10 B.Wherein also the link between

node

12 and 14 is considered to and separates link, and new node 159D (duplicate of node 12) is copied in the new piece that indicates with 159A.Now, for Visitor Logs 20 and record 26 need the identical I/O operation of number of times, and under this concrete condition secondary.Along with the increase of clue size, the efficient of utilizing hierarchical index to conduct interviews is higher.

Thereby the hierarchical index of Figure 10 B causes balance-type piece tree, and it is identical to guarantee to reach the required I/O number of operations of each data recording in the tree.The insider understands easily, and the I/O number of operations preferably depends on data recording quantity and the logarithmic function of the number of links sent from a piece.Like this, for example send 1000 links far away from a piece, the hierarchical index with three layers allows 1,000,000,000 data record of visit.

For this is better understood, provide some numeric example below.Suppose that each piece has 1000 links far away.Suppose that each link far away is of a size of 4 bytes, drawing these required spaces of link far away of expression easily is 4000 bytes.Further node in the supposition piece and nearly link occupy other 4000 bytes, and the piece size that is produced is less than 10,000 bytes.For being discussed, each piece is of a size of 20,000 bytes.

Consider that now a hierarchical index of being made up of a piece (for example piece 144 among Fig. 7 B) is as index level I ₁, and suppose that it is linked to layer I one by one ₀In piece (two

pieces

146 and 148 only are shown among Fig. 7 B wherein), this hierarchical index amounts to and is used to add up to 1001 the piece that respectively is of a size of 20,000 bytes.Thereby being used to of should distributing keeps the gross space of each piece of this hierarchical index to be about 20 megabyte.The space of this order of magnitude can easily for example hold in the internal storage of personal computer.Now suppose I ₀In each piece related with 1,000 data record, the clean effect (according to present embodiment) that then adopts hierarchical index of the present invention be in internal storage, integrally hold 1,000,000 can be in the data recording that need not to visit under the I/O index.

Equally, visit the index that tens records may need more one decks, this may need once additional I/O operation.

In order to understand better, for example consider the realization of the hierarchical index among Fig. 6 B-1 or the 6B-3 (PAIF key map).The key word longer dimensionally (for example 100 byte longs) of

tentation data record

103 and 107, this would not change the size of PAIF.Can find out another nonrestrictive example in Fig. 8 B, if be of a size of 200 byte longs by the key word of the data recording a-k of this index addressing, this should be unable to change the size and the structure of this hierarchical index.Also can find out, also might in index, navigate and retrieve data a-k according to the order of key word.This example goes out a kind of form of sequential operation.

As shown, the hierarchical index as a result of Figure 10 B comprises that two have vertical orientated tree, that is, first tree construction (is cut apart index I for a kind of form substantially by piece 159B and 159C ₀) form, and second tree has a piece 159A and (cuts apart index I substantially for a kind of form ₁).

The piece of the level of Shi Xianing tree (for a kind of form of hierarchical index) is balanced like this, promptly can visit all links to each data recording according to 159A by an I/O.Can cause I ₀The piece group in the insertion of more data record of further separation can require to upgrade hierarchical index I certainly ₁Work as I ₁Piece 159A in number of nodes when exceeding certain and giving determined number, according to separation mechanism explant 159A.

The thread index that technology of the present invention is considered is not limited in ' 609 patents in the disclosed search, but as the above-mentioned tree that comprises other type.

Note that the piece inner structure needn't be balanced, that is, the node in the piece needn't be arranged under the balance-type structure.Although this fact looks like a shortcoming, the insider understands easily, and the database overall performance that it involved in fact is unconspicuous.This is because search graph normally carries out in the quick internal storage of computer system in the piece.Different with search graph in the piece, piece arrangement in the hierarchical index remains under the balance-type structure, thereby the blocks of data in the searching route is the logarithmic function that depends on data recording quantity, and reflects that the I/O to external memory storage visits (operation at a slow speed in essence) number of times in the internal storage for required piece is encased in.

In this connection, the insider understands easily, and the present invention never is limited to certain given physics realization, thereby, for example with regard to search graph, although after having used technology of the present invention, in piece, keep search graph, this applied logic is conceptive, for example advances in hierarchical index according to skew and off-set value.Can realize this general notion by the various modes that technology of the present invention comprises.For example, the offset dimensions that each intranodal held (figure place) can change, and realizes null pointer (promptly pointing to the pointer that sky-nothing has children) and other.The dirigibility of this physics realization also is applied to part in the piece.

The two all keeps identical key map (except that error handling processing, can run into error handling processing as top when visit data writes down with reference to the detailed explanation of Figure 10 G) to the hierarchical index that illustrates with reference to Fig. 7 to 10 with representing key map to clue in fact.

Explain like that as reference Figure 11 illustration, to clue with represent index the two to keep key map be not enforceable.

Figure 11 illustrates the method (promptly making up hierarchical index) of the unbalanced tree of another kind of equilibrium figures 8A, and the B tree of its routine is as the representative index on the representative key word of unbalanced tree.Resulting horizontal alignment balanced tree (hierarchical index) comprises and is positioned at top layer (index level I ₂) locate piece 272, be positioned at low layer (index level I ₁) piece 270 located and 271 and each original block (piece 260,261,262,264) of being positioned at the unbalanced vertical orientated tree of Fig. 8 A that lowermost layer (index level I0) locates.Fig. 4 illustrates the key map of representing index needn't be identical with the key map of original unbalanced clue.If need, can (constitute this B tree one and represent index) on the whole and regard index level I as ₁

Database file management of the present invention system not only solves the shortcoming of conventional thread index file but also provides other convenient and improve the benefit of user application visit data.

Thereby the fact of the balance-type structure of maintainance block is guaranteed average to go up that I/O number of operations at a slow speed is held in basically is optimum, promptly obtains more effective result, when referring more particularly to by the many big files of forming.

The insider understands easily, although preferably operation imposes the structure hierarchical index at a slow speed I/O, for example makes the number of times of visiting exterior storage medium at a slow speed for minimum, and the present invention never is limited to illustrated storage medium.Thereby for example, can use storage medium of the present invention also can be internal storage.When the capacity of considering internal storage (although it is faster than external memory storage) increases this day by day is suitable especially, and its efficient access that also needs to realize according to the present invention is controlled.

The following describes a second aspect of the present invention.

For the ease of explaining, a second aspect of the present invention is described with reference to PAIF index (making up the index of an appointment).This aspect never is limited to this specific examples.

As described above, database file management of the present invention system can write down addressing to data of different types with single index.

In order to distinguish the data of different types record by the addressing of same PAIF index better, it is related with certain given identifier that each belongs to the data recording of certain given type.The latter forms the part of key word of data recording to make up the identifier key word.For each categorical data identifier is unique.Thereby, for example, the key word of data recording that belongs to entity " people checks out " with identifier " A " as prefix, and all key words of data recording that belong to entity " book " with identifiers " B " as prefix.The new key that belongs to the data recording of " people checks out " now becomes the formula of the indicating key word that the original key word by " A " and " people checks out " is composed in series, equally, the formula of newly indicating key word being composed in series by the original key word of " B " and " book " now that belongs to the data recording of " book ".

After the what is called that a second aspect of the present invention has been discussed " identifier " characteristic, the following describes so-called metadata.

According to one aspect of the present invention, data dictionary keeps metadata information, the latter that information as the function of its record type is provided on the data recording.Thereby,, also indicate that by using metadata information to identify or construct formula key word and out of Memory for example write down size except the needs data recording keeps an identifier so that discern this identifier.Indexed search figure does not note metadata, and it is not using under the metadata from identifier (or compound) key word position the record.Metadata is necessary for making up (compound) identifier key word, in case and retrieve data be used for determining the character of record.Thereby, for example retrieve the other identifier B of data recording subscript of book, then can obtain information from metadata with the record of B sign.For example, the size of secretary record, its each field and as each field of key field.

It only is one type that the use of indicating the formula data recording is not limited to, and would rather (best) by indicating the formula index process more than a kind of type, explain the ground with subordinate relation as the back.

Like this, according to hitherto known solution, typically in several files, preserve (and by several index file addressing) data of different types, and indicate that according to employing of the present invention formula indexed data library file management system can write down addressing to data of different types with same index.It should be noted that each key word that belongs to data of different types record (and by same indicate the addressing of formula index) needn't have identical length.Like this, for example one also is one and is substantially cut apart hierarchical index as it as type described in Fig. 8 A based on the hierarchical index of the formula of the indicating index of clue.Belong to " people checks out " entity each the record key word chi length be 6 byte longs, and belong to " book " entity each the record key word be of a size of 5 byte longs.The formula of indicating index to Fig. 8 A inserts the data structure that the book that has identifier key word B11111 and B22222 produces Figure 12, the latter comprises one to two class data recording, the data recording a-k and the data recording w-x that is distributing identifier B, the formula of the indicating index of addressing that are promptly distributing identifier A.In the following description, the record of term type X or be used to describe that to have record and the identifier of indicating the formula key word be X with the record of X sign.

Although this example illustrates and a kind of the key word of data records is realized indicating that (be a preposition character, character string or any amount of hyte as prefix, it is a kind of in many possible modification that the insider understands it easily for the mode of formula data.In fact, the identifier of being advised can realize by any known way, distinguishes different data recording, is considered as the part of key word and the part of formation search thereby need only this identifier.

Top narration means down irrelevant with identifier itself, identifier: (ⅰ) part (or keyword fragment) of composition data record (ⅱ) is stored in other places (for example, in different data structures), or (ⅲ) can be defined within other places, perhaps even definition otherwise.Its a example is the clue structure that is associated of a data recording identical with all types data recording of useful character A sign (for example).By this example, clearly, do not need physically on the example of identifier attached to data recording, because this identifier is common to all records.Yet, if the visit data record should be discerned its identifier and add on the key word.Another kind of possible solution is identifier addition data recording prefix, thereby can obtain identifier when visit data writes down.For example, consider Figure 12, by linking 270 from node 266 visit data records.First character of data recording d is A, i.e. identifier.

In order to understand subordinate relation better, note forwarding to Figure 13 A-13E.Figure 13 A illustrates the formula of the indicating index 800 (with the PAIF form) that four data record 802,804,806 and 808 (they are only illustrated the identifier key word) in an association.As having identifier " A " can draw from each data recording front easily, these data recording all are one type.

Now forward Figure 13 B to, PAIFS00 shown in it, it has the new data records (812) that composite key is A12355B940201333333 (identifier of record 81 is B).This new data records is subordinated to the data recording 806 that key word is A12355.According to this PAIF index, node 814 indication distinguish skew be 6 and the value that is linked to data recording 812 be B (6 places have value B in skew).Can find out that being recorded in skew does not have value in 6 places, distribute virtual value (for example empty) so that determine the skew of distinguishing to it, and correspondingly set and be labeled as empty direction linking 818 to another record in this skew place.

Figure 13 C is illustrated in the PAIF800 that wherein inserts another data recording 820.(data recording 820 of the example of 806 category-B type data recording is inserted into PAIF to represent another to be subordinated to category-A type data recording.It distinguishes that skew is 11 (values of new node 822), and is respectively " 0 " and " 1 " to the link value of data recording 812 and 820.

Figure 13 D illustrates PAIF800, and wherein dissimilar records is subordinated to record 806.The data recording (824) of type " D " of data recording that is subordinated to type " A " is from link 823 links of node 814 by having value D.Recall the data recording that PAIF has represented to use the B mark, wherein the latter is subordinated to the data recording with the A mark.An example that is subordinated to " B " type of " A " type is the article of being preserved by supplier (" A ") (" B "), and the example that is subordinated to " D " type of " A " is the client (" D ") by supplier (" A ") service.

Now forward Figure 13 E to, shown in it with Figure 13 D in another slightly different embodiment of PAIF.Particularly, in data file,, promptly omit under the prefix form key word A12355 expression and preservation subordinate data recording 812,820 and 824 with under the key word prefix (this prefix is the identifier key word of record 806) that does not have them.When visit, for example, the information that obtains from metadata according to identifier B, data recording 812 allows to extract following information: (ⅰ) pick out and lack a part of key word (ⅱ) data recording 812 and be subordinated to certain record with the A mark, the latter can be a links and accesses of sky by value from the node (814) with value 6.

Thereby might visit data record 806 and the complete key that makes up record 812.If PAIF800 is a hierarchical index,

node

814 and 822 may reside in the different pieces and may not comprise node 814 to the access path of the piece relevant with record 812.In this case, from slave node to record 806 link (link 826,828 and 830) addressable data recording 806 and construct key word.Above-mentioned realization is got rid of the expression of the formula of the indicating key word that must copy data writes down for each subordinate data recording and (for the object lesson of Figure 13 D, is duplicated particular prefix A12355 to writing down 812,820 and 824 3 times.But do not needing the node that the visit subordinate is correlated with under the roving commission with alternative key word prefix conserve space of link (if the size of prefix is greater than expression of link) and permission.

Subordinate relation feature of the present invention shown in Figure 13 D, the 13E is not subject to any specific implementation.

Thereby, on the relevant meaning of a plurality of as can be known data types of index and multiple subordinate relation (index file that prior art will separated), to compare with hitherto known technology, subordinate relation of the present invention can make the low layer of data realize more effective.However, exist in the application of use certainly according to the present invention more than an index file.

Clearly, subordinate record 812,820,824 respectively can have the record group that is subordinated to it.

In addition, the technology that adopts the present invention to propose also can produce some other advantages, for example keeps data integrity.For example, consider the PAIF800 of Figure 13 E is imposed an issued transaction, be subordinated to the data recording that data recording 806 (have and indicate formula key word A12355) composite key is A12355B9300101123456 with the B sign with insertion.Node 822 is guided in its search.The value that the key word that is inserted into data recording is offset 11 places is 0, thus Visitor Logs 812.The insertion that needs to make up the search key (by through linking 826 Visitor Logs 806) of record 812 and can finish new data records.Note that to record 806 link and make not and need carry out once roving commission according to its key word for the existence of confirming it to writing down 806.This more effectively keeps data integrity.

Utilize the b-tree indexed that has illustrated to carry out same data integrity inspection and mean very important expense, because it needs two stages operatings.At first, be 12355 data recording in order to find its key word, the index of the data recording of type " A " is imposed search.In case can insert the record (and upgrading an independently index file) of type B after finding.

When search data, the data structure example of Figure 12 E illustrates another advantage that is produced by the reality of subordinate data recording and their " father " record linkage.For example, if the record of type A is client and the record of type B is an issue voucher, need utilize client's details visit issue voucher details usually.Link from the issue voucher to client omits to a separate searches of client's details.

The resulting formula index of indicating of the present invention is producing another significant advantage for realize that sequential operation is navigated in index.

For example, need therein to consider the PAIF of Figure 13 E under all data recording of ascending order " retrieval ".Like this, might in this PAIF, navigate (also being called sequential operation), and according to the order retrieved data record 802,804,806,812,820,824 and 808 of identifier key word.If only need certain type record, the record of type A for example can navigate in the same manner in this index and avoids visiting irrelevant node and record.Thereby, from node 814 visit datas record 806, and measurablely can be subordinated to node 806, thereby avoid linking 833,823 from the data recording that node 814 is visited by its each link and descendant node.A search records 802,804,806 and 808 in this embodiment.Under same way as, if only need the record of type A and B, should avoid moving, because measurable coming out from certain value is that the value of the node of 6 pairs of records, 806 addressing is that the link of D is the link of pointing to the subordinate data recording of D sign along link 823.

If the PAIF index is hierarchical index and supposes that node 814 resides in the piece different with node 822, can move to node 812 from node 814 by separating link.If there is not this separation link, for example in Fig. 7 F, when needs then need to use the link 421 of Node B ' (422) by link 400 when Node B (423) moves to node E (424).

In the specific embodiment illustration of reference Figure 13 after the subordinate relation, the following describes relevant multidimensional feature according to a second aspect of the present invention.

Now forward Figure 14 to, shown in it according to the formula of the indicating index of one embodiment of the present of invention.This index comprises that two lead to a searching route of indicating formula data recording (" deposit " data recording), thereby can be by each the visit deposit in two composite keys, one of them indicates that the formula key word comprises each key field: account number, date and customer ID, indicate that the formula key word comprises each key field for second: customer ID, date and account number and.Think the example of front, this account data recording has the formula of indicating key word " A133333 " (1201).Updated account deposit (being subordinated to account's deposit) can realize by being subordinated to the formula of the indicating record 203 of indicating formula record 201.This PAIF should allow from node 207 by link 206 Visitor Logs 201,203.Equally, data recording 204 is represented certain client's deposit.The key word of record 202 is B133333.Can realize client 202 deposit 204 is upgraded to the node 209 of data recording 204 by index 200 and link (208).The key word of data recording 203 is " A133333C01019811346 " (k ₁).The key word of record 204 is B11346D010198133333 (k ₂).

Duplicate client's field and account's field (and out of Memory for example date and total) in 203 and 204 at record as shown, this is a distinct disadvantage that causes undesirable expandable file.

Can overcome this shortcoming by single " deposit " record sheet is shown as a multidimensional node 210.

Data recording 210 (Figure 14) is a multidimensional record, by indicating that formula index 200 is according to identifier key word k ₁(identifier C) and according to identifier key word k ₂It is updated (identifier D) and visits.(notice that the identifier of record depends on the key word that is just using when data recording is the multidimensional record.Pass through k ₁Route guidance node 207 in the index and from the identifier C of this node guiding record 210.Information according to identifier C metadata can make up relevant structure.For example make up and comprise key word k ₁A data structure.By linking 213,214 Visitor Logs 201 and 202 and make up all key fields by record 210 date field.Pass through k ₂Route guidance node 209 in the index, and from the identifier D of this node guiding record 210.Can make up relevant structure according to the information in the identifier D data, for example make up one and comprise key word k ₂Data structure.As shown, (it is according to search key k to have value " C " by the searching route guiding of record 203 search key definition ₁Identifier) first field 212.The 3rd field is pointed to data recording 201.(have value " D ", it is according to search key k to second field 215 that can be by being visited same data structure 210 by the searching route of record 204 search key definition ₂Identifier).The 4th field has a link to real data record 202.The two subordinate of account and client represented in record " deposit " by this way, and avoid duplicating of each field account, client, date and total.Note that the links and accesses data element account and the client that can pass through original data record (201 and 202), and remainder data (date and total) only exists once in data element 210.Significantly, data recording 210 can comprise other field.The present invention never is limited to certain given realization, thereby correspondingly, the implementation of the data recording 210 described in Figure 14 is many a kind of in may modification.The quantity of searching route is unrestricted.As the front explain according to Figure 13, if seek data recording is Axxxxx (being that account record 201 is own), then moves to the links and accesses type A record that certain arbitrary subordinate write down and recorded by this subordinate type A record by search key " Axxxxx " in index simply.The for example link 213 of Figure 14.Other realization is so long as yes feasible (for example keeping one to the link of writing down A) that need and suitable in index.A physics occurrence to the data record provides the above-mentioned detailed description of two (and at least two in the ordinary course of things) searching routes to construct multidimensional data structure, and it is for indicating the formula index and data records (being called multidimensional data) being comprised at least two searching routes.

Relation-Figure 15 between data element illustrates another characteristic of the present invention, i.e. data relationship characteristic.Like this, data recording A (a book data recording) has C, F, J, k and the L data recording that is subordinated to it.The front has illustrated the realization of this layering.Realize one-one relationship and many-one relationship easily according to this relation property, for example consider that writing materials have plurality of classes (L), i.e. one-to-many, however it has only a summary (K), promptly one to one.

Characteristic according to this proposal, realize data relationship one to one by the formula of indicating (compound) key word with two members: first member is the formula of the indicating key word of its subordinate record, and second member is the identifier (owing to not needing to use the key field of subordinate record for one-one relationship) of its master record.Many-one relationship realizes that by identifier (compound) key word wherein first member of this key word is the identifier key word of master record, and its second member is made up of the identifier and the key word of subordinate record.

In this embodiment, the one-one relationship between a book and its summary passes through the AxxxL of keyword definition society of L is kept, and wherein Axxx is the formula of the indicating key word of A, and L is the identifier of the key word of record L.Many-one relationship between book and the classification is by being the keyword definition of L that AxxxLyyy keeps, and wherein Axxx is the formula of the indicating key word of A, and L is that key tag symbol and yyy are the key fields (group) of record L.

Following explanation is relevant with another characteristic according to a second aspect of the present invention, promptly represents relevant with multi-model.According to this characteristic and as below be explained in more detail, one or more in following (and possible other) model can be represented by the formula of the indicating index of defined.By multi-model indicate the formula index represent relation table-

Relational model think all data by the table form.Each table is made up of the record (being called tuple) of same structure.Suppose that each tuple is made up of field F1, F2 and F3.Each such field is a key word.If key word F2 is subordinated to key word F1, and key word F3 is subordinated to key word F2, thereby can make up this table easily: in order to retrieve its tuple, the identifier of following key word F1 also obtains each value of F1 thus, follow the identifier of F2, and in the same way F3 is continued.A tuple of each such this table of three steps definition.Some projections are more simple: in table, seek all F1 of all certain values that exists F3 and F2 to value, (F1, F2) back finishes to search for handling.Carry out that (F2, projection F3) may be consuming time, because this need at first search for all values of F1.Then, if this operation is shared, this indicate the formula index also should keep searching route (F2, F3, F1).Promptly make up new identifier composite key F2 ' F3 ' F1 ', and these additional paths are inserted into indicate in the formula index with new identifier group.Thereby can arrive each record and construct the multidimensional record by one of two paths.Multi-model indicate other model on the formula index-

Other data model can be represented in the formula index of indicating, comprising: relational database, object-oriented systems and hierarchical data base, wherein copy data not in essence.By multi-model indicate the formula index realize object-oriented (relevant data structure)-

OO method is all regarded all data as object.Each object belongs to a class, and class is determined the structure of object and can be applied those methods (effect) to it.In hierarchy, organize all kinds ofly, in this hierarchy, can inherit structure and method.OO method is that of short duration-certain object only just exists in the existing time spent of the program of creating it.It is lasting to need the long-term object of supporting to be defined as.These object storage also can be obtained by other (mandate) program on dish.Multi-model indicates that the formula index can easily support this object.Because under the help of identifier, their structure is uniform encoding, later (later incarnations) and addressable these persistent object of other program specialized of program.Note simultaneously the also part of relation table of persistent object.Do not need copy data.

For example consider the data structure of Figure 16.Data recording 223,224,225 and 226 be subordinated to data recording 221 and and record together by as being an object.Might utilize a key word prefix that equals to write down 221 the formula of indicating key word all data recording of efficiently searching (partial key search) and retrieve whole object in this index.If a part of data that only need object are the category-B type record of category-A type record and subordinate for example, then can utilize the key word prefix of the formula of the indicating key word that equals record type A (for example 221) equally and obtain these data recording by partial key search as the identifier B of next key field.By multi-model indicate the formula index realize object relationship-

Different with OO method, relational approach is regarded all data as table.Thereby in object oriented programming languages (C++ or Java), be difficult to integrated SQL inquiry.The object relationship method provides an interface so that table is converted to object.This interface requirement user stipulates the relation between object and the Table Properties.If some attribute itself also is a table, also needs to allow on these tables and carry out relational algebra operation.These conversions are realized by application program.Thereby database can not be optimized inquiry.Indicate the formula index unifying the mode deal with data, thereby between OO application program and data structure, provide ideal interface.Utilization indicates that the formula key word lists the inquiry usefulness of application program, thereby database can be optimized the inquiry strategy.The formula key word is indicated in the database loopback, and the latter can easily be handled by OO method art by OO application program.Determine the class of object in proper order to the identifier of the searching route of object, and the pleomorphism that each identifier of the different field that arrives can make OO program solution call.

Indicate the formula index to all relevant data addressing, for example suppose that Figure 16 describes the data structure of certain insurance company, wherein the record of type A is client, and the record of type B is client's right, and the record of Type C is a customer payment.As being clearly shown that, by single index structure to all data recording addressing.

Now can visit all object examples effectively, because this index allows to navigate to its relevant data-right and payment from client.Can on this index structure, navigate and produce client's table (set of type A record), client's right table (set of type A and two kinds of records of B) and customer payment table (set of type A, two kinds of records of C) effectively simultaneously.Because this data structure does not cause the physics of data to assemble, if at different object data sharings, can visit effectively by different object angles, thereby and such data recording be the multidimensional record.In this embodiment, from client's object and policy-objects the two one of can effectively visit certain right, and come from a kind of type (structure 210) as constructing among Figure 16.

OO method allows the user to increase user-defined type (UDT) and user-defined function (UDF).For example can add the incident photo in insurance company's database.At this example, define the formula of newly an indicating data recording that is subordinated to category-A type data recording.When search right details, visit this incident photo and send to photo and print off application.By indicating the formula index, with class in and the same way as set up in the relation, handle the relation between picture data and the right.New UDT can according to or be relevant to (passing through subordinate) any other data type.Now, by indicating the formula index, application program can be from the defined all kinds of new UDT that navigate to, and wherein this new UDT of class from these definition can be method or other characteristic in essence.In this embodiment, when in this index, navigating, certain right can be navigate to, wherein any other parts this photo and the rights data can be arrived from this right.Network model and hierarchical model: by multi-model indicate the formula index realize network model and hierarchical model-

Network model and hierarchical model are substituted by relational model.Yet,, and compare them and have some advantages (and many shortcomings) towards the realization of table although these models go out of use.In case certain record is retrieved, obtain the address of relative recording easily.

For example, consider a bank that has client and loan.Each client has an address and several loans, and every hectare of loan is borrowed to one or more clients.Under network model, each client is contained to this client's link and to the node representative of several links of several nodes of representing the resulting strokes loan of this client by one.Represent a node of certain loan to be linked to the client's who respectively obtains this loan node similarly.Thereby provide a certain loan and can easily visit each client who borrows this loan and the home address that obtains them.

B tree realizes requiring us to keep two trees: client and home address tree, second is that loan and client set.Thereby after retrieving certain loan data, can obtain borrowing name into each client of this loan.In order to find their address, need carry out once independently B-tree search to each client.

Indicate (for example Figure 16) in the formula index at the multi-model of this proposition,, can proceed to knowledge up to standard and borrow each client's of this loan identifier (for example record of type B) in case arrive the node of representative loan.Usually, each client is needed once dish visit at most.The multidimensional of this proposition indicates that the formula index has the advantage of network model but do not have its shortcoming.Network model is independently treated each node and is responsive to long searching route, and multi-model is indicated unified all data of handling of formula index, thereby and the end that may under logarithm, handle the length logarithm of searching route be the size of piece.Thereby in fact, search needs the visit of single dish.Based on indicate the formula index realize having OO server client model-

Client/Server can effectively be realized relational model.According to this model, all data reside in central computer (being called server), and run application on other computing machine (being called client computer).When certain application need data, it forms a SQL inquiry and sends to server by client computer.Server evaluates should the inquiry and to client computer return result table.

Thereby the interface between client and the server inquires by SQL, and server is not known internal data structure and the code used, and client and server are only consistent on the title of table and attribute.

This model is malfunctioning in OO method.Because each data item is an object, server must be known its inner structure.There is this problem worse under the polymorphic method.Server must be known the structure and the details of whole class layering.

Originally indicate that the formula index allows the client server method is used OO model and object relationship model.For example, in order to arrive certain attribute, application program sends the groups of keywords of the required node of guiding and the path of link flag symbol group to server.According to these data, server can satisfy this request not having under any knowledge of application's data structure.

Client and server should be consistent on the name of field and their identifier.The data type that server does not need to know each this field with and semantic content.

According to another aspect of the present invention, propose the expression of further compressed index, thereby make it more effective.Therefore, estimate the required space of clue, and assessment reduces the method for space requirement.

If clue is a hierarchical index, then should concentrate last one deck (I to the analysis of thread index structure ₀) on: the memory requirement of the primary key index of clue-

Based on one of most important characteristics of the data structure of clue is the moderate size of its expression.For example PAIF keeps the size littler than conventional clue, and this is because its compression expression.

Last one deck of PAIF index comprise one have point to same in each link of other clue node and clue of pointing to each link of each record.Make that N is a record quantity in this database.This index just in time contains N pointer that points to these records.If each pointer needs 4 bytes, the needed space of these pointers is the 4N byte.In addition, each pointer has a direction (1 byte), thereby is total up to the 5N byte.

Now consider the space that a PAIF clue needs.Owing to from index, send N pointer and each clue node has at least two children, n≤N-1 clue node arranged at most.Make d represent average children's number of clue node, then n≤N/ (d-1).Because d＞＞2 in fact, n＜＜N.Each clue node has a level number (1 byte).Because having at least one, each clue node enters the clue link, n-1 clue link arranged at most, the link of each clue has a mark (for single character) and an interblock pointer (1 byte), thereby 3n byte altogether needs 3n+4N≤7N byte like this under worst condition.In fact between 4N and 6N byte.

Carry out similar analysis from another angle: consider two pointer p sending from the node v of layer k ₁And p ₂Make x ₁For can be from p ₁A key word that arrives, and x ₂For can be from p ₂A key word that arrives.x ₁And x ₂K-1 character before sharing.In the PAIF structure, each in these characters presents once at most.In the B tree representation, need explicitly in each key word, to present this preceding k character.

Saving among the PAIF is dual: the first, on every layer each character at most storage once, and the second, needn't present the further compression of all character index-

In the superincumbent discussion, the pointer that points to record needs most of space.The method that can save pointer space is now proposed.This method is based on allowing several links to a plurality of records to share same pointers.At first the supposition record has fixed measure.If the first two record resides in same, the pointer that then might keep single full-scale sensing piece for first node, yet every all the other external links to this piece are not kept a pointer, but for and calculate their displacement, promptly, if the first two writes down and to reside in number be in 2000 the piece and the 3rd be recorded in the piece 7000, might keep structure 2000 (e, f) 7000 (h).If it is very obvious that same this saving all pointed in a large amount of externally links.If certain piece is pointed in k such link, then in all k, distribute 4B pointer, thereby being reduced to the 4/k byte, the space that each writes down addressing adds the space (1 byte) that is used for direction.This means for k 〉=4 that each is recorded in and need 2 bytes under this index.

For the record of size variable, displacement might be preserved, for example: 2000 (e:de, f:df) 7000 (h:dh) in piece.Substitute and preserve complete pointer, preservation can be contained in the displacement in the single byte.Thereby for each record, it needs 1 to be used in the shared byte of pointer, 1 byte and 1 byte that is used for displacement that is used for direction; Each record is 3 bytes altogether.

Please see Figure 17 example, the node 2000 that Figure 17 A illustrates certain clue has the

link

2010,2011,2012 that 3 disk addresss of addressing respectively respectively are 3000,5000,7000

data recording

2002,2004,2006 (value be respectively 5,9, A).Needing each link value of expression (every link 1 byte) and the space of each pointer to data (4 byte) is 15 bytes.

Now forward Figure 17 B to, wherein node 2000 keeps a shared link (2010) to three data records (2002,2004,2006).The information of this chained representation is (4 byte) and to each link value that resides in

data recording

2002,2004 databases 2006 in this piece (each link value 1 byte) to the address of piece 2020.Need expression to the space of the value of the pointer of this data block and these links only be 7 bytes-(3000:5,9, A).

Can calculate its address by the address of this data block+the depend on displacement (supposing that all the record sizes in this data block are equal) of write down size now for visit data record 2004.

As explaining, node 2000 can comprise many links to other data recording or other data block link 2024 of the data block 2022 of holding data recording 2008 (for example to).

Database file management of the present invention system preferably should be inconjunction with known concurrent and/or distribution capability, thereby can make a plurality of users visit this database simultaneously virtually.This database can be positioned at the center, perhaps is distributed among two or more remote locations.

Now forward Figure 18 A-D to, four benchmark test figure shown in it are to be presented on the response time and to adopt the file management system of system of the present invention that the performance of database of commercialization based on the C tree improved on the document size of database.Insertion realizes by the Uniface application program of moving under (being used for working group) Windows.

The benchmark test of Figure 18 A is measured to certain file and is inserted the quantity good required time (by branch) of data recording (0-1,000,000) of ever-increasing prior classification.As shown in Figure 18 A, it is big more to insert quantity, and the improvement of database file management system of the present invention on the response time is big more.Like this, inserting 100 ten thousand records and be about 669 minutes in based on the database of C tree, only is 65 minutes in the system of the present invention by comparison.In addition, when record quantity increases, the response time under the file management system of the present invention just a small amount of increasing, and under the corresponding system of foundation prior art the different response times with it obviously increase.

Benchmark test among Figure 18 B illustrates the document size (by megabyte) as the function of the data recording quantity in the file (0-1,000,000).As shown in Figure 18 B, the quantity of record is big more, and the improvement of database file management system of the present invention on document size is big more.Like this, for 100 ten thousand records, being about 151 megabyte based on the document size of the file of C tree, only is 22 megabyte under database file management of the present invention system by comparison.

18C and 18D are similar to Figure 18 A and 18B by table, difference be insert data recording at random in two of the fronts (18C and 18D) and in back two (18A and 18B) according to search key to the classification in advance of data record.As shown the result as before, system promptly of the present invention is all more effective on response time and document size.

Figure 19 A-D illustrates system of the present invention (operating) and the commercial benchmark test chart that contrasts based on the Database Systems of B tree under dos operating system.The result as before, system promptly of the present invention is all more effective on response time and document size.

The insider just makes for ease of explanation alphabetic character and roman character that the right procedure to apply indicates, never should be interpreted as the order of forcing of each step, perhaps also should not be construed as with respect to each step of other step of this method and will carry out how many times.

Under certain singularity, the present invention is made an explanation, yet insider's understanding can be implemented various modifications and substitute under the scope and spirit that do not deviate from following claims.

Claims

1. in the employed storage medium of carrying out on data handling system of database file management system, a kind of data structure comprises:

One be arranged in the piece group hierarchical index; This hierarchical index comprises related with each a data recording index of cutting apart substantially; This is cut apart index substantially and makes it possible to by visit of key word or groups of keywords and new data records and be responsive to the unbalanced construction of piece group more;

Described hierarchical index makes it possible to by visit of key word or groups of keywords and new data records and make up the balance-type structure of piece group more.

2. the hierarchical index of claim 1, the wherein said index of cutting apart substantially is a clue.

3. in the employed storage medium of carrying out on data handling system of database file management system, a kind of data structure comprises:

One be arranged in the piece group and be the index that on the groups of keywords of data recording group, makes up; This index comprises related with each a data recording index of cutting apart substantially; This is cut apart index substantially and makes it possible to by visit of key word or groups of keywords or new data records more, and it is responsive to the unbalanced construction of piece group;

Described index makes it possible to by visit of key word or groups of keywords or new data records and make up the balance-type block structure more.

4. in the employed storage medium of carrying out on data handling system of database file management system, a kind of data structure comprises:

One be arranged in the piece group and be the index that on the groups of keywords of data recording group, makes up; This index comprises a clue related with each data recording; This clue makes it possible to by visit of key word or groups of keywords or new data records more, and it is responsive to the unbalanced construction of piece group;

5. the hierarchical index of claim 1, wherein said storage medium is an external memory storage.

6. the hierarchical index of claim 5, wherein said storage medium also is an internal storage.

7. the hierarchical index of claim 1, wherein said storage medium is an internal storage.

8. the hierarchical index of claim 2, wherein said clue is the PAIF clue.

9. the hierarchical index of claim 1, the representative index of wherein cutting apart index and described hierarchical index substantially is essentially identical key map.

10. the hierarchical index of claim 1, the representative index of wherein cutting apart index and described hierarchical index substantially is different index figure.

11. according to the hierarchical index of claim 8, the representative index of wherein said hierarchical index is b-tree indexed figure.

12. according to the hierarchical index of claim 10, wherein representing index is b-tree indexed figure.

13. according to the hierarchical index of claim 8, the representative index of wherein said hierarchical index comes down to the PAIF key map.

14., wherein represent index to come down to the PAIF key map according to the hierarchical index of claim 9.

15. the hierarchical index according to claim 1 can be supported the ODBC standard.

16. hierarchical index I according to claim 1 ₀..., I _h, comprising:

Represent index I for one ₁..., I _h, it is built into arbitrary I _jBe at I _jMake up on-1 the representative groups of keywords.

17. hierarchical index I according to claim 16 ₀..., I _h, I wherein _hAll be included in the piece.

18. the hierarchical index of claim 3, wherein said storage medium is an external memory storage.

19. the hierarchical index of claim 18, wherein said storage medium also are internal storage.

20. the hierarchical index of claim 3, wherein said storage medium is an internal storage.

21. the hierarchical index of claim 3 can be supported the ODBC standard.

22. the hierarchical index of claim 4, wherein said storage medium is an external memory storage.

23. the hierarchical index of claim 22, wherein said storage medium also are internal storage.

24. the hierarchical index of claim 4, wherein said storage medium is an internal storage.

25. the hierarchical index of claim 4 can be supported the ODBC standard.

26. a kind of method in the database file management system that is used for the visit data record and on data handling system, carries out; Wherein these data recording and the index of cutting apart substantially that is arranged in the piece group and is stored in the storage medium are associated; This is cut apart index substantially and makes it possible to by visit of key word or groups of keywords or new data records and be responsive to the unbalanced construction of piece group more; This method is used for making up one and is arranged in the hierarchical index of piece group and comprises step:

(a) provide the described index of cutting apart substantially;

(b) on the described representative groups of keywords of cutting apart index substantially, make up one and represent index; Described hierarchical index makes it possible to by visit of key word or groups of keywords or new data records and constitute a balance-type block structure more.

27. the hierarchical index of claim 26, the wherein said index of cutting apart substantially is a clue.

28. a kind of method in the database file management system that is used for the visit data record and on data handling system, carries out; Wherein these data recording and the index of cutting apart substantially that is arranged in the piece group and is stored in the storage medium are associated; This is cut apart index substantially and makes it possible to by visit of key word or groups of keywords or new data records and be responsive to the unbalanced construction of piece group more; This method is used for making up an index that is arranged in the piece group on each key word of each data recording, and the method comprising the steps of:

(a) provide the described index of cutting apart substantially;

(b) make up an index on the described representative groups of keywords of cutting apart index substantially, described index makes it possible to by key word or groups of keywords visit or new data records and constitute a balance-type structure more.

29. a kind of method in the database file management system that is used for the visit data record and on data handling system, carries out; Wherein these data recording and one are arranged in clue in the piece group and that be stored in the storage medium and are associated; This clue makes it possible to by visit of key word or groups of keywords or new data records and be responsive for the unbalanced construction of piece group more; This method is used for making up an index that is arranged in the piece group on each key word of each data recording, and the method comprising the steps of:

(a) provide a clue;

(b) make up an index on the representative groups of keywords of described clue, described index makes it possible to by key word or groups of keywords visit or new data records and constitute a balance-type block structure more.

30. the method for claim 26, wherein said storage medium is an external memory storage.

31. the method for claim 30, wherein said storage medium also are internal storage.

32. the method for claim 26, wherein said storage medium is an internal storage.

33. the method for claim 27, wherein said clue are the PAIF clues.

34. the method for claim 26 is wherein cut apart index substantially and is represented index to be essentially identical key map.

35. the method for claim 26, wherein cutting apart index substantially and representing index is different index figure.

36. the method for claim 33, wherein representing index is b-tree indexed figure.

37. the method for claim 35, wherein representing index is b-tree indexed figure.

38. according to the hierarchical index of claim 33, wherein representing index is the PAIF key map.

39. according to the hierarchical index of claim 34, wherein representing index is the PAIF key map.

40. the method for claim 26 can be supported the ODBC standard.

41. the method for claim 28, wherein said storage medium is an external memory storage.

42. the method for claim 41, wherein said storage medium also are internal storage.

43. the method for claim 28, wherein said storage medium is an internal storage.

44. the method for claim 28 can be supported the ODBC standard.

45. the method for claim 26, sequential operation supported in wherein said index.

46. the method for claim 28, sequential operation supported in wherein said index.

47. the method for claim 29, sequential operation supported in wherein said index.

48. visiting the method for being searched data recording r by key word k in the hierarchical index of claim 1 comprises:

(a) at I _hTo I _kIn search k, h 〉=k 〉=0 wherein, when not looking for ground in the key word in data recording in order to find the I that is directed at k _H-1Piece;

(b) repeating step (a), until arriving and having the data recording of key word k, if exist, related I ₀Piece.

49. the method by key word k insertion data recording r in the hierarchical index of claim 1 comprises:

(a) at I _hTo I _kIn search k, h 〉=k 〉=0 wherein, when not finding in the key word in data recording in order to find the I that is directed at k _H-1Piece;

(b) repeating step (a) is until to arriving and the related I of the data recording with key word k (if exist) ₀With piece B;

(c) make r related with B.

50. the method by a data record of key word k deletion r in the hierarchical index of claim 1 comprises:

(b) repeating step (a) is until arriving and the related I of the data recording with key word k (if existence) ₀Piece B;

(c) disconnect r from B.

51. visiting the method for being searched data recording r by key word k in the hierarchical index of claim 3 comprises:

(a) at I _hTo I _kIn search k, h 〉=k 〉=0 wherein, when finding in the key word not in data recording in order to find the I that is directed at k _H-1Piece;

(b) repeating step (a) is until arriving and the related I of the data recording with key word k (if existence) ₀Piece;

52. the method by key word k insertion data recording r in the hierarchical index of claim 3 comprises:

(c) make r related with B.

53. the method by a data record of key word k deletion r in the hierarchical index of claim 3 comprises:

(c) disconnect r from B.

54. the method for claim 26, wherein said composition step (b) comprising:

(a) if (I _H-1In) B overflows, and it is separated into two (or more) pieces, and with each alternative B of each representative of new at I _hIn representative.

(b) if I _hPiece overflow, set up an extra play I _H+1And add in this hierarchical index.

55. according to the method for claim 54, it occurs carrying out down.

56., carrying out afterwards according to the method for claim 54.

57. the method for claim 28, wherein said composition step (b) comprising:

58., under occurring, carry out according to the method for claim 57.

59., carrying out afterwards according to the method for claim 54.

60., wherein form step (b) and comprising according to the method for claim 26:

(a) with (B _I-1) piece in exist at least under two clues, at least one short chain of deletion connects (be called and separate link) in a plurality of short chains of certain node (being called separate node) of this piece connect;

(b) each subtree is moved on to one independently in the piece;

(c) if there is not B _iPiece, set up B _iAnd at B _iIn set up one of this separate node copy node;

(d) if piece B _iExist but at B _iIn do not have the copy node of this separate node, then at B _iIn set up one of this separate node copy node and it be connected to B _iClue, thereby can be from comprising B _iIn root node and this replica node and it according to B _I-1' the searching route of each tape label of representative key word on visit B _I-1' (separating treatment finishes the back);

(e), increase from this copy node to piece B if this copy node does not have direct link _I-1Direct link;

(f) increase by one from this copy node to piece B _I-1' link far away, perhaps if this copy node on this direction that far links, have to the short chain of certain child node connect then available from this child node to piece B _I-1' one directly link replace should link far away.

61. in the storage medium that the database file management system that carries out on the data handling system uses, a data structure comprises that at least one has the random access index file (PAIF) of a plurality of nodes and many links;

The leaf node of described PAIF respectively can be related by the data recording of described user application visit with at least one, and at least a portion of wherein said data recording constitutes at least one search key;

The node of choosing among the described PAIF is respectively represented a given skew of certain search key section in the described embedding search key; A unique value of described search key section is respectively represented in the link that each given node from the described node of choosing sends;

This PIAF has at least two sub-PIAF that respectively are arranged in the piece;

Described database file management system can also be arranged in a balance-type block structure to described group.

62., wherein in a unique file, preserve at least a portion and the related data recording of described leaf node group at least according to the data handling system of claim 61.

63. according to the data handling system of claim 61, wherein at least one leaf is with related more than one data recording.

64. one kind is used for new data records the method according to the existing PAIF of claim 61 of being inserted into comprised and carries out following step:

ⅰ. begin and advancing from root node along one in certain data recording related (being called " comparable data record ") end with certain leaf node with reference to the path; If equaling the value of 1 byte long key word section under the skew of described node specification, the value of certain chained representation then advances at this with reference to each node in the path along this link; The skew that illustrates in this node surpasses under the situation of any corresponding key word section in this key word, perhaps if there is not the link with described value, advances along arbitrary path to arbitrary comparable data record;

ⅱ. the relatively search key of comparable data record and the search key of new data records, with the search key section of determining to distinguish these two little skew (hereinafter referred to as distinguishing skew).

ⅲ. according to this value of distinguishing skew, one of continue in the following step (ⅲ .0-ⅲ .3):

ⅲ .0 is if data recording equates then to finish; Or

ⅲ .2 is if recognizing to distinguish is offset greater than the indicated skew of the leaf node of its link, by linking the comparable data record:

ⅲ .2.1 disconnects from the link of comparable data record (that is its temporary transient maintenance " unclamping ") and this link and moves on to new node; The value of this new node being distributed this differentiation skew;

ⅲ .2.2 connects comparable data record and this new node (it becomes leaf node now) and this link (long-chain connects) is distributed from the search key of comparable data record in the value of distinguishing the search key section that skew takes out.

ⅲ .3.1 is to situation A and B, and set up a new node and this node is distributed the value of described differentiation skew,

Only situation A-is disconnected the link from this father node to this child node and this link moved on to (that is, this child node temporarily keeps " unclamping " on the new internal node;

65. a method that is used to obtain balance-type PAIF index, this PAIF comprises the piece of a plurality of a plurality of links of respectively holding a plurality of nodes and sending from described a plurality of nodes; Leaf node among described a plurality of node is related with data recording; This method comprises to be carried out following step and needs how many times just to carry out how many times:

(ⅰ) substitute a piece, constitute one replaced with at least two explants, thus the part in the described separated node be received into described explant one of in, and all the other nodes in the described separated node are received in other explant;

(ⅱ) at least one node in the described replaced node is copied in the piece, thereby described at least two explant becomes sub-piece.

66. have at least one scope in the computer system of the storage medium of the internal storage of 10 to 20M bytes and an external memory storage;

A kind of data structure comprises the index on each key word of each data recording; This index is arranged in the piece group; Thereby for 1,000,000,000 data records, for visit with described 1,000,000,000 data record in any piece that is associated, irrelevant down in fact to the no more than secondary of the visit of described external memory storage with the length of the key word of described data recording.

67. have at least one scope in the computer system of the storage medium of the internal storage of 10 to 20M bytes and an external memory storage;

A kind of data structure comprises the index on each key word of each data recording; This index is arranged in the piece group; Thereby all pieces that irrespectively in fact in described internal storage, hold this index for the length of the key word of 1,000,000,000 data records and described data recording.

68. in having the computer system of storage medium,

A kind of data structure comprises the index on each key word of each data recording; This index is arranged in the balance-type block structure and can carries out sequential operation in described data recording; The index size is not subjected to the influence of the length of described key word basically.

69. in the employed storage medium of carrying out on data handling system of database file management system, a kind of data structure comprises:

70. in the employed storage medium of carrying out on data handling system of database file management system, a kind of data structure comprises:

71. the storage medium of claim 69, wherein said index constitutes a hierarchical index.

72. the storage medium of claim 70, the wherein said formula index of indicating constitutes a hierarchical index.

73. according to the storage medium of claim 70, the wherein said formula index of indicating constitutes a multi-dimensional indexing.

74. according to the storage medium of claim 72, the wherein said formula index of indicating constitutes a multi-dimensional indexing.

75. according to the storage medium of claim 70, the wherein said formula index of indicating constitutes a multi-model index.

76. according to the storage medium of claim 72, the wherein said formula index of indicating constitutes a multi-model index.

77. according to the storage medium of claim 74, the wherein said formula index of indicating constitutes a multi-model index.

78. according to the storage medium of claim 69, wherein the subordinate data recording of the data recording of the first kind and second type constitutes one-one relationship.

79. according to the storage medium of claim 70, wherein the subordinate data recording of the data recording of the first kind and second type constitutes many-one relationship.

80. according to the storage medium of claim 71, wherein the subordinate data recording of the data recording of the first kind and second type constitutes one-one relationship.

81. according to the storage medium of claim 73, wherein the subordinate data recording of the data recording of the first kind and second type constitutes many-one relationship.

82. the storage medium of claim 69, wherein said index comprises clue.

83. the storage medium of claim 70, wherein said index comprises clue.

84. the storage medium of claim 71, the index of cutting apart substantially of wherein said hierarchical index is a clue.

85. the storage medium of claim 69, wherein be to having composite key K1 ... the subordinate data recording of Kn conducts interviews or upgrades issued transaction, exists YITIAOGEN according to composite key K1 in this index ... Kn is directed to the subordinate searching route of this subordinate data recording; This subordinate searching route comprises one to having key word K1 ... the searching route of the data recording of Kn-1.

86. the storage medium of claim 70, wherein be to having composite key K1 ... the subordinate data recording of Kn conducts interviews or upgrades issued transaction, exists YITIAOGEN according to composite key K1 in this index ... Kn is directed to the subordinate searching route of this subordinate data recording; This subordinate searching route comprises one to having key word K1 ... the searching route of the data recording of Kn-1.

87. according to the storage medium of claim 75, wherein said multi-model comprises relational model.

88. according to the storage medium of claim 75, wherein said multi-model comprises OO model.

89. according to the storage medium of claim 75, wherein said multi-model comprises the object relationship model.

90. according to the storage medium of claim 75, wherein said multi-model comprises Client/Server.

91. according to the storage medium of claim 76, wherein said multi-model comprises relational model.

92. according to the storage medium of claim 76, wherein said multi-model comprises OO model.

93. according to the storage medium of claim 76, wherein said multi-model comprises the object relationship model.

94. according to the storage medium of claim 76, wherein said multi-model comprises Client/Server.

95. in the employed storage medium of carrying out on data handling system of database file management system, a kind of data structure comprises:

One is stored in the index that makes up on each key word of described each data recording in the storage medium and that stored in the piece group; Be arranged in the piece group by this index of leaf piece that links by the section of being linked in and data recording;

96. the storage medium of claim 95, wherein said index is made of clue.

97. in the employed storage medium of carrying out on data handling system of database file management system, a kind of data structure comprises:

One is stored in the index that makes up on each key word of described each data recording in the storage medium and that stored in the piece group; Be arranged in the piece group by this index of leaf piece that links by link means and data recording;

98. the storage medium of claim 97, the wherein said index of cutting apart substantially is made of clue.