MXPA00007026A - Database apparatus - Google Patents

Database apparatus

Info

Publication number
MXPA00007026A
MXPA00007026A MXPA/A/2000/007026A MXPA00007026A MXPA00007026A MX PA00007026 A MXPA00007026 A MX PA00007026A MX PA00007026 A MXPA00007026 A MX PA00007026A MX PA00007026 A MXPA00007026 A MX PA00007026A
Authority
MX
Mexico
Prior art keywords
index
data
key
node
block
Prior art date
Application number
MXPA/A/2000/007026A
Other languages
Spanish (es)
Inventor
Moshe Shadmon
Original Assignee
Ori Software Development Ltd
Moshe Shadmon
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ori Software Development Ltd, Moshe Shadmon filed Critical Ori Software Development Ltd
Publication of MXPA00007026A publication Critical patent/MXPA00007026A/en

Links

Abstract

A database file management system for accessing data records is being executed on data processing system, the data records are linked to a trie index that is arranged in blocks (402, 405, 406 and 407) and being stored in a storage medium. The trie index (A, B and I, element 402) enables accessing or updating the data records by key or keys and being susceptible to an unbalanced structure of blocks. There is provided a method for constructing a layered index arranged in blocks, which includes the steps of providing the trie index and constructing a representative index over the representative keys of the trie index. The layered index enables accessing or updating the data records by key or keys and it constitutes a balanced structure of blocks.

Description

DATABASE DEVICE DESCRIPTION OF THE INVENTION This invention relates to databases and database administration systems. As is known, a database system is a collection of interrelated data files, indexes and a set of programs that allow one or more users to add data recovery, and modify the data stored in these files. The fundamental concept of a database system is to provide users with a so-called "abstract" and simplified view of the data (also known as a data model or conceptual structure) that prevents a conventional user from having to deal with details such as when the data is physically organized and accessed. Some of the well-known data models (ie, the "Hierarchical Model", "Network Model", "Relational Model" and "Object Relational Model" will now be briefly mentioned.) A more detailed discussion can be found, for example, in : Henry F. Korth, Abraham Silberschatz, "Datábase System Concepts", McGRA-Hill International Editions, 1986 (or the 3rd edition (1997)), Chapters 3-5 pp. 45-172 Generally speaking, all the models that are discussed below have a common property in that they represent each "entity" as a "record" that has one or more "fields" each being an indicator of a given attribute of the entity (for example a record of a given book may have following fields "BOOK ID", "BOOK ÑAME", "TITLE".) Normally one or more attributes constitute a "key" that is, identifies the record In the first example, "BOOK ID" serves as a key. models are distinguished from each other, inter alia, in the way that other The records are organized in a more complex structure: Relational Model - The relational model, introduced by Codd, is an important event in the history of the development of the database. In the relational databases, an abstract concept has been introduced, according to which the data is represented by tables (defined as "relationships"), in which the columns represent the fields and the rows represent the records. The association between tables is only conceptual.
It is not a part of the definition of databases. Two tables can be implicitly associated by the fact that they have one or more columns whose values are taken from the same set of values (called "domain"). Other concepts introduced by the relational model are high-level operators that operate in tables (that is, both their parameters and their results are in tables) and understandable data languages (now called fourth generation languages) in which one specifies which ones are the required results instead of how these results should be produced. Such non-procedural languages (SQL Structured Query Language) have become an industry norm. In addition, the relational model suggests a very high level of data independence. There should be no effect on the programs written in these languages due to changes in the way in which the data is organized, stored, ordered and in the formation of its index. Relational models have become a de facto standard for data analysts. Network Model- In the relational model, the data (and the relationship between the data) are taken as a collection of tables. Unlike this, the data in the network model is represented as a record collection while the relationship between the records (data) are presented as links. A network model record is similar to an "entity" in the sense that it is a field collection each containing a type of data. The links can effectively be seen as preferred (but not necessarily) as pointers. A collection of records and the relationship between them constitutes a collection of graphs. Hierarchical Model- The Hierarchical Model resembles the network model in the way data and relationships between data are treated, that is, as records and links. However, unlike the network model, the records and the relationships between them constitute a collection of trees instead of arbitrary graphs. The structure of the Hierarchical Model is simple and direct, particularly in case the data that needs to be organized in a database are of an inherent hierarchical nature. The hierarchical model has some inherent disadvantages, for example, in many cases of real life data can not be easily accommodated in a hierarchical manner. On the other hand, even if the data can be organized in a hierarchical manner, it may require larger volumes compared to other database models. Consider for example a "Employee" of basic entity with the following subordinate attributes "Employee_Salary" "Employee_Attendance". The latter can also have subordinate attributes such as "Employee_Entries" and "Employee_Exits". In this case, the data are inherently hierarchical in nature and therefore should preferably be organized in a hierarchical model. Consider, for example, a case in which an "Employee" is assigned to several "Projects" and the time he / she spends, ("Time_Spent") in each project is an attribute that is included in the "entity". Employee "as in the entity of" Projects ". Such a data arrangement can not be easily organized in the hierarchical model and a possible solution is to duplicate the article "Time_Spent" and keep it separately in the hierarchies of "Employee" and "Project". This approach is difficult and prone to errors in the sense that it is now required to ensure that the two instances of "Time_Spent" remain identical all the time. Target-Oriented Model A comprehensive explanation can be found in "Object Oriented Modeling and Design," James Rumbaugh, Michael Blaha, Illiam Premerlani, Fredrick Eddi and William Lorensen. The oriented object focuses views of all entities as objects. Each object belongs to a class, with each class there are methods and associated fields. To allow encapsulation, some of the fields are private, accessible only to methods of the class while others are public, accessible to all. In this way "Joe Smith" belongs to the class of people. For that class, the age of private fields can be defined. By applying the update_age () class method to the object, Joe will change its age. The methodology allows to define subclasses that inherit all the methods and fields of the superclass. In this way, for example, the employee class can be defined as a subclass of the person class. In addition, one can define additional fields and methods to the subclass. From this method, the employee class can support a salary field, and the get_raise () method. Relational Object Model allows an object view in a relational-organized datum. In this way, one is able to operate on a data as if it were organized as an object and at the same time, support the relational approach. As mentioned above, the data models are related to the logical or conceptual level of data representation and "hide" details such as when the data is accommodated and accessed physically. These latter characteristics are normally approached with the so-called file system of data management. The file management system of data draws the logical structure (in terms of database model) to a data structure, relevant operations and possibly other data. The data structure includes data and index registers. The index allows you to access or update data records using a key. In the search context, the keyword search term is used. The database file management system should preferably operate on data records to achieve improved performance in terms of time (ie, from the user's point of view, a quick response time of the database), and space (that is, minimize, the volume of storage allocated for the database files). As is well known in the art, there is usually an exchange between the requirements of time and space. The performance of the database depends on the efficiency of the data structures that are used to represent the data and how efficiently the system can operate on these data. A detailed discussion on conventional management and file systems is provided for example in Chapters 7 (archiving system structure) and 8 (indexing) "Datábase System Concepts", ibid. Known database file management systems typically use the following index formation schemes, which fall into the following main categories, which include: multi-lane tree indexes and others. Multi-way Tree Indexes - These techniques can be used to create one or more access paths (also called search paths) in the same data record. The search paths form a multi-track tree. Its main disadvantages are that it requires space (usually all the keys of the registers plus some pointers) and maintenance (addition and / or deletion of keys when an update transaction occurs (see definition below), that is, when a record is added and / or deleted Normally, the nature of the index formation scheme as well as the volume of data maintained in the files determine the access number that is required to find or update (update includes, insert, delete, modify) a given data record In the case that the storage medium under consideration is an external memory, the number of accesses is effectively the number of 1/0 accesses, as will be explained later, on each access to the storage medium, a block of data is loaded into the memory Different types of tree index formation scheme have been developed, but normally, an implementation Index training is more expensive than the specified direct access index training techniques. On the other hand, tree index formation allows sub-rank and sequential processing. One of the most widely used index formation schemes is the B-tree (under various names of commercial products and implementation variants, such as B + tree) in which the keys are maintained in a balanced tree structure and the points of lowest level in the data themselves. The detailed explanation of the B-tree index formation scheme can be found in "Datábase System Concepts" ibid. pp. 275-282. The number of accesses of 1/0 obeys to the algorithmic expression Log? N -rl where K is an implementation-dependent constant and N is the total number of records. This means that the performance decreases logarithmically as the number of records increases. It is possible, of course, to use a combination of the above techniques or other techniques, for example, an index formation scheme that is implemented in accordance with two or more of the above techniques. One of the significant disadvantages of the above-mentioned popular B-tree index formation scheme is that the keys are not only contained as part of the data records, but also as part of the index. This results, of course, in undesired inflation of the index size and this latter disadvantage is further aggravated when using large indexes (ie when a relatively large number of bits is required to represent the key). One possible approach to dealing with this problem is to exploit the two-dimensional matrix index formation scheme. An example of the latter is the trie discussed in G. Wiederhold, "File organization for Datábase design"; Mcgra-Hill, 1987, pp. 272, 273, or in D.E. Knuth, "The Art of Computer Programming," Addison-Wesley Publishing Company, 1973, pp. 481-505, 681-687. Generally speaking, the trie index formation scheme allows a quick search, thus avoiding the duplication of keys as manifested for example by the B-tree technique. The trie index formation scheme has the general structure of a tree, where the search is based on the participation of the search according to the search key portions (for example, the bit or digit of the search key) . In this way, for example, each node in the index formation file of trie represents a displacement of the search key and the link to any of its children represents the character value in such a displacement. The trie structure allows a sufficient data structure in terms of space-memory that is assigned to it, thus, as already specified above, the search key is not contained, in its entirety, in the internal nodes and this mode, the duplication that is exhibited for example in the B-tree index formation technique is avoided. In a specific variant of the trie as the trie described in "File organization for Datábase desing" ibid. , in order to achieve an improved performance in terms of response time, a trie index formation file should be constructed by selecting the digits (or bits) from the search key so that the best possible partition of the search space is can get, in other words to achieve a tree that is as balanced as possible. This, however, requires an a priori knowledge of the data records of the trie and is achieved with the disadvantage of obtaining unclassified data, which in many cases of real life is inapplicable. It should be noted that if the classified data is necessary, a balanced structure can not be guaranteed even if there is sufficient a priori knowledge of the data records of the trie. It should be noted that the specified trie does not support sequential sub-range processing. When considering a large amount of data, it is of particular importance to maintain the so-called balanced structure of the trie index in order to avoid long trajectories for accessing a given data record from the root node to the leaf node that is associated with the data record. what are you looking for? The specific B-tree index formation scheme constitutes an inherent balanced tree structure, even after the tree has been subjected to update transactions. The inherently balanced "essentially balanced" structure is achieved, however, and as explained above, with the disadvantage of inflating the content of the blocks in the tree and, consequently, unduly increasing the file size contained in the index, particularly in As for large trees that contain large data records. The large volume of files adversely affects the performance of the data management system in terms of access number (and consequently in terms of access time) to the storage medium in order to achieve the searched data record, which is obviously undesirable. Returning now to the "other" category index formation schemes, for example, the so-called Skip list index is included: a jump list is a random data structure: consists of levels, the lowest level, the zero level, consists of from a list of all records ordered by a non-decreasing order. Each node of level yi (1 = 0, ...., h) chooses, with a probability p, to be or not a representative level i + 1. The representatives of level i constitute the nodes of level i + 1. These representative , also, they are organized as an orderly one. The level h + 1 is the first empty level. Having discussed the main disadvantages of the index formation schemes known hitherto, ie volumes of inflamed data (for example, B-tree and variant thereof, and susceptibility in terms of unbalanced structure (e.g., trie), a discussion will follow in another aspect that belongs to the different characteristics that include the subordination of data record and multidimensional characteristics.
Thus, consider, for example, two types of data records represented as two entities (tables), that is, books and loans, each being associated with a respective unique key, for example, the borrower is identified by Borrower Id. and book is identified by Book_Id. In a real life case, such as in a public library, one is interested in seeing for example all the books borrowed by a given borrower. This last transaction exemplifies the subordination of data records, where "books" are subordinated to "borrower". To be able to solve an inquiry, one must apply two inquiries-one for the borrower's information and another for the books provided by the borrower (in accordance with the compound key - book borrower). So far, regarding the index-tree formation scheme, in order to support the subordination of the data in the specified manner, several separate index files are required, as follows: • Book index file, accessible by means of the book-Id key; • Borrower index file, accessible by means of the Borrower-Id key; • Transactions through borrowers, accessible through the compound key (Borrower-Id book-Id).
Accordingly, the index scheme includes three index files here. This obviously presents unwanted increases in data volumes and additional integrity maintenance and revisions. Thus, for example, the removal of a given book from the book archive requires a preliminary test to ask if it exists in the borrower-book index file. Having discussed the disadvantages of the techniques known up to now with respect to subordination of data records, the difficult way and representation of operation thereof becomes more susceptible to the consideration of implementations of so-called mul-dimensional data registers. Returning now to the last example, the tables of books and borrowers are now taken as multi-dimensional tables, which can be reached from different views. In this way, in addition to the aforementioned borrower - > view of the book (books borrowed by borrowers which is implemented by an index on the key composed of borrower-book, the database should support the alternative view of borrowers who have borrowed a book or certain books, which requires, Of course, use the alternative composite key (book-borrower). In the B tree representation, therefore, it is necessary to add another accessible index file by means of the compound key (loan identification, borrower identification), giving a total of four index files The relevant disadvantages are self-explanatory and become valuable even for n dimensional tables (n> 2) Therefore, there is a need in the art to reduce the disadvantages of data processing systems exploiting so far the known database file management systems, specifically, there is a need in the technical CA to provide a data processing system that exhibits a database performance using an efficient database file management system. There is still another need in the art to provide a database file management system using index which inherently is not susceptible to an unbalanced structure in the manner specified above. There is still another need in the art to provide an index which inherently supports the representation of multiple types of data, subordination of data records and / or multidimensions. GLOSSARY OF TERMS For clarity of explanation, below is a glossary of additional terms frequently used throughout the description and appended claims. Some of the terms are conventional and others have been created: Block - a storage unit that can be accessed by a single 1/0 operation. A block to contain data accommodated in any desired way, for example, nodes arranged in the form of a tree and possibly also links to current data records. A block can reside in a main (also called internal) or secondary (also called external) storage. Tree - a data structure which is either empty or consists of a root node linked by means of d (d> 0) pointers (or links) to d separate trees called root subtrees. The roots of the subtrees are called as child nodes of the root node of the tree and the nodes of the subtrees are descendant nodes of the root. A node, all whose subtrees are empty are called leaf nodes. The nodes in the tree that are not leaves are designated as internal nodes. In the context of the invention, the leaf nodes are also nodes that are associated with the data records. Nodes and trees should be interpreted in a broad sense. In this way, the tree definition also covers a tree of blocks where each node constitutes a block. In the same way, the descending blocks of such block are all the blocks that can be accessed from the block. For a detailed definition of "tree", refer also to the book Cormen, Leiserson and Rivest, or Lewis and Deneberg "Data structures and their algorithms" It should be noted that the association (eg link) between the leaf node and the record of data includes any embodiment, which allows to access the data records of the leaf nodes. In this way, by way of example, a data record can be accessed directly (that is, through a pointer) from the leaf node.
By means of another non-limiting example, the leaf node points to a data structure, (for example, a table) which, in turn, allows to access data records. Of course other variants are also possible. Depth of an index - is defined as the maximum number of blocks from a root block to a block associated with the data record. Balanced index - An index is balanced if there is a constant c so that the number of accesses needed to reach any data record is at most clogn, where n is the record number in the structure. Obtain a balanced tree includes, apply a balance technique, post factum, (in an unbalanced structure), carrying out a structure, or, if desired, applying the technique of balance on the deployment, to maintain, a balanced structure . Accessing an index would be considered as a process of moving from one node to another node within a block or another block usually, although not necessarily to reach the searched data records. Browsing is considered as accessing data records, usually (though not necessarily), to collect them in a way ordered by your password. Search schema: means the algorithm that is associated with an index that is used to access a data record given by a key; Intra-block search scheme means the algorithm that is being used within the block to access a given data record or another block. The data record does not necessarily fit within the block. Common key of a block - The common key of a block is the longest prefix of all the keys of the data records that can be accessed from the block by means of the relevant search scheme. If desired, part or all of the common key can be explicitly contained in the block. Update transactions - transactions consist of either inserting a new data record, or deleting an existing data record or modifying an existing data record or a portion of it. Vertical oriented structure - the conventional orientation of the digital tree from root to leaves. As will be exemplified below, it is not always obligatory to maintain all the links between the nodes and the blocks in the vertical trie. As will be explained in detail later, in an index of the invention, a trie that is susceptible to an unbalanced structure constitutes a vertical tree. As will be described later, in some specific modalities, the construction of indexes on the keys of the tries data registers are vertical oriented tries. Trie structure oriented horizontally - it has levels h of structure of tries oriented vertically with the first level representing the highest level and the level hv ° representing the lowest level (constituting the trie that is susceptible to an unbalanced structure) which is normally associated with data records, and allows it to move from a block at a level iv0 to a level at the level i + let according to a common key value of the block. In various embodiments of the invention, as will be explained in detail below, the upper levels h constitute a representative index on the common keys of the tree blocks of the lowest level.
Storage medium - Any medium that can be used to store data, including returning either an internal or external memory or both. The external memory may be one or more of the following: magnetic tape, magnetic disk, optical disk, or any other physical medium used to store data. The internal memory includes any known main memory including returning a cache like any other physical storage medium that serves as internal memory. Short link - (also referred to as a close link) a link labeled k between a node a having the value r to node b in the same block so that the keys of the data records that include node b in its access path have the value k in its key position r. In the long ce - (also referred to as a far link) a link between a node v in block B of level i to block B 'of level i-1 or to a data record. If v has the value r and the label of the link is k then the value of the common key of the block B 'or the key of the data record is k in the position r. The label of a short link or a long link is also referred to as the value or address of the link. Ce split - If a block overflows and a division process is performed so that if a node a is bound to a node b, and then the node divided by its descending nodes are accommodated in a different block-block B- then the link between node a and node b is a divided link. After division, the divided link is the link between node a and block B (ie an accommodation node b). A split link is a labeled link. In several implementations such as PAIF the maintenance of the divided link from node a to block B where the node b that resides is optional since one can access block B through the index in layer. Direct link - a link between node v in block B of the level to block B 'of level i-1 that includes a node v' so that nodes v and v 'have the same value. If a search path to the data record with a key k includes the node v but does not include any of its near and far links then it must contain the direct link to the block B '. A direct link has no label. Below is a description that pertains to the terms duplicate node and copied node that are used in the block division procedure. Thus, if a node v 'has the value k then all the keys of the accessible data records of v' and their labeled links agree in the positions O,. . . , k -1. If a node v is created so that it has a value equal to the node value v 'and all the data records are accessible from v and its tagged links are accessible from the node v' and its links labeled, v is considered, a duplicate node of v '. A duplicate node maintains a direct link to the block that includes the node v '(a duplicate node is also called a copied node). Following is a discussion on several additional terms and procedures that are used in the description and claims in the context of the present invention. Record of Dates consists of a rule of several fields, some of which are designated as keys. Sometimes, the records are sorted by one of the keys, called the primary key. An index (or index scheme) on the data registry keys or on the representative keys (for the definition of the latter see below) is a data structure that facilitates the investigation by one or more of the keys. Index examples are any of the specific multi-way tree index schemes. An index according to the invention can be constituted using more than one index scheme. The index can be stored in a file or files that reside partially or entirely in internal memory or in external memory. According to the invention, an index is provided that includes a fractioned Index - a dynamic data structure - that allows the search by means of a key and is divided into blocks, each of which contains a representative key. The representative keys should be sufficient to find the block associated with a record whose key matches the search key (if any). Having located the block, the data record can easily be recovered. The representative keys are not necessarily stored physically in the block. Examples of fractional indexes are: 1. The block sequence of a file sorted by incremented key values of the primary key. The index leads to the search for the block that contains the key. To allow searches by means of a key that is not the primary key, a fractional index is constructed so that for each record the fractional index contains its key and its link. These pairs are ordered in non-decreasing values of the key. The index leads to the block that contains the address of the desired record. 2. A trie accommodated in blocks. 3. Other types of index scheme that comply with the provision of the fractional index. Fractional indexes on the data entry keys are called a basic fractional index and the index layer is indicated. This fractional index can become unbalanced, thus providing rise to some long search paths. To search for the fractionated index with efficiency, an additional index layer (an index layer is indicated by an index that is also short) I ± is built on the representative keys of the- If I \ is also a fractional index then an additional I2 index can build on the clues representative of the I \ blocks. This process can be repeated until an Ih index (hereinafter root index) is created, which is preferably contained completely within a single block. The root Ih index is not necessarily a fractional index. The index in layers (which also constitutes an index) is the collection of the ..., Ih li, ••• h- It also constitutes a so-called representative index. To search for the record by the key k, the latter is searched in In (and some cases in In-? Al, and in the data registers in order to find the block of Ih- \ carrying ak.) This process is repeated until it reaches the block of what is associated with the record with the key k (if any). 'To insert a new record r with the key k, a search is performed as in the above to find the block of k. found B in I0, r is added to B. If B (in it) overflows, it is divided into two (or more) blocks and the representative of B in I \ is replaced by the representatives of the new blocks. block B \ in Zi leads to a division of B and the representative of Si in I2 is replaced by the representatives of the new blocks, etc. If the block of Ih overflows, a coat Ih-? additional is created and added to the index in layers. It should be noted that a state of "overflow" can be determined according to the particular application, and does not necessarily trigger when the block is considered full. In this way, for example, by means of a modality, the overflow occurs when a block is at least half of its capacity. The elimination is similar to insertion, and may involve uniting-reversing division processes. The update or the division does not necessarily need to be done in the deployment, but it can be delayed (that is, done post factum). It should be noted that the construction of the index layer preferably retains a balanced index. It should be noted that in some modalities, the balanced index is sufficient, and that in some cases where the index in layers (without it) is of a relatively small volume (for example, it can be accommodated mostly or almost entirely in memory). internal) the requirement of "balanced structure" can be exempted. According to a first aspect of the invention, it has been found that the inherent limitations of a basic fractional index (e.g. trie) that is susceptible to an unbalanced structure can be dealt with by providing an index and, more specifically, a layered index in the way specified. Focusing, for example, on the layered index compared to the basic fractional index (for example, trie), it easily emerges to access selected data records through the layered index that is substantially more efficient than accessing the same records. data through the trie. In the context of the invention, "more efficient" means that the number of accesses to the storage medium through the index in layers in order to be able to perform an update transaction (for example, insert, delete or modify) in a register of data or an access data record is smaller compared to the number of accesses to the storage medium through the basic fraction index. The number of accesses must be interpreted so that each block is treated (for example, loaded or processed) from the storage medium. There may be exceptional cases where the last "most efficient" provision does not apply, for example, in the case of a very small file that has only a few blocks, where accessing a data record through the basic fractional index may require the same or even fewer operations than through the index. In order to implement a fractional index such as a trie - the construction of a layered index from a basic fractional index which is a trie, requires some additional considerations. In this way, each key is estimated as a character or binary string. On the other hand, if the trie can not be accommodated in a single block, it is divided into blocks, so that each block contains a single subtree of the trie. The key representative of the block is the chain associated with the root node of the trie in the block, that is, the sequence of levels of the trajectory of the root of the trie from Ix to the root of the trie of the block. As in the general layered index scheme, the representative keys of Ix are the keys of I1 +? . To look for a key ic in I1 + 1 one looks for the largest k prefix in the I1 + blocks? and from there it moves to the appropriate block of The insertion of a record leads to the addition of its key to Jo that is, adding a value to the trie of the-If as a result a block overflows, the block is divided-it is split into typically 2 (in some more implementations) blocks, so that each block contains (connected) a trie. To achieve this, a link between a node u and its child v is divided, and the subtree rooted in v is moved to another block. The representative key of the new block is added to I? . As in the general layered index scheme, this process is continued to li ...., I / ,. If the basic fractional index is a compressed tri similar to Patricia or PAIF, only part of the keys are saved, this saves the index space. However, these savings affect the way in which the search is performed. In such compressed tries usually only nodes of degrees larger than equal to 2 are maintained. If the search key k does not belong to a compressed trie, the search can end in some record r, and you have to check if k is equal to the key of r. If the keys are different then the trie does not contain a record with key k. The effect of this strategy on the layered index scheme is that the prefix k may not be representing in the index. To allow the search in the cases, a direct link from the nodes of the blocks of l to the block of Ii-i are introduced. These links do not have an address, and are taken when the appropriate position of the search key does not match any of the addresses of the node. Assuming that the search reaches block B1. of J? - ?, whose representative key kx-? it is not a prefix of k. (If k -? Is not explicitly recorded in Bi-i, any data record r accessible from B -? Can be reached, and from the key of r is determined Jfi-i. To continue with the search, compare kyk - To find the position of j of the first character where they differ, look for the trie of the block Bx until finding a node of v with a direct link and a value less than or equal to j.The search is continued from the block of I? -? indicated by that direct link (If no such node exists, it goes to the first block of the index I -?) Thus, in the worst case, each layer may require additional access. As will be explained later, three layers are sufficient to direct billions of records and usually two layers can be kept in the internal memory of a computer, so it is possible to have no more than two I / O accesses in the middle of external storage to access the block associated with a data record The division process also has to accommodate direct links. Assuming that the access path to the block Bx_? of I ^ i consists of the Bx block of the Ilf I ± -? overflows and divides into blocks B1-? and B? -? ' The Bx block now has to contain links with all its downstream blocks in Ii-i. This can be achieved by the following non-limiting technique: Let k ± -? be the representative key of B - \ ', this key is inserted in rx- the compressed trie of B - so that the search of the descendant keys of Bi-i' reach B -i, and the search of the descending of B ± -? scope S? -1. A non-limiting method to achieve the division process is as follows: 1. at least one short link is removed between the short links of a node (hereinafter divided node) in the block (hereinafter divided link) ) in a way that there are at least two tries in the block. 2. each of the subtrees is moved to a separate block. 3. If block B does not exist, Bx is created and a node copied from the split node is created in Bx 4. If the Bx block exists and a node copied from the split node does not exist in B? then a node copied from the divided node is created in Bx and connected to the trie of B ± so that BÍ-I '(at the end of the division process) is accessible in the search path that includes the root node in Bx and the copied node and its links labeled according to the representative key of Bi-i '. 5. If the copied node does not have a direct link, add a direct link from the copied node to the block Bi_ ?. • 6. Add a far link from the copied node to block B? -? Or if the copied node has a short link to a child node in the direction of the far link, the far link can be replaced by a direct link from the child node to block B? ' 'In the previous implementation, a division of a block in I, k > 0 is realized in such a way that the division links (of I) are links between nodes copied from divided nodes residing in the different blocks.Therefore, according to one aspect of the invention, a storage medium used by a user is provided. database file management system executed in the data processing system, a data structure that includes: a layered index accommodated in blocks, the layered index includes a basic fractional index that is associated with the data records The basic fractional index allows to access or update the data records by key or keys, and being susceptible to an unbalanced structure of blocks, the index in layers allows to access or update the data record by key or keys and constitutes a balanced structure of The invention also provides, in a storage medium used by a file management system of b A data structure executed in the data processing system, a data structure that includes: an index accommodated in blocks and that is built on the keys of the data records; the index includes a basic fractional index that is associated with the -registers of data; the basic fractional index allows accessing or updating the data records by key or keys, and being susceptible to an unbalanced structure of blocks; the index allows to access or update data records by key or keys and constitutes a balanced structure of blocks. Additionally, the invention provides, in a storage medium used by a database file management system executed in the data processing system, a data structure that includes: an index accommodated in blocks and that is constructed on the keys of data records; the index includes a trie that is associated with the data records; the trie allows to access or update the data records by key or keys, and is susceptible to an unbalanced structure of blocks; the index allows to access or update the data records by key or keys and constitutes a balanced structure of blocks. Additionally, the invention provides, in a database file management system, to access the data records and to be executed in the data processing system; the data records are associated with a basic fractional index accommodated in blocks and stored in a storage medium; the basic fraction index allows access or update of data records by key or keys and is susceptible to an unbalanced structure of blocks; • a method for constructing a layer index accommodated in blocks, comprising the steps of: (a) providing the basic fractional index; (b) construct a representative index on the keys representative of the basic fractional index; the index in layers allows to access or update data records by key or keys and constitutes a balanced structure of blocks. The invention further provides, in a database file management system, accessing data records and executing them in a data processing system; the data records are associated with a basic fractional index accommodated in blocks and stored in a storage area; the basic fractional index allows to access or update the data records by key or keys and is susceptible to an unbalanced structure of blocks; a method for building an index on the keys of the data records, the index is accommodated in blocks, which comprise the steps of: (a) providing the basic fractional index; (b) construct an index on the keys representative of the basic fractional index; the index allows to access or update the data records by key or keys and they constitute a balanced structure of blocks. In addition to the present invention, a database file management system is provided to access data records and execute it in the data processing system; the data records are associated with a trie accommodated in blocks and stored in a storage medium; the trie allows to access or update the data records by key or keys and is susceptible to an unbalanced block structure; a method for building an index on the keys of the data records, the index is accommodated in blocks, comprising the steps of: (a) providing a trie; (b) build an index on the representative keys of the trie; the index allows to access or update data records by key or keys and constitutes a balanced structure of blocks. The index, according to the invention is preferably, but not necessarily, constructed by one or more of the index formation schemes selected from the specific index schemes. Typical, though not exclusive, examples of multi-track tree indices are the B-tree index formation schemes. By one embodiment, the basic fractional search scheme is a trie that is constituted by a digital tree of the type described in US Pat. No. 5,495,609. By another modality, the trie is constituted by a so-called Probabilistic Access Indexing File (PAIF) (Training File of Probability Access Index). Thus, a storage means used by a database file management system executed in a data processing system, a data structure that includes at least one index formation file is provided by a specific embodiment. of probability access (PAIF) having a plurality of nodes and links; the PAIF leaf nodes are each associated with at least one data record accessible to the user's application program and wherein at least a portion of the data record constitutes at least one search key; the nodes selected in the PAIF represent, each / a given displacement of a portion of the search key within the search key interleaved; link or links originated from each node provided from among the newly selected ones, each representing a unique value of the search key portion; the PAIF has at least two subPAIFs being arranged, each one, in a block; The database file management system is also capable of accommodating the blocks as a balanced block structure. In the context of PAIF, it should be noted that the selected nodes, while preferably including only one given displacement, is not always necessarily the case. In this way, one or more of the nodes may include other information, such as portions of the keys and / or other information, as required and appropriate. According to a modified embodiment, the trie being of the PAIF type, the index formation scheme is constituted by a search scheme substantially identical to that of the trie PAIF. Before proceeding it should be noted that for convenience of description, only the invention is described primarily by reference to the trie as a basic fractional index. Those skilled in the art will readily appreciate that the invention is in no way limited by a trie and therefore no basic fractional index is applicable. Thus, a database file management system employing a layered index of the invention is advantageous, in terms of improved performance compared to inter-alias techniques known so far due to the following characteristics: Data is inherently contained in a classified form according to the search key.
Mainly, One can navigate in the tree by the order of the keys of the data records. The layered index inherently supports sequential operations such as "get next" "get prior". In this regard, the proposed layered index constitutes an advantage over, for example, calculated addressing schemes and some implementations of digital trees. • There is no requirement for an advance knowledge of the contents of the database, in order to maintain the index balanced.
• A balanced layer index is retained and the depth of the index is relatively small, thus minimizing the number of accesses (usually 1/0 slow operations) that are required to perform update transactions or access data records. According to one modality, practically a 1/0 operation (and no more than two 1/0) (which constitute one or two accesses) is required to be able to access a given data record from among billions of data records. The invention thus provides a computing system having a storage medium of at least one internal memory varying between 10 to 20 M bits or more, and an external memory; a data structure that includes an index on the keys of the data records; the index is arranged in blocks; so that for a trillion data records, substantially no more than two external memory accesses are required in order to access a block that is associated with one of the billions of data records, regardless of the size of the data key. data records. Additionally, the invention provides a computing system having a storage means of at least one internal memory ranging from 10 to 20 M bits or more, and an external memory; a data structure that includes an index on the keys of the data records; the index is arranged in blocks; so that a million records of data substantially all blocks of the index are accommodated in the internal memory without taking into account the size of the key of the data records. The invention further provides a computing system having a storage means, a data structure including an index on the keys of the data records; the index is accommodated in a balanced structure of blocks and allows performing sequential operations in data records; The index size is essentially unaffected by the size of the keys. It should be noted that the data records can do in the index blocks in layers, or they can reside in separate data files (one or more). In this last modality, the data records should be associated, of course, with the corresponding layered index. As will be elucidated below with reference to the description of the following specific embodiment, a given data record can accommodate more than one search key. The index, according to the invention, preferably, although not necessarily constructed by one or more of the index formation schemes selected from the specific index schemes.
Typically, though not exclusively, examples of multiple-path tree indices are the B-tree index formation scheme. Next, a discussion pertaining to the second aspect of the invention is provided. In this way, data usually consist of records of various types (for example, in the previous example books and borrowers). The type of record determines its fields (attributes) and its keys. In a conventional system, for example, of the type that uses a B-tree index, the type of each key is not kept within the record and is not considered part of the key. The program "knows" the type of record, and from it the fields of data records and their structures. According to the second aspect of the invention, a different approach is proposed. Each type of key is assigned with a designator-a string of bits, for example, a series of one or more characters which, usually but not necessarily, are added as prefixes to all keys of this type. A designated key is a key with its designator. The designator is treated as part of the key (for search or update purposes), and therefore is part of the index scheme. The designator allows obtaining the properties of the data record as a function of the type. In this way, when looking at the designator of the key, one obtains the designator and in this way can deduce the type of record, one does not need to know the type of record a priori. The data records in which the keys are designated are called designated data records. A designated index is an index that allows searching in designated data records. The following is a description that exemplifies the use of designators according to the invention. In this way, consider a class C, so that all data records of this class have a field (or fields) of key k1 and possibly some other fields without a key. Let R be a data record of classes C, where R. k? = FIAT. Let the designator of Kx be A. When adding the designator, one gets the key AFIAT. To access a record with R. ^ i = FIAT, the designated index is searched by the AFIAT key. Having described the designator characteristic, a description of another characteristic according to the second aspect-subordination of data record is shown below. Consider an Rl record with a Kl key, and an R2 record with a composite key consisting of the ordered pair of Kl, K2 keys. (In this case, the designated key of R2 is the composite key Kl ', K2', where K2 'consists of the key K2 with a prefix by a designator D2. (D2 is considered, the designator of R2). designated, one can select Rl by looking for the key Kl'-the key Kl with its designator DI, and select R2 looking for the same index by means of the key Kl'K2'-the concatenation of Kl 'and K2' where K2 'is the key K2 with its designator D2, in this case K2 is subordinate to Kl. The subordination relation also extends to the registers If K2 is subordinate to Kl, the designator of K2 'is D2 and the designator of R2 is also D2 (or Di , D2) If R2 is subordinate to Rl, the key of R2 is composed by concating K2 'to Kl. Note that in K2', D2 is with a prefix or K2.In the ERD model, the record type Rl and R2 can remain in a DI relationship to many, which means that several records of type R2 may be related to a single record type Rl. The relationship can be implemented by the subordination relationship: several records of type R2 will be subordinated to a single record of the type (for example, several books can be borrowed by the same borrower). In particular, if this relationship is one-to-one (for example, one-to-one is the relationship where only one book can be borrowed by each borrower) then the key K1'D2, where D2 is the designator of R2, it is enough to locate R2. In a designated index the search path to K1'K2 'includes the search path to K1'. (This does not prevent the possibility of reaching the R2 register by means of another trajectory). This latter feature exhibits another important feature according to the second aspect, i.e., inherent maintenance of data integrity. In this way, the insertion of a record whose key is K1'K2 '(or K1'D2) can only be done if the record whose key is Kl' exists. In the previous example, an insertion of a transaction by a borrower who lent a book (book_Id = 2222) should result in the insertion of an R2 record whose designated key is A111111B2222 (hereinafter borrower-book record) _ only if the specific loan (data record Rl with Kl = 111111) exists (in the previous example, the borrower's designer is A and the data record designator of subordinate borrower-book is B). The integrity of the data is achieved with a small elevation since the trajectory in the index of the borrower-book record includes enough information to determine if the borrower exists. If the borrower does not exist, the path to the compound key will not pass through the borrower. This will be automatically detected in the insertion process. In comparison, according to the prior art, records of different types are associated with different index files. Before inserting a new data record (with a composite key) into the Borrower-Book index file, a separate review must be made in the Borrower's Index File to be able to insure the specific borrower (registration. Kl) exists, thus presenting an undue increase. Note that the subordination relationship is not limited only to the levels, the subordinate register itself can have a subordinate register to it and therefore a level n of subordination can be achieved. For example, consider a bank database, where account records are subordinated to branch records, and deposit records are subordinated to accounts. Returning now to the characteristic of mul -dimensions according to the second aspect of the invention, let R be a record that is identified by either of the two keys Kl and K2. Next, the designated index should contain two search paths for R, one through the designated Kl 'key and the other through the designated K2' key. Therefore, R constitutes a multidimensional register. A mul-dimensional index includes the designated index and multi-dimensional data records. Consider a first mode where a multi-dimensional index does not apply to subordinate data records. Thus, for example, consider a class C, so that all data records of this class have two key fields k? ~ The model of automobile- and k? is your license plate number, and possibly several fields without a password. Let R be a class C data record, where R. k ^ FIAT and R. k2 = 127 Let the designator of ki be A and of k2 be B. When adding the designators one obtains the keys AFIAT and B127. These extended keys are inserted into a single designated index. To access a record with R. k? = FlKT, the designated index is searched by the AFIAT key and to select a record with i ?. 2 = 127, the same index designated by B127 is searched. The above discussion and example are considered as a multi-dimensional index where the data records do not necessarily exhibit a subordination relationship. In a multi-dimensional index it can optionally also be applied to subordinate data records. For example, consider a banking database, where deposits are subordinated to both accounts and depositors. A single designated index provides access to accounts (using the key designated ki 'account-number), with depositors (using the designation key k' depositor-name) and deposits using k? 'k2' and k2 'k? r. (It is possible, of course, to use different designators for k \ when subordinate to k? And a when it is subordinate to ki). The designator of a multi-dimensional register depends on the designator of the key used to search or update the record. Thus, the automobile registration designator (FIAT, 127) is A when the registration is searched for or updated by the AFIAT key, and is B when it is accessed by means of a license plate number B127. In addition to data records, it is necessary to maintain the metadata. The metadata include information about the different registers as a function of their type. In this way, it is necessary to identify the designator and as a result the information in the registry is available, for example a description of several fields, keys, subordination, registry size, etc. The search scheme in the designated index is absorbed in the meta-data. Locate the record, identify the designator (for example the designator can be prefixed to the record) and construct the designated key (compounds). Thus, according to a second aspect of the invention, in a storage medium used by a database file management system executed in the data processing system, a data structure including: an index about the keys of data records; the data records are at least two types where the data records of the second type are subordinated to the data records of the first type.
Still in accordance with the second aspect, a data structure that includes: a designated index on designated keys of data is provided in a storage medium used by a database file management system executed in the data processing system. data records; the data records constitute designated data records, with at least two types where the designated data records of the second type are subordinated to the designated data records of the first type. According to the second aspect, several advantages are achieved, including: Reopening: ü The data structure that includes the designated index and the designated data can maintain the relationships between the different data items. Q The data structure that includes the designated index and the designated data can link logically related articles. Q The data structure that includes the designated index and the designated data can support several data models simultaneously and effectively. Q The data structure that includes the designated index and the designated data allows high efficiency to maintain data integrity.
Q The data structure that includes the designated index and the designated data allows a high efficiency to recover the related data. A detailed discussion regarding the different advantages offered by the database file management systems of the invention is provided below with reference to the specific modalities. It should be noted that the data records may constitute a part of the PAIF, or may reside in one or more separate data files. In the last modality the data records should be linked, of course, to the corresponding PAIF. As will be further elucidated with reference to the specific embodiments of the description below, a data record provided may accommodate more than one search key. It will also present how data structures and complex data relationships can be supported by a new simple and uniform technology. It will also be presented how an index structure can be of a minimum size, without depending on the size of the keys. All the advantages mentioned above are supported inherently by the invention without any preliminary consideration of the data (ie, the key range is unknown, the registration number is unknown, the random physical position of the records is presumed, etc.). . Through yet another aspect of the invention, a data structure that includes: an index that is stored in the data storage system is provided in a storage medium used by a database file management system executed in the data processing system. storage medium and built on the keys of data records that are stored in blocks; the index is accommodated in blocks with the leaf blocks being linked to the data records by means of links; the index is characterized in that at least one of the links is shared by at least two data records stored in the same block. By means of a modality, the index is constituted by a trie. Additionally, the invention provides, in a storage medium used by a database file management system executed in the data processing system, a data structure that includes: an index that is being stored in a storage medium and built on the keys of the data records that are stored in blocks; the index is accommodated in blocks with the leaf blocks being linked to the data records by means of links; the index is characterized in that at least one of the links is shared by at least two data records stored in the same block; the index constitutes a layered index according to claim 1, and the basic fractional index blocks are linked to the data records. BRIEF DESCRIPTION OF THE DRAWINGS In order to understand the invention and see how it can be carried out in practice, a preferred embodiment will now be described, by way of non-limiting example only, with reference to the accompanying drawings, in which: Figure 1 shows a generalized block diagram of a system that employs a database file management system; Figure 2 shows an exemplary database structure represented as an Entity Relationship Diagram (ERD), and serving for illustrative purposes; Figure 3 shows the database of Figure 2, represented as a table according to the relational data model, with each table containing few data cases; Figure 4 shows the "CUSTOMER" table of Figure 3, according to the file management system that employs the conventional B + tree index scheme; Figure 5 shows the "CLIENT" table of Figure 3, according to the file management system that employs the conventional trie index scheme; Figures 6A-6C show the "CUSTOMER" table of Figure 3; according to the file management system that employs the PAIF index scheme; Figures 7A-7H show schematic illustrations exemplifying the construction of the index in layers, according to one embodiment of the invention; Figures 8A-B show schematic illustrations exemplifying the construction of a layered index, according to yet another embodiment of the invention; Figures 9A-G show schematic illustrations exemplifying the construction of a layered index, according to yet another embodiment of the invention; Figures 10A-B show schematic illustrations exemplifying the construction of a layered index, according to another embodiment of the invention; Figure 11 shows a schematic illustration exemplifying the construction of a layered index, according to still another embodiment of the invention; Figure 12 shows a schematic illustration v to exemplify the use of designators in an index designated according to yet another embodiment of the invention; Figures 13A-E show five schematic illustrations for exemplifying the data record subordination feature in an index designated in accordance with one embodiment of the invention; Figure 14 shows a schematic illustration of an index designated exemplifying a multidimension record according to an embodiment of the invention; Figure 15 shows a schematic illustration in an index designated according to another embodiment of the invention; Figure 16 shows a schematic illustration for exemplifying the characteristic of relationships between the data records provided according to an embodiment of the invention; Figures 17A-B show a schematic illustration of a compressed representation of links to data records according to an embodiment of the invention; Figures 18A-D show four benchmarks graphs demonstrating the improved performance, in terms of response time and file size, of a database using a file management system of the invention against a database Carbol-based commercially available; and Figures 19A-D show four graphs of fixed marks demonstrating the improved performance, in terms of response time and file size, of a database using a file management system of the invention versus a database based in Treebeard commercially available. First, attention is drawn to Figure 1 which shows a generalized block diagram of a system employing a database file management system of the invention. In this way, a general-purpose computer 1, for example a personal computer (P.C.) employing a Pentium® 3 microprocessor commercially available from Intel Corp. U.S. A., of an operating system module 5, for example Windows NT® commercially available from Microsoft Inc. U.S. A., which communicates with the processor 3 and controls the overall operation of the computer 1. The P.C. 1 also accommodates a plurality of user application programs of which only three, 7, 9 and 11 respectively are shown. The user application programs are executed by a processor 3 under the control of the operating system 5, in a manner known per se, and responds to the information entered by the user fed through the keyboard 13 through the intermediary of port 1/0 15 of the operating system 5. The user application programs also communicate with the monitor 16 to display data through the intermediary of port 17 1/0 and the operating system 5. The user application programs can access data stored in a database by means of the database management system module 20. The generalized database management system, as generally shown in Figure 1, includes a high-level administration system 22 which sees, as a rule, the underlying data in a "logical" manner and responds to the program of user application by means known per se, such as, for example, SQL Data Definition Language (DDL and DML). The database administration system typically exploits, in a manner known per se, a data dictionary 24 that includes meta-data that maintains information in the underlying data. The underlying structure of the data is governed by the database file management system 26 which is associated with the index scheme and the current data registers 28. The "high level" logical instructions (for example, SQL commands) received and processed by the high level administration system 22 are converted into "lower level" commands that access or update the data records that are stored in the database files and up to this point the database file management system considers the current structure and organization of the data records. The "high level" and "low level" portions of a database file management system can communicate through an Application Programming Interface (API) known per se, for example, the basic connectivity interface Open Data Center (ODBC) commercially available from Microsoft. The use of the ODBC allows the "high level modules of the database file management system or the application program to transparently communicate with different" database file management systems "that support the ODBC standard. The terms access or update of the data records used here include all types of data manipulation including returning "find", "insert", "delete", "modify" data records, and the relevant DDL commands that help the construction as a modification and elimination of the database In Figure 1 it also shows, schematically, a storage medium in the form of an internal memory module 29 (for example, 16 megabytes and possibly employing an associated memory submodule) and a module 29 'of external memory (for example 1 gigabyte) Typically, external memory 29' can be accessed through a relatively communicative bus external slow (not shown), while the internal memory is normally accessed by means of a faster internal bus (not shown). Normally, by virtue of the relatively small size of the internal memory, only those applications (or portions of it) that are currently running are loaded from the external memory into the internal memory. At the same time, for large databases that can not be fully accommodated in the internal memory, a larger portion of it is stored in the external memory. Thus, in response to an inquiry generated by an application that searches for one or more data records in the database, the database management system uses the services of the operating system (i.e., an I / O operation). ) to be able to load, via the external communication bus, one or more blocks of data from the external memory to the internal memory. If the searched data records are not found in the loaded blocks, additional I / O operations are required until the searched data records are found. It should be noted that for simplicity of presentation, the modules 29, 29 'of internal and external memory, are separated from the different modules 5, 7, 9, 11, 20. Clearly, although not shown, the different modules (system DBMS, and user application programs) are usually stored in the external memory and their currently executed portions are loaded into the internal memory. The computer 1 can serve as a work station that is part of a LAN Local Area Network (LAN) (not shown) that employs a server that also has essentially the same structure as Figure 1. Up to the point where the stations Service and server use client-server-based protocols a predominant portion of the modules (including returning the database records themselves 28) residing on the server. Those skilled in the art will readily appreciate that the above embodiments described with reference to Figure 1 are only two out of many possible variables. In this way, by means of the non-limiting example, the database can be an online database that resides in a Web Internet site. The invention, of course, is not limited to the specific fractionation of the small internal memory and the large external memory. Thus, for example, by means of a modified mode, a large internal memory and external memories are used and by means of yet another modified mode only an internal memory is used. It should be additionally noted that for clarity of explanation, the system 1 is illustrated in a simplified and generalized manner. A more detailed discussion of the database file management systems and in particular of the various components that are normally accommodated in the database file management systems can be found, for example in Chapter 7 of Datbase System Concepts "ibid Having described the general structure of a system of the invention, attention is now directed to Figure 2 which shows an exemplary database structure represented as an Entity Relationship Diagram (ERD), and which serves to Illustrative purposes In this way, ERD 30 of Figure 2 consists of the entities "CLIENT" 32 and "ACCOUNT" 34 as well as a "DEPOSIT" relation 36 from "n" to "m" indicating that a customer Given that you can have more than one account and that at the same time a given account may belong to more than one client, as shown, the entity "CLIENT" has the following attributes (fields): "Client_Id" 38 being a key attribute ue uniquely identifies each client, - "name" 39 established for the name of the client "Address" 40 established for the customer's address. The entity "ACCOUNT" has the following attributes (fields): "Acc_No" 42 being a key attribute that only identifies each account, and "Balance" 43 that contains the balance of the account. The "DEPOSIT" relationship consists of key pairs of the "CLIENT" and "ACCOUNT" entities, so that each pair is indicative of a particular account belonging to the specific client. 'Returning now to Figure 3, a database of Figure 2 is shown, represented as three tables 50, 51 and 52 corresponding to the relational data model 32, 34 and 36 respectively, with each table containing few data cases for illustrative purposes. It should be noted that the length of the key field ("Client_ID") of the "CUSTOMER" table is 5 digits, while the length of the key field of ("Acc_ID") of the "ACCOUNT" table is 6 digits The customer table contains 5 data cases 55-59, the account table contains 2 data cases 65, 66 and the deposit table contains 3 data cases 70-72. . According to the prior art techniques, for each table there is, as a rule, a different index file by the primary key. In this way, Figure 4 illustrates an underlying index formation file of the "CLIENT" table of Figure 3, according to the file management system employing the conventional B-tree index formation scheme. As shown, the index formation file 80 consists of three blocks 80a-c, set by a root block, and two blocks of sheets respectively. The data records are randomly organized into a separate file 81 that contains five data records 83-87. Each block consists of a succession of field pairs (for example, 82a-b and 83a-d in block 80a). In each pair, the first field is presented as a search field value and the second field is presented as a link as well as the number identifying the next block to search, or in the case of a leaf block a link to the record of data as well as a number that identifies the data record. The last embodiment forms a non-limiting mode for associating a data record with a block. In the specific embodiment of Figure 4, a search for records with a key equaling 12355 or a smaller value are directed from block 80a from root to block 80b. In this way, the search for a register whose key is 12355 (82a) starts at the root block 80a and is directed by the link 82b to the block 80b. In block 80b, search key 12355 (86a) is associated with link 86b indicating the address of the data record identified by this search key in data file 81. Placed differently, the data record that is identified by the search key "12355" (57 in Figure 3) is henceforth in order the data file 81. The tables "ACCOUNT" and "DEPOSIT" are likewise accommodated in two tree index formation files of separate B-trees, respectively. The B-tree index formation file of Figure 4 exhibits one of the significant disadvantages of this approach in that the keys (ie search keys) are duplicated, that is, they are maintained in both the internal blocks (it is say, in the index scheme) as in the data records associated with the B-tree index. Thus, for example, the search key of the data record 57 (in Figure 3) is not only contained as an integral part of the data record 86 in the file 81 but also in the block 80b (search key 86a ) and sometimes in the original blocks like 80a (search key 82). This being the case, one can easily notice that for large files (which is the case in many real-life scenarios) the duplication of search keys (and particularly for long keys) results in an exaggerated index which needs a large storage volume, which also adversely affects performance. Figure 5 illustrates a different index formation scheme of the "CUSTOMER" table of Figure 3, according to the file management system employing a known trie index formation scheme. Thus, the trie index formation file 90 includes a plurality of nodes and links where each node means a position to give rise and the link means a value in this displacement. Table 91 has four columns. The first column indicates which digit position will be used, the second column, the value of that digit. A digit value divides the key into two subsets. Columns three and four direct the search procedure to the next step. In order to locate a given search key, for example, 12355, a digit in the position indicated by the root (position "5" indicated by node 90a, also being the first column in the first line of table 91) is compared with the value specified in the second column of the same line (value "5" also indicated by link 90b in the trie index). Since the digit in position 5 of the intended search key 12355 is actually 5, the control is transferred to line 2 (as indicated by the third column of line 1 of table 91). Next, the digit in position 3 of the intended search key (90c in the tree also being the value of the first column of the second line in table 91) is compared with the value three (link 90d, also being the second column of the second line of table 91). Since a match occurs, the control is transferred to line 3 in the table. In this step, the digit in position 4 of the intended search key does not match the value specified in the second column of the line tree (ie, "5" vs. "4") and therefore as indicated in fourth column of table 91 ("not equal") a link to searched data record 57 is obtained (86 in Figure 4). The "ACCOUNT" and "DEPOSIT" tables are also accommodated in two separate trie index formation files, respectively. In comparison with the B-tree index formation file of Figure 4, that shown in Figure 5 does not need duplication of the search key. Placed differently, only the link and displacement values and not all the keys remain in the trie (90). In this sense, it constitutes an advantage over technique B. However, as it is specific, the previous trie is associated with some disadvantages: it stops a uniform distribution of the data in the cost of knowing a priori the contents of the database and the consistent fractionation of the keys to obtain the balanced structure. Knowing a priori the contents of the database is obviously undesirable since it presents an undue limitation since the databases of the type described in Figure 2 are of a dynamic nature, for example, for the specific database of Figure 2 , new clients open accounts, previous clients close accounts, and new clients • register as co-owners of existing accounts etc. Another disadvantage of the previous tree is that it does not support sequential processing. When navigating in the tree it would be possible to access the data through the following order-83, 86, 87, 84, 85 (figure 4) and not through the order of the key. Having shown a known trie index scheme (with reference to Figure 5), a description follows of the different embodiments of an index of the invention that includes a basic fractional index and which deals with the disadvantages described above together with the techniques known until now. Specifically, a preferred embodiment of the index will be demonstrated in the layered index form, and the preferred embodiment of the basic fractional index in the trie form. These examples are by no means limiting. Before returning to the explanation of the various modalities, a new trie index scheme designated PAIF is described, also with reference to Figures 6A-C. As will be shown later, the PAIF is not confined to a tree structure. Based on the PAIF, several layered index modalities are described, with reference to Figures 7-9, which include representative indices constructed on keys representative of the PAIF. Through the modalities of Figures 7-9, the index scheme of the representative index and that of the basic fractional index is substantially the same PAIF. In Figure 10, another modality of the index is described in layers, with a different trie. As will be shown, in the embodiment of Figure 10, the representative index and the trie are also substantially the same. This, however, is not mandatory and is exemplified, for example with reference to Figure 11, where the trie and the representative index are different. Returning now to Figures 6A-C, the sequence of the schematic illustration of the "CLIENT" table of Figure 3 is shown, in accordance with the file management system employed by the PAIF. The terms "transaction" and "operation" are used interchangeably. In the following description, the basic commands that allow the manipulation of the data in the PAIF will be reviewed, that is, insert a new data record in a PAIF, find a data record in a PAIF, and delete an existing data record. Those skilled in the art will undoubtedly appreciate that at the base of these basic primitives plus composite data manipulation operations, (e.g., "Affiliate") can be performed. • Returning to the beginning of Figure 6A, the Client data record 103 (56 in the client table of Figure 3) is shown having a search key "12, 3, 45, (ie, a search key of 5 byte long.) The PAIF of Figure 6A (100) is, of course, trivial and consists of a single node 101 (representing both the root node and the leaf node) joined by a long link 102 to the record 103. The node 100 represents a shift 0 in the search key and the link 102 represents a value "1" of the search key portion (being by this particular mode 1 byte long) at the specified offset. As clearly shown in Figure 6A, the data record 103 is associated with a search path which is a unit consisting of a node 101 and a link 102 defining a shift and a relevant search key portion value that conforms the key portion value of b SEARCHING it is corresponding to that particular offset within the search key registry 'specific data. More specifically, the value of the one-byte search key portion in the offset 0 within the search key "12345" is actually "1". Returning now to Figure 6B-1 a PAIF 108 is shown after the completion of a successive transaction in which the data record having the Client_Id_No "12445" 107 has been inserted (case of data 58 in the Customer table of Figure 3). The search keys of data records 103 and 107 are distinguished only in the third byte (offset 2), ie, "3" and "4" respectively. The unit defined by the root node 101 and the link 102 'is not sufficient to distinguish between the data registers 103 and 107, since the value of the search key portion of a byte in offset 0 for both registers of data is "1". In this way, the node 104 indicates the lowest displacement that is distinguished between the two registers and the links 105 and 106 that indicate the respective 1-bit search key portion "3" and "4" in the offset 2. You should note that the implementation of the PAIF is not limited by the specific examples illustrated in the drawings and the different implementations thereof and may apply, depending on the particular application. Thus, for example, Figures 6B-2 and 6B-3 illustrate two other options for performing the PAIF of Figure 6B-1, where in Figure 6B-2 the total key is represented in the PAIF (e.g. all digits of the record 12445 are specified in the links that start from the root node and end in the data record). This last embodiment is much more explicit and less efficient in terms of space, compared to the spaced embodiment of Figure 6B-3 where only the nodes that are absolutely necessary appear in the tree. Of course other variants can be applied. Before focusing on the description of a procedure for inserting a new data record in an existing database, it must be taken into account that the higher the new PAIF of smaller trie the displacement indicated by this will be (for example , in the PAIF of Figure 6B, node 101 is higher than node 104 and therefore is assigned with a smaller displacement- "0" versus "2"). Generally speaking, the preferred procedure for inserting a new data record into an existing PAIF includes the execution of the following steps: i. advancing along a reference path that starts from the root node and ends at the data record associated with a leaf node (referred to as "reference data record"); in each node the reference path, being advanced along a link originating from the node if the value represented by the link is equivalent to the value of the key portion of 1 bit long in the offset specified by the node; in the case that the specified offset in the node is beyond any corresponding key portion in the key, or if there is no link to the value, advancing along any arbitrary path to any reference data record; ii. compare the search key of the reference data record with the new data record to determine the smallest displacement of the search key portion that discerns both (hereinafter discernment displacement). iii. proceed to one of the following steps (iii.O-iii.3) depending on the value of the discernment displacement: iii.O if the data records are equal then terminate; or iii.l if the discernment displacement matches the displacement indicated by one of the nodes in the reference path, add another source link of the first node and assign the link the value of the search key portion in the displacement of discernment taken from the search key of the new data record; or iii.2 if the discernment displacement is greater than that indicated by the leaf node to which it is linked, through a link, to the reference data record: iii .2.1 disconnect the link from the reference data record ( that is, it remains temporarily "free") and move the link to a new node; the new node is assigned with a value of the discernment displacement; iii.2.2 connect the reference data record and the new node (which now becomes a leaf node) and assign to the link (long link) a value of the search key portion in the discernment shift taken from the search key of the reference data record; iii.2.3 connect the new data record and the new node via a link and assign to the link (long link) a value of the search key portion in the discernment shift taken from the search key of the data record new; or iii.3 if the conditions iii.O, iii.l and 111.2 are not met, there exists, in the reference search path, a parent node and a child node thereof so that the discernment shift, at the same time, is greater than the displacement assigned to the parent node and smaller than the displacement assigned to the new child - (- considered as case A), or all nodes in the reference search path have a value greater than the discernment displacement - ( - considered as case B); therefore, apply the following sub-steps: iii.3.1 for cases A and B, create a new node and assign the node with the value of the discernment displacement, for case A only - disconnect the link from the parent node to the node son and change the link to a new internal node (that is, the child node remains temporarily "free"); iii.3.2 for case A and B, connect by means of a link (long link) the new data record and the new internal node; the value assigned to the link is that of the portion of the search key in the discernment shift, as taken by the search key of the new data record; iii.3.3 for case A and B, connect by means of a new link the new node and for case A - the child node, for case B - the root node (that is, the new node is converted for the case A - a new parent node, for case B - a new root node), and the value assigned to the link is the portion of search key in the offset indicated by the new node, taken from the search key of the record of reference data. UUH It should be noted that for a reference trajectory difference, a different PAIF can be obtained. But a better understanding, the aforementioned "insert data record" operation will successively be applied to the specific PAIF of Figure 6B, each time with a different data record to exemplify the three different cases stipulated in steps iii.l - iii .3. above, with this resulting in the three PAIFs illustrated in Figures 6C-1 to 6C-3, respectively.
In the first example, the CLIENT data record that has the Client_Id (or the search key) "12546" (59 in the Client table of Figure 3) is inserted into the PAIF of Figure 6B. As stipulated in step (i), a movement is made along the reference path starting from the root 101 and ending, for example, in the data register 103 representing the "reference data record". This is implemented by advancing from the node 101 along the link 102 (where in the offset "0" of the inserted data record, the value of a long digit is "1") and after that as in the offset " 2"(as specified by node 104) none of the values of links 105 and 106 (4 and 3 respectively) match the value of the key inserted in offset 2 (" 5") is advanced in a path arbitrary (by this particular mode through link 106) to the record 103 of reference data. The comparison operation stipulated in step (ii) results in the search key of the new data record being distinguishable from the search key of the reference data record 103 in offset 2 ("5" vs. "3" ) and 4 ("6" vs. "5"). The smallest displacement ("discernment displacement") is therefore 2.. Returning now to step (ii), the condition of step iii.l is met since the discrimination displacement is equal to that assigned to node 104. Therefore, and as shown in Figure 6C-1, the new link 111 connects the node 104 to the new data register 112. The value assigned to link 111 is 5 with the byte value being at position 2 in the search key of new data record 112. PAIF 110 of Figure 6C-1 is therefore • the result of inserting record 112 of data within PAIF 108 of Figure 6B-1. Going now to the second example, the CLIENT data record having a Client_Id (or search key) "12355" (57 in the Client table of Figure 3) is inserted into the PAIF of Figure 6B-1. The steps i and ii, stipulated above, result in a reference path starting at node 101 and ending at data register 103. Returning now to step (iii), the condition of step iii.2 already satisfies the displacement 3 of discernment is greater than displacement 2 of leaf node 104 in the reference search path. Accordingly, according to step iii.2.1 and as shown in the PAIF 120 resulting from Figure 6C-2, the link 106 disconnects the reference data register 103 and connects to a new node 121. The node new is assigned with displacement 3 of discernment. Next, according to step iii.2.2, the reference data register 103 and the new node 121 are connected by means of a new link 122. The new link is assigned with the value 4 (the digit value being the three discernment displacement taken from the search key "12345" of the reference data record 103); and finally, as stipulated in step iii.2.3, the new data record 123 is connected to the node 121 via the link 124 which is assigned with the value "5" (the digit being in the displacement 3 of discernment taken of the search key "12355" the new data record 123). The PAIF 120 of Figure 6C-2, therefore, is the result of inserting the data record 123 into the PAIF 108 of Figure 6B-1. The third and final example concerns inserting the CLIENT data record having the Client_Id (or search key) "11346" (55 in the Client table of Figure 3) within the PAIF of Figure 6B-1. Applying the aforementioned steps i and ii results in the advancement of the node 101 to the data register 103 (in Figure 6B) and establishes that the discernment shift is 1. Thus, in step iii, the condition of the step iii.3. is fulfilled. Therefore, according to step iii.3.1 and as shown in the PAIF 130 resulting from Figure 6C-3, the link 102 is moved to a new internal node 131. The new internal node 131 is assigned with the value 1 (being the discernment displacement). As stipulated in step iii.3.2, the new data record 132 and the node 131 are directly connected by means of a new link 133. The value assigned to the link 133 is one (the digit being in the discernment offset 1 taken from the search key "11346" of the new data record 132), and finally, according to step iii.3.3 the internal node 131 new is linked to the node 104 by means of the link 134 assigned with the value 2 (the digit being in the discernment shift (1) taken from the search key "12345" of the reference data record 103). Although the PAIF described above with reference to Figures 6A-6C can be accommodated within a block, however it is preferable to separate "nodes" and "data records" so that the data records are grouped into a separate file or files. Applying this approach to the PAIF of Figure 6C-3, results in the generation of the data record file containing the records 132, 103, 107. The links 133, 106 and 105 are, of course, converted into long links . Obviously, if an insertion procedure results in finding that the data records when being inserted already exist in the PAIF, an appropriate error message is returned to the procedure invoked by the Insert command. It should be noted that in the last examples it is assumed that the entire PAIF resides in a single block.
Obviously, when additional data records are inserted following the "insertion procedure" above, a block overflow may occur, which necessitates (as will be explained in detail below) invoking a "split block" procedure, and thereafter it is necessary to advance to the searched block and perform the insertion procedure in a manner specified above. Having described a typical "Insertion" transaction, a transaction of "Finding data record (or Recovery ") will now be described, in this way, to find a data record by a given search key (from here on out the searched data record) in an existing PAIF, the following steps must be executed: i. move along a search path that starts from the root node and ends in the data record linked to a leaf node, and for each node in the search path (hereinafter "current node") perform the following substeps: il for each link originating from the current node: compare the search key portion of the searched data record in the offset defined by the value of the current node to a value assigned to the link; in case of a match, move along the link and return to step i.l; i.2 if none of the links originated by the current node matches the search key portion of the searched data record, return "NOT FOUND" and finish the find procedure; i.3 if the data record is reached (hereinafter "reference data record"), compare the search key of the searched data records as a total, to the key of the reference data record; i.3.1 in case of returning "FOUND" (and in case of "Recover", also return the entire data record) and finish the find procedure; or i .3.2 in the case of no agreement return "NO FOUND "and finish the find procedure For a better understanding the" find "procedure will be applied, twice, to the specific PAIF of Figure 6C-3, accommodating the results of" found "and" not found "respectively. In this way, consider a data record to be found by a search key "12445" (hereinafter searched data record) .According to step il, the value of the digit "1" in the offset assigned to the node root (offset 0) of the searched data record is compared to that assigned to link 102 (the link being only originated from node 101). Since a match is found, the control is changed to node 131. Once again, according to with the step il, the value of the digit ("2") in the offset assigned to node 131 (offset 1) of the searched data record is compared to that assigned to link 134. Here also a match is found so that the control It changes ia to node 104. Then, according to step il, the value of digit "4" in the offset assigned to node 104 (offset 2) the searched data record is compared for each link originating from node 104. The comparison results in a match for the link 105 and consequently the control is changed to the data register 107. According to step i.3 the search key of the searched data record and that of the data record 107 are compared and since a match is found, a result of "FOUND" is returned (step i.3.1). Returning now the second example, consider the case when the searched data record has a search key "12463". The procedure described with reference to the previous example is repeated, however, in step i.3 the comparison between the searched data registers and the data register 107 results in a non-agreement, and in accordance with step i.3.2 returns to a "NO FOUND" result. A general "Delete Data Record" transaction will now be described. In this way, as a first stage, a "find data record" transaction is applied to the PAIF. In case of "NOT FOUND", an appropriate error message is returned to the procedure that invoked the "Delete" command. Alternatively, the searched data record is found. For clarity of explanation of the "Delete" procedure, the following nomenclatures are presented: The leaf node that is linked to the searched data record is referred to as the "target node". The parent of the target node is referred to as the "predecessor target node". The link that connects the predecessor target node to the target node is referred to as the predecessor link. And the link connecting the target node to the child node thereof (to a data record other than the searched data record) is referred to as the "target link". Keeping this nomenclature in mind, the following steps are executed: i delete the searched data record and the link that links the target node to it; ii. if the number of links remaining in the target node is greater than or equal to 2, then the deletion procedure terminates; iii. if, on the other hand, the number of links remaining in the target node is exactly 1 (ie, an objective link), then: iii.l "bypass" the target node by connecting the predecessor link of the predecessor node to the child node ( or to a data record); and iii.2 delete the target node and the target link; ending with the deletion procedure. It should be noted that the current step is more than a step of "prudent memory management" to be able to free the space occupied by the link and the target node, to enable the assignment of them to other nodes and links in the block. It should be further noted that step (iii) is optional. For a better understanding the previous procedure "delete the data record" will be applied to the specific PAIF of l? Figure 6C-3. In this way, in response to the command "delete the record that has the search key =" 11346", the last record is searched in the PAIF according to the procedure described above, having found the data record 132 and in accordance with In step i above, the data record as well as a link 133 leading to it are deleted, since after the step of erasing the latter, target node 131 remains only with target link 134 alone, step iii and iii 1 are applied, and therefore the predecessor link 102 branches to the target node 131 and is directly linked to the child node 104 thereof, Then, according to step ii.2, the target node 131 and the target link 134 are erased with this, obtaining the PAIF shown in Figure 6B-1, another example is provided with reference to the PAIF of Figure 6C-1, in this way, in response to the command "delete record having search key =" 12546", the last one registration is searched in the MYP F according to the procedure described above. Having found the data record 112 and in accordance with step i above, the data record as well as the link 111 leading to it are deleted. Since, as stipulated in step ii, the number of links remaining in the target node 104 is two (i.e., links 105 and 106), then the deletion procedure terminates. The resulting PAIF is once again shown in Figure 6B-1. Another common primitive is "Modify the existing data record", that is, change the address of an existing client. The primitive "Modify" is normally performed selectively using the aforementioned primitives. To execute a "Modify" command one must distinguish between the following cases: 1. The "modify" applies to fields other than the search key (for example, modify the address of a client that has a Client_Id_No = "xxxxx") - in this case the modifying procedure simply implies a "Find" operation (a data record that has a Client_Id_No = "xxxxx"). Having found the data record searched, the previous address is replaced by a new one. 2. "Modify" applies to a search key field (for example, changing an account number from "xxxxxx" to "yyyyyy") • This command is performed as a sequence of two or three primitives, that is, delete the data record that has "Account_No" = "xxxxxx" and after that insert the data record that has "Account_No" = "yyyyyy" or vice versa. Obviously a Modify transaction can consist of both cases. In the previous examples each search key is represented as a series of bytes and therefore the search procedure is performed by fractionating the search key into search key portions each consisting of at least one byte. Those skilled in the art will readily appreciate that bytes are not just the possible representation of a search key. Thus, for example, a search key can be represented in a binary form, ie a series of l's and 0's and consequently the search procedure is performed by fractioning the search key into portions of the search key each consisting of one bit (ie 1 = 1) or more, for example one byte (ie 1 = 8 bits) and others. In certain cases, it may happen that the value 1 is not identical for all the nodes in the PAIF. It should be further noted that different links in a given PAIF can be assigned with search key portions of different lengths while the respective search key portion is known by the corresponding node. As is clearly evident from the different PAIFs of Figures 6A-6C, the data records are contained in a classified form according to the search key. When navigating, for example, the ordered series "11346", 12345"and" 12445"is carried out in the PAIF of Figure 63-C (right to left), which is another advantage that facilitates data manipulation. in comparison with the tree in Figure 5 where the data records are not stored As specified above, a node in the PAIF is not necessarily uniquely classified, in this way, for example, in the PAIF 120 of the Figure 6C-2, the node 104 is at the same time a leaf node (linked, by means of a long link 105 to the data record 107) and an internal node (linked by a short link 106 to the node 121).
Those skilled in the art will readily understand that the "Insert", "Delete", "Find" and "Modify" procedures described herein are only one output of many possible variables for performing these procedures and can be modified, as required and appropriate depending on the particular implementation. The specific delete and find and insert transactions apply to a transaction called intra-block. As will be explained in more detail in the following, applying the last transactions in the inter-block context needs to address several cases that are irrelevant in the intra-block operation. Having explained the structure of the trie PAIF, a description of several embodiments according to the invention follows, where a layered index based on a PAIF index schema including a PAIF tree (as a basic fractioned index) is shown. Returning now to Figure 7A-H, schematic illustrations of the layered index constructed in response to a succession of split block operations are shown, according to one embodiment of the invention. Consider for example a block 140 in Figure 7A (in the basic fractional index) which overflows in terms of memory space. This being the case, a "split blocks" procedure is invoked which results in a layered index 142 of Figure 7B consisting of a root block 144 and a duplicate A 'node (155) linked to the leaf block 146 by means of a direct link 145 and by means of a long link 146 to a leaf block 148. By this specific example, the split point was selected to be the link 149 (Figure 7A) (hereinafter "split link") with this by changing the nodes A, B, E, D and F to the new block 146 and the nodes C, G, I, J, K, L and H to a new block 148. The divided link is preferably selected in order to essentially perform a uniform distribution of nodes and links between the new blocks (for example, the size of the sub PAIFs residing in blocks 148 and 146 is essentially the same). In the event that a parent block does not exist, a parent block -144 (constituting I?) Is created with a duplicate node A '(155) of the divided node A (156). In the event that a duplicate node of the divided node from which the split link originates is no longer resident in parent block 144, the node is copied to the last block (marked A ') and the connection between node A' (155) and the block in which A (146) resides is implemented by means of direct link 145. The link 149 divided (being originally a short link between A and C) it is replaced by a long link 147 between A 'and the block in which C resides. Optionally, the nodes A and C (156, 153 respectively), can also be linked by means of a split link marked as a dotted line 150. The net effect is that a layered index consisting of blocks is provided in Figure 7B. , and the blocks of the trie are 146 and 148. Those skilled in the art will readily appreciate that it is now possible to access or update data records not through the trie (ie starting from node A 156) but through the index in layers (that is, starting from node A '155). In this relation it should be noted that the link 147 has the same value as the link 150, and which in turn has the value of the original link 149 of Figure 7A. Now considering that block 148 overflows and undergoes a similar block division procedure resulting in index 151 in layers in Figure 7C. By this example the split link is a short link 152 of Figure 7B and accordingly, nodes C and H reside in block 148A of Figure 7C while nodes G, I, K, L, and J reside in the block 148B. The node from which the split link originates (node C-153 of Figure 7B) is duplicated (producing a duplicate node 153a of Figure 7C) and placed in a block 140 marked C. As before, the direct link 154 is connected to the node C 153a copied to the block 148A of the original divided node 153 while the link 155 is a far link to the divided block 148B and the value of the link is the original value of the link 152 between the nodes C and G before (and after) the division. In Figure 7C, the layered index 151 is constituted by the trie that includes the blocks 141, 148A and 148B that form the Jo and the block 16 that forms a representative index on the common keys of the trie. It should be noted that in Figure 7C node A in block 141 and node C in block 148A are optionally disconnected and likewise node C of 148A and node G of 148B 'are optionally disconnected. As clearly shown, nodes A 'and C are connected to block 140 to form a trie (connected) and therefore it is possible to access blocks 141 through node A' and direct link 156; block 148A through node A ', C and direct link 154; and block 148B through nodes A ', C and direct link 155. It should be noted that the value of the link between nodes A 'and C (in block 140) is identical to the original value between nodes A and C (see link 149 in Figure 7A). As can be clearly seen in Figure 7C, the resulting layer index - constitutes a balanced block structure with this keeping the depth of the index to a minimum and consequently minimizing the number of accesses (normally, but not necessarily, 1/0 operations). ) that are required to be able to find, insert or delete a given data record. Considering now that in order to access the data record the layered index maintains substantially a logarithmic function that depends on the number of records, the layered index is more efficient in terms of numbers of 1/0 operations required for the access of a registry of data. given data compared to the number of 1/0 operations required to access a data record through the trie. Thus, for example, to access the data record that is associated with the node J through the layered index, it is initially required to access block 140 and then block 148B and then the searched data record ( that is, three I / O operations). In comparison, accessing the same data record through the trie provides around four I / O accesses, mainly block 141, block 148A, block 148B and data register 159. As shown there are few particular cases in which the trie is more efficient (for example, accessing a data record associated with the node A), however, the larger the trie (ie constituted by more blocks) the more efficient the access through the index of the index in layers. By means of the particular modality of Figure 7, the representative index and the trie (being a modality of the basic fractional index) comply with substantially the same index scheme, ie the PAIF. By "substantially" the same scheme is meant that there are some differences as explained with reference to Figure 9G below. The considerations along with the duplication nodes at the higher layers Jj in the layered index are further illustrated by reference to the additional examples represented in Figures 7D to 7H. Thus, consider the layered index of Figure 7D where the division of blocks is performed on link 400. The resulting layer index is illustrated in Figure 7E, where block 402 is created, node 401 is copied to a higher level block 402 (forming part of the scheme in layered index) and the original link between nodes B and E is optionally retained (via link 104 dotted). Through the node B it is now possible to access the two blocks of the trie (405 and 406), by means of links 407 and 408, respectively. Next, if it is now required to divide the block 405 in, say link 409, the resulting structure now appears in block 402 of Figure 7F, where nodes A and I of block 405 are duplicated A 'and I' (410 and 411) in block 402. The node I 'obviously is a duplicate node of the new I divided in block 405. However, node A is also copied considering that both nodes B (whose counter part B' is a priori resident in block 402) and I (whose I 'is now duplicated to block 402) are descending nodes of A. Node A is the lowest progenitor node of nodes B el, and thus a trie (connected) is formed in block 402. The value associated with the short link 414 (between blocks A 'and B' in block 402) is of the same value as link 412 (between A and B in block 405). The value of link 415 (between nodes A 'and I') in block 402 is of the same value as link 413 originating from node A in the direction necessary to access node B. The internal structure of block 412 is such that it allows a search for the representatives of the blocks 405, 406 and 407. The direct links 416, 417 of the nodes 422 and 411 are optionally retained since it is possible to move along the direct link 418 to block 405, seeing that node 410 is maintained in the access path for nodes 422 and 411. Figure 7G shows the resulting layer index after dividing block 407 of Figure 7F (at link 420) and Figure 7H shows the index at layers resulting after dividing block 402 (at the link between nodes I 'and N'). The resulting layered index in Figure 7H has, as shown, three layers, the first consisting of block 430, the second consisting of blocks 402 and 408 and the trie consisting of blocks 405, 407, 426 and 406. Those skilled in the art will readily appreciate that the manner of performing the division of the block is, of course, not limited to the examples of Figures 7D to 7H. Having described a method of constructing a layered index by the split processes resulting from the succession of insertion transactions (with reference to Figure 7), it will be appreciated that the set procedure, ie "Clear Block" is activated when a Data record is deleted leaving only one node in a block that has no data records associated with it. Those skilled in the art will readily understand that the layered index described with reference to Figure 7 is only one of many possible variants for performing the index in layers, wherein the representative index and the basic fractional index are substantially the same. • The use of a PAIF in the manner specified constitutes an advantage over some of the triads hitherto known in the sense that the layered index thus realized has a balanced block structure despite the fact that the trie itself it may possibly be unbalanced. Attention is now directed to Figures 8A-BB by showing two respective illustrations that exemplify the application of the technique of the invention in accordance with another embodiment of the invention. Thus, Figure 8A illustrates a given trie structure having a vertical orientation (i.e. constituting a vertical tree) which, as shown, is unbalanced, i.e., a depth of three blocks (260, 261, and 262). ) against a depth of two blocks (260 and 264). The following description does not seek to explain the search scheme of the specified vertical tree but to emphasize only those aspects that are required to obtain a balanced layer index. However, it should be noted that the -nodes in a trie structure 260, mean displacement in a half-byte size. (The values of nodes are presented in a hexadecimal representation) of the data records (a-k) shown in Figure 8A. It should be noted that an additional I / O operation, ie accessing three blocks - (other operations and -) to be able to access the data record k compared to a block (or an I / O operation) to access the record b of data as represented in Figure 8A, it can be taken as balanced. In some real-life cases this does not necessarily require applying the technique of the invention in order to be able to provide exactly the same number of 1/0 operations. Of course, additional insertions of the data records can generate a higher degree of "unbalance", which, if not handled by the technique of the invention, will provide degraded performance (due to the unbalanced structure) as discussed. with detail above (with reference to the previous techniques). Figure 8 illustrates a possible embodiment of the invention. As shown, a representative index consisting of a block 270 (which forms l?) Is constructed with the result that a horizontal balanced tree is obtained by having a root block 270 from which all the blocks of the lower level vertical tree ( the latter is the unbalanced trie) are accessed through an I / O operation. As shown, the current access of the blocks in the first vertical tree (being the trie) is achieved by means of the common key value of each block. Before continuing, the common key term will be exemplified with reference to Figure 8. The common key of block 260 (in a hexadecimal representation of half-byte units) is 0x4, Oxl and 0x3, where 0x4 represents the most significant bits of the byte of the character A and Oxl represents the last significant bits of the character A, and 0x3 represents the most significant bits of the characters that reside in the two displacement of the data records. It should be noted that all records that can be accessed through block 266 share the common key prefix specified above. In the same way, the following table summarizes the common key of each block: NO COMMON KEY BLOCKS 260 0x4, Oxl, 0x3 261 0x4, 0x1, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 269 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3 , 0x3 264 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x4, 0x3 It should be noted that block 261 can accommodate a root node with a value of 8, thus, the common key, hereinafter k of the block, is changed to be 0x4, Oxl, 0x3, 0x3, 0x3, 0x3, 0x3, 0x3, that is, it consists of 8 units. In this case, the representative of block 261 in Ix should be changed accordingly. In a different implementation, the representative of 261 is k, even if the root node with the value 8 does not exist. The index on the common keys is achieved in the representative index that (consists of block 270) so that it constructs a trie that addresses the common keys of the first vertical tree. Now, for example, in order to find the data record g, one follows the node 290, the link 291 to the node 292. Next, one proceeds with the link 293 directly to block 261, which is associated with the data record g . The resulting layer index is balanced. As specified above, for the specific case of the trie, the representative key of a block is a common key. Generally speaking, the common key of the block is the longest prefix of all the keys of the data records that can be accessed from the block by means of the relevant index scan. For the PAIF, the specified prefix size (calculated in units of one bit long) is equivalent to the value of the root node in the block (which already contains the offset value). If the prefix size is expressed as the number of bits, then the prefix size calculated as the offset value multiplied by the value of 1 bit long. A description of yet another embodiment of the construction of a layered index of the invention is shown below with reference to Figures 9A-9G. Accordingly, attention is now directed to Figures 9A-9G which show a transaction succession of modifying (inserting) into a PAIF tree (which constitutes a trie that is susceptible to an unbalanced structure) and the index layered in this way obtained. For convenience of presentation, the data records are shown as part of the trie. As specified above, the current manner in which the data records are associated with the trie may vary depending on the particular application. In the following Figures, a layered index is constructed by successively inserting the unclassified data records following AF (which for convenience of representation form part of the blocks): The data string is presented as a series of bits where the portion of 1 bit represents 1: A = 001000011 B = 110011100 C = 011011111. D = 011011011 E = 101010101 In the first step (Figure 9A), the register A is inserted after which the block 300 includes a node 301 having a displacement 0, being associated with the first register A through the link 302, having the value 0. In this stage, the tree consists of block 100 having only one node. The index scheme dictates that the search path to the data register A is determined according to the value 0 at offset 0 as represented at link 302 and node 301, respectively. After this (Figure 9B), the data record B is inserted, which, as can be clearly seen and distinguished from the data record A, at zero offset, the key value is 1, and consequently, the link 302 it takes to the data record B and is assigned with the value 1. Next (Figure 9C), the data record C is inserted and the value of it in the displacement 1, serves to distinguish it from the record A. The links 303 and 304 connect node 305 (standing for offset 1) to the data records C and A specified respectively, since Block 300 accommodates nodes 301 and 305, it is not required, as such, to split the block. Next, the data record D is inserted, and the structure of the block following the insert operation is shown in Figure 9D. However, since the data block can not accommodate more than two nodes, (an overflow occurs), it is now required to split Block 300. Figure 9E illustrates the tree structure after division. In this way, link 306 is the divided link with the motivation that approximately the content and half of the block will be retained in Block 300, and the content of half of the remaining block will be moved to other blocks 310. Of course, other links can likewise be selected to be the divided link. As a first step, the block 300 J0 is replaced with two blocks 300 and 310. The nodes 0, 1 (designated 311 and 313 respectively) and the data registers A and B are retained in the division block 300, while node 6, data records D and C, (representing in this particular embodiment the remaining nodes), they move to block 310. Accordingly, the basic fractional index of Figure 9A now consists of two blocks 300 and 310 (which in fact constitutes the unbalanced trie). After this, since the block of Bi does not exist, it is created, and, therefore, block 312 is provided. The divided node (313) is copied to block (312) to thereby constitute a duplicate node (314) . Next, the duplicated node 314 is connected via link 316 directly to block 300, and duplicate node 314 is linked by way of a far link 318, to block 310. This far link replaces the original divided link 306 which is marked in Figure 9E on a dotted line. The value of the distant link 318 is the same as the value of the divided link. In this way, the representative index (constituted by block 312), allows to search in accordance with the common keys the basic fractioned index. It should be noted that there are no limitations as to whether the divided link should be deleted or retained. As shown, in the horizontal tree thus obtained which constitutes the index in layers (here consisting of blocks 312, 300 and 310, of which 312 belongs to the representative index), it is balanced. • Next, the data record E is inserted. In this case advancing in the horizontal tree (being a layered index form) of the first node 314 of the block 312 (having a value of 1) is not possible by means of the far link 318 since it represents the address 1 of the value of the node 314 (having a 1), and a link in the address 0 is required. In this way advancing via link 316 directly to block 300. Thus, the block that needs to be associated with a new data record is found, of the same In this manner, the data record F is inserted resulting in a tree structure shown in Figure 9F. Next, if a division between a node 320 and a node 321 of the block 300 is performed, the node 320 is copied to the block 312 (designated 323 in Figure 9G) and since it can not be linked to the node 314 of block 312 (already which will not retain the correct intra-block links of the nodes-) node 311 of block 300 is also copied to block 312 (designated 322 in Figure 9G) in order to create a trie (connected) that allows search by the search scheme of the blocks 300, 326, 310 according to the common keys of the blocks. It should also be noted that instead of having direct links of all the copied nodes 314, 322, 323 of block 312 in Figure 9G, it will be sufficient to have a direct link from copied node (322) to block 300. A far link 324 of node 323 is established in block 126 in the direction of the link before the address (link address 315 of Figure 9). Obviously, if another division is made in block 326, it will be represented in block 312 by a connected node of node 323 via the link in address 1 having a direct link to Bi_? and a far link to the Bi-i block. Figures 9A-G and 8A-B illustrate two of many possible ways of realizing the split block mechanism that maintains the balance structure of the invention by constructing a layered index. The flexibility in adopting another non-limiting variant is shown in for example, Figure 8B where the near link 271 and the direct link 272 are represented by the far link 273 (marked in dotted lines) with the address as the link 271 providing this mode a redundant node 276. So many modalities are concerned to such a degree that the balancing technique of the invention confers on the balanced horizontal-oriented digital tree thus obtained (a form of a layered index structure) the characteristics so-called "probabilistic access". This means that a search together with a data entry record (for example search of an A data record) can reach a different data record or a node where there is no link to the address prescribed by the index scheme and may require to apply a "correction" to be able to eventually access the searched data record. For a better understanding of the above, consider, for example, Figure 9E. Consider, for example, that a search transaction is applied to the index in layers. Of Figure 9E with the searched data record L = 111011110. The search path will follow node 314 and link 318 (offset 1 value 1, respectively) and then at offset '6' (root node of block 310) through link 319 (value '1') to the register data C. This last example exemplifies the probabilistic search characteristics of the index in layers thus obtained.
• In order to solve the specific fault, the common prefix size of the key of the searched data record and the key of the data record are calculated. The common key of the block (310) is the prefix portion of the key of the current data record C. Thus, the size of the common prefix is 0. Next, scale the tree to the node in the access path that has a value equal to or smaller than the size of the common prefix that has a direct link. If this last requirement is not met, that is, all the nodes have a value greater than the size of the calculated prefix, then from the first node in the access path that has a direct link (which should point to the first block of the IS index) -x). Now, node 311 is moved by means of link 316 directly to the lower level vertical oriented tree (i.e., layer I? -?) • And hence to continue the search path as prescribed by the index scheme . According to another case, if the index scheme prescribes going in a given direction and there is no link in the desired direction, the search path follows the direct link from a node with the largest value in a search path (which maintains a direct link). When moving from block to block, a comparison with the common key (if available) or with the data records associated with the nodes (if available) may lead to a decision as to whether or not the scheme is advanced index or return to a node with a direct link. It should be noted that the common key is not necessarily physically attached to the data records. Returning to the previous examples (searched data records L) and associated data record C of Figure 9E, if the common key of block 310 (being 011011) is kept in the block, there is no need to access the data record C. In this way, since the common prefix of the key L and the common key of the block is 0, one can return to node 314 and link 316 without accessing register C. By avoiding the need to access the data record in the Specifically, of course, it has the advantage of improving performance. The criterion for knowing that the searched data record does not reside in the tree is that the size of the common key prefix of the searched data record and the common key of the block is greater than the value of the divided node. In the last example, the value of the divided node is 1 (from node 313), so block 310 is not the block that accommodates register L (if the record exists). Therefore, the search for the continuous register L of node 314 and link 316. This procedure applies to all modification transactions. So far, in terms of inserting transactions, blog 350 in the manner specified above is associated with the new data record L. The last example refers to a specific example of the index in layers. Those skilled in the art will readily appreciate that these latter probabilistic access features apply muta tis mutandis to other types of indexes in layers that use a basic fractional index. The probabilistic search characteristics that lead to "errors" are based on the fact that not necessarily the complete common key of a block in the layer Jh-? is known the values of the node that resides in the search path to the block in Jn-? . In this way, it is necessary to know the common key of the block in Jft-? to verify if the search path in the specified block matches the search path according to the key of the searched data record. If the common key is not maintained in the block, it may be necessary to advance the index to a data record in order to know the common key value. The inherent error-prone characteristics of the index in layers and how to handle it has been exemplified by referring to Figure 9 above, and can be described more generally as follows: to search for a record by means of the key k, the latter is searched in Jh (and in some cases in Jh-i to Ji_? or in the data records in order to find the B block of Jh -? which carries ak.This process is repeated until reaching the Jo block that is associated with the data record with the key k (if it exists) .The description in Figures 7 and 9 exemplifies a layered index using a schema of index formation based on the PAIF as the basic fractional index and the representative index Those skilled in the art will readily appreciate that the layered index of the invention is not limited only to the PAIF.So, for example, US 5,495,609 illustrates a different trie Consider, for example, the trie of Figure 10 according to the '609 patent specified, and assuming that the trie consists of a block accommodating nodes 11, 12, 13 and 14. If it is now required to divide the subse block count to the insertion of new nodes in the tree, a possible approach of dividing the block according to the previous techniques, would be, for example, to break the link between node 12 and 14, in order to obtain two blocks, one accommodating the nodes 11, 12 and 13, while the other accommodates node 14 (hereinafter new block). Assuming that the first block resides in the internal memory, if it is now required to look for the record 26, only one I / O operation will be required. If on the other hand, register 20 is of interest, a first I / O operation is required, in order to access the new block (i.e., an accommodation node 14), and another I / O operation is required ( that is to say second), in order to access the register 20. Therefore, it can be seen that the divided block produced an unbalanced tree. Subsequent insertion infections can adversely affect the unbalanced feature of the tree, that is, it requires multiple I / O ports which are obviously undesirable. Applying the technique of the invention will deal with the disadvantages of an unbalanced tree, and the resulting layer index is illustrated in Figure 10B, where the representative index is constituted by block 159A on the representative keys of the trie (constituted by blocks 159b and 159c). Here also, the link between node 12 and 14 is considered as a split link, and the new node 159D (without a replica of node 12) is copied into a new block designated 159A. Now, in order to access registry 20 and registry 26, the same number of I / O operations is required, and in this particular case, 2. As the size of the trie grows, access is more efficient using the index in layers. The layered index of Figure 10B thus provides a balanced tree of blocks, ensuring that essentially the same number of I / O operations are required to achieve each data record in the tree. Those skilled in the art will readily appreciate that preferably the number of I / O operations is a logarithmic function that depends on the data registration number and the number of links originated from a block. Thus, for example, if a thousand links are originated from a block, a layered index with three levels allows access to 1,000,000,000 data records. For a better understanding of the above, a numerical example is presented below. Taking into account that each block has 1000 distant links. Assuming that the size of each distant link is 4 bytes, it easily emerges that the size needed to represent the far links is 4000 bytes. Assuming that nearby nodes and links within a block occupy another 4,000 bytes, the resulting block size is less than 10,000 bytes. For the sake of discussion it is assumed that each block size is 20,000 bytes. Now considering a layered index consisting of a blank (for example block 144 in Figure 7B), like the index layer I? and assuming that it is linked to a thousand blocks in layer J0 (of which only two blocks 146 and 148 are shown in Figure 7B), the index in layers adds a total of 1001 blocks, each having a size of 20,000 bytes. Therefore, the total space that must be allocated to contain the indexed blocks in layers is approximately 20 megabytes. This size order can easily be accommodated in the internal memory of, say, a personal computer. Taking into account now that each block in Jo is associated with a thousand records of data, the net effect is that when using a layered index of the invention (according to the last modality) that fits in its entirety in the internal memory, One million data records can be accessed without an I / O index. At the same time, to access billions of records, one more layer of index may be required which may require an additional I / O operation. For a better understanding of the above, consider for example the implementation of the index in layers in Figures 6B-1 or 6B-3 (PAIF index scheme). If the registration keys 103 and 107 are larger in size (for example, 100 long bytes), this will not have changed the size of the PAIF. Another non-limiting example can be shown in Figure 8B-the size and structure of the index in layers will not change if the size of the key of the data registers a-k directed by the index is 200 long bytes. As you can see, it is also possible to navigate in the index and retrieve the a-k data according to the order of the key. This exemplifies a form of sequential operation. As shown, the resulting layered index of Figure 10B includes two trees having a vertical orientation, i.e., the first tree structure consists of blocks 159B and 159C (being a form of the basic fractional index Jo) and the second tree having a block 159A (being a basic fractional index form Ji). The horizontal block tree thus achieved (being a layered index form) is balanced, that is, the root block 159A which, through an I / O, allows access to all links in the records of data. Additional insertions of the data records will also lead to further divisions in the Jo blocks, which will, of course, require updating the index layer I? . When the number of nodes in block 159A of I? exceeds a given number, block 159A is divided according to the division mechanism. A trie index with which the technique of the invention is related, is not confined to the search tree described in the '609 patent, and may include other types of trees as explained above. It should be noted that the intra-block structure is not necessarily balanced, that is, the nodes within the block are not necessarily accommodated in a balanced structure. While this fact may seem a disadvantage, those skilled in the art will readily appreciate that its implications on the overall performance of the database is virtually negligible. This starts from the fact that the intra-block search scheme is normally performed in the fast internal memory of the computer system. In comparison with the intra-block search scheme, the arrangement of the block within a layered index is retained in a balanced structure, with this the number of blocks in a search path is a logarithmic function that depends on the registration number of data and therefore reflects the number of 1/0 accesses in the external memory (an operation that is inherently slow) in order to load a desired block in the internal memory. In this connection those skilled in the art will readily appreciate that the present invention is in no way limited to a given physical embodiment. Thus, for example, regarding the search scheme, while the inter-block retains the search scheme after applying the technique of the invention, this applies the logical concept of, for example, advancing the index in layers according to the displacements and the displacement values. This latter general concept can be realized in many ways, all of which are included by the technique of the invention. Thus, for example, the displacement size (in terms of bit numbers) that is accommodated within each node can be altered, the way to perform empty pointers (ie pointers pointing to null-not having children) and others . The ultimate flexibility of physical realization also applies to the inter-block portion.The layered index described with reference to Figures 7 to 10, retains essentially the same index scheme for both the trie and the representative index scheme, (except for the error handling that can be found when data logging is accessed through the index, as explained in detail with reference to Figure 10G above.) The retention of the index scheme for both the trie and the representative index is not mandatory as will be exemplified with reference to Figure 11. Figure 11 illustrates another approach to balancing an unbalanced tree of Figure 8A (ie building a layered index) using a conventional tree B as a representative index on the representative keys of the unbalanced trie. The horizontally oriented balanced tree thus obtained (layered index) includes blocks 272 at the upper level (index layer J2), 270 and 271 at a lower level (index layer Ji) and the original oriented tree blocks vertical unbalanced of Figure 8A in the lowest Jo index layer (blocks 260, 261, 262, 264). Figure 4 demonstrates in this way that the index scheme of the representative index is not necessarily the same as that of the original unbalanced trie. If desired, the whole B-tree (which forms a representative index) can be taken as a layer of index J ±. The database file management system of the invention not only deals with the disadvantages of the conventional trie index training file but also offers other benefits that facilitate and improve data access through user application programs. In this way, the fact that a balanced block structure is retained, ensures that, on average, the number of slow I / O operations is retained essentially as optimal, that is, a more efficient result is obtained, more particularly when they take into account large files that consist of a multitude of blocks. Those skilled in the art will readily appreciate that while preferably the layered index construction is applied to the slow I / O operations, that is, to minimize the number of accesses to the slow external storage medium. Thus, for example, the storage medium with which the present invention can be applied can also be an internal memory. This has a particular relevance considering the increasing volumes of internal memories that although they are faster than the external memories, they may also require an efficient access control which is done in accordance with the invention.
Next, a description of the second aspect of the invention is presented. For convenience of explanation, the second aspect of the invention will be described with reference to the PAIF index (constituting a designated index). The invention is in no way limited by this specific example. As mentioned above, the database file management system of the invention allows addressing different types of data records using a single index. In order to better distinguish between the data records of the different types that are directed by the same PAIF index, each data record belonging to a given type is associated with a given designator. The latter is part of the key to the data record that constitutes a designating key. The designator is unique for each type of data. Thus, for example, the data registration key belonging to the entity "Borrower" is prefixed with the designator 'A', while all the data registration keys belonging to the entity "Books" they are prefixed with designator 'B'. The new data record key belonging to the borrower becomes a designated key that now consists of the concatenation of 'A' and the original borrower key, and at the same time, the new designated key of the data records belonging to the Book now consists of the concatenation of 'B' and the original key of the Book. Having discussed the so-called "designator" characteristic of the second aspect of the invention, a description of the so-called metadata is shown below. According to one aspect of the invention, a data dictionary maintains metadata information, which provides information in the data records as a function of the type of records. Thus, in addition to data records it is necessary to maintain a designator, to be able to identify the designator and using the metadata information, to be able to identify or construct the designated key as well as other information such as size register. The index search scheme is absorbed in the metadata. Locate the registry of the designator (or compound) key without using the metadata. The metadata is required to construct the designator key (composed) and, once the record is retrieved, to determine the properties of the record. Thus, for example, "having retrieved the book data record, the designator -B- is identified, and the information in the record designated as B is available from the meta-data, for example, the size of the book record , its fields and the fields that are the key fields.
The use of designated data records is not limited to a single type, instead (preferably) more than one type can be treated by the designated index as will be explained below with the subordination relationship. Thus, according to previously known solutions, data of different types are typically contained in various files (and are directed by various index files), according to a database management system that uses an index designated of the invention, the data records of the different types can be addressed from the same index. It should be noted that the keys of the data records belonging to different types (and which are directed by the same designated index) do not necessarily have the same length.- Thus, for example, consider a layered index which also is a designated index, based on a trie as its basic fractional layer index of the type shown in Figure 8A. The size of the key of the records that belongs to the identity of "Borrower" is 6 long bytes, while the size of the key of the records belonging to the "Books" entity is 5 long bytes. Inserting the books in the designated index of Figure 8A with the designation keys Blllll and B22222 results in a data structure of Figure 12 that includes a designated index that addresses two types of data records - ak data record that are assigned with the designator A and the wx data records that are assigned with the designator B. In the following description, the registration terms of type X or registration designated as X are used to describe a record that has a designated key and the designator is X. While the last example illustrated a way to perform designated data (ie pre-pending as a prefix, .a character, string or any number of bits), to the key of the data record, those skilled in the art easily You will appreciate that this is just one of the many possible variants. In fact, the proposed designator can be done in any known way taking into account that the designator distinguishes between different data records, treated as part of the key, and therefore forms part of the search. This last statement applies, regardless of whether the designator: (i) is part of the data record (or key portion), (ii) being stored elsewhere (for example, in a different data structure), or (iii) may be defined elsewhere, or even defined otherwise. An example of the latter is a trie structure that is associated with the data record all of the same type (for example, all are designated with a character A). Obviously, by this example, it is not physically required to join the designator to the cases of data records, seeing that the designator is common to all records. However, if the data record is accessed, it is necessary to identify the designator and add it to the key. Another possible solution is to place a prefix in the designator in the data record so that when the data record is accessed, the designator becomes available. For example, consider Figure 12, the data record d is accessed from node 266 via link 270. The first data record character d is A-the designator. For a better understanding of the subordination relationship, attention should be directed to Figure 13A-13E. Figure 13A illustrates a designated index 800 (in the form of PAIF) with four data records 802, 804, 806 and 808 (of which only the designator keys are shown) associated with it. The data records are all of the same type as it easily emerges from the 'A' designator that is prependent in each of the data records. Returning now to Figure 13B, it is shown in the PAIF 800 with new data record 812 with a composite key A12355B940201333333 (the designator of record 81 is B). The new data record is subordinated to data record 806 whose key is A12355. According to the PAIF index, the node 814 indicates that the discrimination displacement is 6 and that the value B is linked to the data register 812 (having the value B in the displacement 6). Seeing that register 806 has no value in offset 6, it is assigned with a virtual value (say null) in this offset to be able to determine the discrimination offset vis-a-vis the other register and accordingly, link 818 is then set with address marked as null. Figure 13C illustrates the PAIF 800 in which another data record 820 is inserted. The data record 820 representing another type B data instance that is subordinate to the type A data record (806) is inserted into the PAIF. The discernment offset is 11 (the value of the new node 822) and the link values thereof are '0' and '1' for the data registers 812 and 820, respectively. Figure 13D illustrates the PAIF 800, where different types of registration are subordinated to register 806. The data record of type 'D' (824) which is subordinate to the data record of type 'A' is linked from node 814 through link 824 having the value D. As can be seen, the PAIF already represents the data record designated B where the latter is subordinated to the data record designated as A. An example of type 'B' subordinated to type 'A' it is articles ('B') stored by the supplier ('A') and the type ('D') subordinated to ('A') are customers ('D') served by the supplier ('A'). Returning now to Figure 13E, another mode of the PAIF of Figure 13D is shown, implemented slightly differently. In particular, the subordinate data records 812, 820 and 824 are represented and maintained in the data file without their key prefix which is the designating key of the register 806 (ie, the key with prefix A12355 is omitted). When accessing, for example, the data record 812, the information available from the meta-data according to the designator B allows extracting the following information: (i) identifying which part of the key is missing, (ii) which register 812 is subordinated to the register designated as A that can be accessed from the node with the value 6 (814) and by the link with the null value (818). In this way, it is possible to access the data register 806 and build the complete registry key 812. If the PAIF 800 is a layered index, it may be that nodes 814 and 822 reside in different blocks and the access path to the block associated with register 812 does not include the node 814. In that case, a link of the subordinate records (826, 828 and 830) to register 806 allows access to the registry 806 data and build the key. The implementation described above avoids the need to duplicate the representation of the designated key of the data record 806 with respect to each subordinate data record (by the particular example of Figure 13D, the specified prefix A12355 is duplicated three times for the records- 812, 820 and 824). Replacing the key prefix with a link can save space (if the size of the prefix is larger than the representation of the link) and allows access to the registry with which the subordination is related without the need for a separate search. Figures 13D, 13E illustrate that the subordination relationship characteristics of the invention are not limited to any specific embodiment. The subordination relation of the invention allows, in this way, to provide, more efficiency in the implementation of the low level of data in comparison with the techniques known up to now in the sense that an index can be associated with several types of data and relationships of subordination compared to the separate index files according to the prior art. Either way, there can, of course, be applications according to the invention, where more than one index file is used. Obviously, each of the subordinate registers 812, 820, 824 may have subordinate registers thereto. On the other hand, there are other advantages that are carried out using the proposed technique of the invention, for example maintaining data integrity. Consider, for example, an insert transaction that is applied to the PAIF 800 of Figure 13E, of the data record designated as B with a composite key A12355B930101123456 subordinated to the data record 806 (having the key designated A12355). The search leads to node 822. The value in the key offset 11 of the inserted data register is 0, in this way register 812 is accessed. The search key of register 812 needs to be built (accessing register 806 via link 826) and the insertion of a new data record can be completed. It should be noted that the link to register 806 avoids the need to conduct a separate search of the 806 register by means of its password in order to confirm its existence. In this way, the maintenance of data integrity is more efficient. Performing the same data integrity review using the specified B-tree index involves a considerable elevation since two-phase operations are required. At the beginning, a search is applied to the data record index of type 'A' in order to find the data records whose key is 12355. Only when it is found, the type B record can be inserted (and a file of separate index).
When looking for data, the data structure in Figure 20E exemplifies other advantages that result from the fact that subordinate data records are linked to its 'Parent' record. For example, if the type A record is a customer and the type B record is an invoice, you usually need to access the invoice details with the customer details. The link of the invoice to the client avoid a separate search of the customer's details. The designated index of the invention thus obtained provides another important advantage in navigating the index to achieve sequential operations. Consider, for example, the PAIF of Figure 13E, where it is required to "recover" all data records in ascending order. In this way, it is possible to navigate in the PAIF (also known as sequential operation) and the data registers 802, 804, 806, 812, 820, 824 and 808 are retrieved according to the order of the designator key. If only records of a certain type are needed, for example, type A records, one can navigate in the index in the same way while avoiding the access of nodes and records that are not relevant. Accordingly, the data register 806 can be accessed from the node 814 and it can be predicted that the data records that can be accessed from the node 814 by their links and the downstream nodes are subordinate to the registers 806, thereby avoiding the links 833, 823 In this example, only records 802, 804, 806, and 808 are retrieved. In the same way, one can avoid moving along link 823 if only records of type A and B are needed since it can be predicted that a link with a value D from a node with a value 6 that is directed to a register 806 is a link to the subordinate data register designated as D. If the PAIF index is a layered index and it is assumed that nodes 814 reside in a block other than node 822, the movement of node 814 to node 812 may be the divided link. If the split link does not exist, for example, in Figure 7F, one needs to use the link 421 of the node B '(422) when it is necessary to advance through the link 400 of the node B (423) to the node E (424). Having exemplified the subordination relationship with reference to the specific embodiment of Figure 13, a description that pertains to the multidimensional characteristic according to the second aspect of the invention is presented below. Returning now to Figure 14, a schematic illustration of the index designated in accordance with one embodiment of the invention is shown. The index contains two search paths to a designated data record ("DEPOSIT" data record) so that the deposit can be accessed by each of the two compound keys - a designated key that includes the account number of key fields , date and customer number and a second designated key that includes the key field customer number, date and account number. Returning to the previous example, the account data record has a designated key 'A133333' (1201). Update a deposit for an account (deposit subordinate to account) can be implemented through the register 203 designated subordinate to the designated register 201. The PAIF will allow access to registers 201, 203 of node 207 via link 206. At the same time, data record 204 represents a deposit of a client. The registry key 202 is B133333. Updating a deposit 204 to a client 202 can be implemented by the index 200 and the linked node 209 (208) to the data record 204. The key of the data record 203 is? 133333C01019811346 '(kx). The registry key 204 is B11346D010198133333 (k2). As shown, the Client and Account fields are duplicated in registers 203, 204 (as well as additional information such as date and amount) which is an obvious disadvantage that results in an unduly inflated file. This disadvantage can be solved by representing a single DEPOSIT register as a multi-dimension register 210.
The data record 210 (Figure 14) is a multidimension record that is updated and accessed by the index 200 designated in accordance with the designator key ki (designator C) and in accordance with the designator key k (designator D) (note that when the data record is a multi-dimension record, the designator of the record depends on the key that is being used). The trajectory in the index per ki leads to node 207 and from that node to designator C of record 210. The information in the metadata according to designator C allows the construction of the relevant structure. For example, building a data structure that includes the key ki by the links 213, 214 and the registers 201 and 202 are accessed and thus, with the arrow field of the register 210 is constructed in all the key fields. The trajectory in the index per node leads to node 209 and from that node to designator D of register 210. The information in the metadata according to the designator D allows to build the relevant structure, for example, to build a data structure that includes the key As shown, the search path defined by the search keys of the register 203 leads to the first field 212 having a value 'C (which is the designator according to the search key ki). The third field points to the data record 201. The second field 215 (having a value 'D' - which is the designator according to the search key k2) of the same data structure 210 is accessible by the search path that is defined by the search key of the record 204 The fourth field has a link to the current data record 202. In this way, the DEPOSIT record represents the subordination of both the account and the client, while avoiding the duplication of the account fields, the customer date and the quantity. It should be noted that the account and customer data elements are accessed by links in the original data records (201 and 202) and the rest of the data (date and amount) exists only once within the data element 210 Obviously, the data record 210 may include other fields. The invention is in no way limited to a given embodiment and therefore the manner of performing the data register 210 as depicted in Figure 14 is only one output of many possible variants. The search path number is not limited. As explained above with reference also to Figure 13E, if the searched data record is Axxxx (ie the account register 201 per se), then one simply moves from the index with a search key of 'Axxxx' to any of its subordinate records and access the type A record through the link of the record subordinate to the type A record. As for example, link 213 of Figure 14. Another implementation, of course, is possible (for example, maintain a link in the index to record A), everything as required and appropriate. The specified description provided by two (and in the general case at least two, search paths to a physical occurrence of data records) constitutes the multi-dimensional data structure that is a designated index that contains at least two search paths to a data record (called a multi-dimension record) The relationship between the data elements - Figure 15 illustrates another characteristic of the invention, ie the data relation characteristic. data (a data record of books) has data records C, F, J, K and L subordinated to it.The realization of this hierarchy was previously illustrated.According to the present characteristic of the relationship, one-to-one relationships and one to many can easily be realized, consider, for example, that a book has many categories (L), that is, one to many, however, it only has an extract (K), that is, one to one o According to the proposed characteristic, a one-to-one data relationship is implemented by means of a designated key (compound) of two components: the first is the designated key of its subordinate records and the second is the designator of the subordinate record ( since it is a one-to-one relationship, it is not necessary to use the key field of the subordinate registry). While a one-to-many relationship is implemented by means of a designating key (compound) whose first component is the designator key of the subordinate register, and whose second component consists of the designator and the key of the subordinate register. In this example, the one-to-one relationship between a book and its abstract is maintained by defining the key of L to be AxxxL, where Axxx is the designated key of A, L is the designator of the key of record L. The relationship of one to many between a book and a category is defined by defining the key of L as AxxxLyyy, where Axxx is the designated key of A, L is the designator of the key and yyy are the key fields of the L register. a description belonging to another characteristic according to the second aspect of the invention belonging to a multiple model representation. According to this feature, and as will be explained in detail below, one or more of the following (and possibly other) models can be represented by the designated designated index. Represent relationship tables through a designated multi-model index- • The relational model considers all data as consistent from the tables. Each table consists of records of the same structure, called tupias. Suppose, that the tupias consist of fields Fl, F2, and F3. Each field is a key. If the key F2 is subordinate to the key Fl, and the key F3 is subordinated to the key F2, the table can be easily constructed: to retrieve its tupias, follow the designator of the key Fl, and from there for each value of Fl , follow the designator of F2, and in the same way continue until F3. Each triple defines a tupia of the table. Some projections are even easier: to find all the pairs of values of Fl and F2 for which there is an F3 value in the table, the search is finished after processing (Fl, F2). Performing the projection of (F2, F3) can be expensive, since it requires searching all the values of Fl first. However, if this operation is common, the designated index should also maintain the search path (F2, F3 and Fl). That is, a new key composed of designator F2 'F3' Fl 'is constructed with new designators, and the additional paths to the designated index are inserted. In this way, each record can be reached by means of both trajectories and constitute the multi-dimension register. Additional models on the designated multi-model index - The designated index allows you to represent additional data models, including, relational database, an object-oriented system, and a hierarchical database, where the data is substantially not duplicated. Implement a designated index through an object-oriented multiple model (persistent data structure). The object-oriented approach considers all data as obj ets. Each object belongs to a class, which determines its structure and whose methods (functions) can be applied to it. Classes are organized in a hierarchy, whose structure and method can be inherited. The object-oriented approach is ephemeral - an object exists only as long as the program that created it is active.The objects that need to be supported for a longer period of time are defined as persistent.These objects are stored on a disk and they are available for other (authorized) programs.The multi-model designated index can easily support the objects.Since its structure is uniformly coded with the help of designators, other incarnations of the program like other programs can access these persistent objects. Note that at the same time, a persistent object can also be part of a relational table, there is no need to duplicate the data, consider, for example, the data structure 220 of Figure 16. The data registers 223, 224, 225 and 226 are subordinated to the data record 221 and together with the record 221 are considered as an object. n index all data records with a key prefix that is equal to the designated key of record 221 (partial key search) and retrieve-the entire object. If only part of the object data is needed as the type A record and the subordinate type B records, once again the partial key search is made for the data records with a key prefix that is equal to the designated key of the record type A (for example 221) and designator B as the next key field. Implement a relational-object through the index designated by multiple model. In comparison with the object-oriented approach, the relational approach considers all data as tables. In this way, it is difficult to integrate SQL queries into an object-oriented programming language (C ++ or Java). The object-relational approach provides an interface to convert tables into objects. The interface requires the user to specify the relationship between the objects and the attributes of the table. If some attributes themselves are tables, it is necessary to allow relational algebra operations on these tables as well. These conversions are made through the application program. In this way, the database is unable to optimize the inquiries. The designated index treats the data in a uniform manner, thereby providing an ideal interface between the object-oriented application program and the data structures. The inquiries of the application program are formulated in terms of designated keys, so that the database can optimize the inquiry strategy. The database returns the designated keys, which can easily process the object-oriented application program by the object-oriented methodology. The sequence of the designators of the search path of the object determines its class, and the decimators to several fields I allow the object-oriented program to solve the polymorphism of the method calls. The designee addresses all related data. For example, assuming Figure 16 describes a data structure of an insurance company where type A records are clients, type B records are customer claims and type C records are customer payments. As is clearly shown, all data records are managed by a single index structure. Now, one is able to efficiently access all the object cases since the index allows to navigate from a client to its related data- claims and payments. And at the same time one is able to navigate in the index structure efficiently and perform the customer table (the collections of type A records), the customer complaints table (the collections of type A and B records) and the customer payment table (the collection of A and C records). Since the data structure does not impose a physical grouping of the data, if the data are shared between different objects, they can be efficiently accessed by different object views - and thus the data record is a multidimension record. In this example, a claim can be efficiently accessed both from the client's object and from the policy object and being of a structured type as for example in Figure 16 (structure 210). The object-oriented approach allows users to add user-defined types (UDT) and user-defined functions (UDF). For example, one can add accident photos in the insurance company's database. In the example, a new data record designated subordinate to the type A data record is defined. When the details of a claim are searched, the accident photo is accessed and sent to the photo printing application. With a designated index, the relationship between the photo data to the claim is handled in the same way as with the integrated classes and relationships. The new UDT can be based on or related (through subordination) to any other type of data. Now, with the designated index, the application can navigate to the new UDT of the defined classes of which the new UDT can be inherent to methods and other properties. In the example, when browsing the index, one can navigate to a claim from which one can reach the photos as well as any other part of the claim data. Hierarchical and Network Models: Implementation of hierarchical and network models using a designated multi-model index- Hierarchical and network models have been replaced by a relational model. However, although these models are obsolete, they have some advantages (as well as many disadvantages) over the table-oriented implementation. Once a record is retrieved, the addresses of the related records are readily available. Consider, for example, a bank with clients and loans. Each client has an address and several loans, while each loan is taken by one or more clients. In the network model, each client is represented by a node that contains a link to the client and links to the nodes representing the loans taken by the client. A node that represents a loan in the same way is linked to the nodes of the clients that took that loan. In this way, provided a loan, one can easily access the clients who requested the loan and obtain their addresses. The implementation of tree-B, requires maintaining two trees: one of clients and addresses, and the second of loans and clients. In this way, having recovered the data of a loan, the names of the clients who took the loan are available. To find your addresses, an independent B-tree search is required for each client. In the proposed multi-model index proposed (as in Figure 16, for example), once the node representing the loan is reached, one can proceed to a designator that identifies the client who took the loan (for example, the records of the loan). type B) . Normally, at most one disk access is required for each client. This proposed multi-dimensional index proposed has the advantages of the network model, without its disadvantages. While the network model treats each node separately, and is susceptible to long search trajectories, the designated multi-model index treats all data uniformly and the length of the search path in logarithmic probability so that the base of the logarithm is the size of the block. In this way, in practice, the search requires a single disk access. Implement the client-server model with the object-oriented model based on a designated index- The client-server model allows for sufficient implementations of the relational model. According to this model, all the data reside in a central computer (called the server), and the application programs run on other computers (called clients). When - an application needs data, it formulates an SQL inquiry, which is sent by the client to the server. The server evaluates the inquiry and returns the resulting table to the client. In this way, the interface between the client and the server is by means of SQL inquiries - the server is not aware of the internal data structures and the code of the application. The client and the server simply have to agree on the names of the tables and their attributes. In the object-oriented approach this model breaks down. Since each data item is an object, the server must be aware of its internal structure. This problem is aggravated in the presence of polymorphic methods. The server must be aware of the structure and details of the entire class hierarchy.
The designated index allows the client-server approach to be applied to object-oriented and object-oriented models. For example, to achieve an attribute, the application program sends the path of the keys and links the designators that carry the desired node to the server. Based on this data, the server can make the request without any knowledge of the application's data structure. The client and the server must agree on the names of the fields and their designators. The server does not need to be aware of the data type of each field, and its semantic content. According to yet another aspect of the invention, it is further proposed to compress the representation of the index with this by making it more efficient. Here an estimate of the space required by a trie and methods to reduce space requirements is provided. If the trie is a layered index, the analysis of the trie index structure will concentrate on the last layer (J0): Storage requirements for the primary key index of a trie- One of the most important characteristics of a structure data based on a trie is the modest size of its representation. The PAIF for example maintains an even smaller size than a conventional trie due to its compressed representation. The last level of the PAIF index contains a trie that links that point to other trie nodes in the block, and links that point to the registers. Let N be the number of records in the database. The index contains exactly N pointers for those records. If each pointer requires 4 bytes, the size needed for the pointers is AN bytes. In addition, each pointer has an address, (1 byte) in this way the total is 5N bytes. Now consider the space required for a PAIF trie. Since N pointers emanate from the index and each node of trie has at least 2 children, there are at most n = N -1 nodes of trie. Let d indicate the average number of children of a trie node, then n = N / (d-1). Since the practice d »2, n« N. Each node of trie has a level number (1 byte). Since each trie node has at most one incoming trie link, there are at most n-1 trie links, each trie link has a tag, which is a single character and an intra-block pointer (1 byte) , in this way a total of 3 n bytes is obtained. Thus, in the worst case, 3n + AN = 7N bytes are needed. And between 4? 7 and 6N bits in practice. When doing the same analysis but from another angle: Consider two pointers pi and p2 emanating from node v of level k. Let xi be a key reachable from p? and x2 a key that achievable from p2. Then xx and x2 share the first, k-1 character. In an A PAIF structure, each of these characters is represented at most once. In the B-tree representation, it is explicitly necessary to represent the first k character of each key. . The saving in the PAIF is double: The first character is stored as much once in each level, and second, not all the characters need to be represented. Understanding of additional index- In the previous discussion, most of the space is required for the pointers of the records. Now we will present a method that saves space pointer. The method is based on allowing several record links to match the same pointer. Suppose, first, that the records have a fixed size. If the first two registers reside in the same block, then it is possible to keep a single pointer of full size for the first pointer for a block, and instead of keeping a pointer for each remaining outgoing link to a block, calculate its offset, it is say, if the first two registers reside in block number 2000 and the third register in block 7000 it is possible to maintain structure 2000 (e, f) 7000 (h). The savings would be much more substantial if a larger number of outbound links all point to the same block. If the k links point to a block, then the 4B of the pointer is divided among all the registers k, in this way the space to direct each record is reduced to A / k bytes plus the space for the address (1 byte). For k = 4, this means that each record requires two bytes in the index. For records with variable size it is possible to keep the displacement within the block, for example: 2000 (e: de, f: df) 7000 (h: d?). Instead of keeping a total pointer, you can keep a displacement that can be adjusted within 1 byte. In this way, for each record, 1 byte is needed to compare it in the pointer, 1 byte for the address, and 1 byte for the offset; a total of 3 bytes per record. Returning to the example in Figure 17, the Figure 17A shows a 2000 node of a trie with the links 2010, 2011, 2012 (values 5, 9, A respectively) that direct 3 data records -2002, 2004, 2006 in the disk addresses 3000, 5000, 7000 respectively. The size needed to represent the link values (1 byte for each link) and the pointers (4 bytes), to the data is 15 bytes. Returning now to Figure 17B, where the node 2000 maintains a shared link (2010) with the three data registers (2002, 2004, 2006). The information represented by the link is the address to block 2020 (4 bytes) and the link values to the data registers 2002, 2004, 2006 that reside in the block (1 byte for each link value). The size needed to represent the pointer in the data block and the value of the link is only 7 bytes - (3000: 5, 9, A). Now to be able to access the 2004 data record one can calculate its address as the address of the data block + the displacement that depends on the record size assuming that the records in the data block are all of the same size. As explained, node 2000 may include links to other data registers or data blocks (such as link 2024 to data block 2022 accommodating the 2008 data record). Preferably, the database file management system of the invention should be associated with a case known per se and / or distributed capabilities to allow a plurality of users to access virtual and simultaneously in the database. The database can be located in a central place, or distributed between two or more remote places. Returning now to Figures 18 AD, four test bench graphs are shown that demonstrate improved performance, in terms of response time and file size of the database using a file management system employing a system of the invention against a database based on a commercially available C-tree. The inserts are made through a Uniface application that runs on a Windows operating system (for workgroups). The test bench of Figure 18A is related to the measurement of time in minutes to insert an increased number of data records classified a priori to a file (0-1,000,000). As shown in Figure 18A, the larger the number of inserts the larger the improvement in terms of response time of the database file management system of the invention. Thus, inserting 1 million records takes approximately 669 minutes in the database based on a C-tree compared to only 65 minutes in the system of the invention. On the other hand, the response time in a file management management system of the invention is increased only by a small extent so that the registration number increases, compared to the significant increase in the response time in the system of counterpart according to the prior art. The test bank in Figure 18B illustrates the file size in mega bytes as a function of the data record number in the file (0-1,000,000). As shown in Figure 18B, the larger the registration number, the greater the improvement in terms of file size in a database file management system of the invention. Thus, for 1 million records, the file size of the file based on the C-tree is approximately 151 mega bytes compared to only 22 mega bytes in the database file management system of the invention. Charts 18C and 18D are similar to those shown in Figures 18A and 12B apart from the fact that in the trainer (18C and 18D) the data records are inserted randomly while in the latter (18A and 18B), the data records are inserted randomly. Data is classified a priori according to the search key. As shown, the results are as before, that is, the system of the invention is more efficient in terms of both the response time and the file size. Figures 19A-D illustrate test bank graphs of a system of the invention (operating under a DOS operating system) against a database system based on a commercially available B-tree. The results are as before, that is, the system of the invention is more efficient in terms of both response time and file size. Those skilled in the art will appreciate that the alphabetic and Roman characters which designate the steps of the claims are made for convenience of explanation only and should in no way be taken as an imposition in the order of the steps, or at how many times each step should be executed vis-a-vis other steps of the method. The present invention has been described with some degree of particularity, but those skilled in the art will appreciate that various modifications and alterations may be implemented without departing from the scope and spirit of the following claims.

Claims (1)

  1. CLAIMS 1. A storage medium used by a database file management system executed in a data processing system, characterized in that a data structure includes: a layered index accommodated in blocks; the index in layers includes a basic fractional index that is associated with the data records; the basic fractional index allows to access or update the data records by key or keys, and is susceptible to an unbalanced structure of blocks; the index in layers allows to access or update the data records by key or keys, and constitutes an unbalanced structure of blocks 2. The index in layers according to claim 1, characterized in that the basic fraction index is a trie. 3. A storage medium used by a database file management system executed in a data processing system, characterized in that the data structure includes: an index accommodated in blocks and that is built on the keys of data records. data; the index includes a basic fractional index that is associated with the data records; the basic fractional index allows to access or update the data records by key or keys, and is susceptible to an unbalanced structure of blocks; the index allows to access or update data records by key or keys and constitutes a balanced structure of blocks. 4. A storage medium used by a database file management system executed in a data processing system, characterized in that the data structure includes: an index accommodated in blocks and that is built on the keys of the records of data; the index includes a trie that is associated with the data records; the trie allows to access or update the data records by key or keys and is susceptible to an unbalanced structure of blocks; the index allows to access or update data records by key or keys and constitutes a balanced structure of blocks. 5. The index in layers according to claim 1, characterized in that the storage medium has an external memory. 6. The index in layers according to claim 5, characterized in that the storage medium is also an internal memory. 7. The index in layers according to claim 1, characterized in that the storage medium has an internal memory. 8. The index in layers according to claim 2, characterized in that the trie is a trie for PAIF. 9. The index in layers according to claim 1, characterized in that the basic fractional index and the representative index of the index in layers are substantially the same index schemes. 10. The index in layers according to claim 1, characterized in that the basic fraction index and the representative index of the index in layers are different index schemes. The layered index according to claim 8, characterized in that the representative index of the index in layers is the tree-B index scheme. 12. The index in layers according to claim 10, characterized in that the representative index is a tree-B index scheme. 13. The layered index according to claim 8, characterized in that the representative index of the layered index is substantially the PAIF index scheme. 14. The index in layers according to claim 9, characterized in that the representative index is substantially the index scheme PAIF. 15. The layered index according to claim 1, characterized in that it is capable of supporting the ODBC standard. 16. The index in layers I0,. . . , I according to claim 1, characterized in that it comprises: a representative index I¿,. . . , I constructed so that any Ij is built on the representative keys of Ij-1. 17. The index in layers I0 /. . ., according to claim 16, characterized in that Jh is totally contained in a block. 18. The index in layers according to claim 3, characterized in that the storage medium has an external memory. 19. The index in layers according to claim 18, characterized in that the storage medium is also an internal memory. 20. The index in layers according to claim 3, characterized in that the storage medium is an internal memory. 21. The layered index according to claim 3, characterized in that it is capable of supporting the ODBC standard. 22. The index in layers according to claim 4, characterized in that the storage medium is an external memory. 23. The index in layers according to claim 22, characterized in that the storage medium is also an internal memory. 24. The index in layers according to claim 4, characterized in that the storage medium has an internal memory. 25. The index in layers according to claim 4, characterized in that it is capable of supporting the ODBC standard. 26. A database file management system to access data records and be executed in a data processing system; the data records are associated with a basic fractional index accommodated in blocks and stored in a storage medium; the basic fractional index allows to access or update data records by key or keys and is susceptible to an unbalanced block structure; a method for constructing a layer index accommodated in blocks, characterized in that it comprises the steps of: (a) providing a basic fractional index; • (b) construct a representative index on the representative keys of the basic fractional index; the index in layers allows accessing or updating the data records by key or keys and constitutes a balanced block structure. 27. The layered index according to claim 26, characterized in that a basic fraction index is a trie. 28. A database file management system for accessing data records and running in a data processing system; the data records are associated with a basic fractional index accommodated in the block and being stored in a storage medium; the basic fractional index allows to access or update data records by key or keys and is susceptible to an unbalanced block structure; a method for constructing an index on the keys of the data records, the index is accommodated in blocks, characterized in that it comprises the steps of: (a) providing the basic fractional index; (b) construct an index on the keys representative of the basic fractional index; the index allows to access or update the data records by means of a key or keys and constitutes a balanced structure of blocks. 29. A database file management system for accessing data records and running in a data processing system; the data records are associated with a trie arranged in blocks and stored in a storage medium; the trie allows to access or update the data records by key or keys and is susceptible to an unbalanced structure of blocks; a method for building an index on the keys of the data records, the index is accommodated in blocks, characterized in that it comprises the steps of: (a) providing a two-dimensional sample; (b) build an index on the representative keys of the trie; the index allows to access or update the data records by means of a key or keys and constitutes a balanced structure of blocks. The method according to claim 26, characterized in that the storage medium has an external memory. 31. The method according to claim 30, characterized in that the storage medium is also an internal memory. 32. The method according to claim 3, characterized in that the storage medium is an internal memory. • The method according to claim 27, characterized in that the trie is a trie for PAIF. 34. The method according to claim 26, characterized in that the basic fractional index and the representative index are substantially the same index scheme. 35. The method according to claim 26, characterized in that the basic fractional index and the representative index are different index schemes. 36. The method according to claim 33, characterized in that the representative index is the tree-B index scheme. 37. The method according to claim 35, characterized in that the representative index is the tree-B index scheme. 38. The index in layers according to claim 33, characterized in that the representative index is the PAIF index scheme. 39. The index in layers according to claim 34, characterized in that the representative index is a scheme of the PAIF index. 40. The method according to claim 26, characterized in that it is capable of supporting the ODBC standard. 41. The method according to claim 28, characterized in that the storage medium is an external memory. 42. The method according to claim 41, characterized in that the storage medium is also an internal memory. 43. The method according to the claim 28, characterized in that the storage medium is an internal memory. 44. The method according to claim 28, characterized in that it is capable of supporting the ODBC standard. 45. The method according to claim 26, characterized in that the index supports sequential operations. 46. The method according to claim 28, characterized in that the index supports sequential operations. 47. The method according to the claim 29, characterized in that the index supports sequential operations. 48. The method for accessing a data record searched for by the key k in the index in layers according to claim 1, characterized in that it comprises: (a) search for k in J? a Ik where h = k = 0 and in case it is not found in the key of a data record in order to find the data block Ih-x that leads to k. (b) repeat step (a) until reaching the block of what is associated with the data record with the key k, if it exists. 49. The method for inserting a data record r by means of the key k in the layered index according to claim 1, characterized in that it comprises: (a) search for k in Ih to Ik where h = k = 0 and in the case that is not found in the key of the data record in order to find the block of Ih-? that leads to k; (b) repeat step (a) until reaching block B of Jo that is associated with the data record with key k, if any; (c) associate r to B. 50. The method for deleting a data record r by means of the key k in the layered index according to claim 1, characterized in that it comprises: (a) searching for k in Ih to Ik where h = k = 0 and in case of that is not in the key of a data record in order to find the block of Ih- that carries ak; (b) repeat step (a) until reaching the B block of Jo that is associated with the data record with the key k, if it exists; (c) disconnect r from B. • 51. The method for accessing a searched data record r with the key k in the layered index according to claim 3, characterized in that it comprises: (a) search for k in Ih to Ik where h = k = 0 and if it is not in the key of a data record in order to find the block of Ih-? that leads to k. (b) repeat step (a) until reaching the block of or associated with the data record with the key k, if it exists. 52. The method for inserting a data record r by means of the key k in the layered index according to claim 3, characterized in that it comprises: (a) search for k in J ^ a I where h = k = 0 and in case that is not in the key of a data record to be able to find the block of Ih-? that leads to k. (b) repeating step (a) until reaching the B block of Jo that is associated with the data record with the key k, if it exists; (c) associating ra B. • 53. The method for erasing a data record r by means of the key k in the layered index according to claim 3, characterized in that it comprises: (a) search for k in Ih to Ik where h = k = 0 and if it is not in the key of a data record in order to find the block of Ih-? that leads to k. (b) repeating step (a) until reaching the B block of J0 that is associated with the data record with the key k, -if it exists; (c) disconnect r from B. 54. The method according to claim 26, characterized in that step (b) of construction includes: (a) If B (in Ih-i) overflows, it is divided into two (or more) blocks and the representative of B in Ih is replaced by the representatives of the new blocks. (b) If the block I overflows, an additional layer J ^ + i is created and added to the index in layers. 55. The method of compliance with the claim 54, characterized in that it is performed in a deployment. . 56. The method according to claim 54, characterized in that it is carried out post t fa ctum. 57. The method according to claim 28, characterized in that step (b) of construction includes: (a) If B (in Ih-x) overflows, it is divided into two (or more) blocks and the representative of B in Ih it is replaced by the representatives of the new blocks. (b) If the Ih block overflows, an additional layer Ih +? is created and added to the index in layers. 58. The method according to claim 57, characterized in that it is performed in deployment. 59. The method according to claim 57, characterized because it is performed pos t fa ct um. 60. The method according to claim 26, characterized in that the step (b) of construction includes: (a) at least one short link between the short links of a node (in the present the node is divided) in the block (from Bx-?) is deleted (hereinafter divided link) in a way that there are at least two tries in the block '. (b) each of the sub-angles is moved to a separate block; (C) if the B2 block does not exist, B is created and a node copied from the split node is created in B. (d) if the block of B exists and a node copied from the split node does not exist is created in Bl r then the node copied from the split node is created in the trie of Ba so that B -? '(at the end of the division process) is accessible in a search path that includes a root node in Ba and the copied node and the links labeled according to the representative keys of B -? '; (e) if the copied node has no direct link, a direct link is added from the copied node to the Ba- ?; (f) a distant link added from the copied node to the Bx- block? If the copied node has a short link or a child node in the direction of the far link, the far link is replaced by the direct link of the child node to the block B? -? ' 61. A storage medium used for a database file management system executed in a data processing system, the data structure is characterized in that it includes at least one information file of the probabilistic access index (PAIF) having a plurality of nodes and links; the leaf nodes of the PAIF are each associated with at least one data record accessible to the user application program and wherein at least a portion of the data record constitutes at least one search key; the nodes selected in the PAIF each represent a given displacement of a portion of the search key within the established search key; the links originating from each given node from among the selected nodes each represent a unique value of the search key portion. the PAIF have at least two sub-PAIFs that are arranged, each one, in a block; The database file management system is also capable of accommodating the blocks as a balanced block structure. 62. The data processing system according to claim 61, characterized in that at least some data records that are associated with the leaf nodes are maintained in at least one separate file. 63. The data processing system according to claim 61, characterized in that at least one sheet is associated with more than one data record. 6 The method for inserting a new data record into an existing PAIF according to claim 61, characterized in that it includes the execution of the following steps: i. advancing along a reference path starting from the root node and ending at the data record associated with a leaf node (referred to as "reference data record"); in each node the reference path, being advanced along a link originating from the node if the value represented by the link is equivalent to the value of the key portion of 1 bit long in the offset specified by the node; in the case that the specified offset in the node is beyond any corresponding key portion in the key, or if there is no link to the value, advancing along any arbitrary path to any reference data record; ii. compare the search key of the reference data record with the new data record to determine the smallest displacement of the search key portion that discerns both (hereinafter discernment displacement). 111. Proceed to one of the following steps (m. 0 - ???. 3) depending on the value of the discernment shift: m.O if the data records are equal then terminate; or ml if the discernment displacement matches the displacement indicated by one of the nodes in the reference trajectory, add another origin link of the first node and assign to the link the value of the search key portion in the discernment displacement taken the search key of the new data record; or 111.2 if the discernment displacement is greater than that indicated by the ho node to which it is linked, through a link, to the reference data record: m.2.1 disconnect the link from the reference data record (is say, it remains temporarily "free") and move the link to a new node; the new node is assigned with a value of the discernment displacement; m.2.2 connect the reference data record and the new node (which now becomes a ho a node) and assign to the link (long link) a value of the search key portion in the discernment shift taken from the search key of the reference data record; iii.2.3 connect the new data record and the new node via a link and assign to the link (long link) a value of the search key portion in the discernment shift taken from the search key of the data record new; or iii.3 if the conditions iii.O, iii.l and 111.2 are not met, there exists, in the reference search path, a parent node and a child node thereof so that the discernment shift, at the same time, is greater than the displacement assigned to the parent node and smaller than the displacement assigned to the new child - (- considered as case A), or all nodes in the reference search path have a value greater than the discernment displacement - ( - considered as case B); therefore, apply the following sub-steps: iii.3.1 for cases A and B, create a new node and assign the node with the value of the discernment displacement, for case A only - disconnect the link from the parent node to the node son and change the link to a new internal node (that is, the child node remains temporarily "free"); iii.3.2 for case A and B, connect by means of a link (long link) the new data record and the new internal node; the value assigned to the link is that of the portion of the search key in the discernment shift, as taken by the search key of the new data record; iii.3.3 for case A and B, connect by means of a new link the new node and for case A - the child node, for case B - the root node (that is, the new node is converted for the case A - a new parent node, for case B - a new root node), and the value assigned to the link is the portion of search key in the offset indicated by the new node, taken from the search key of the record of reference data. 65. A method to obtain a balanced PAIF index; the PAIF includes blocks, each accommodating a plurality of nodes and links originating from the nodes, between leaf nodes of the nodes that are associated with the data records; the method is characterized in that it comprises executing the following steps as many times as required: (i) replacing a block, constituting a block replaced with at least two blocks divided so that some of the nodes of the divided blocks are accommodated within one of the divided blocks and the remaining nodes within the nodes of the divided block are accommodated within other divided blocks; '(ii) dealing with at least one node between the nodes of the replaced block in a mode block so that at least two divided blocks are child blocks thereof. 66. A computer system characterized in that it has a storage medium of at least one internal memory that varies between 10 to 20 mega bytes or more, and an external memory; a data structure that includes an index on the keys of the data records; the index is accommodated in blocks, so that for a trillion data records, substantially no more than two accesses to external memory are required in order to access a block that is associated with any of the billions of data records, without taking account the size of the key of the data records. 67. A computer system characterized in that it has a storage medium of at least one internal memory that varies between 10 to 20 mega bytes or more, and an external memory; a data structure that includes an index on the keys of the data records; the index is arranged in blocks, so that one million data records, substantially all blocks of the index are accommodated in the internal memory without taking into account the size of the key of the data records. 68. A computer system characterized in that it has a means of storage, a data structure that includes an index on the keys of the data records; the index is accommodated in a balanced structure of blocks and allows sequential operations in data records; the size of the index is not affected essentially by the size of the keys. 69. A storage medium used by a database file management system executed in a data processing system, characterized in that a data structure includes: an index on the data recording keys; the data records are at least two types, where the data records of the second type are sub-ordered to the data records of the first type. 70. A storage medium used by a database file storage system executed in a data processing system, a data structure characterized in that it includes: an index designated on the designated keys of the data records; the data records constitute designated data records, being at least two types, where the data records designated to the second type are subordinated to the designated data records of the first type. 71. The storage medium according to claim 69, characterized in that the index constitutes a layered index. 72. The storage medium according to claim 70, characterized in that the designated index constitutes a layered index. 73. The storage medium according to claim 70, characterized in that the designated index constitutes a multidimensional index. 74. The storage medium according to claim 72, characterized in that the designated index constitutes a multidimensional index. 75. The storage medium according to claim 70, characterized in that the designated index constitutes a multiple model index. 76. The storage medium according to claim 72, characterized in that the designated index constitutes a multiple model index. 77. The storage medium according to claim 74, characterized in that the designated index constitutes a multiple model index. 78. The storage medium according to claim 69, characterized in that the data record of the first type and the subordinate data record of the second type constitute a one-to-one relationship. 79. The storage medium according to claim 70, characterized in that the data record of the first type and the subordinate data record of the second type constitute a one-to-many relationship. 80. The storage medium according to claim 71, characterized in that the data record of the first type and the subordinate record of the second t or p constitute a one-to-one relationship. 81. The storage medium according to claim 73, characterized in that the data record of the first type and the subordinate record of the second type constitute a one-to-many relationship. 82. The storage medium according to claim 69, characterized in that the index includes a trie. . 83. The storage medium according to claim 70, characterized in that the index includes a trie. 84. The storage medium according to claim 71, characterized in that the basic fraction index of the layered index is a trie. 85. The storage medium according to claim 69, characterized in that to access or update a transaction with respect to the subordinate data record having a composite key Kl..Kn; there exists in the index a subordinate search path that leads to the registration of subordinate data according to the composite key Kl..Kn; the subordinate search path includes a search path to a data record that has a key Kl..kn-1. 86. The storage medium according to claim 70, characterized in that to access or update a transaction with respect to the subordinate data record having a composite key Kl..Kn, there exists in the index a subordinate search path leading to the record of subordinate data according to the composite key Kl..Kn; the subordinate search path includes a search path to a data record that has the key Kl..kn-1. 87. The storage medium according to claim 75, characterized in that the multiple model includes a relational model. 88. The storage medium according to claim 75, characterized in that the multiple model includes an object-oriented model. 89. The storage medium according to claim 75, characterized in that the multiple model includes a model relational to object. 90. The storage medium according to claim 75, characterized in that the multiple model matches a client server model. 91. The storage medium according to claim 76, characterized in that the multiple model includes a relational model. 92. The storage medium according to claim 76, characterized in that the multiple model includes an object-oriented model. 93. The storage medium according to claim 76, characterized in that the multiple model includes a model relational to object. 94. The storage medium according to claim 76, characterized in that the multiple model matches a client server model. 95. A storage medium used for a database management system executed in a data processing system, a data structure that includes: an index that is stored in the storage medium and constructed on the keys of data records that are stored in blocks; the index is accommodated in blocks as leaf blocks linked to the data records by means of links; the index is characterized in that at least one of the links is shared by at least two data records stored in the same block. 96 The storage medium according to claim 95, characterized in that the index is constituted by a trie. 97. The storage medium used by a database file management system executed in a data processing system, a data structure that includes: an index that is stored in a storage medium and built on the keys of the data records that are stored in blocks, the indexes are arranged in blocks with the blocks of sheets that are linked to data records through links; the index is characterized in that at least one of the links is shared by at least two registers stored in the same block; the index constitutes a layered index according to claim 1, and the basic fractional index blocks are linked to the data records. 98. The storage medium according to claim 97, characterized in that the basic fraction index is constituted by a trie.
MXPA/A/2000/007026A 1998-01-22 2000-07-18 Database apparatus MXPA00007026A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCPCT/IL1998/000029 1998-01-22

Publications (1)

Publication Number Publication Date
MXPA00007026A true MXPA00007026A (en) 2002-03-05

Family

ID=

Similar Documents

Publication Publication Date Title
AU759360B2 (en) Database apparatus
US6175835B1 (en) Layered index with a basic unbalanced partitioned index that allows a balanced structure of blocks
US6208993B1 (en) Method for organizing directories
US6240418B1 (en) Database apparatus
US7870174B2 (en) Reference partitioned tables
EP1393206B1 (en) Data structure for information systems
KR20010083096A (en) Value-instance-connectivity computer-implemented database
US20050165733A1 (en) System and method for an in-memory roll up-on-the-fly OLAP engine with a relational backing store
Srivastava et al. TBSAM: An access method for efficient processing of statistical queries
CA2380348A1 (en) Method for organizing directories
WO2015191033A1 (en) Top-k projection
MXPA00007026A (en) Database apparatus
Roumelis et al. Bulk Insertions into xBR-trees
Eze et al. Database system concepts, implementations and organizations-a detailed survey
WO2015191032A1 (en) Aggregate projection
Wi et al. Towards multi-way join aware optimizer in SAP HANA
Yao Modeling and performance evaluation of physical data base structures
IL137347A (en) Database apparatus
CA2262593C (en) Database apparatus
US11899640B2 (en) Method of building and appending data structures in a multi-host environment
US20230177034A1 (en) Method for grafting a scion onto an understock data structure in a multi-host environment
Kvet et al. Efficiency of the relational database tuple access
Tanaka Adaptive segmentation schemes for large relational database machines
KR100836004B1 (en) Pre-Aggregation Indexing Method of Spatial Data
Hidalgo Lorenzo Performance Evaluation in SQL Server