CN103500183A

CN103500183A - Storage structure based on multiple-relevant-field combined index and building, inquiring and maintaining method

Info

Publication number: CN103500183A
Application number: CN201310415712.5A
Authority: CN
Inventors: 王洋
Original assignee: National Computer Network and Information Security Management Center
Current assignee: National Computer Network and Information Security Management Center
Priority date: 2013-09-12
Filing date: 2013-09-12
Publication date: 2014-01-08

Abstract

The invention provides a storage structure based on a multiple-relevant-field combined index, and further provides a corresponding building, inquiring and maintaining method regarding to the storage structure. The storage structure comprises a hash table, a data area and a delete-bitmap, wherein the hash table is used for storing byte offsets of hash conflict chains corresponding to hash values corresponding to all fields in the data area respectively, the data area is used for storing data of all field values of line mummers (rowid) and an actual combined index and the delete-bitmap is used for expressing the information whether the corresponding line is deleted or not. When equivalence inquiry is conducted on any field of multiple relevant fields or equivalence inquiry under an or condition or equivalence inquiry under an and condition is conducted on multiple fields or all fields of multiple relevant fields, the index can be utilized, and the sequence of the conditions is not limited.

Description

A kind of based on a plurality of relevant field combined index storage organizations and foundation, query and maintenance method

Technical field

The invention belongs to data processing field, especially relate to a kind of based on a plurality of relevant field combined index storage organizations and foundation, query and maintenance method.

Background technology

The key-value data storage that MemcacheDB, TokyoTyrant, MongoDB be representative of usining all is based on a field as key, this key creates index with btree or hash mode, with this key during as querying condition, search efficiency is very high, with other non-key fields during as querying condition, necessary full table scan, search efficiency is low.As relate to the inquiry of the or condition class of a plurality of associate fields, and key field wherein can make index of reference, and other fields must full column scan.Traditional a plurality of fields are as the system of combined index, and this index type can only be supported the inquiry of " and " type, and index has requirement to order, for example combined index is key (f1, f2, f3), the use of this index had to sequence requirement, only has following inquiry can use this index:

(1) f1=xxx

(2) f1=xxxandf2=xxx

(3) f1=xxxandf2=xxxandf3=xxx

Summary of the invention

The invention provides a kind of based on a plurality of relevant field combined index storage organizations and foundation, query and maintenance method, especially be applicable to any one field of a plurality of associate fields carry out equivalent inquiry and to a plurality of fields in a plurality of associate fields or all field carry out the equivalence inquiry of or or and condition.And the not restriction of order to condition.

For solving the problems of the technologies described above, the technical solution used in the present invention is: a kind of method of setting up based on a plurality of relevant field combined index storages comprises:

1) each field of given combination index and create the corresponding hash value of each field;

2) set up in order to store the hash table of hash conflict chain corresponding to the corresponding hash value of each field in the byte offsets of data field;

3) set up the data field of value (data) that comprises each field value of line number (rowid) and actual combined index in order to storage;

4) set up the deletion bitmap (delete-bitmap) of the information of whether deleting of being expert in order to mean.According to a further aspect in the invention, also provide a kind of in a plurality of relevant field combined index storage organizations, having comprised:

In order to store the hash table of hash conflict chain corresponding to the corresponding hash value of each field in the byte offsets of data field;

The data field of value (data) that comprises each field value of line number (rowid) and actual combined index in order to storage;

The deletion bitmap (delete-bitmap) of the information of whether deleting of being expert in order to mean.

According to a further aspect in the invention, the present invention also provides a kind of querying method to the only field condition, comprising:

1) the hash value of calculated field value is searched the first address of the data page of hash conflict chain from hash table according to the hash value;

2) become and to go through all data pages, in the data data item from data page, the value of this only field is taken out, and the value of this field mated, obtained the rowid of all couplings;

3) all rowid that obtain according to previous step, check delete-bitmap, gets rid of the rowid of the row of having deleted, and according to rowid, obtains Query Result.

Further, it is or computing querying method that the present invention also provides the field of two identical values and querying condition, comprising:

1) calculate the hash value of same queries condition, search the first address of the data page of hash conflict chain according to the hash value from hash table;

2) become and to go through all data pages, respectively the value of each field is taken out in the data data item from data page, mated with query condition value respectively, if wherein on any one value coupling both can, obtain the rowid of all couplings;

3) rowid second step obtained carries out the distinct duplicate removal, checks delete-bitmap, gets rid of the row of having deleted, and according to rowid, obtains Query Result.

Further, the querying method that is the or computing to field and the querying condition of two different values based on a plurality of relevant field combined index storage organizations comprises:

1) arbitrarily in given combination a field value be the first field, the first field value hash value in calculation combination is searched the conflict first address of data page of chain of hash according to the hash value from hash table;

2) become and go through all data pages, in the data data item from data page, the value of the first field in combination is taken out, mated with the first field value, obtain the rowid of all couplings;

3) calculate the hash value of the second field value, search the first address of the data page of hash conflict chain according to the hash value from hash table;

4) become and go through all data pages, in the data data item from data page, the value of the second field is taken out, mated with the second field value, obtain the rowid of all couplings;

The rowid that second step and the 4th step are obtained carries out the distinct duplicate removal, checks delete-bitmap, removes the row of having deleted, according to the Query Result that obtains of rowid.

Further, the querying method that is the and computing to field and the querying condition of two different values based on a plurality of relevant field combined index storage organizations comprises:

1) in any given combination, a field value is the first field, calculates the hash value of the first field field value, searches the first address of the data page of hash conflict chain from hash table according to the hash value;

2) become and go through all data pages; in data data item from data page, the value of the first field is taken out; with the first field value, mated; from data page, in the data data item, the value of the field of the second field is taken out; with the second field value, mated; two values can be mated, and obtain this rowid and satisfy condition;

3) result of the rowid that obtains is checked to delete-bitmap, get rid of the row of having deleted, according to rowid, obtain Query Result.

Further, the and condition based on a plurality of relevant field combined index storage organizations to two fields, the value of the querying condition that the inside of one of them field is the or condition is the querying method of not identical computing, comprising:

1) in given combination, the inner field without the or conditional operation is the first field, calculates the first field field value hash value, searches the first address of the data page of hash conflict chain from hash table according to the hash value;

2) become and go through all data pages; in data data item from data page, the value of the first field is taken out; with the first field value, mated; the value of the second field is taken out; with the first value of the second field or condition or the second value of the second field or condition, mated; the value of two fields can be mated, and obtains this rowid and satisfies condition;

3) result of the rowid that obtains is checked to delete-bitmap, get rid of the row of having deleted, the count value of final rowid is obtained to Query Result.

According to a further aspect in the invention, also provide a kind of increase data method based on a plurality of relevant field combined index storage organizations according to claim 2, having comprised:

1) this field of calculation combination field hash value is searched the first address of the data page of hash conflict chain from hash table according to the hash value;

2) become and go through all data pages, obtain free page, at the idle data page, write data line, content is rowid and data, the size that the content of data is the first field and value, size and the value of each all the other residue fields;

3) judge whether this field is last field of combined field, if it is increase process finishes, and if NO, proceeds to first step.

Further, the present invention also provides a kind of querying method method based on a plurality of relevant field combined index storage organizations, comprising:

1) determine that according to the querying condition of a plurality of relevant fields whole querying condition is or inquiry or and inquiry;

2) when whole querying condition is the or inquiry, successively each querying condition is processed, obtain the rowid as a result of each querying condition after processing, to these as a result rowid carry out rowid after the distinct duplicate removal as final rowid, obtain Query Result according to final rowid;

3) when whole querying condition is the and inquiry, obtain the rowid as a result of each querying condition after successively each querying condition being processed, rowid as a result to all these querying conditions of obtaining gets common factor, and then to these as a result rowid carry out rowid after the distinct duplicate removal as final rowid, obtain Query Result according to final rowid.

Owing to adopting technique scheme, when any one field to a plurality of associate fields is carried out the equivalence inquiry, perhaps to a plurality of fields in a plurality of associate fields or when all field is carried out the equivalence inquiry of or or and condition, can utilize this index, and to the not restriction of order of condition.Simultaneously, a plurality of fields, after combined index, are not had to extra maintaining coherency expense, while doing " or " inquiry, performance advantage is obvious.The performance advantage of maximum of the present invention is embodied in " or " optimization, and maximum flexibility advantage is embodied on the order that does not limit the combination index field.

The accompanying drawing explanation

Fig. 1 is the storage organization schematic diagram of hash table of the present invention and data field

Fig. 2 is delete-bitmap storage organization schematic diagram in an example of the present invention

Fig. 3 is the storage organization schematic diagram of hash table and data field in an example of the present invention

Fig. 4 increases the schematic flow sheet of data in an example of the present invention

Fig. 5 deletes the schematic flow sheet of data in an example of the present invention

Embodiment

On the basis of existing key-value database, we will realize a kind of based on a plurality of associate fields as the combination key, key-value advanced database with the hash indexed mode, the bottom storage key-value of this key-value advanced database, upper strata provides a SQL interface layer to support SQL, function comprises that the schema of the combined index of a plurality of associate fields creates, and the storage of combined index and additions and deletions change maintenance and the use of checking combined index.

The establishment of the framework of the combined index of a plurality of associate fields (schema)

Each field and the index type that need the given combination index, the index type of this example is the hash mode, also can adopt in addition the btree implementation to complete, adopt the hash conversion can obtain fast and accurately look into record and support following mode to create combined index, it thes contents are as follows:

Create table tablename(f1 varchar(100),f2 varchar(100),f3varchar(100),f4varchar(100),key(f1,f2,f3)usinghash);

For combined index is set up storage

The storage organization schematic diagram that Fig. 1 is the combined index based on a plurality of associate fields, in figure, the content of hash table is stored the byte offsets of hash conflict chain in data field corresponding to corresponding cryptographic hash.In figure, the data field part is actual data store organisation, according to page, manage, each hash conflict chain is comprised of one or more pages, the integral multiple that the size of page is 4k, the page Coutinuous store of all hash conflict chains, the afterbody of each page has been stored the pointer that points to lower one page.Each data page is comprised of the multirow data, each row of data is comprised of rowid and data, rowid is line number, the value of the reality of each field that data is combined index, the value of field is according to elongated storage, first memory length, and then store the actual value of field, and the structural representation of data part as shown in Figure 3.

In addition, also need the extra bitmap (bitmap) of safeguarding for his-and-hers watches that creates, as shown in Figure 2, we are called deletion bitmap (delete-bitmap), each position (bit) corresponding data line, initial value is 1, when needs carry out deletion action to the data row, find the value of the position of the corresponding bitmap of this row, this value is got to 0 rear preservation and get final product.

Below to for this example, how combined index being carried out to corresponding computing inquiry, be described in detail:

(1) inquiry of only field condition

For example SQL is as follows: selectcount (*) fromTableNamewheref2=v1;

The execution flow process is as follows:

The first step: calculate the hash value of f2 field value v1, search the first address of the data page of hash conflict chain according to the hash value from hash table;

Second step: become and to go through all data pages, in the data data item from data page, the taking-up of the value of f2 field, and v1 mated, and obtains the rowid of all couplings;

The 3rd step: all rowid that obtain according to previous step, check delete-bitmap, get rid of the rowid of the row of having deleted, the count value of final rowid is returned.

By this query case, can be found out, traditional a plurality of fields are as the system of combined index, use sequence requirement is arranged because of the field to combined index, therefore can't use this combined index, but use the combined index of a plurality of associate fields provided by the invention, use and there is no sequence requirement because of the field to combined index, therefore, when any one field to combined index is inquired about, can both use this combined index.

The inquiry of the or condition of (2) two fields, the value of querying condition is identical

For example SQL is as follows: selectcount (*) fromTableNamewheref2='xxx'orf3='xxx';

The execution flow process is as follows:

The first step: because f2, f3 value is identical, thus calculated value be ' the hash value of xxx', search the conflict first address of data page of chain of hash according to the hash value from hash table;

Second step: become and to go through all data pages, in the data data item from data page, the value of f2 field is taken out, the value of f3 field is taken out, with xxx, mated respectively, as long as wherein any one value both mate can, obtain the rowid of all couplings;

The 3rd step: the rowid that second step is obtained carries out distinct(and removes repetition) operation, check delete-bitmap, get rid of the row of having deleted, the count value of final rowid is returned.

F2 in this example, the value of f3 querying condition is identical, can also do optimization on this basis again, now only needs the data page in the conflict chain of a hash bucket of access, and to set up the data storage of key-value based on a field fast a lot of than traditional for operational efficiency.

The inquiry of (3) two field or conditions, the value of querying condition is not identical

For example SQL is as follows: selectcount (*) fromTableNamewheref2=v1orf3=v2;

The execution flow process is as follows:

The first step: calculate the hash value that the f2 field value is v1, search the first address of the data page of hash conflict chain according to the hash value from hash table;

The 3rd step: calculate the hash value that the f3 field value is v2, search the first address of the data page of hash conflict chain according to the hash value from hash table;

The 4th step: become and to go through all data pages, in the data data item from data page, the taking-up of the value of f3 field, and v2 mated, and obtains the rowid of all couplings;

The 5th step: the rowid that second step and the 4th step are obtained carries out the distinct duplicate removal, checks delete-bitmap, gets rid of the row of having deleted, and the count value of final rowid is returned.

From this example, can find out, traditional a plurality of fields are as the system of combined index, use sequence requirement is arranged because of the field to combined index, therefore can't use this combined index, but use the combined index of a plurality of associate fields provided by the invention, use and there is no sequence requirement because of the field to combined index, therefore, when any one field to combined index is inquired about, can both use this combined index.Because f2, the value difference of f3 querying condition, so the data page in the conflict chain of two hash buckets of needs access, than hash bucket of example access more than 2.

The inquiry of (4) two field and conditions, the value of querying condition is not identical

For example SQL is as follows: selectcount (*) fromTableNamewheref2=v1andf3=v2;

The execution flow process is as follows:

Second step: become and to go through all data pages, in the data data item from data page, the value of f2 field is taken out, and v1 mated, the value taking-up of f3 field, and v2 mated, two values can be mated, and obtain this rowid and satisfy condition;

The 3rd step: the result of the rowid that obtains is checked to delete-bitmap, get rid of the row of having deleted, the count value of final rowid is returned.

Use the combined index of a plurality of associate fields provided by the invention, because of the field to combined index, use and there is no sequence requirement, therefore, when any one field to combined index is inquired about, can both use this combined index.Because f2, the and of f3 inquiry, therefore only need access wherein for example, data page in the conflict chain of the hash bucket of any one field (f2) get final product.

The inquiry of (5) two field and and or combination condition

For example SQL is as follows: selectcount (*) fromTableNamewheref2=v1and (f3=v2orf3=v3);

The execution flow process is as follows:

Second step: become and to go through all data pages, in the data data item from data page, the value of f2 field is taken out, and v1 mated, the value taking-up of f3 field, and v2 or v3 mated, and can mate, and obtains this rowid and satisfy condition;

In this example, in the situation that the value of f1 is definite to f2, the and of f3 inquiry, only need access wherein for example, data page in the conflict chain of the hash bucket of any one field (f2) get final product.

In routine use, often have the variation of data, will inevitably cause corresponding maintenance process.In general safeguard and mainly comprise increase, delete and revise.

Take combined index as key (f1, f2, f3) is example, increase data creation combined index flow process and be: first the value of all index column is done respectively to hash, hit several hash buckets and just these data are appended in those buckets, identical hash value is without repeated storage.In addition, can have redundancy (data appear in a plurality of Hash buckets) in the hash index, its idiographic flow is as follows:

The value of the first step: f1 is calculated the hash value, searches the first address of the data page of hash conflict chain from hash table according to the hash value;

Second step: become and go through all data pages, obtain free page, at the idle data page, write data line, content is rowid and data, the size that the content of data is the f1 field and value, the size of f2 field and value, the size of f3 field and value;

The value of the 3rd step: f2 is calculated the hash value, searches the first address of the data page of hash conflict chain from hash table according to the hash value;

The 4th step: become and go through all data pages, obtain free page, at the idle data page, write data line, content is rowid and data, the size that the content of data is the f1 field and value, the size of f2 field and value, the size of f3 field and value;

The value of the 5th step: f3 is calculated the hash value, searches the first address of the data page of hash conflict chain from hash table according to the hash value;

The 6th step: become and go through all data pages, obtain free page, at the idle data page, write data line, content is rowid and data, the size that the content of data is the f1 field and value, the size of f2 field and value, the size of f3 field and value;

Consideration based on efficiency, the deletion index data that deletion action generally can't be real, but the position of delete-bitmap corresponding row is set to 0, flow process is as follows:

The first step: obtain the rowid (flow process that obtains the capable rowid that satisfies condition according to querying condition refers to the querying flow explanation) that needs Delete Row according to querying condition;

Second step: according to this rowid, calculate the correspondence position of this row in delete-bitmap, the value that the delete-bitmap correspondence position is set is 0.As shown in Figure 2.

For data modification, the retouching operation flow process is exactly first to delete, then the process increased.Its detailed process can, with reference to the mode of operation of above-mentioned deletion, increase, not be described in detail at this.

Above one embodiment of the present of invention are had been described in detail, but described content is only preferred embodiment of the present invention, can not be considered to for limiting practical range of the present invention.All equalization variations of doing according to the present patent application scope and improvement etc., within all should still belonging to patent covering scope of the present invention.

Claims

1. the foundation method based on a plurality of relevant field combined indexs storages comprises:

4) set up the deletion bitmap (delete-bitmap) of the information of whether deleting of being expert in order to mean.

A method according to claim 1 generate based on a plurality of relevant field combined index storage organizations, comprising:

3. the querying method to the only field condition based on a plurality of relevant field combined index storage organizations according to claim 2 comprises:

4. the querying method that value is identical and querying condition is the or computing to two fields based on a plurality of relevant field combined index storage organizations according to claim 2 comprises:

One kind not identical to the value of two fields based on a plurality of relevant field combined index storage organizations according to claim 2, the querying method that querying condition is accidental computing comprises:

5) rowid second step and the 4th step obtained carries out the distinct duplicate removal, checks delete-bitmap, removes the row of having deleted, and according to rowid, obtains Query Result.

6. the querying method that is the and computing based on a plurality of relevant field combined index storage organizations to two not identical field querying conditions of value according to claim 2 comprises:

7. the and condition to two fields based on a plurality of relevant field combined index storage organizations according to claim 2, the value of the querying condition that the interior condition of one of them field is the or condition is the querying method of not identical computing, comprising:

3) result of the rowid that obtains is checked to delete-bitmap, get rid of the row of having deleted, final rowid is obtained to Query Result.

8. the increase data method based on a plurality of relevant field combined index storage organizations according to claim 2 comprises:

9. the deletion data method based on a plurality of relevant field combined index storage organizations according to claim 2 comprises:

1) obtain according to querying condition the rowid that needs Delete Row;

2) according to this rowid, calculate the correspondence position of this row in delete-bitmap, the value that the delete-bitmap correspondence position is set is 0.

10. the querying method based on a plurality of relevant field combined index storage organizations according to claim 2 comprises: