CN102609490B - Column-storage-oriented B+ tree index method for DWMS (data warehouse management system) - Google Patents

Column-storage-oriented B+ tree index method for DWMS (data warehouse management system) Download PDF

Info

Publication number
CN102609490B
CN102609490B CN201210019935.5A CN201210019935A CN102609490B CN 102609490 B CN102609490 B CN 102609490B CN 201210019935 A CN201210019935 A CN 201210019935A CN 102609490 B CN102609490 B CN 102609490B
Authority
CN
China
Prior art keywords
data
tree
index
train value
block
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201210019935.5A
Other languages
Chinese (zh)
Other versions
CN102609490A (en
Inventor
夏小玲
乐嘉锦
王梅
李晔锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Donghua University
Original Assignee
Donghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Donghua University filed Critical Donghua University
Priority to CN201210019935.5A priority Critical patent/CN102609490B/en
Publication of CN102609490A publication Critical patent/CN102609490A/en
Application granted granted Critical
Publication of CN102609490B publication Critical patent/CN102609490B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a column-storage-oriented B+ tree index method for a DWMS (data warehouse management system). The column-storage-oriented B+ tree index method is characterized by comprising a first step, generating column data; a second step, turning to a fourth step for building if a B+ tree keyword is a row number, and turning to a third step for sorting if the B+ tree keyword is not the row number; the third step, sorting column value data by means of using multi-line merging with heapsort; the fourth step, initiating a B+ tree index; a fifth step, creating leaf nodes; and a sixth step, generating a middle nodes an a bottom-up manner. The column-storage-oriented B+ tree index method for the DWMS is used for column storage, and has the advantages that 1), the number of layers of a B+ tree is the smallest, and the number of searching is reduced; and 2), a traditional plug-in method for building the B+ tree is abandoned, and the bottom-up B+ tree creating method is utilized. When the method is used, division operation is omitted, and a lot of expense is reduced.

Description

A kind of B+ tree indexing means towards row storage DWMS
Technical field
The present invention relates to a kind of tree of the B+ towards row storage DWMS (Data Warehouse Management System) index technology.
Background technology
The high development of internet, applications, high-throughput and large buffer memory become the nowadays essential feature of database product, day by day urgent for issued transaction and the high performance requirement of query analysis.Traditional row stored data base can not be for business decision provides efficient query analysis as processing transactions application, the database schema of row storage is in recent years by re-examine, for the inquiry type work of reading to optimize in data warehouse or in analysis support application, row storage list reveals than row to be stored and has performance more significantly.Because relation table is in external performance, be still logical line, to be therefore connected with multilist be to be listed as the key factor that affects query performance in storing in tuple reconstruct.Index is one of important technology improving search efficiency.B+ tree index can keep data to store in order, and have advantages of allow to search, sequential access, insertion and deletion, make it in transaction environment, become the most popular index structure of Database Systems.
In traditional Database Systems, the variation of the B+ tree construction that data insertion and modification bring is frequently very large.In order to reduce the possibility that causes node split when data are inserted and revised, the node of B+ tree is not what fill up.But in data warehouse, almost do not have data to insert and retouching operation.Traditional B+tree is applied in the data warehouse of mass data storage and will causes the waste in space.Meanwhile, because node is not filled completely, data storage needs more node, and this will increase the height of B+ tree index, reduces the search efficiency of data.
Summary of the invention
The object of this invention is to provide a kind of B+ tree index that is applied to row storage DWMS, overcome the limitation of traditional B+tree index, improve the search efficiency of data.
In order to achieve the above object, technical scheme of the present invention has been to provide a kind of B+ tree indexing means towards row storage DWMS, it is characterized in that, step is:
Step 1, column data generate: import user data, the original data by row storage are divided vertically into single-row, for each item number of each row is according to according to the line number at its place, add the line number item of rebuilding for tuple, form two tuple (line numbers, train value), request for data section, is kept at the every train value data that newly produce in a data segment;
If step 2 B+ tree key word is line number data, turn to step 4 to create; If B+ tree key word is train value data, goes to step 3 and first train value is sorted;
Step 3, multiway merge and heapsort are used in combination train value data are sorted;
Step 4, the initialization of B+ tree index;
Step 5, establishment B+ leaf node: the leafy node of application B+ tree, data item is filled directly into leafy node and obtains data block, form the 0th layer, B+ tree;
The feature of step 5 is:
According to row storage characteristics, direct save data information in leaf piece, but not the pointer of key assignments and sensing data block.In search procedure, can directly obtain data by leafy node like this, reduce one time I/O; When node is filled, no longer consider sparse coefficient, the each node in tree fills up, and has adopted the principle of " key for searching number is consistent with pointer number ", room for promotion utilization factor.
Step 6, generation intermediate node: iteration is set up the middle layer node of B+ tree until whole B+ tree creates end from the bottom to top.
Preferably, described step 3 comprises:
Step 3.1, initialization: in internal memory, apply for a sequence district, its size is designated as K, and making K is the integral multiple of block size.If pending data is counted Blk_num > K according to piece, the method for multiway merge external sort will be adopted.The size of M while calculating multiway merge, sub-list number when M is merger, order
Figure BDA0000132910760000021
in each sub-list for the treatment of merger, piece is counted D=Blk_num/M.
Source data in step 3.2, the section of reading, the train value Data Division in data block two tuples out, packs dataitem array into, puts into sequence district.Use Heap algorithm using dataitem array as input parameter, dataitem array is sorted; If Blk_num is <=K, sequence finishes; Otherwise, sorted data item is reassembled into piece, write back in interim section.
Step 3.3, the interim section of M is carried out to merge sort.
Preferably, described step 6 comprises:
Step 6.1, generating indexes item, the tlv triple that first train value, the line number of correspondence and the piece of this piece number that index entry is served as reasons in lower one deck piece forms;
Step 6.2, judge that train value type is fixed-length data or elongated data, if fixed length train value, calculate the length of index entry, and then can obtain the index entry number=index block space size/index entry length in index block, the index entry number in the data block number/each node in index block number=0th in ground floor layer.Application index block space, is inserted into index entry in index block in batches; If elongated train value, index block of first to file, then inserts tlv triple successively, until just apply for new index block while failing to lay down;
After step 6.3, one deck have created, turn to step 5.1, create last layer intermediate node according to same process, finish until whole B+ tree creates.
The invention provides a kind of B+ tree index tool that is applied to row storage DWMS has the following advantages:
1) provide improved B+ tree index node, the direct memory row data of leafy node, intermediate node is no longer considered sparse coefficient, all fills up, and has increased space availability ratio.For the data of same number, this structure guarantees that B tree hierachy is the shortest, has reduced and has searched number of times;
2) traditional insertion has been abandoned in the foundation of B+ tree, uses the method for bottom-up structure B+ tree.Make not need to consider in this way splitting operation, reduced a large amount of expenses;
3) line number in row storage system and train value data are set up respectively to B+ tree index, keep data to store in order, be convenient to tuple reconstruct and be connected with multilist.
Accompanying drawing explanation
Fig. 1 is the structure that B+ sets inner node (index entry);
Fig. 2 is the structure of B+ leaf node (data page).
Embodiment
For the present invention is become apparent, be hereby described in detail as follows with a preferred embodiment.
The invention provides a kind of B+ tree indexing means towards row storage DWMS, the steps include:
Step 1, column data generate: import user data, the original data by row storage are divided vertically into single-row, for each item number of each row is according to according to the line number at its place, add the line number item of rebuilding for tuple, form two tuple (line numbers, train value), request for data section, is kept at the every train value data that newly produce in a data segment;
If step 2 B+ tree key word is line number data, turn to step 4 to create; If B+ tree key word is train value data, goes to step 3 and first train value is sorted;
Step 3, multiway merge and heapsort are used in combination train value data are carried out to key assignments sequence, this step comprises:
Step 3.1, initialization: in internal memory, apply for a sequence district, its size is designated as K, and making K is the integral multiple of block size.If pending data is counted Blk_num > K according to piece, the method for multiway merge external sort will be adopted.The size of M while calculating multiway merge, sub-list number when M is merger, order in each sub-list for the treatment of merger, piece is counted D=Blk_num/M.
Source data in step 3.2, the section of reading, the train value Data Division in data block two tuples out, packs dataitem array into, puts into sequence district.Use Heap algorithm using dataitem array as input parameter, dataitem array is sorted; If Blk_num is <=K, sequence finishes; Otherwise, sorted data item is reassembled into piece, write back in interim section.
Step 3.3, the interim section of M is carried out to merge sort.
Step 4, B+ tree initialization, specifically comprise the descriptor of B+ tree carried out to assignment, needs theing contents are as follows of assignment:
(1) type of B+ tree index key assignments, comprises fixed/elongated, value type whether.The key assignments of fixed length comprises: SMALLINT, INT, NUMBER, CHAR, DATE, TIME; Elongated key assignments has the VARCHAR of comprising.
(2) information of row, comprises row name, row type, row length, if row are elongated, specifies maximum length.
(3) be B+ tree root piece allocation space.
(4) the level value that B+ tree is set is 0.
Step 5, establishment leafy node: the leafy node of application B+ tree, data item is filled directly into leafy node and obtains data block, form the 0th layer, B+ tree; This step is according to row storage characteristics, direct save data information in leaf piece, but not the pointer of key assignments and sensing data block.In search procedure, can directly obtain data by leafy node like this, reduce one time I/O.In addition, when node is filled, no longer consider sparse coefficient, the each node in tree fills up, room for promotion utilization factor, as shown in Figures 1 and 2.KEY in figure iby < line number, train value > two parts composition.
Step 6, generation intermediate node: iteration is set up the middle layer node of B+ tree until whole B+ tree creates end from the bottom to top, the steps include:
Step 6.1, generating indexes item, the tlv triple that index entry is made up of first train value in lower one deck piece, the line number of correspondence and the piece number of this piece, the length of computation index item, is designated as TL;
Step 6.2, application index block space: judge that according to initialization information in step 4 train value is fixed length or elongated, if fixed length train value calculates the length of index entry.Index entry number=each index block space size/index entry length T L in index block.Calculate the index entry number in the data block number/each index block in index block number=0th in ground floor layer.Apply for the index block space of this layer.Index entry is inserted in index block in batches; If elongated train value, index block of first to file, then inserts tlv triple successively, until just apply for new index block while failing to lay down.
After step 6.3, one deck have created, level=level+1; Turn to step 6.1, create last layer intermediate node according to same process, difference is the index entry number in index block number/each index block of the index block number=lower one deck that calculates current layer.In the time that the index block number in this layer is 1, iteration finishes, and this index block is the root node of B+ tree.

Claims (1)

1. towards a B+ tree indexing means of row storage DWMS, it is characterized in that, step is:
Step 1, column data generate: import user data, the original data by row storage are divided vertically into single-row, for each item number of each row is according to according to the line number at its place, add the line number item of rebuilding for tuple, form two tuple (line numbers, train value), request for data section, is kept at the every train value data that newly produce in a data segment;
If step 2 B+ tree key word is line number data, turn to step 4 to create; If B+ tree key word is train value data, goes to step 3 and first train value is sorted;
Step 3, multiway merge and heapsort are used in combination train value data are sorted, described step 3 comprises:
Step 3.1, initialization: in internal memory, apply for a sequence district, its size is designated as K, making K is the integral multiple of data block size to be sorted, if pending data is counted Blk_num>K according to piece, to adopt the method for multiway merge external sort, sub-list number when merger is M, order
Figure FDA0000442824400000011
in each sub-list for the treatment of merger, piece is counted D=Blk_num/M, source data in reading out data section, the train value Data Division in data block two tuples out, pack array into, put into sequence district, use Heap algorithm using array as input parameter, array is sorted; If Blk_num<=K, sequence finishes; Otherwise, sorted data item is reassembled into data block, write back in interim section;
Step 3.2, the interim section of N is carried out to merge sort;
Step 4, the initialization of B+ tree index;
Step 5, establishment B+ leaf node: the leafy node of application B+ tree, data item is filled directly into leafy node and obtains data block, form the 0th layer, B+ tree;
Step 6, generation intermediate node: iteration is set up the middle layer node of B+ tree until whole B+ tree creates end from the bottom to top, the steps include:
Step 6.1, generating indexes item, the tlv triple that first train value, the line number of correspondence and the piece of this index block number that index entry is served as reasons in next layer index piece forms;
Step 6.2, judge that train value type is fixed-length data or elongated data, if fixed length train value, calculate the length of index entry, and then can obtain the index entry number=index block space size/index entry length in index block, index entry number in data block number/each node in index block number=0th in ground floor layer, application index block space, is inserted into index entry in index block in batches; If elongated train value, index block of first to file, then inserts tlv triple successively, until just apply for new index block while failing to lay down;
After step 6.3, one deck have created, turn to step 6.1, create last layer intermediate node according to same process, finish until whole B+ tree creates.
CN201210019935.5A 2012-01-20 2012-01-20 Column-storage-oriented B+ tree index method for DWMS (data warehouse management system) Expired - Fee Related CN102609490B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210019935.5A CN102609490B (en) 2012-01-20 2012-01-20 Column-storage-oriented B+ tree index method for DWMS (data warehouse management system)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210019935.5A CN102609490B (en) 2012-01-20 2012-01-20 Column-storage-oriented B+ tree index method for DWMS (data warehouse management system)

Publications (2)

Publication Number Publication Date
CN102609490A CN102609490A (en) 2012-07-25
CN102609490B true CN102609490B (en) 2014-07-02

Family

ID=46526862

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210019935.5A Expired - Fee Related CN102609490B (en) 2012-01-20 2012-01-20 Column-storage-oriented B+ tree index method for DWMS (data warehouse management system)

Country Status (1)

Country Link
CN (1) CN102609490B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284299A (en) * 2015-06-08 2019-01-29 南京航空航天大学 Reconstruct the method with the hybrid index of storage perception

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870492B (en) * 2012-12-14 2017-08-04 腾讯科技(深圳)有限公司 A kind of date storage method and device based on key row sequence
CN104268146A (en) * 2014-08-21 2015-01-07 南京邮电大学 Static B+-tree index method suitable for analytic applications
CN104601732B (en) * 2015-02-12 2018-01-23 北京金和软件股份有限公司 A kind of quick method for realizing multichannel data merger
CN107066551B (en) * 2017-03-23 2020-04-03 中国科学院计算技术研究所 Row-type and column-type storage method and system for tree-shaped data
CN106980796B (en) * 2017-03-27 2020-03-06 河南科技大学 MDB-based cloud environment+Search method of tree multi-domain connection keywords
CN107273483B (en) * 2017-06-06 2019-11-05 贵州易鲸捷信息技术有限公司 The access method and system of sparse data
CN108920708B (en) * 2018-07-20 2021-04-27 新华三技术有限公司 Data processing method and device
CN109522271B (en) * 2018-10-22 2021-05-18 郑州云海信息技术有限公司 Batch insertion and deletion method and device for B + tree nodes
CN111275203A (en) * 2020-02-11 2020-06-12 深圳前海微众银行股份有限公司 Decision tree construction method, device, equipment and storage medium based on column storage

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101751406B (en) * 2008-12-18 2012-01-04 赵伟 Method and device for realizing column storage based relational database

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109284299A (en) * 2015-06-08 2019-01-29 南京航空航天大学 Reconstruct the method with the hybrid index of storage perception
CN109284299B (en) * 2015-06-08 2021-08-10 南京航空航天大学 Method for reconstructing a hybrid index with storage awareness

Also Published As

Publication number Publication date
CN102609490A (en) 2012-07-25

Similar Documents

Publication Publication Date Title
CN102609490B (en) Column-storage-oriented B+ tree index method for DWMS (data warehouse management system)
CN101673307B (en) Space data index method and system
CN102737033B (en) Data processing equipment and data processing method thereof
CN102890722B (en) Indexing method applied to time sequence historical database
Hjaltason et al. Speeding up construction of PMR quadtree-based spatial indexes
Song et al. HaoLap: A Hadoop based OLAP system for big data
CN102722531B (en) Query method based on regional bitmap indexes in cloud environment
US20120197900A1 (en) Systems and methods for search time tree indexes
CN103123650B (en) A kind of XML data storehouse full-text index method mapped based on integer
CN103745008A (en) Sorting method for big data indexing
CN103577440A (en) Data processing method and device in non-relational database
CN105095520A (en) Distributed type in-memory database indexing method oriented to structural data
Wang et al. Distributed storage and index of vector spatial data based on HBase
CN104391908B (en) Multiple key indexing means based on local sensitivity Hash on a kind of figure
CN104112011B (en) The method and device that a kind of mass data is extracted
CN103390015A (en) Mass data united storage method based on unified indexing and search method
CN103678550A (en) Mass data real-time query method based on dynamic index structure
CN102737123B (en) A kind of multidimensional data distribution method
CN102867066A (en) Data summarization device and data summarization method
Wang et al. Massive remote sensing image data management based on HBase and GeoSOT
CN104268158A (en) Structural data distributed index and retrieval method
CN107273443B (en) Mixed indexing method based on metadata of big data model
CN101916260A (en) Method for establishing semantic mapping between disaster body and relational database
CN103631839B (en) A kind of page region weight model implementation method
CN104794237A (en) Web page information processing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140702

Termination date: 20170120