CN103544300B - A kind of extensible storage index structure in cloud environment realize method - Google Patents

A kind of extensible storage index structure in cloud environment realize method Download PDF

Info

Publication number
CN103544300B
CN103544300B CN201310530188.6A CN201310530188A CN103544300B CN 103544300 B CN103544300 B CN 103544300B CN 201310530188 A CN201310530188 A CN 201310530188A CN 103544300 B CN103544300 B CN 103544300B
Authority
CN
China
Prior art keywords
partial indexes
index
node
server
indexes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310530188.6A
Other languages
Chinese (zh)
Other versions
CN103544300A (en
Inventor
周维
路劲
姚绍文
罗静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201310530188.6A priority Critical patent/CN103544300B/en
Publication of CN103544300A publication Critical patent/CN103544300A/en
Application granted granted Critical
Publication of CN103544300B publication Critical patent/CN103544300B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

What the invention discloses a kind of extensible storage index structure in cloud environment realizes method, this data structure of Skiplist is utilized to build extensible storage index structure in cloud environment targetedly, make data subset, data be ordered storage, so know that the interval bound of key to be checked just can realize the inquiry of scope。Meanwhile, the gloabal inode on upper strata is made up of the metadata of lower floor's node, alleviates the memory cost of upper strata global index, it is possible to store more gloabal inode, substantially increases the inquiry velocity of whole cloud storage system, and real-time is improved。Additionally, the storage index structure in the present invention can dynamically adjust, there is good autgmentability。

Description

A kind of extensible storage index structure in cloud environment realize method
Technical field
The invention belongs to cloud storage technical field, more specifically say, what relate to a kind of extensible storage index structure in cloud environment realizes method。
Background technology
Along with the development of computer techno-stress technology, cloud computing technology, as the practical distributed computing technology of a kind of high-performance, low cost, has been widely used in processing in the various network applications of representative with big data。The cloud storage system of enhanced scalability and reliability is increasingly becoming one of preferred option of big data process, and existing outstanding cloud storage system includes: the Cassandra etc. of GFS, MapReduce of Google and its Dynamo and Facebook realizing Hadoop, Amazon that increases income。Relative to conventional data storage system, cloud storage system distribution is wider, supports that data are more, and this means that big change necessarily occurs the secondary index system in cloud storage epoch。
Current most cloud storage system all adopts strong-value (Key-Value) model, the key (key) of inquiry and occurrence (value) is mapped as key-value pair (kv-pair) and accesses data。This model is simple, fits through major key and inquires about。There is good performance when singly-bound is inquired about, but can not effectively support range query。Actual application effect shows, this kind of cloud storage system based on Key-Value model there is also some places in urgent need to be improved。Such as, for an on line video on demand system, users often tend to adopt more than one key assignments to inquire about, or need inquiry particular community to be in the video information within some scope of data。In order to meet above-mentioned application demand, current solution, mainly by running a backstage batch processing task (such as running the task of a MapReduce), scans whole data set and then obtains Query Result。But, this kind of solution lacks ageing, and the data being newly stored in can not be inquired timely, it is necessary to waits the batch processing task on backstage to complete complete scanning, and data just can be able to be looked into。Above-mentioned analysis shows, all less desirable and poor in timeliness that current cloud storage system is supported in various dimensions inquiry and range query, it is necessary to build storage index structure under cloud environment。
Current minority is suggested based on the bilayer storage index structure of different pieces of information structure。These schemes can realize the extensibility of cloud storage system easily, enables cloud storage system to support large-scale inquiry simultaneously。But, these schemes mostly have employed the overlay network based on P2P agreement in global index and realize parallel query, but the maintenance of P2P network itself is more complicated, and network overhead during inquiry is also relatively larger, and this influences whether the query performance of cloud storage system。Simultaneously as existing cloud storage system is typically all master-slave structure, a P2P network to be rebuild on the nodes, original storage system can be brought certain negative effect。
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, it is provided that a kind of extensible storage index structure in cloud environment realize method, with solve in cloud storage system store index structure do not support range query and lack real-time problem。
For achieving the above object, extensible storage index structure in cloud environment of the present invention realize method, it is characterised in that comprise the following steps:
(1) double-layer structure of the expansible storage index of master-slave mode, is set up
Whole storage index structure is divided into upper and lower two-layer, and upper strata is global index, global index's server be responsible for, and lower floor is multiple partial indexes, and each partial indexes is responsible for by a partial indexes server;
Data set to be indexed being carried out cutting, according to average principle, is divided into the data subset comprising equal amount of data, the number of the data subset of division is equal with partial indexes server;Then, ready-portioned data subset and lower floor's index server one_to_one corresponding, and based on SkipList, set up partial indexes in each lower floor index server, each data of data subset are stored in each node of partial indexes by lower floor's index server respectively, complete the foundation of partial indexes;
On the basis that partial indexes has been set up, each partial indexes is selected a node and is published in the global index on upper strata as " representative " of oneself index range;During issue, the metadata extracting the stake node issued is sent in global index's server on upper strata, and metadata includes: the key of index, partial indexes server ip address, partial indexes server disk physical block number, to reach alleviate the memory cost of upper layer index and store the purpose of more nodes;After global index's server receives the metadata that each partial indexes of lower floor is issued, by the form of SkipList, these metadata are organized into a global index as gloabal inode, logically each for lower floor independent partial indexes is associated, maintain the global consistency of index space;
(2) adjustment of metadata node, is issued
After partial indexes is associated to the metadata of global index's issue stake node, each partial indexes can according to the income estimated, it may be judged whether will continue next level toward partial indexes and issue:
If next level toward partial indexes is issued, the inquiry velocity rate of change of cloud storage system is forward, and the EMS memory occupation rate of change more than global index's server, then the metadata of next level node of partial indexes is published in the global index on upper strata, otherwise, do not issue toward next level;
(3), inquiry
3.1), singly-bound value inquiry
A1, the entrance inquired about as process by global index's server on upper strata, first, inquire the gloabal inode of key to be checked in global index;A2, according to the gloabal inode inquired in global index, navigate on some partial indexes server that lower floor is concrete, query manipulation be transferred to this partial indexes server and continue with;A3, partial indexes server perform query manipulation, the partial indexes that inquiry the machine is responsible for, and after finding the data that key to be checked is corresponding, are directly returned to inquiry request originating end;
3.2), range query
B1, the mode inquired about according to singly-bound value, with the interval lower bound of key to be checked for key, find node corresponding in some partial indexes concrete;B2, the node found are starting point, the order data that traversal caching query arrive backward, until finding the interval upper bound of key to be checked, then all data found are returned to inquiry request originating end。
The goal of the invention of the present invention is achieved in that
Extensible storage index structure in cloud environment of the present invention realize method, this data structure of Skiplist is utilized to build extensible storage index structure in cloud environment targetedly, make data subset, data be ordered storage, so know that the interval bound of key to be checked just can realize the inquiry of scope。Meanwhile, the gloabal inode on upper strata is made up of the metadata of lower floor's node, alleviates the memory cost of upper strata global index, it is possible to store more gloabal inode, substantially increases the inquiry velocity of whole cloud storage system, and real-time is improved。Additionally, the storage index structure in the present invention can dynamically adjust, there is good autgmentability。
Accompanying drawing explanation
Fig. 1 is the double-layer structure schematic diagram of expansible storage index in the present invention;
Fig. 2 is the adjustment schematic diagram issuing metadata node in the present invention;
Fig. 3 is that partial indexes is published to the situation of change of accelerator coefficient and internal memory during global index;
Fig. 4 is the processing procedure schematic diagram of singly-bound value inquiry;
Fig. 5 is partial indexes division schematic diagram;
Fig. 6 is the organigram of expansible index structure;
Fig. 7 is range query handling process schematic diagram;
Fig. 8 is partial indexes division schematic diagram。
Detailed description of the invention
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described, in order to those skilled in the art is more fully understood that the present invention。Requiring particular attention is that, in the following description, when perhaps the detailed description of known function and design can desalinate the main contents of the present invention, these descriptions here will be left in the basket。
Embodiment
Fig. 1 is the double-layer structure schematic diagram of expansible storage index in the present invention。
In the present embodiment, as it is shown in figure 1, the process setting up the double-layer structure of the expansible storage index of master-slave mode is:
Whole storage index structure is divided into upper and lower two-layer, lower floor is multiple partial indexes, each partial indexes is responsible for by a partial indexes server, the data of index leave in the partial indexes node of lower floor, the effect of a location and guiding is then played by the global index on upper strata, global index's server be responsible for。
When storage index is set up, first data set to be indexed being carried out cutting, according to average principle, be divided into the data subset comprising equal amount of data, the number of the data subset of division is equal with the partial indexes server of lower floor。Then, ready-portioned data subset and lower floor's index server one_to_one corresponding, setting up partial indexes in each lower floor index server based on SkipList, each data of data subset are stored in each node of partial indexes by lower floor's index server respectively, complete the foundation of partial indexes。
On the basis that partial indexes has been set up, each partial indexes is selected a node and is published in the global index on upper strata as " representative " of oneself index range, wherein, it is necessary to include the stake node in partial indexes, i.e. and black circles part in Fig. 1。During issue, be not directly using intact for node in the lower floor's partial indexes upper strata global index that is copied to as node, but extract these metadata being published node, including: the key of index, partial indexes server ip address, partial indexes server disk physical block number, only transmits metadata in upper strata global index server, it is possible to reach alleviate the memory cost of upper strata global index and store the purpose of more nodes。After global index's server receives the metadata that each partial indexes of lower floor is issued, by the form of SkipList, these metadata are organized into a global index as gloabal inode, logically each for lower floor independent partial indexes is associated, maintain the global consistency of index space。
Fig. 2 is the adjustment schematic diagram issuing metadata node in the present invention;
After partial indexes is associated to the metadata of global index's issue stake node, each partial indexes can according to the income estimated, it may be judged whether will continue next level toward partial indexes and issue:
If next level toward partial indexes is issued, the inquiry velocity rate of change of cloud storage system is forward, and the EMS memory occupation interconversion rate more than global index's server, then the metadata of next level node of partial indexes is published in the global index on upper strata, otherwise, do not issue toward next level。
In the present embodiment, as described in Figure 2, present invention introduces dynamic publishing adjustment algorithm, next level and L2(L3 level are the level of current issue according to the income estimated by the partial indexes of lower floor) judge, seeing the need of issuing, the income of next level L2 and the inquiry velocity rate of change of cloud storage system are forward, and more than the EMS memory occupation rate of change of global index's server, therefore, the metadata of the node of L2 level is published in the global index on upper strata。
Index storage organization in the present invention adopts the partial indexes of lower floor to issue the metadata of node to the global index on upper strata, builds global index, safeguard the globality of index structure in global index's server on upper strata。When partial indexes in lower floor issues the metadata issuing node to the global index on upper strata, employing is the top-down mode metadata of node that is stepped up issuing。First, the metadata of highest level (the L4 level in Fig. 2) node is published in global index by each partial indexes, the characteristic of SkipList ensure that top node necessarily includes a node, then each partial indexes can according to the income estimated, it may be judged whether continue to issue toward next level。
After the strategy estimated is issued according to partial indexes, overall index structure and the inquiry velocity rate of change of cloud storage system and the internal memory rate of change of global server are benchmark。Fig. 2 gives the node that partial indexes issues when extending from L3 level to L2 level, the situation that in the global index of upper strata, node changes。Because the characteristic of SkipList itself, the node of next level always comprises the node of last layer level, so when issuing to the extension of next level, the metadata by the new node not comprised before is only needed to be sent to the global index on upper strata, the metadata of the node not having before namely only inserting in global index。
For SkipList, the data of its index are stored in the node of the bottom, and each node upwards raises with the probability of p simultaneously, and the acceleration node partly as inquiry of rising uses。Therefore, in SkipList, the quantity of top-down node will increase with the form of power level。Corresponding with the node quantity that lower floor partial indexes is issued, the form with power level is also increased by the memory cost of upper strata global index。Based on this reason, the metadata issuing node to upper strata that lower floor's partial indexes can not be without restriction, should using the integrated value of the inquiry velocity of index structure and cloud storage system entirety and memory cost as the basis for estimation of issue。
Fig. 3 is after partial indexes is published to global index, and the inquiry velocity of index structure entirety and global index's server memory take the schematic diagram of situation。From figure 3, it can be seen that along with the increase issuing the number of plies, the inquiry velocity of index structure entirety promotes trend and slows down gradually, and the EMS memory occupation of global index's server and expense are then significantly raised。Therefore, the present invention according to the situation of change of inquiry velocity and EMS memory occupation as judgement, can set a threshold value when issuing, as the Greatest lower bound issuing the number of plies。Method particularly includes: set QoldBefore issuing next layer of node metadata for partial indexes, the inquiry velocity that index structure is overall。QnewAfter issuing next layer of node data for partial indexes, the inquiry velocity that the index structure estimated is overall。Assuming that partial indexes is published to next layer, then the index structure entirety inquiry velocity rate of change after issuing is:
A query = Q new - Q old Q old
It is A with index structure entirety inquiry velocity rate of changequeryDefinition method identical, definition global server EMS memory occupation rate of change be Amen_load。The threshold definitions that then partial indexes of lower floor issues node metadata to the global index on upper strata is as follows:
The present invention stores index and have employed double-layer structure, process a typical singly-bound value query manipulation time, as shown in Figure 4, including 3 step: a1, by global index's server on upper strata as process inquiry entrance, first, in global index, inquire the gloabal inode of key to be checked;A2, according to the gloabal inode inquired in global index, navigate on some partial indexes server that lower floor is concrete, query manipulation be transferred to this partial indexes server and continue with;A3, partial indexes server perform query manipulation, the partial indexes that inquiry the machine is responsible for, and after finding the data that key to be checked is corresponding, are directly returned to inquiry request originating end。
Except conventional singly-bound value inquiry, support of the present invention retrieves all corresponding values inside this interval according to the interval of a key to be checked, namely supports range query。Data set to be indexed, when building, is divided into some mutually disjoint subsets by this index structure in order, and each subset is mapped in an independent partial indexes。Partial indexes adopts SkipList storage, then the subset data wherein stored also is ordered into。Therefore, carrying out range query by this index structure mainly needs 2 step: b1, the mode inquired about according to singly-bound value, with the interval lower bound of key to be checked for key, finds node corresponding in some partial indexes concrete;B2, the node found are starting point, the order data that traversal caching query arrive backward, until finding the interval upper bound of key to be checked, then all data found are returned to inquiry request originating end。
Additionally, the present invention also adopts Dynamic Division algorithm to solve the hot issue in local servers, preserve the load balancing that index structure is overall。
In the present invention, whole index structure is divided into mutually disjoint subset, and each subset is safeguarded by independent partial indexes respectively。Along with the dynamic insertion indexed and deletion etc. adjust operation, the size of partial indexes likely there will be difference, and the partial indexes namely having becomes larger, and some partial indexes can diminish on the contrary。The size of partial indexes changes and likely can cause that the load between each partial indexes occurs unbalanced, because relatively large partial indexes, being accessed for probability can strengthen。Accordingly, it would be desirable to corresponding Dynamic Division algorithm solves hot issue that may be present in partial indexes。
The splitting algorithm of partial indexes depends on wall, and (x i) has defined。(x, i) accept 2 parameter: x is the start node calculating division position to wall, is traditionally arranged to be the stake node of partial indexes to be divided;I is splitting factor, is used to determine the floor height of division position, and this parameter is more big, is retained in the node in first half subset more many after division, and anti regular is more few。
The division flow process of partial indexes is:
To be intended to carry out the partial indexes S of division process1Stake node be start node x, from height i start to find first, the right height equal to i+1 node wall (x, i);
With node wall, (x, i) for boundary, by partial indexes S1First half points to node wall, and (x, i) heir pointer of node is revised as NULL, namely from partial indexes S1The node of middle deletion latter half;
With node wall, (node of latter half, i) for boundary, is sent to partial indexes S by x2, it is inserted into partial indexes S one by one2In, i.e. S2For receiving S1The partial indexes of the latter half data split off。
Therefore, (x, i) represents such a node to wall, and this node is first height of i-th layer of the right sensing of the node x node equal to i+1。Fig. 5 gives an example asking wall (5,3) with node 5 for start node, required by the node 24 in wire frame is。
Example
One, the construction process of extensible storage index structure
Current internet, applications generally adopts cloud storage system to preserve the business datum of magnanimity, and the general Hash table in a distributed manner of these cloud storage systems (DistributedHashTable is called for short DHT) provides the entrance of access。DHT is a kind of typically strong-value model example, and its key, in the process being saved in DHT, first can be calculated corresponding cryptographic Hash, then be mapped in the relevant position of logical space further according to cryptographic Hash by data。Because the discreteness of hash function, result in the storage of data is random distribution, therefore adopts the mode of DHT to store data and can not well support range query。The present invention proposes a kind of secondary index being implemented in cloud storage system upper strata, and the process of its structure is as shown in Figure 6。
1, the memory space of cloud storage system is divided by this storage index structure, according to the spatial dimension of the good each partial indexes management of equivalent and orderly policy setting。In this example, the memory space of cloud storage system is 1-12, but owing to DHT adopts the characteristic of hash function Discrete Mapping, the storage of wherein each key is unordered。Assuming that we adopt 3 partial indexes data to preserve in distributed storage, according to the principle of equivalent, each partial indexes should store 4 data。Therefore, from left to right, No. 1 partial indexes management 1-4, No. 2 partial indexes management 5-8, No. 3 partial indexes management 9-12。
Data in distributed memory system are mapped in the partial indexes of correspondence by 2, the partial indexes range of management distributed according to the first step。After mapping process completes, would is that orderly inside each partial indexes, be also ordered between each partial indexes simultaneously。
3, the node of its highest level and stake node are published in the global index on upper strata by each partial indexes of lower floor respectively, global index issues the metadata come by lower floor, structure global index, thus each partial indexes being associated, constitutes complete index space。Based on the characteristic of SkipList, its first node necessarily belongs to highest level, we term it stake node。In figure 6, from left to right, the stake node that stake node is 5, No. 3 partial indexes that stake node is 1, No. 2 partial indexes of No. 1 partial indexes is 9。After these top stake nodes are published to upper strata global index, this global index constitutes a global index comprising 1,5,9 three node。
4, each partial indexes of lower floor progressively carries out downwards the iteration issue of node。According to global index's EMS memory occupation rate of change after inquiry velocity rate of change after the issue estimated and issue, determine whether to the metadata continuing to issue partial indexes node to next level。For Fig. 6, in the process of structure, from left to right, No. 1 partial indexes has been issued 1, No. 2 partial indexes and has been issued 5, No. 3 partial indexes and issued 9。Assuming that partial indexes continues when issuing downwards, the overall inquiry velocity of index structure can obtain forward income, then No. 1 partial indexes will be issued 3, No. 2 partial indexes again and will issue 7, No. 3 partial indexes again and will issue 11 again。Therefore, after issuing still further below, the global index on upper strata will comprise 1,3,5,7,9,11 and have 6 data altogether。If estimating inquiry velocity rate of change and global index's EMS memory occupation rate of change, the inquiry velocity income of the index structure obtained entirety is negative sense, then stop issuing to next level。
Two, the query processing process of expansible index structure
The expansible index structure that the present invention proposes is to have employed two-level decision-making, and index data is actually stored in each partial indexes, and the global index on upper strata is then used for associating each partial indexes, safeguards the global consistency of index space。When this index structure is implemented query manipulation, first using the global index on upper strata as the entrance inquired about, by inquiring about global index, can determine which partial indexes is actual and comprise data to be checked。Secondly, query processing will hand to this partial indexes, this partial indexes, after inquiring the data determined, be directly returned to the promoter of inquiry request。
Fig. 7 gives in the concrete handling process of a range query, the index space of index structure and embodiment one consistent, and for 1-12, data to be checked are 1-6。
1, interval to be checked can be sent to the global server on upper strata, global index's entry key (i.e. data 1) using interval lower bound as inquiry, retrieves in global index。
2, after the global index on upper strata navigates to concrete partial indexes according to the key of lower bound, the lower floor's partial indexes being forwarded to issue this key is comprehended in Directory Enquiries。In the figure 7, the data 1 of global index are to be issued by the first left partial indexes of lower floor, so query processing will be transferred to this partial indexes and continue with。
3, when partial indexes receives the query processing request delivered, first according to interval to be checked, the index of oneself can be traveled through。Because being ordered into inside each partial indexes, therefore only need constantly to travel through backward, till meeting the upper bound that inquiry is interval。If the compass of competency of a partial indexes was looked in interval to be checked, then need to hand to inquiry request the rear stepbrother of this partial indexes。Owing to being also orderly mutually between each partial indexes, therefore this delivers the integrity and correctness that ensure that inquiry。For Fig. 7, because interval to be checked has 6 data, and each partial indexes only manages 4 data, directly goes forward one by one deliver so needing to carry out partial indexes。After No. 1 partial indexes inquires key 4, find not meet the upper bound that inquiry is interval, therefore inquiry request is handed to the follow-up partial indexes (we term it No. 2 partial indexes) on the right by it, after No. 2 partial indexes receive inquiry request, continue order in this space to retrieve backward, until retrieving data 6, meet the upper bound in interval to be checked。So far, the process of this range query terminates, and the data set inquired directly returns to the request end of inquiry from No. 2 partial indexes。
Range query is one of principal character of this index structure, and singly-bound inquiry is as special circumstances (namely interval to be checked is the situation of 1) of range query, and its processing procedure is consistent with the flow process of above-mentioned introduction。Differring primarily in that, singly-bound inquiry is not related to the traversal within partial indexes, is also not related to delivering between partial indexes。When singly-bound is inquired about, after upper strata global index hands to a certain partial indexes of lower floor, inside this partial indexes, directly find data to be checked, namely can return to。
Three, the fission process of partial indexes
Autgmentability is a key property of the index structure that the present invention proposes, and extensibility requires that dynamic growth can be supported in this index。The data of this index structure are stored in each partial indexes, along with the dynamic insertion of data, in fact it could happen that part partial indexes management overabundance of data, thus causing being accessed for the situation that probability increases。For this, the present invention proposes a partial indexes splitting algorithm, when the overabundance of data of storage in partial indexes, can by its one-to-two, first half remains in former index server, and latter half can select storage to a new partial indexes server, or move on the partial indexes server of existing light load。
Fig. 8 gives the example of a partial indexes division。The processing procedure of its division is as follows:
First, according to wall, (x i) determines the site position of division。In this example, that the initial calculation node selection of division is partial indexes S1First stake node (namely 5) the highest, the value of splitting factor i is 3。Then according to definition, the division position found is 24;
With node 24 for boundary, by first half node points to the pointer of node 24, it is modified to point to NULL。What need amendment is the correspondence position of node 5,14,19。
The later node of node 24 moves on new local servers, or is integrated into one by one on the partial indexes server of existing light load。If moving to new local servers, an optional step is to the highest (elevated height to the highest 5 layers by node 24) by the height adjustment of first stake node, to ensure top only one of which node。If moving in the partial indexes of existing light load, then need the node by 24 is later to travel through one by one, be reinserted in this partial indexes。The height of each node after migration, is as the criterion the height to generate after reinserting。
Although above the illustrative detailed description of the invention of the present invention being described; so that those skilled in the art understand the present invention; it is to be understood that; the invention is not restricted to the scope of detailed description of the invention; to those skilled in the art; as long as various changes limit and in the spirit and scope of the present invention determined, these changes are apparent from, and all utilize the innovation and creation of present inventive concept all at the row of protection in appended claim。

Claims (3)

1. an extensible storage index structure in cloud environment realize method, it is characterised in that comprise the following steps:
(1), the double-layer structure of the expansible storage index of master-slave mode
Whole storage index structure is divided into upper and lower two-layer, and upper strata is global index, global index's server be responsible for, and lower floor is multiple partial indexes, and each partial indexes is responsible for by a partial indexes server;
Data set to be indexed being carried out cutting, according to average principle, is divided into the data subset comprising equal amount of data, the number of the data subset of division is equal with partial indexes server;Then, ready-portioned data subset and lower floor's index server one_to_one corresponding, and based on SkipList, set up partial indexes in each lower floor index server, each data of data subset are stored in each node of partial indexes by lower floor's index server respectively, complete the foundation of partial indexes;
On the basis that partial indexes has been set up, each partial indexes is selected a node and is published in the global index on upper strata as " representative " of oneself index range;During issue, the metadata extracting the stake node issued is sent in global index's server on upper strata, and metadata includes: the key of index, partial indexes server ip address, partial indexes server disk physical block number, to reach alleviate the memory cost of upper layer index and store the purpose of more nodes;After global index's server receives the metadata that each partial indexes of lower floor is issued, by the form of SkipList, these metadata are organized into a global index as gloabal inode, logically each for lower floor independent partial indexes is associated, maintain the global consistency of index space;
(2) adjustment of metadata node, is issued
After partial indexes is associated to the metadata of global index's issue stake node, each partial indexes can according to the income estimated, it may be judged whether will continue next level toward partial indexes and issue:
If next level toward partial indexes is issued, the inquiry velocity rate of change of cloud storage system is forward, and the EMS memory occupation rate of change of big overall situation index server, then the metadata of next level node of partial indexes is published in the global index on upper strata, otherwise, do not carry out sending out toward next level;
(3), inquiry
3.1), singly-bound value inquiry
A1, the entrance inquired about as process by global index's server on upper strata, first, inquire the gloabal inode of key to be checked in global index;A2, according to the gloabal inode inquired in global index, navigate on some partial indexes server that lower floor is concrete, query manipulation be transferred to this partial indexes server and continue with;A3, partial indexes server perform query manipulation, the partial indexes that inquiry the machine is responsible for, and after finding the data that key to be checked is corresponding, are directly returned to inquiry request originating end;
3.2), range query
B1, the mode inquired about according to singly-bound value, with the interval lower bound of key to be checked for key, find node corresponding in some partial indexes concrete;B2, the node found are starting point, the order data that traversal caching query arrive backward, until finding the interval upper bound of key to be checked, then all data found are returned to inquiry request originating end。
2. extensible storage index structure according to claim 1 realize method, it is characterised in that also include the fission process of partial indexes:
To be intended to carry out the partial indexes S of division process1Stake node be start node x, start to find the node wall of the i+1 such as first height in the right (x, i), i is splitting factor, is the floor height of division position from height i;
With node wall, (x, i) for boundary, by partial indexes S1First half points to node wall, and (x, i) heir pointer of node is revised as NULL, namely from partial indexes S1The node of middle deletion latter half;
With node wall, (node of latter half, i) for boundary, is sent to partial indexes S by x2, it is inserted into partial indexes S one by one2In, i.e. S2For receiving S1The partial indexes of the latter half data split off;
First half and partial indexes S1Remain in former index server, and latter half and partial indexes S2Storage can be selected to a new partial indexes server, or move on the partial indexes server of existing light load。
3. extensible storage index structure according to claim 1 realize method, it is characterized in that, if described next level toward partial indexes is issued, the inquiry velocity rate of change of cloud storage system is forward, and the EMS memory occupation rate of change more than global index's server, then the metadata of next level node of partial indexes is published in the global index on upper strata, otherwise, is not issued as toward next level:
If QoldBefore issuing next layer of node metadata for partial indexes, the inquiry velocity that index structure is overall, QnewAfter issuing next layer of node data for partial indexes, the inquiry velocity that the index structure estimated is overall;Partial indexes is published to next layer, then the index structure entirety inquiry velocity rate of change after issuing is:
A q u e r y = Q n e w - Q o l d Q o l d
It is A with index structure entirety inquiry velocity rate of changequeryDefinition method identical, definition global server EMS memory occupation rate of change be Amen_load, then the threshold definitions that the partial indexes of lower floor issues node metadata to the global index on upper strata is as follows:
CN201310530188.6A 2013-10-31 2013-10-31 A kind of extensible storage index structure in cloud environment realize method Expired - Fee Related CN103544300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310530188.6A CN103544300B (en) 2013-10-31 2013-10-31 A kind of extensible storage index structure in cloud environment realize method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310530188.6A CN103544300B (en) 2013-10-31 2013-10-31 A kind of extensible storage index structure in cloud environment realize method

Publications (2)

Publication Number Publication Date
CN103544300A CN103544300A (en) 2014-01-29
CN103544300B true CN103544300B (en) 2016-06-22

Family

ID=49967752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310530188.6A Expired - Fee Related CN103544300B (en) 2013-10-31 2013-10-31 A kind of extensible storage index structure in cloud environment realize method

Country Status (1)

Country Link
CN (1) CN103544300B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005588B (en) * 2015-06-26 2018-04-20 深圳市腾讯计算机系统有限公司 A kind of processing method and processing device of training data
CN106649790A (en) * 2016-12-28 2017-05-10 华中科技大学 Multilayer link separated skiplist construction method and system
CN107679212A (en) * 2017-10-17 2018-02-09 安徽慧视金瞳科技有限公司 A kind of data query optimization method for being applied to jump list data structure
CN108121807B (en) * 2017-12-26 2021-06-04 云南大学 Method for realizing multi-dimensional Index structure OBF-Index in Hadoop environment
CN108664662B (en) * 2018-05-22 2021-08-31 上海交通大学 Time travel and tense aggregate query processing method
CN109933584B (en) * 2019-01-31 2021-04-02 北京大学 Multi-level unordered indexing method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659739A (en) * 1995-10-02 1997-08-19 Digital Equipment Corporation Skip list data structure enhancements
CN101150489A (en) * 2007-10-19 2008-03-26 四川长虹电器股份有限公司 Resource share method based on distributed hash table
CN101272399A (en) * 2008-04-25 2008-09-24 浙江大学 Method for implementing full text retrieval system based on P2P network
CN101950300A (en) * 2010-09-20 2011-01-19 华南理工大学 Hierarchical structure, distributed search engine system and implementation method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659739A (en) * 1995-10-02 1997-08-19 Digital Equipment Corporation Skip list data structure enhancements
CN101150489A (en) * 2007-10-19 2008-03-26 四川长虹电器股份有限公司 Resource share method based on distributed hash table
CN101272399A (en) * 2008-04-25 2008-09-24 浙江大学 Method for implementing full text retrieval system based on P2P network
CN101950300A (en) * 2010-09-20 2011-01-19 华南理工大学 Hierarchical structure, distributed search engine system and implementation method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于并发跳表的云数据处理双层索引架构研究;周维 等;《计算机研究与发展》;20150715;第52卷(第7期);1531-1545 *

Also Published As

Publication number Publication date
CN103544300A (en) 2014-01-29

Similar Documents

Publication Publication Date Title
CN103544300B (en) A kind of extensible storage index structure in cloud environment realize method
CN104123359B (en) Resource management method of distributed object storage system
Liao et al. Multi-dimensional index on hadoop distributed file system
CN103327052B (en) Date storage method and system and data access method and system
CN103810244A (en) Distributed data storage system expansion method based on data distribution
CN109376156B (en) Method for reading hybrid index with storage awareness
CN103067461B (en) A kind of metadata management system of file and metadata management method
CN102708165B (en) Document handling method in distributed file system and device
US20130268644A1 (en) Consistent ring namespaces facilitating data storage and organization in network infrastructures
CN103020078B (en) Distributing real-time data bank data hierarchy indexing means
CN103577123A (en) Small file optimization storage method based on HDFS
Hongchao et al. Distributed data organization and parallel data retrieval methods for huge laser scanner point clouds
CN104077423A (en) Consistent hash based structural data storage, inquiry and migration method
CN104536959A (en) Optimized method for accessing lots of small files for Hadoop
CN104133882A (en) HDFS (Hadoop Distributed File System)-based old file processing method
CN107436813A (en) A kind of method and system of meta data server dynamic load leveling
CN103366016A (en) Electronic file concentrated storing and optimizing method based on HDFS
CN104281701A (en) Method and system for querying distributed multi-scale spatial data
CN110321325A (en) File inode lookup method, terminal, server, system and storage medium
CN104407879A (en) A power grid timing sequence large data parallel loading method
CN105608224A (en) Orthogonal multilateral Hash mapping indexing method for improving massive data inquiring performance
CN103631894A (en) Dynamic copy management method based on HDFS
CN103455531A (en) Parallel indexing method supporting real-time biased query of high dimensional data
CN109189341B (en) Directory load balancing method, device, equipment and medium for distributed storage system
CN105357247A (en) Multi-dimensional cloud resource interval finding method based on hierarchical cloud peer-to-peer network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160622

Termination date: 20191031