CN103544300A - Method for realizing extensible storage index structure in cloud environment - Google Patents

Method for realizing extensible storage index structure in cloud environment Download PDF

Info

Publication number
CN103544300A
CN103544300A CN201310530188.6A CN201310530188A CN103544300A CN 103544300 A CN103544300 A CN 103544300A CN 201310530188 A CN201310530188 A CN 201310530188A CN 103544300 A CN103544300 A CN 103544300A
Authority
CN
China
Prior art keywords
partial indexes
index
node
server
indexes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201310530188.6A
Other languages
Chinese (zh)
Other versions
CN103544300B (en
Inventor
周维
路劲
姚绍文
罗静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201310530188.6A priority Critical patent/CN103544300B/en
Publication of CN103544300A publication Critical patent/CN103544300A/en
Application granted granted Critical
Publication of CN103544300B publication Critical patent/CN103544300B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Abstract

The invention discloses a method for realizing an extensible storage index structure in a cloud environment. The extensible storage index structure in the cloud environment is established specifically through utilization of a Skiplist data structure, so that data sub-collection and data are stored in order. Therefore, range query can be realized once the upper and lower bounds of a section of a to-be-inquired key are known. Meanwhile, a global index node in the upper layer is composed of metadata of a node in the lower layer, so that internal memory expense of the global index of the upper layer is reduced, more global nodes can be stored, the inquiry speed of the whole cloud storage system is improved greatly, and the real-time performance is improved. In addition, the storage index structure in the invention can be adjusted dynamically and achieves very good extensibility.

Description

Implementation method that can extension storage index structure under a kind of cloud environment
Technical field
The invention belongs to cloud technical field of memory, more specifically say, relate under a kind of cloud environment can extension storage index structure implementation method.
Background technology
Along with the development of computing machine and network technology, cloud computing technology is as a kind of high-performance, practical distributed computing technology cheaply, has been widely used in take in the diverse network application that large data processing is representative.The cloud storage system of enhanced scalability and reliability becomes one of preferred option of large data processing gradually, and existing outstanding cloud storage system comprises: the GFS of Google, MapReduce with and increase income and realize the Dynamo of Hadoop, Amazon and the Cassandra of Facebook etc.With respect to conventional data storage system, cloud storage system distributes wider, and supported data is more, and large variation must occur this secondary index system that just means the cloud storage epoch.
Current most cloud storage system all adopts strong-value (Key-Value) model, and the key (key) of inquiry and occurrence (value) are mapped as to key-value pair (kv-pair) and carry out access data.This model is simple, is applicable to inquiring about by major key.In singly-bound inquiry, there is good performance, but can not effectively support range query.Actual effect shows, the cloud storage system of this class based on Key-Value model also exists some places in urgent need to be improved.For example, for an on line video on demand system, users often tend to adopt more than one key assignments to inquire about, and maybe need to inquire about the video information of particular community within some data areas.In order to meet above-mentioned application demand, current solution is for example mainly, by moving a backstage batch processing task (moving the task of a MapReduce), scanning whole data set and then obtain Query Result.Yet this class solution lacks ageing, the data that newly deposit in can not be inquired timely, must wait the batch processing task on backstage to complete complete scanning, and data just can be looked into.Above-mentioned analysis shows, current cloud storage system is not very desirable and poor in timeliness support aspect various dimensions inquiry and range query, is necessary to build under cloud environment and stores index structure.
The bilayer storage index structure of minority based on different pieces of information structure is suggested at present.These schemes can realize the extensibility of cloud storage system easily, make cloud storage system can support large-scale inquiry simultaneously.But these schemes have mostly adopted the overlay network based on P2P agreement to realize parallel query in global index, but P2P network itself safeguard more complicated, network overhead during inquiry is also larger, this can have influence on the query performance of cloud storage system.Meanwhile, because existing cloud storage system is all generally master-slave structure, on these nodes, rebuild a P2P network, can bring certain negative effect to original storage system.
Summary of the invention
The object of the invention is to overcome the deficiencies in the prior art, provide under a kind of cloud environment can extension storage index structure implementation method, to solve, in cloud storage system, store index structure and do not support range query and the problem that lacks real-time.
For achieving the above object, implementation method that can extension storage index structure under cloud environment of the present invention, is characterized in that, comprises the following steps:
(1), set up master-slave mode can extension storage index double-layer structure
Whole storage index structure is divided into two-layer up and down, and upper strata is that ,You global index of global index server is in charge of, and lower floor is a plurality of partial indexes, and each partial indexes is in charge of by a partial indexes server;
The data set for the treatment of index carries out cutting, according to average principle, is divided into the data subset that comprises equal amount of data, and the number of the data subset of division equates with partial indexes server; Then, ready-portioned data subset is corresponding one by one with lower floor index server, in Bing Ge lower floor index server, take SkipList as Foundation partial indexes, lower floor's index server is stored in each data of data subset respectively in each node of partial indexes, completes the foundation of partial indexes;
On the basis of having set up in partial indexes, each partial indexes is selected a node and is published in the global index on upper strata as " representative " of own index range; During issue, the metadata that extracts the stake node of issue sends in global index's server on upper strata, and metadata comprises: the key of index, partial indexes server ip address, partial indexes server disk physical block number, to reach the object of the memory cost and the more nodes of storage that alleviate upper layer index; Global index's server receives after the metadata of each partial indexes issue of lower floor, form by SkipList is organized into Yi Ge global index using these metadata as gloabal inode, logically by lower floor each independently partial indexes associate, maintained the global consistency of index space;
(2), the adjustment of issue metadata node
Partial indexes is carried out after association to the metadata of global index's issue stake node, and each partial indexes can be according to the income of estimating, and judges whether that next level that will continue toward partial indexes issues:
If next level toward partial indexes is issued, the inquiry velocity rate of change of cloud storage system is forward, and be greater than the EMS memory occupation rate of change of global index's server, the metadata of next level node of partial indexes is published in the global index on upper strata, otherwise down one deck level is not issued;
(3), inquiry
3.1), singly-bound value inquiry
A1, by global index's server on upper strata as the entrance of processing inquiry, first in , global index, inquire the gloabal inode of key to be checked; A2, according to the gloabal inode inquiring in global index, navigate on the concrete some partial indexes servers of lower floor, query manipulation is transferred to this partial indexes server and continues to process; A3, partial indexes server are carried out query manipulation, and the responsible partial indexes of inquiry the machine, finds after the data that key to be checked is corresponding, directly returns to inquiry request originating end;
3.2), range query
B1, according to the mode of singly-bound value inquiry, the interval lower bound of key to be checked of take is key, finds node corresponding in concrete some partial indexes; B2, the node finding are starting point, and order is the traversal data that also caching query arrives backward, until find the interval upper bound of key to be checked, then all data that find are returned to inquiry request originating end.
Goal of the invention of the present invention is achieved in that
Implementation method that can extension storage index structure under cloud environment of the present invention, utilizing this data structure of Skiplist to build targetedly under cloud environment can extension storage index structure, making data subset, data is all to store in order, knows that so the interval bound of key to be checked just can be realized the inquiry of scope.Meanwhile, the gloabal inode on upper strata consists of the metadata of lower floor's node, has alleviated the memory cost of upper strata global index, and can store more gloabal inode, has greatly improved the inquiry velocity of whole cloud storage system, and real-time is improved.In addition, the storage index structure in the present invention can dynamically be adjusted, and has good extendability.
Accompanying drawing explanation
Fig. 1 be can extension storage index in the present invention double-layer structure schematic diagram;
Fig. 2 is the adjustment schematic diagram of issuing metadata node in the present invention;
Fig. 3 is the situation of change of partial indexes accelerator coefficient and internal memory while being published to global index;
Fig. 4 is the processing procedure schematic diagram of singly-bound value inquiry;
Fig. 5 is partial indexes division schematic diagram;
Fig. 6 is the organigram that can expand index structure;
Fig. 7 is range query treatment scheme schematic diagram;
Fig. 8 is partial indexes division schematic diagram.
Embodiment
Below in conjunction with accompanying drawing, the specific embodiment of the present invention is described, so that those skilled in the art understands the present invention better.Requiring particular attention is that, in the following description, when perhaps the detailed description of known function and design can desalinate main contents of the present invention, these are described in here and will be left in the basket.
Embodiment
Fig. 1 be can extension storage index in the present invention double-layer structure schematic diagram.
In the present embodiment, as shown in Figure 1, set up master-slave mode can extension storage index the process of double-layer structure be:
Whole storage index structure is divided into two-layer up and down, lower floor is a plurality of partial indexes, each partial indexes is in charge of by a partial indexes server, the deposit data of index is in the partial indexes node of lower floor, the global index on upper strata plays the effect ,You global index server of a location and guiding and is in charge of.
When storage index is set up, the data set that first can treat index carries out cutting, according to average principle, is divided into the data subset that comprises equal amount of data, and the number of the data subset of division equates with the partial indexes server of lower floor.Then, ready-portioned data subset is corresponding one by one with lower floor index server, in Ge lower floor index server, take SkipList as Foundation partial indexes, lower floor's index server is stored in each data of data subset respectively in each node of partial indexes, completes the foundation of partial indexes.
On the basis of having set up in partial indexes, each partial indexes is selected a node and is published in the global index on upper strata as " representative " of own index range, wherein, must comprise stake node, i.e. black circles part in Fig. 1 in partial indexes.During issue, be not directly using the intact copy of node in lower floor's partial indexes to upper strata global index as node, but extract the metadata that these are published node, comprise: the key of index, partial indexes server ip address, partial indexes server disk physical block number, only sends to metadata in upper strata global index server, can reach the object of the memory cost and the more nodes of storage that alleviate upper strata global index.Global index's server receives after the metadata of each partial indexes issue of lower floor, form by SkipList is organized into Yi Ge global index using these metadata as gloabal inode, logically by lower floor each independently partial indexes associate, maintained the global consistency of index space.
Fig. 2 is the adjustment schematic diagram of issuing metadata node in the present invention;
Partial indexes is carried out after association to the metadata of global index's issue stake node, and each partial indexes can be according to the income of estimating, and judges whether that next level that will continue toward partial indexes issues:
If next level toward partial indexes is issued, the inquiry velocity rate of change of cloud storage system is forward, and be greater than the EMS memory occupation interconversion rate of global index's server, the metadata of next level node of partial indexes is published in the global index on upper strata, otherwise down one deck level is not issued.
In the present embodiment, as described in Figure 2, the present invention introduces dynamic issue adjustment algorithm, the partial indexes of lower floor is that L2(L3 level is the level of current issue according to the income of estimating to next level) judge, whether see and need to issue, the income of next level L2, and be greater than the EMS memory occupation rate of change of global index's server if being that the inquiry velocity rate of change of cloud storage system is forward, therefore, the metadata of the node of L2 level is published in the global index on upper strata.
Index stores structure in the present invention adopts the partial indexes of lower floor to the metadata of global index's issue node on upper strata, in global index's server on upper strata, builds global index, safeguards the globality of index structure.In the partial indexes issue of lower floor during to the metadata of global index's issue node on upper strata, employing be the metadata that top-down mode progressively increases the node of issue.First, each partial indexes is published to the metadata of highest level (the L4 level in Fig. 2) node in global index, the characteristic of SkipList has guaranteed that top node must comprise a node, then each partial indexes can, according to the income of estimating, judge whether to continue a down level issue.
The strategy of estimating is according to after partial indexes issue, and whole index structure is that the inquiry velocity rate of change of cloud storage system and the internal memory rate of change of global server are benchmark.The node that Fig. 2 has provided a partial indexes issue extends the situation that the global index of Shi, upper strata, node changes from L3 level to L2 level.Because the characteristic of SkipList itself, the node of next level is always comprising the node of last layer level, so in a downward level expansion issue, only need the metadata of the new node not comprising before to send to the metadata of the node not having before inserting in the ,Ji Jin global index of global index on upper strata.
For SkipList, the data of its index are to be stored in the node of the bottom, and each node upwards raises with the probability of p simultaneously, and the part of rising is used as the acceleration node of inquiry.Therefore, in SkipList, the quantity of top-down node will increase with the form of power level.Corresponding with the node quantity of lower floor partial indexes issue, the memory cost of upper strata global index also increases the form with power level.Based on this reason, lower floor's partial indexes can not be without restriction to the metadata of upper strata issue node, and the index structure of should usining is that the inquiry velocity of cloud storage system integral body and the integrated value of memory cost are as the basis for estimation of issue.
Fig. 3 is that partial indexes is published to after global index, and the inquiry velocity of index structure integral body and global index's server memory take the schematic diagram of situation.As can be seen from Figure 3, along with the increase of the issue number of plies, it is that expense obviously raises that the inquiry velocity of index structure integral body promotes slow down the gradually EMS memory occupation of ,Er global index server of trend.Therefore, the present invention can set a threshold value according to the situation of change of inquiry velocity and EMS memory occupation as judgement when issue, as the Greatest lower bound of the issue number of plies.Concrete grammar is: establish Q oldbefore the lower one deck node metadata of partial indexes issue, the inquiry velocity of index structure integral body.Q newafter the lower one deck nodal point number certificate of partial indexes issue, the inquiry velocity of the index structure integral body of estimating.Suppose partial indexes is published to lower one deck, the whole inquiry velocity rate of change of the index structure after issue is:
A query = Q new - Q old Q old
With the whole inquiry velocity rate of change of index structure be A querydefine method identical, definition global server EMS memory occupation rate of change is A men_load.The partial indexes of lower floor is defined as follows to the threshold value of global index's issue node metadata on upper strata:
In the present invention, store index and adopted double-layer structure, when processing a typical singly-bound value query manipulation, as shown in Figure 4, comprise 3 step: a1, the entrance of being inquired about as processing by global index's server on upper strata, first in , global index, inquire the gloabal inode of key to be checked; A2, according to the gloabal inode inquiring in global index, navigate on the concrete some partial indexes servers of lower floor, query manipulation is transferred to this partial indexes server and continues to process; A3, partial indexes server are carried out query manipulation, and the responsible partial indexes of inquiry the machine, finds after the data that key to be checked is corresponding, directly returns to inquiry request originating end.
Except conventional singly-bound value inquiry, support of the present invention is retrieved this all corresponding values in the inside, interval according to the interval of a key to be checked, supports range query.This index structure, when building, will treat that directoried data set is divided into some mutually disjoint subsets in order, and each subset is mapped in an independent partial indexes.Partial indexes adopts SkipList storage, and wherein the subset data of storage is also orderly.Therefore, by this index structure carry out range query mainly need 2 step: b1, according to the mode of singly-bound value inquiry, the interval lower bound of key to be checked of take is key, finds node corresponding in concrete some partial indexes; B2, the node finding are starting point, and order is the traversal data that also caching query arrives backward, until find the interval upper bound of key to be checked, then all data that find are returned to inquiry request originating end.
In addition, the present invention also adopts Dynamic Division algorithm to solve the hot issue in localized services device, preserves the load balancing of index structure integral body.
In the present invention, whole index structure is divided into mutually disjoint subset, and each subset is safeguarded by independent partial indexes respectively.Along with adjusting operations such as the dynamic insertion of index and deletions, the size of partial indexes likely there will be difference, and the partial indexes having becomes greatly gradually, and some partial indexes can diminish on the contrary.The size of partial indexes changes and likely can cause the load appearance between each partial indexes unbalanced, because relatively large partial indexes, accessed probability can strengthen.Therefore, need corresponding Dynamic Division algorithm to solve the hot issue that may exist in partial indexes.
The splitting-up method of partial indexes depends on wall (x, i) and has defined.Wall (x, i) accepts 2 parameter: x for calculating the start node of division position, is traditionally arranged to be the stake node of partial indexes to be divided; I is splitting factor, is for determining the floor height of division position, and the node being retained in first half subset after the larger division of this parameter is more, and anti regular is fewer.
The division flow process of partial indexes is:
To want to divide the partial indexes S processing 1stake a node be start node x, from height i start to find the right first highly equal the node wall (x, i) of i+1;
Take node wall (x, i) as boundary, by partial indexes S 1the heir pointer that first half points to node wall (x, i) node is revised as NULL, from partial indexes S 1the node of middle deletion latter half;
Take node wall (x, i) as boundary, the node of latter half is sent to partial indexes S 2, be inserted into one by one partial indexes S 2in, i.e. S 2for receiving S 1the partial indexes of the latter half data that split off.
Therefore, wall (x, i) represents such node, this node be i layer the right of node x point to first highly equal the node of i+1.Fig. 5 has provided one take node 5 and asks the example of wall (5,3) as start node, and the node 24 in wire frame is required.
Example
One, construction process that can extension storage index structure
Current internet, applications adopts cloud storage system to preserve the business datum of magnanimity conventionally, and these cloud storage systems generally provide the entrance of access with distributed hashtable (Distributed Hash Table is called for short DHT).DHT is a kind of typical be good for-value model example, and data, in being saved in the process of DHT, first can being calculated corresponding cryptographic hash to its key, and then according to cryptographic hash, be mapped in the relevant position of logical space.Therefore because the discreteness of hash function, the storage that has caused data is stochastic distribution, adopts the mode of DHT to store data and can not well support range query.The present invention proposes a kind of secondary index that is implemented in cloud storage system upper strata, the process of its structure as shown in Figure 6.
1, this storage index structure is divided the storage space of cloud storage system, the spatial dimension of managing according to equivalent and orderly good each partial indexes of policy setting.In this example, the storage space of cloud storage system is 1-12, but because DHT adopts the characteristic of hash function Discrete Mapping, the storage of wherein each key is unordered.Suppose that we adopt 3 partial indexes to preserve the data in distributed storage, according to the principle of equivalent, each partial indexes should be stored 4 data.Therefore, from left to right, No. 1 partial indexes management 1-4, No. 2 partial indexes management 5-8, No. 3 partial indexes management 9-12.
2, the partial indexes range of management distributing according to the first step, by the data-mapping in distributed memory system in corresponding partial indexes.After mapping process completes, each partial indexes inside will be orderly, is also orderly between each partial indexes simultaneously.
3, each partial indexes of lower floor is respectively that a stake node is published in the global index on upper strata by the node of its highest level, the metadata that global index issues by lower floor, structure global index, thus each partial indexes is associated to the index space of complete.Characteristic based on SkipList, its first node must belong to highest level, and we are referred to as a node.In Fig. 6, from left to right, No. 1 partial indexes stake node be 1, No. 2 partial indexes stake node be 5, No. 3 partial indexes stake a node be 9.When being published to Hou,Gai global index of upper strata global index, these top stake nodes formed a global index that comprises 1,5,9 three nodes.
4, each partial indexes of lower floor is progressively carried out the iteration issue of node downwards.According to global index's EMS memory occupation rate of change after inquiry velocity rate of change after the issue of estimating and issue, judge whether to continue the metadata of a downward level issue partial indexes node.Take Fig. 6 as example, and in the process of structure, from left to right, No. 1 partial indexes has been issued 1, No. 2 partial indexes and has been issued 5, No. 3 partial indexes and issued 9.While supposing that partial indexes continues issue downwards, the inquiry velocity of index structure integral body can obtain forward income, and No. 1 partial indexes will be issued 3, No. 2 partial indexes again and will issue 7, No. 3 partial indexes again and will issue 11 again.Therefore,, after issuing downwards again, in the global index on upper strata, will comprise 1,3,5,7,9,11 and have 6 data altogether.If estimate inquiry velocity rate of change and global index's EMS memory occupation rate of change, the inquiry velocity income of the index structure integral body obtaining is negative sense, stops a downward level issue.
Two, can expand the query processing process of index structure
The index structure expanded that the present invention proposes is to have adopted two layer architectures, and in fact index data is stored in each partial indexes, and the global index on upper strata is used for associated each partial indexes, safeguards the global consistency of index space.When this index structure is implemented to query manipulation, first can using the global index on upper strata as the entrance of inquiry, by inquiry global index, determine which partial indexes is actual and comprising data to be checked.Secondly, query processing will be handed to this partial indexes, by this partial indexes, is inquired after established data, directly returns to the promoter of inquiry request.
Fig. 7 has provided the concrete treatment scheme of a range query, and the index space of index structure, with consistent in embodiment mono-, is 1-12, and data to be checked are 1-6.
1, interval to be checked can be sent to the global server on upper strata, and interval lower bound is usingd as retrieving in entry key (being data 1) the , global index of inquiring about in global index.
2, the global index when upper strata navigates to after concrete partial indexes according to the key of lower bound, and lower floor's partial indexes of being handed to this key of issue is comprehended in Directory Enquiries.The first left partial indexes issue of the data 1 Shi You lower floor of Tu7Zhong, global index, so will being transferred to this partial indexes, query processing continues to process.
3, when partial indexes receives the query processing request of delivering, first can, according to interval to be checked, travel through the index of oneself.Because each partial indexes inside is orderly, thus only need constantly to travel through backward, until meet the upper bound between interrogation zone.If the compass of competency of a partial indexes was looked in interval to be checked, need inquiry request to hand to the rear stepbrother of this partial indexes.Owing to being also orderly mutually between each partial indexes, therefore this delivers integrality and the correctness that can guarantee inquiry.Take Fig. 7 as example, because interval to be checked has 6 data, and each partial indexes is only managed 4 data, so need to carry out partial indexes, directly goes forward one by one and delivers.When No. 1 partial indexes inquires after key 4, find not meet the upper bound between interrogation zone, therefore it hands to inquiry request the follow-up partial indexes (we are referred to as partial indexes No. 2) on the right, No. 2 partial indexes receives after inquiry request, continuation is order retrieval backward in this space, until retrieve data 6, met the upper bound in interval to be checked.So far, the processing of this range query finishes, and the data set inquiring directly returns to the request end of inquiry from No. 2 partial indexes.
Range query is one of principal character of this index structure, and singly-bound is inquired about the special circumstances (being that interval to be checked is 1 situation) as range query, and the flow process of its processing procedure and above-mentioned introduction is consistent.The key distinction is, singly-bound inquiry does not relate to the traversal of partial indexes inside, does not relate to delivering between partial indexes yet.For the situation of singly-bound inquiry, upper strata global index hands to after a certain partial indexes of lower floor, directly in this partial indexes inside, finds data to be checked, can return.
Three, the fission process of partial indexes
Extendability is a key property of the index structure that proposes of the present invention, and extensibility requires this index can support dynamic growth.The data of this index structure are stored in each partial indexes, along with the dynamic insertion of data, may occur that part partial indexes management data is too much, thus the situation that causes accessed probability to increase.For this reason, the present invention proposes a partial indexes splitting-up method, the overabundance of data of storing in partial indexes, can be by its one-to-two, first half continues to be retained in former index server, and latter half can select to store into a new partial indexes server, or move on the lighter partial indexes server of existing load.
Fig. 8 has provided the example of a partial indexes division.The processing procedure of its division is as follows:
First, according to wall (x, i), determine the site position of division.In this example, that the initial calculation node selection of division is partial indexes S 1first the highest stake node (5), the value of splitting factor i is 3., according to definition, the division position of finding is 24;
With node 24Wei circle, will in first half node, point to the pointer of node 24, be revised as and point to NULL.What need modification is node 5,14,19 correspondence position.
The later node of node 24 moves on new localized services device, or is integrated into one by one on the lighter partial indexes server of existing load.If move to new localized services device, an optional step is to the highest (elevated height that is about to node 24 arrives the highest 5 layers), to guarantee the top one node that only has by the height control of first node.If move in the lighter partial indexes of existing load, need 24 later nodes to travel through one by one, reinsert in this partial indexes.After migration, the height of each node, will be as the criterion to reinsert the height of rear generation.
Although above the illustrative embodiment of the present invention is described; so that those skilled in the art understand the present invention; but should be clear; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various variations appended claim limit and definite the spirit and scope of the present invention in, these variations are apparent, all utilize innovation and creation that the present invention conceives all at the row of protection.

Claims (3)

1. a implementation method that can extension storage index structure under cloud environment, is characterized in that, comprises the following steps:
(1), master-slave mode can extension storage index double-layer structure
Whole storage index structure is divided into two-layer up and down, and upper strata is that ,You global index of global index server is in charge of, and lower floor is a plurality of partial indexes, and each partial indexes is in charge of by a partial indexes server;
The data set for the treatment of index carries out cutting, according to average principle, is divided into the data subset that comprises equal amount of data, and the number of the data subset of division equates with partial indexes server; Then, ready-portioned data subset is corresponding one by one with lower floor index server, in Bing Ge lower floor index server, take SkipList as Foundation partial indexes, lower floor's index server is stored in each data of data subset respectively in each node of partial indexes, completes the foundation of partial indexes;
On the basis of having set up in partial indexes, each partial indexes is selected a node and is published in the global index on upper strata as " representative " of own index range; During issue, the metadata that extracts the stake node of issue sends in global index's server on upper strata, and metadata comprises: the key of index, partial indexes server ip address, partial indexes server disk physical block number, to reach the object of the memory cost and the more nodes of storage that alleviate upper layer index; Global index's server receives after the metadata of each partial indexes issue of lower floor, form by SkipList is organized into Yi Ge global index using these metadata as gloabal inode, logically by lower floor each independently partial indexes associate, maintained the global consistency of index space;
(2), the adjustment of issue metadata node
Partial indexes is carried out after association to the metadata of global index's issue stake node, and each partial indexes can be according to the income of estimating, and judges whether that next level that will continue toward partial indexes issues:
If next level toward partial indexes is issued, the inquiry velocity rate of change of cloud storage system is forward, and the EMS memory occupation rate of change of large overall index server, the metadata of next level node of partial indexes is published in the global index on upper strata, otherwise down one deck level is not sent out;
(3), inquiry
3.1), singly-bound value inquiry
A1, by global index's server on upper strata as the entrance of processing inquiry, first in , global index, inquire the gloabal inode of key to be checked; A2, according to the gloabal inode inquiring in global index, navigate on the concrete some partial indexes servers of lower floor, query manipulation is transferred to this partial indexes server and continues to process; A3, partial indexes server are carried out query manipulation, and the responsible partial indexes of inquiry the machine, finds after the data that key to be checked is corresponding, directly returns to inquiry request originating end;
3.2), range query
B1, according to the mode of singly-bound value inquiry, the interval lower bound of key to be checked of take is key, finds node corresponding in concrete some partial indexes; B2, the node finding are starting point, and order is the traversal data that also caching query arrives backward, until find the interval upper bound of key to be checked, then all data that find are returned to inquiry request originating end.
2. implementation method that can extension storage index structure according to claim 1, is characterized in that, also comprises the fission process of partial indexes:
To want to divide the partial indexes S processing 1a stake node be start node x, first highly waits the node wall (x, i) of i+1 from height i, to start to find the right, i is splitting factor, is the floor height of division position;
Take node wall (x, i) as boundary, by partial indexes S 1the heir pointer that first half points to node wall (x, i) node is revised as NULL, from partial indexes S 1the node of middle deletion latter half;
Take node wall (x, i) as boundary, the node of latter half is sent to partial indexes S 2, be inserted into one by one partial indexes S 2in, i.e. S 2for receiving S 1the partial indexes of the latter half data that split off.
First half is partial indexes S 1continue to be retained in former index server, and latter half is partial indexes S 2can select to store into a new partial indexes server, or move on the lighter partial indexes server of existing load.
3. implementation method that can extension storage index structure according to claim 1, it is characterized in that, if described next level toward partial indexes is issued, the inquiry velocity rate of change of cloud storage system is forward, and be greater than the EMS memory occupation rate of change of global index's server, the metadata of next level node of partial indexes is published in the global index on upper strata, otherwise down one deck level is not issued as:
If Q oldbefore the lower one deck node metadata of partial indexes issue, the inquiry velocity of index structure integral body, Q newafter the lower one deck nodal point number certificate of partial indexes issue, the inquiry velocity of the index structure integral body of estimating; Partial indexes is published to lower one deck, and the whole inquiry velocity rate of change of the index structure after issue is:
A query = Q new - Q old Q old
With the whole inquiry velocity rate of change of index structure be A querydefine method identical, definition global server EMS memory occupation rate of change is A men_loadthe partial indexes of ,Ze lower floor is defined as follows to the threshold value of global index's issue node metadata on upper strata:
Figure FDA0000405929020000031
CN201310530188.6A 2013-10-31 2013-10-31 A kind of extensible storage index structure in cloud environment realize method Expired - Fee Related CN103544300B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310530188.6A CN103544300B (en) 2013-10-31 2013-10-31 A kind of extensible storage index structure in cloud environment realize method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310530188.6A CN103544300B (en) 2013-10-31 2013-10-31 A kind of extensible storage index structure in cloud environment realize method

Publications (2)

Publication Number Publication Date
CN103544300A true CN103544300A (en) 2014-01-29
CN103544300B CN103544300B (en) 2016-06-22

Family

ID=49967752

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310530188.6A Expired - Fee Related CN103544300B (en) 2013-10-31 2013-10-31 A kind of extensible storage index structure in cloud environment realize method

Country Status (1)

Country Link
CN (1) CN103544300B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005588A (en) * 2015-06-26 2015-10-28 深圳市腾讯计算机系统有限公司 Training data processing method and apparatus
CN106649790A (en) * 2016-12-28 2017-05-10 华中科技大学 Multilayer link separated skiplist construction method and system
CN107679212A (en) * 2017-10-17 2018-02-09 安徽慧视金瞳科技有限公司 A kind of data query optimization method for being applied to jump list data structure
CN108121807A (en) * 2017-12-26 2018-06-05 云南大学 The implementation method of multi-dimensional index structures OBF-Index under Hadoop environment
CN108664662A (en) * 2018-05-22 2018-10-16 上海交通大学 Time travel and tense aggregate query processing method
CN109933584A (en) * 2019-01-31 2019-06-25 北京大学 A kind of unordered indexing means of multistage and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659739A (en) * 1995-10-02 1997-08-19 Digital Equipment Corporation Skip list data structure enhancements
CN101150489A (en) * 2007-10-19 2008-03-26 四川长虹电器股份有限公司 Resource share method based on distributed hash table
CN101272399A (en) * 2008-04-25 2008-09-24 浙江大学 Method for implementing full text retrieval system based on P2P network
CN101950300A (en) * 2010-09-20 2011-01-19 华南理工大学 Hierarchical structure, distributed search engine system and implementation method thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5659739A (en) * 1995-10-02 1997-08-19 Digital Equipment Corporation Skip list data structure enhancements
CN101150489A (en) * 2007-10-19 2008-03-26 四川长虹电器股份有限公司 Resource share method based on distributed hash table
CN101272399A (en) * 2008-04-25 2008-09-24 浙江大学 Method for implementing full text retrieval system based on P2P network
CN101950300A (en) * 2010-09-20 2011-01-19 华南理工大学 Hierarchical structure, distributed search engine system and implementation method thereof

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
周维 等: "基于并发跳表的云数据处理双层索引架构研究", 《计算机研究与发展》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105005588A (en) * 2015-06-26 2015-10-28 深圳市腾讯计算机系统有限公司 Training data processing method and apparatus
CN105005588B (en) * 2015-06-26 2018-04-20 深圳市腾讯计算机系统有限公司 A kind of processing method and processing device of training data
CN106649790A (en) * 2016-12-28 2017-05-10 华中科技大学 Multilayer link separated skiplist construction method and system
CN107679212A (en) * 2017-10-17 2018-02-09 安徽慧视金瞳科技有限公司 A kind of data query optimization method for being applied to jump list data structure
CN108121807A (en) * 2017-12-26 2018-06-05 云南大学 The implementation method of multi-dimensional index structures OBF-Index under Hadoop environment
CN108664662A (en) * 2018-05-22 2018-10-16 上海交通大学 Time travel and tense aggregate query processing method
CN108664662B (en) * 2018-05-22 2021-08-31 上海交通大学 Time travel and tense aggregate query processing method
CN109933584A (en) * 2019-01-31 2019-06-25 北京大学 A kind of unordered indexing means of multistage and system
CN109933584B (en) * 2019-01-31 2021-04-02 北京大学 Multi-level unordered indexing method and system

Also Published As

Publication number Publication date
CN103544300B (en) 2016-06-22

Similar Documents

Publication Publication Date Title
CN103544300B (en) A kind of extensible storage index structure in cloud environment realize method
CN104123359B (en) Resource management method of distributed object storage system
CN103810244A (en) Distributed data storage system expansion method based on data distribution
CN102169507B (en) Implementation method of distributed real-time search engine
CN105718455B (en) A kind of data query method and device
US9628438B2 (en) Consistent ring namespaces facilitating data storage and organization in network infrastructures
CN103002027B (en) Data-storage system and the method for tree directory structure is realized based on key-value pair system
CN103455531B (en) A kind of parallel index method supporting high dimensional data to have inquiry partially in real time
CN103366016A (en) Electronic file concentrated storing and optimizing method based on HDFS
CN102158546A (en) Cluster file system and file service method thereof
US10853193B2 (en) Database system recovery using non-volatile system memory
CN103577123A (en) Small file optimization storage method based on HDFS
CN104133882A (en) HDFS (Hadoop Distributed File System)-based old file processing method
CN104899297A (en) Hybrid index structure with storage perception
CN102819599A (en) Method for constructing hierarchical catalogue based on consistent hashing data distribution
CN110321325A (en) File inode lookup method, terminal, server, system and storage medium
CN104111924A (en) Database system
CN109189341B (en) Directory load balancing method, device, equipment and medium for distributed storage system
CN105791370A (en) Data processing method and related server
CN103218433A (en) Method and module for managing metadata applied to random access
CN106021414B (en) A kind of method and system accessing multi-level buffer parameter information
CN114338718B (en) Distributed storage method, device and medium for massive remote sensing data
Chihoub et al. A scalability comparison study of data management approaches for smart metering systems
CN104794196A (en) Tree structure data collecting and updating method
CN111190863B (en) Catalog management method, device, equipment and medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160622

Termination date: 20191031

CF01 Termination of patent right due to non-payment of annual fee