CN106599040A - Layered indexing method and search method for cloud storage - Google Patents

Layered indexing method and search method for cloud storage Download PDF

Info

Publication number
CN106599040A
CN106599040A CN201610975816.5A CN201610975816A CN106599040A CN 106599040 A CN106599040 A CN 106599040A CN 201610975816 A CN201610975816 A CN 201610975816A CN 106599040 A CN106599040 A CN 106599040A
Authority
CN
China
Prior art keywords
index
paging
data
interval
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610975816.5A
Other languages
Chinese (zh)
Inventor
郭皓明
王之欣
魏闫艳
庞廓
田霂
焉丽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Software of CAS
Original Assignee
Institute of Software of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Software of CAS filed Critical Institute of Software of CAS
Priority to CN201610975816.5A priority Critical patent/CN106599040A/en
Publication of CN106599040A publication Critical patent/CN106599040A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2264Multidimensional index structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a layered indexing method and a search method for cloud storage. Aiming at the characteristic that in a cloud storage environment mass data is organized and managed in a distributed manner, a layered indexing structure is built by using the method provided by the invention, wherein a top layer is an overall index which forms dimensions according to different attributes of data and builds a mapping relation between values and pages in a Hash inversion manner for each dimension; an index corresponding to the value of the upper-layer dimension is built in a local index at a bottom layer, and thus mapping with the data in the local storage page is achieved. The indexing structure is good in balance and expansibility, has the characteristic that the scale of a data set has less influence on query execution efficiency, and supports multidimensional and Boolean queries; meanwhile, when a node of the cloud environment changes, node migration maintenance is only performed at the overall index, so that index maintenance is simple, and requirements of the cloud storage environment can be met. The methods provided by the invention have positive application values in the field of distributed databases.

Description

A kind of hierarchical index method and search method towards cloud storage
Technical field
The invention belongs to distributed cloud storage system index technology research and application, and in particular to a kind of facing cloud is deposited The hierarchical index method of storage and search method.
Background technology
In recent years, with the fast development of the information technologys such as cloud computing, Internet of Things, the Internet, the form of information system is sent out Raw large change, cloudization services and the development trend for being increasingly becoming information system is built with full ecosphere.This trend causes high in the clouds Data volume reaches the scale of TB, PB level in explosive growth, and its growth rate is considerably beyond traditional Moore's Law.With in As a example by institute of section sea cloud platform, it is a typical collaboration services platform based on cloud computing, and by extra large end adopting for data is realized Collection and pre-treatment, are realized the unified storage of data, are inquired about, analyze and extracted with knowledge by high in the clouds.Meanwhile, high in the clouds is by automatically negative Carry balanced realization platform adaptive.In with smart city as typical application scenarios, this platform high in the clouds day data increment size Reach more than 10G.Meanwhile, this platform will not only meet the requirement of data simple retrieval, also to support various excavations, analysis with The complex operations such as many-valued inquiry and boolean queries in knowledge extraction activity.With the rapid growth of data scale, how in magnanimity On the basis of data, towards the dynamic environment of cloud computing, there is provided high-performance complex query is supported becomes an important problem.
Inquiry is a kind of complicated data manipulation, and after data set reaches certain scale, the efficiency of data query just becomes The Main Bottleneck of information system performance.Index is the important means for realizing improving data retrieval and search efficiency.Index technology Method for organizing has the i.e. positive index of two important directions and inverted index.In traditional relational database, inverted index application It is relatively broad.Meanwhile, according to the data structure of index, Ordered indices and hash index fundamental type are formed again.B+-Tree indexes It is typical Ordered indices, it is adopted based on balanced tree, organizes one-dimensional data structure to realize the retrieval of data, due to its structure Efficiently the advantages of, it is extensively applied in traditional relational database.However, in the applied environment of above-mentioned extra large cloud platform, number Larger according to collecting, the structure expansion that this results in B+-Tree is more serious, remains a need for boolean queries simultaneously for many-valued Complicated crossing filtering is processed, therefore overall search efficiency is affected larger by data set scale.
Since 2000, complex query difficult problem causes the note of related researcher caused by data scale expansion Meaning.It is generation that a collection of research institutions such as Google in 2004 release key assignments (key-value) data base based on cloud computing, successively The big data storage of table, inquiry solution.Highly Scalable is had based on the cloud data store query technology of key-value Property, high availability and the features such as fault-tolerance, the purpose of the efficient storage to mass data and inquiry can be realized.key-value Data base, based on hash index, sets up mapping relations, for rowkey by the Hash of rowkey and data object value The problem that index cannot sort, research worker is again by it in combination with the technologies such as B+Tree indexes.Key-value data bases exist Inquiry velocity on rowkey is very fast, but can only be realized by the way of full table scan in non-rowkey, although pass through The parallel architectures such as MapReduce can to a certain extent improve inquiry velocity, generally speaking, for many-valued inquiry and boolean queries Efficiency it is still relatively low, when data set is larger, inquiry velocity can not meet application demand.
On the other hand, distributed is main flow framework that current large-scale dataset storage management is adopted.It is flat with aforementioned extra large cloud As a example by platform, its high in the clouds adopt typical parallel data storage management framework, by upper strata master nodes realize scheduling of resource with Data distribution.Realize that data are locally stored by bottom worker nodes, in order to reduce local data sets I/O load, adopt inside it Data are read and write with the mode of paging (acquiescence 512k).Meanwhile, in cloud computing environment, in order to meet the requirement of load balancing, number Be migrated among the nodes according to certain scheduling strategy according to meeting.This requires to index and is supporting the same of high-performance complex query When, with preferably migration adaptability.The problems referred to above in cloud computing environment cause index creation, safeguard tired with inquiry operation It is difficult.
The content of the invention
Around the problems referred to above, the present invention conducts a research work with regard to mass data index technology in cloud environment.Form two-layer rope Draw framework.Its top layer is global index, and the index forms dimension with the different attribute of data, is each dimension with the row of hash The mapping relations that mode is set up between value and paging.Rope corresponding with upper strata dimension value is set up in the partial indexes of bottom Draw, realize the mapping with data in locally stored paging.Efficiently, balance is preferable with autgmentability, the index for this index structure Less feature is affected by data set scale with query execution efficiency, and is supported multidimensional and boolean queries.Meanwhile, in cloud environment When node is converted, the maintenance work of node migration is only carried out in global index, index maintenance is relatively simple, disclosure satisfy that cloud The requirement of storage environment.
The present invention proposes a kind of index organization, management and the technology retrieved on the basis of hierarchy.This with In technology, it is made up of with local paging index two-layer global index on index structure.In distributed cloud storage system, master Node (host node) is generally used for the scheduling of data distribution and various operation tasks.Worker nodes (memory node) are used to count According to locally stored, retrieval and extraction.Correspond, global index sets up in master nodes, and it is mainly used in data Record is distributed in worker nodes the mapping relations for storing paging.Its by hash in the way of record data record in different attribute Value queue.It is internal in an attribute hash, internal hash, each attribute value are set up according to the interval division of its value Pointer maps queue is set up in hash.This queue is made up of one group of worker node pointer and corresponding enumerator.In data In incremental process, after completing locally stored in a data Ji Lu worker node, in global index, correspondence category is obtained Property value map pointer queue, by pointer counter corresponding with the worker in queue cumulative 1, realize the dimension of global index Shield;During data deletion, it is only necessary to obtain corresponding worker pointers in current data record attribute value hash and count Device, being subtracted 1 can complete the attended operation of global index.
Local paging index is set up in worker nodes, and it is used for record data record and pointer is stored in local IO Positional information.This positional information is preserved in an attribute value hash.Attribute hash with it is similar in master nodes, There are attribute hash and its internal value hash two-layer to constitute.The value hash storage inside IO pointer letter of paging index in local Breath.By this mode, it is possible to achieve the immediate addressing and extraction of data record.In data office incremental process, worker sections Point complete it is locally stored after, data record and IO pointers are submitted to into local index.Local paging is indexed according in data record Attribute value obtains corresponding value hash.IO pointer informations are recorded in value hash can complete local index establishment. During data deletion IO pointer informations in the value hash of data record to be deleted are deleted and can completed.Meanwhile, this is deleted Division operation returns to global index, and for it attended operation is completed.
In data retrieval process, first according to inquiry constraints, obtain in corresponding attribute hash from global index Value is hashed.Worker set of pointers is extracted from value hash.Different attribute hash worker set of pointers are according to looking into Ask request and worker task queues are formed after Boolean calculation.Then inquiry request is sent in the task queue Worker nodes.These worker nodes are received after inquiry request, and the IO extracted in value hash is indexed by local paging Pointer, forms IO set of pointers.Then, entered after Boolean calculation to be formed the IO set of pointers of inquiry selected works according to inquiry request. By these IO pointer extractings original data records and master nodes are returned to, the retrieval and extraction for completing data was operated Journey.The basic structure of this index is illustrated in Fig. 1.
Beneficial effects of the present invention are as follows:
The characteristics of organizing for cloud storage Environments amount data distribution, the present invention sets up a kind of based on hierarchy Index system.Its top layer is global index, and the index forms dimension with the different attribute of data, is each dimension with the row of hash The mapping relations set up between value and paging of mode.Set up in the partial indexes of bottom corresponding with upper strata dimension value Index, realizes the mapping with data in locally stored paging.Efficiently, balance is preferable with autgmentability, the rope for this index structure Draw is affected less feature with query execution efficiency by data set scale, and supports multidimensional and boolean queries.Meanwhile, in cloud ring When border node is converted, the maintenance work of node migration is only carried out in global index, index maintenance is relatively simple, disclosure satisfy that The requirement of cloud storage environment.This method has positive using value in distributed data base field.
Description of the drawings
Fig. 1 is hierarchical index structure chart.
Fig. 2 is global index's basic block diagram.
Fig. 3 is local paging index basic block diagram.
Fig. 4 is data write operation local paging index maintenance flow chart.
Fig. 5 is the execution flow chart that global index safeguards.
Fig. 6 is that data update index upgrade maintenance process figure in operation.
Fig. 7 is data retrieval execution flow chart.
Specific embodiment
Below by specific embodiments and the drawings, the present invention will be further described.
First, index structure
As shown in figure 1, index proposed by the invention is made up of in structure two parts:Global index and local Paging is indexed.In global index, the mapping relations between each data storage paging and data attribute value are have recorded;And counting According to paging index, the mapping relations between IO pointers and data attribute value in storage paging are have recorded.With reference to cloud storage ring The characteristics of border, the present invention realizes the tissue of various mapping relation informations in the way of hashing and safeguards.Below will be from global index Two aspects are indexed with partial page to be described index structure.
1st, global index
Global index is made up of memory node information and attribute tree, is expressed as form:
GIdx={ attSeti| i=1,2 ... ..n }
AttSet={ attTag, { (range, nodeSet)k| k=1,2 ... .p }
NodeSet={ (nodei,{pageMappingj| j=1,2 ... .o }) | i=1,2 ... m }
PageMapping={ (dataPage, count)j| j=1,2 ... .n }
Wherein:
Global index gldx is made up of the hash aggregation that an attSet is constituted,
AttSet is made up of two tuples, wherein:
AttTag is the label of the attribute.It is right that it constitutes attribute value hash with one group of interval queue.Dissipate in value Range is corresponding interval mark in row, and the mark is corresponding with specific codomain scope.During data increment, one The value of individual certain attribute of data record can be by reduction in specific interval.Using the interval as storage mapping relation The elementary cell of record.
NodeSet is memory node mapping table.Store in the paging of data Ji Lu worker nodes.Its storage mapping letter Breath includes affiliated node, affiliated paging information.In global index, by this storage mapping information record depositing in current value In storage node mapping table.It is by node identification nodeiWith paging count queue pageMappingjConstitute.
nodeiFor i-th worker node identification, paging count queue pageMapping is made up of two tuples:
DataPage represents the corresponding paging information of paging count queue;
Count is enumerator, and it is represented in currently stored paging, and attribute value hits the data record of current interval Quantity;
In cloud storage management system, original data record D is stored according to specified primary key attribute and span The distribution of node and data page is (such as according to timestamp or specific right between attribute interval and worker nodes Should).After initial data completes I O storage in specified worker nodes, according to predefined extraction community set, this property set Conjunction forms index data.Based on the index data, set up respectively in gIdx storage mapping information-attribute interval- Attribute tags three-level mapping relations, realize the establishment of data record global index.
The basic structure of global index is illustrated in Fig. 2.As illustrated, in order to meet quick positioning D places memory node With the position of data page, gIdx have recorded data attribute and its value and memory node sum by the way of hash index According to the mapping relations between paging.For example, if querying attributes are attTag1, property value belongs to range1Record, then from gIdx Middle can quickly acquisition in the be distributed worker nodes of its storage and worker according to key assignments corresponding relation stores paging The information of page.So as to realize the orientation of data retrieval operation.
2nd, local paging index
In worker nodes, data record D is stored in the storage paging specified and incrementally order write.Write After the completion of can obtain the data Ji Lu paging in storage row number information:rowId.Carrying out to storing the data in paging It is that rowId is obtained to attribute value match condition according to inquiry request during retrieval, and original is extracted from bottom IO by the rowId Beginning data record.Local paging index sets up corresponding relation in the way of arranging between attribute value and the storage line number.For Raising data search efficiency, records in the present invention each attribute value in storage record set in the way of hashing with reference to bitmap The positional information of rowId is hit in conjunction.
Paging index pIdx in local is defined as follows
PIdx={ attSetk| k=1,2 ... ..p }
AttSet={ attTag, { attValArrayj| j=1,2 ... .m }
AttValArray={ (range, bitmap)i| i=1,2 ... ..n }
Wherein:
PIdx is local paging index, and it is made up of one group of attSet.
AttSet is hashed for the value of attribute tags, and it defines atTag and interval queue by attribute tags AttValArray is constituted.
AttValArray is the bitmap index record of the specific interval of attribute, and it is by section definition range and right The bitmap bitmap for answering is constituted.Wherein:
Range is interval definition, and it is the definition of attribute codomain segmentation limit.
Bitmap is the data record rowId bitmap record for hitting the attribute interval.It is a binary stream. It is 1 by position corresponding with the rowId of hit record in the binary stream, other non-hit bits are set to 0.
Fig. 3 illustrates the basic structure of local paging index.As illustrated, paging index in local is made up of layering hash. Defined by attribute tags and constitute ground floor hash, have attribute value hash to constitute in each attribute tags definition.Each attribute takes A bitmap is included in value hash, 1 position is set in the bitmap and is represented that the data record of the correspondence line number in current paging is worked as Front attribute value hits the interval, is set to 0 and represents the miss area of the corresponding line number data record current attribute value Between.
2nd, the attended operation indexed in data writing process
In ablation process, cloud storage system needs complete successively according to the membership credentials between two layer indexs data record D Into the maintenance and renewal of index content.In this operation, system is first according to major key in distribution policy and data record D Value selects corresponding worker nodes, worker nodes to complete after local data records write, safeguards local paging index letter Breath.Then these information are returned into master nodes.Letter of the Master nodes according to data Fragmentation in worker nodes Breath safeguards the content in global index.Illustrate in Fig. 4 in data writing process, locally divide in data distribution and worker nodes Page index maintenance operation:
Its step is as follows:
1st, master nodes receive data record D write request;
2nd, the corresponding attribute value of major key is extracted from data record D;
3rd, directional profile information corresponding with the major key value is obtained from current global index, if the information is present Execution step 4, otherwise execution step 5;
4th, the worker nodes in the directional profile information are selected as distribution node;Execution step 6;
5th, according to distribution policy, current major key value is carried out into hash process, forms the mapping with worker nodes, selected The worker nodes are used as distribution node;
6th, the worker nodes that request is sent to aforementioned binding are write data into;
7th, Worker nodes receive the write request;
8th, in the storage paging that Worker nodes specify the data record D write, and the data Ji Lu storage is obtained Line number position rowId information in paging;
9th, current data record and storage information are submitted to into local paging index.System basis from data record D Predefined information extracts attribute value;
10th, i=1 is set;
11st, the value of ith attribute is extracted;
12nd, in opening current paging index, the value hash of attribute i obtains the corresponding interval of current attribute value, opens The interval corresponding value hash, obtains the bitmap in the hash;
13rd, there are no then execution step 14, otherwise execution step 15 in the corresponding bitmap of current attribute value;
14th, the corresponding interval hash of current attribute value is created under current attribute label right, creates new bitmap object And it is placed on the hash centering;
15th, position corresponding with line number rowId is obtained from current bitmap, the execution step 16 if acquisition failure, otherwise Execution step 17;
16th, by the byte length polishing of current bitmap to the corresponding length of rowId, supplement position and be set to 0 first, meanwhile, Obtain the rowId position;
17th, it is 1 by the position for obtaining, completes the updating maintenance of current attribute index;
18th, i=i+1 is made, the otherwise execution step 11 of execution step 19 if whole property index attended operations are completed;
19th, the storage information by data D in current worker nodes returns to master;
20th, Master receives the storage information of return;
21st, Master updates global index's content;
22nd, data record write is completed, terminates current operation.
Fig. 5 illustrates the execution flow process that global index safeguards in master nodes, and its step is as follows:
1st, master nodes receive the storage information of worker nodes return;
2nd, community set is obtained from data record D according to predefined;
3rd, i=1 is set;
4th, the value of ith attribute is obtained;
5th, the corresponding interval hash of current attribute value is obtained from the hash of global index, if there is then execution step 7, otherwise execution step 6;
6th, a new interval hash is added in current global index's attribute hash.The interval covers current The value of attribute;
7th, mapping table corresponding with worker nodes is obtained in current attribute value hash;If there is then execution step 9, otherwise execution step 8;
8th, node mapping table is created in current sublist;
9th, the corresponding paging count queue of node is obtained from current worker nodes mapping table, if there is then performing step Rapid 11, otherwise execution step 10;
10th, new paging count queue is created in present node mapping table, in the paging queue and currently stored information Paging correspondence;
11st, obtain in current paging queue, paging handle object corresponding with return information, if there is then execution step 13, otherwise execution step 12;
12nd, create new paging handle object and write in node mapping table;
13rd, the enumerator in existing object is added 1, completes the renewal of current attribute index;
14th, i=i+1 is made, execution step 15, otherwise execution step 4 if i overflows the current index definition set upper bound;
15th, global index's renewal, returned data write operation successful information are completed;
16th, terminate.
3rd, the attended operation indexed in data updating process
Data update operation and refer to that data deletion changes two basic operations with data.
In data updating process, whether major key value is to change in system analysis request first, the root if not changing According to task requests, the storage paging map information of corresponding data is extracted from global index.Then, data are updated into solicited message It is sent to worker nodes.Completed after IO changes, by local corresponding attribute value bitmap by its corresponding data record of acquisition Index carries out set operation.Old value bitmap sets to 0, and new bitmap value puts 1.Then fresh information is returned to into master nodes. Master nodes are received after this information, by the counting in the paging mapping queue for corresponding to attribute before changing in global index Device subtracts counting.Meanwhile, increase the enumerator in the attribute value paging mapping queue after updating.Update so as to complete whole data Operating process.
When updated value includes major key, the corresponding data record of the major key is deleted, and current data is reinserted.
This basic procedure is illustrated in Fig. 6, its step is as follows:
1st, master nodes receive data renewal request;
2nd, its corresponding worker node and paging are retrieved from global index according to the data record primary key attribute value Information;
3rd, the worker node foldbacks data obtained in step 2 update request;
4th, Worker nodes receive renewal request, according to local paging indexed search, obtain the data record corresponding Line number rowId information, extracts original data record D;
5th, its corresponding value is obtained according to the attribute value of original data record D and hashes middle position index of the picture;
6th, it is 0 by position corresponding with current rowId in above-mentioned bitmap index;
7th, it is D ' to change current data record D, and writes data IO;
8th, attribute value is extracted according to D ', obtains the community set that value occurs change;
9th, obtain successively in current paging with the corresponding bitmap index of hash of attribute value described in step 8, by bitmap rope Position corresponding with rowId is 1 in drawing;
10th, the bitmap completed in local paging index updates operation, returns operation information and gives master nodes;
11st, Master nodes are received after fresh information, and the corresponding node mapping of data record D is extracted from global index Table, subtracts 1 by corresponding enumerator;
12nd, the attribute value that change occurs is extracted according to D ', its corresponding node mapping table of hash is obtained, will wherein paging Enumerator in count queue adds 1;
13rd, the change operation of current data record is completed.
In data deletion operating process, the storage paging map information of corresponding data is extracted first from global index. Then, data deletion solicited message is sent to into worker nodes.The deletion of data record in local IO is completed by it.Complete to delete After removing, data line number rowId of the record is submitted to into local paging index, by it by local corresponding attribute value bitmap rope Introduce row set operation.Then fresh information is returned to into master nodes.Master nodes are received after this information, complete The enumerator in the paging mapping queue for corresponding to attribute before changing is subtracted into counting in office's index.So as to complete whole data deletion behaviour Make process.
4th, data retrieval and inquiry
Realize that Data Data retrieves this process and there are two steps complete on the basis of inquiry constraints is matched with index Into:
One-level is retrieved:Host node is received after inquiry, and storage paging composition inquiry that data are located is filtered out from global index Task-set, and distribute that query task to respective stored paging is present from node.
2-level search:Query task is distributed to and enters line retrieval from node accordingly by host node.From node according to data page Index, crossing filtering is carried out to query task, forms local queries result.The local queries that each storage paged data inquiry is obtained As a result host node is uploaded to, row set that host node with whole local queries results are entered collects to form final Query Result return.
Retrieval query can be expressed as following form:
Wherein attriIt is ith attribute, m represents the number of query child-operations;opiIt is operation Connector, value is AND, OR, NOT,Expression does not operate connector.
Host node after query task is received, need according to inquiry in each attribute tags and attribute span from The storage paging set for meeting respective condition is found out in gIdx.Then each storage paging is transported according to operation connector Calculate and obtain the corresponding storage paging set of query task.Finally query task is distributed to into corresponding each number in storage paging set According to node.
This process is as shown in fig. 7, comprises following steps:
1st, master nodes obtain inquiry request;
2nd, the constraints in global index in inquiry request, obtains different attribute value hash, according to inquiry Condition, by the node mapping table in hash Boolean calculation is carried out;
3rd, after step 2 process, Query Result rough set pointer is obtained;
4th, corresponding worker nodes send inquiry request in rough set acquired in step 3;
5th, node obtains inquiry request;
6th, i=1 is set;
7th, i-th paging information from inquiry request;
8th, corresponding local paging index in opening steps 7;
9th, the bitmap index in corresponding attribute value hash is obtained according to inquiry request;
10th, the bitmap index that whole attribute values are hashed is carried out into length vs, obtains the most short bitmap index of length;
11st, the bitmap index with step 10 acquisition is respectively carried out it with bitmap index of other values hash as object Boolean calculation;
12nd, the result of calculation of obtaining step 11, forms Query Result selected works bitmap index;
13rd, the number of the position correspondence row that value is 1 is obtained from storage paging IO according to the selected works bitmap index in step 12 According to record;
14th, above-mentioned data record is filtered according to querying condition, selection meets the data record of condition, is formed paging Result set;
15th, by current paging result set cache;
16th, i=i+1 is made, execution step 17, otherwise execution step 7 if completing whole paging queries and processing;
17th, worker local whole paging result sets are merged and returns to master nodes;
18th, Master nodes converge the result set that whole worker nodes are returned, and return to query task;
19th, tasks carrying is finished.
By above-mentioned hierarchical index technology, the present invention for each dimension set up in the way of the row of hash value and paging it Between mapping relations.Realize storage mapping information-attribute interval-attribute tags three-level mapping relations.This index structure Efficiently, with autgmentability preferably, there is the index balance query execution efficiency to be affected less feature by data set scale, and prop up Hold multidimensional and boolean queries.Relatively conventional index technology (B+-tree) method proposed by the invention is in typical cloud storage ring Inquiry response speed is improved 31% or so in border, and write is improved 23% or so with efficiency in renewal process.Meanwhile, in cloud ring When border node is converted, the maintenance work of node migration is only carried out in global index, index maintenance is relatively simple, disclosure satisfy that The requirement of cloud storage environment.
Above example only to illustrate technical scheme rather than be limited, the ordinary skill of this area Personnel can modify or equivalent to technical scheme, without departing from the spirit and scope of the present invention, this The protection domain of invention should be to be defined described in claims.

Claims (10)

1. a kind of hierarchical index method towards cloud storage, its step includes:
The first step:Global index is set up in master nodes, in worker nodes local paging index is set up;In global index Record the mapping relations between each data storage paging and data attribute value;The record data storage point in data page index Mapping relations in page between IO pointers and data attribute value;
Second step:Data record D is indexed the maintenance of content in ablation process according to the membership credentials between two layer indexs With renewal:First corresponding worker nodes, worker sections are selected according to the value of major key in distribution policy and data record D Point is completed after local data records write, is safeguarded local paging index information and is returned to master nodes;Master is saved Content of the point according to data in worker nodes in the maintenance of information global index of Fragmentation.
2. the method for claim 1, it is characterised in that the global index hashes-section by attribute hash-attribute value Point three levels of mapping table are constituted;The local paging index hashes-falls ranking index of the picture three by attribute hash-attribute value Level is constituted.
3. the method for claim 1, it is characterised in that the rope of two levels is indexed by global index and local paging Attract the mapping relation information of storage attribute value and storage location;According to attribute value in global index and partial indexes Scope sets up interval, and by the attribute value correspondence of data record in some specific interval, the interval includes the value;Together When, interval and storage mapping information are set up into hash to relation.
4. the method for claim 1, it is characterised in that the global index builds attribute interval with storage paging Vertical hash is to relation, and a particular community interval is corresponding with a node mapping table;Node mapping table by a group node and Corresponding paging count queue is constituted;By global index, storage worker nodes and the paging position of data are quickly navigated to.
5. the method for claim 1, it is characterised in that the local paging index remembers attribute interval and data Line number rowId of the record in Fragmentation sets up corresponding relation;Simultaneously rowId is converted into bitmap in attribute interval Mode is recorded, to improve recall precision;The bitmap is the throttling of binary word, wherein i-th when being 1, is represented current RowId hits current interval for the attribute value of the data record of i rows in storage paging.
6. the method for claim 1, it is characterised in that the ablation process of data record D is comprised the following steps:
The first step:In data record D ablation process, master nodes receive request, and data record D major key pair is extracted first The attribute value answered, is that current data record D selects a corresponding worker node according to distribution policy, and by write request It is sent to the worker nodes;
Second step:Worker nodes are received after the write request, the data record is write in locally stored paging and is obtained Its line number rowId, then opens local paging index;Local paging index extracts the attribute in D, and its value is taken with corresponding Value interval range contrasts, select the interval of hit;Then open bitmap bitmap therein, by bitmap with rowId Corresponding position is 1;After completing whole property indexs process, storage paging is returned to into master sections with worker nodal informations Point;
3rd step:Master nodes are received after the storage information of return, extract the attribute in D, and open global index, successively Attribute value is contrasted with corresponding interval range, the interval of hit is selected, node mapping table therein is opened; Paging count queue corresponding with aforementioned worker is selected in node mapping table;Extract from paging count queue and return letter The corresponding paging count target of paging is stored in breath;Its enumerator is added 1;After completing whole property indexs process, data are realized Index upgrade in ablation process, and return operating result.
7. the method for claim 1, it is characterised in that in data record D renewal process the step of index maintenance such as Under:
The first step:Master nodes receive data and update operation requests, and according to major key value in D its storage worker sections are bound Point, and send the requests to the worker nodes;
Second step:Worker nodes receive request, according to the contents extraction initial data in data record local paging index Record D, according to request content D ' is changed it to;After change is finished, the corresponding attribute values of D first in local paging index Bitmap in interval is modified;It is 0 by position corresponding with D storages line number in the bitmap;Then, by the corresponding value areas of D ' Between bitmap modify, will with the corresponding position of D ' storage line numbers be 1 in the bitmap;After completing aforesaid operations, by depositing for D and D ' Storage paging information, returns to master nodes;
3rd step:Master nodes are received after the storage paging information of return, by the corresponding values of D first in global index Node mapping table in interval is opened.Enumerator corresponding with the affiliated pagings of D is extracted from wherein paging count queue, is subtracted 1;Then the node mapping table in the corresponding intervals of D ' is opened, is extracted from wherein paging count queue and affiliated point of D ' The corresponding enumerator of page, plus 1 by it;After completing aforesaid operations, operation information is returned.
8. method as claimed in claim 7, it is characterised in that if the major key value of data is changed, will update behaviour It is encapsulated as deleting original data record, adds new two parts of data record, it is step-wise execution to complete.
9. method as claimed in claim 8, it is characterised in that during data record is deleted, first according to major key value Worker nodes are sent the requests to, worker nodes delete the record in local paging, and paging index in local extracts the number It is 0 by the corresponding position of the line number of the record according to the bitmap recorded in corresponding interval;Then, master sections are returned to Point;The paging count target in node mapping table in the corresponding interval of the master Node extractions data record, will divide Enumerator in page count target subtracts 1, works as enumerator<0 is set to 0.
10. it is a kind of based on described in claim 1 towards cloud storage hierarchical index method data retrieval method, its step is such as Under:
The first step:Master nodes receive retrieval request, and it opens global index, and the value constraint in retrieval request is beaten Corresponding attribute value hash interior joint mapping table is opened, and extracts paging count queue;
Second step:After obtaining paging and the nodal information in whole paging count queues, master nodes enter according to retrieval request Row Boolean calculation, obtains final worker nodes and paging information set;
3rd step:Retrieval request and paging information are sent to successively master nodes the worker nodes described in second step;
4th step:Worker nodes are received after retrieval request, open the corresponding local paging index of paging, obtain inquiry request Value constrains corresponding interval bitmap index;
5th step:Worker nodes carry out Boolean calculation according to inquiry request content to the bitmap index in the 4th step, finally obtain Obtain in current paging and meet the selected works message bit pattern of requirement, and corresponding data record is extracted from paging;
6th step:Complete to filter selected works after whole paged data records are extracted, form result set, return to master nodes;
7th step:Master nodes collect total data record, return to inquiry request.
CN201610975816.5A 2016-11-07 2016-11-07 Layered indexing method and search method for cloud storage Pending CN106599040A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610975816.5A CN106599040A (en) 2016-11-07 2016-11-07 Layered indexing method and search method for cloud storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610975816.5A CN106599040A (en) 2016-11-07 2016-11-07 Layered indexing method and search method for cloud storage

Publications (1)

Publication Number Publication Date
CN106599040A true CN106599040A (en) 2017-04-26

Family

ID=58589854

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610975816.5A Pending CN106599040A (en) 2016-11-07 2016-11-07 Layered indexing method and search method for cloud storage

Country Status (1)

Country Link
CN (1) CN106599040A (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341221A (en) * 2017-06-28 2017-11-10 百度在线网络技术(北京)有限公司 Foundation, associative search method, apparatus, equipment and the storage medium of index structure
CN107436736A (en) * 2017-08-08 2017-12-05 郑州云海信息技术有限公司 The storage method and device of file in a kind of HDFS
CN107451243A (en) * 2017-07-27 2017-12-08 迪尚集团有限公司 Complex query method based on attribute
CN108229358A (en) * 2017-12-22 2018-06-29 北京市商汤科技开发有限公司 Index establishing method and device, electronic equipment, computer storage media, program
CN108427748A (en) * 2018-03-12 2018-08-21 北京奇艺世纪科技有限公司 Distributed data base secondary index querying method, device and server
CN109471840A (en) * 2018-10-15 2019-03-15 北京海数宝科技有限公司 Fileview method, apparatus, computer equipment and storage medium
CN109783513A (en) * 2018-12-20 2019-05-21 北京大米科技有限公司 Data processing method, device, server and computer readable storage medium
CN109978081A (en) * 2019-05-05 2019-07-05 星环信息科技(上海)有限公司 Determination method, apparatus, equipment and the medium of eigentransformation mode
CN112948386A (en) * 2021-03-04 2021-06-11 电信科学技术第五研究所有限公司 Simple indexing and encryption tray-dropping mechanism for ETL abnormal data
CN114791913A (en) * 2022-04-26 2022-07-26 北京人大金仓信息技术股份有限公司 Method, storage medium and device for processing shared memory buffer pool of database

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101370025A (en) * 2007-08-17 2009-02-18 北京灵图软件技术有限公司 Storing method, scheduling method and management system for geographic information data

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101370025A (en) * 2007-08-17 2009-02-18 北京灵图软件技术有限公司 Storing method, scheduling method and management system for geographic information data

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
孟必平: ""分片位图索引:一种适用于云数据管理的辅助索引机制"", 《计算机学报》 *
朱春莹: ""面向大数据查询的索引技术研究"", 《万方》 *
王振: ""面向海量数据的位图索引技术及应用研究"", 《万方》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107341221B (en) * 2017-06-28 2020-08-11 百度在线网络技术(北京)有限公司 Index structure establishing and associated retrieving method, device, equipment and storage medium
CN107341221A (en) * 2017-06-28 2017-11-10 百度在线网络技术(北京)有限公司 Foundation, associative search method, apparatus, equipment and the storage medium of index structure
CN107451243A (en) * 2017-07-27 2017-12-08 迪尚集团有限公司 Complex query method based on attribute
CN107451243B (en) * 2017-07-27 2024-04-12 迪尚集团有限公司 Complex query method based on attribute
CN107436736A (en) * 2017-08-08 2017-12-05 郑州云海信息技术有限公司 The storage method and device of file in a kind of HDFS
CN108229358A (en) * 2017-12-22 2018-06-29 北京市商汤科技开发有限公司 Index establishing method and device, electronic equipment, computer storage media, program
CN108427748A (en) * 2018-03-12 2018-08-21 北京奇艺世纪科技有限公司 Distributed data base secondary index querying method, device and server
CN109471840A (en) * 2018-10-15 2019-03-15 北京海数宝科技有限公司 Fileview method, apparatus, computer equipment and storage medium
CN109783513A (en) * 2018-12-20 2019-05-21 北京大米科技有限公司 Data processing method, device, server and computer readable storage medium
CN109783513B (en) * 2018-12-20 2021-03-16 北京大米科技有限公司 Data processing method, device, server and computer readable storage medium
CN109978081A (en) * 2019-05-05 2019-07-05 星环信息科技(上海)有限公司 Determination method, apparatus, equipment and the medium of eigentransformation mode
CN109978081B (en) * 2019-05-05 2019-12-24 星环信息科技(上海)有限公司 Method, apparatus, device and medium for determining feature transformation mode
CN112948386A (en) * 2021-03-04 2021-06-11 电信科学技术第五研究所有限公司 Simple indexing and encryption tray-dropping mechanism for ETL abnormal data
CN112948386B (en) * 2021-03-04 2023-09-22 电信科学技术第五研究所有限公司 Simple indexing and encrypting disk-dropping mechanism for ETL abnormal data
CN114791913A (en) * 2022-04-26 2022-07-26 北京人大金仓信息技术股份有限公司 Method, storage medium and device for processing shared memory buffer pool of database

Similar Documents

Publication Publication Date Title
CN106599040A (en) Layered indexing method and search method for cloud storage
US11238098B2 (en) Heterogenous key-value sets in tree database
CN107122443B (en) A kind of distributed full-text search system and method based on Spark SQL
CN110291518A (en) Merge tree garbage index
CN103177055B (en) It is stored as row storage and row stores the hybrid database table of the two
CN107423422B (en) Spatial data distributed storage and search method and system based on grid
CN104765876B (en) Magnanimity GNSS small documents cloud storage methods
CN110268399A (en) Merging tree for attended operation is modified
CN110383261A (en) Stream for multithread storage device selects
CN110268394A (en) KVS tree
CN108874971A (en) A kind of tool and method applied to the storage of magnanimity labeling solid data
CN111427847B (en) Indexing and querying method and system for user-defined metadata
CN102982103A (en) On-line analytical processing (OLAP) massive multidimensional data dimension storage method
CN106716409A (en) Method and system for adaptively building and updating column store database from row store database based on query demands
CN108536692A (en) A kind of generation method of executive plan, device and database server
CN105912666A (en) Method for high-performance storage and inquiry of hybrid structure data aiming at cloud platform
CN108009265B (en) Spatial data indexing method in cloud computing environment
CN108021702A (en) Classification storage method, device, OLAP database system and medium based on LSM-tree
CN104346444B (en) A kind of the best site selection method based on the anti-spatial key inquiry of road network
CN1845093A (en) Attribute extensible object file system
CN104408128B (en) A kind of reading optimization method indexed based on B+ trees asynchronous refresh
CN111126461A (en) Intelligent auditing method based on machine learning model explanation
CN107273443B (en) Mixed indexing method based on metadata of big data model
JP6006740B2 (en) Index management device
CN109213760B (en) High-load service storage and retrieval method for non-relational data storage

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170426

RJ01 Rejection of invention patent application after publication