CN106599040A - Layered indexing method and search method for cloud storage - Google Patents
Layered indexing method and search method for cloud storage Download PDFInfo
- Publication number
- CN106599040A CN106599040A CN201610975816.5A CN201610975816A CN106599040A CN 106599040 A CN106599040 A CN 106599040A CN 201610975816 A CN201610975816 A CN 201610975816A CN 106599040 A CN106599040 A CN 106599040A
- Authority
- CN
- China
- Prior art keywords
- index
- paging
- data
- interval
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/214—Database migration support
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2264—Multidimensional index structures
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a layered indexing method and a search method for cloud storage. Aiming at the characteristic that in a cloud storage environment mass data is organized and managed in a distributed manner, a layered indexing structure is built by using the method provided by the invention, wherein a top layer is an overall index which forms dimensions according to different attributes of data and builds a mapping relation between values and pages in a Hash inversion manner for each dimension; an index corresponding to the value of the upper-layer dimension is built in a local index at a bottom layer, and thus mapping with the data in the local storage page is achieved. The indexing structure is good in balance and expansibility, has the characteristic that the scale of a data set has less influence on query execution efficiency, and supports multidimensional and Boolean queries; meanwhile, when a node of the cloud environment changes, node migration maintenance is only performed at the overall index, so that index maintenance is simple, and requirements of the cloud storage environment can be met. The methods provided by the invention have positive application values in the field of distributed databases.
Description
Technical field
The invention belongs to distributed cloud storage system index technology research and application, and in particular to a kind of facing cloud is deposited
The hierarchical index method of storage and search method.
Background technology
In recent years, with the fast development of the information technologys such as cloud computing, Internet of Things, the Internet, the form of information system is sent out
Raw large change, cloudization services and the development trend for being increasingly becoming information system is built with full ecosphere.This trend causes high in the clouds
Data volume reaches the scale of TB, PB level in explosive growth, and its growth rate is considerably beyond traditional Moore's Law.With in
As a example by institute of section sea cloud platform, it is a typical collaboration services platform based on cloud computing, and by extra large end adopting for data is realized
Collection and pre-treatment, are realized the unified storage of data, are inquired about, analyze and extracted with knowledge by high in the clouds.Meanwhile, high in the clouds is by automatically negative
Carry balanced realization platform adaptive.In with smart city as typical application scenarios, this platform high in the clouds day data increment size
Reach more than 10G.Meanwhile, this platform will not only meet the requirement of data simple retrieval, also to support various excavations, analysis with
The complex operations such as many-valued inquiry and boolean queries in knowledge extraction activity.With the rapid growth of data scale, how in magnanimity
On the basis of data, towards the dynamic environment of cloud computing, there is provided high-performance complex query is supported becomes an important problem.
Inquiry is a kind of complicated data manipulation, and after data set reaches certain scale, the efficiency of data query just becomes
The Main Bottleneck of information system performance.Index is the important means for realizing improving data retrieval and search efficiency.Index technology
Method for organizing has the i.e. positive index of two important directions and inverted index.In traditional relational database, inverted index application
It is relatively broad.Meanwhile, according to the data structure of index, Ordered indices and hash index fundamental type are formed again.B+-Tree indexes
It is typical Ordered indices, it is adopted based on balanced tree, organizes one-dimensional data structure to realize the retrieval of data, due to its structure
Efficiently the advantages of, it is extensively applied in traditional relational database.However, in the applied environment of above-mentioned extra large cloud platform, number
Larger according to collecting, the structure expansion that this results in B+-Tree is more serious, remains a need for boolean queries simultaneously for many-valued
Complicated crossing filtering is processed, therefore overall search efficiency is affected larger by data set scale.
Since 2000, complex query difficult problem causes the note of related researcher caused by data scale expansion
Meaning.It is generation that a collection of research institutions such as Google in 2004 release key assignments (key-value) data base based on cloud computing, successively
The big data storage of table, inquiry solution.Highly Scalable is had based on the cloud data store query technology of key-value
Property, high availability and the features such as fault-tolerance, the purpose of the efficient storage to mass data and inquiry can be realized.key-value
Data base, based on hash index, sets up mapping relations, for rowkey by the Hash of rowkey and data object value
The problem that index cannot sort, research worker is again by it in combination with the technologies such as B+Tree indexes.Key-value data bases exist
Inquiry velocity on rowkey is very fast, but can only be realized by the way of full table scan in non-rowkey, although pass through
The parallel architectures such as MapReduce can to a certain extent improve inquiry velocity, generally speaking, for many-valued inquiry and boolean queries
Efficiency it is still relatively low, when data set is larger, inquiry velocity can not meet application demand.
On the other hand, distributed is main flow framework that current large-scale dataset storage management is adopted.It is flat with aforementioned extra large cloud
As a example by platform, its high in the clouds adopt typical parallel data storage management framework, by upper strata master nodes realize scheduling of resource with
Data distribution.Realize that data are locally stored by bottom worker nodes, in order to reduce local data sets I/O load, adopt inside it
Data are read and write with the mode of paging (acquiescence 512k).Meanwhile, in cloud computing environment, in order to meet the requirement of load balancing, number
Be migrated among the nodes according to certain scheduling strategy according to meeting.This requires to index and is supporting the same of high-performance complex query
When, with preferably migration adaptability.The problems referred to above in cloud computing environment cause index creation, safeguard tired with inquiry operation
It is difficult.
The content of the invention
Around the problems referred to above, the present invention conducts a research work with regard to mass data index technology in cloud environment.Form two-layer rope
Draw framework.Its top layer is global index, and the index forms dimension with the different attribute of data, is each dimension with the row of hash
The mapping relations that mode is set up between value and paging.Rope corresponding with upper strata dimension value is set up in the partial indexes of bottom
Draw, realize the mapping with data in locally stored paging.Efficiently, balance is preferable with autgmentability, the index for this index structure
Less feature is affected by data set scale with query execution efficiency, and is supported multidimensional and boolean queries.Meanwhile, in cloud environment
When node is converted, the maintenance work of node migration is only carried out in global index, index maintenance is relatively simple, disclosure satisfy that cloud
The requirement of storage environment.
The present invention proposes a kind of index organization, management and the technology retrieved on the basis of hierarchy.This with
In technology, it is made up of with local paging index two-layer global index on index structure.In distributed cloud storage system, master
Node (host node) is generally used for the scheduling of data distribution and various operation tasks.Worker nodes (memory node) are used to count
According to locally stored, retrieval and extraction.Correspond, global index sets up in master nodes, and it is mainly used in data
Record is distributed in worker nodes the mapping relations for storing paging.Its by hash in the way of record data record in different attribute
Value queue.It is internal in an attribute hash, internal hash, each attribute value are set up according to the interval division of its value
Pointer maps queue is set up in hash.This queue is made up of one group of worker node pointer and corresponding enumerator.In data
In incremental process, after completing locally stored in a data Ji Lu worker node, in global index, correspondence category is obtained
Property value map pointer queue, by pointer counter corresponding with the worker in queue cumulative 1, realize the dimension of global index
Shield;During data deletion, it is only necessary to obtain corresponding worker pointers in current data record attribute value hash and count
Device, being subtracted 1 can complete the attended operation of global index.
Local paging index is set up in worker nodes, and it is used for record data record and pointer is stored in local IO
Positional information.This positional information is preserved in an attribute value hash.Attribute hash with it is similar in master nodes,
There are attribute hash and its internal value hash two-layer to constitute.The value hash storage inside IO pointer letter of paging index in local
Breath.By this mode, it is possible to achieve the immediate addressing and extraction of data record.In data office incremental process, worker sections
Point complete it is locally stored after, data record and IO pointers are submitted to into local index.Local paging is indexed according in data record
Attribute value obtains corresponding value hash.IO pointer informations are recorded in value hash can complete local index establishment.
During data deletion IO pointer informations in the value hash of data record to be deleted are deleted and can completed.Meanwhile, this is deleted
Division operation returns to global index, and for it attended operation is completed.
In data retrieval process, first according to inquiry constraints, obtain in corresponding attribute hash from global index
Value is hashed.Worker set of pointers is extracted from value hash.Different attribute hash worker set of pointers are according to looking into
Ask request and worker task queues are formed after Boolean calculation.Then inquiry request is sent in the task queue
Worker nodes.These worker nodes are received after inquiry request, and the IO extracted in value hash is indexed by local paging
Pointer, forms IO set of pointers.Then, entered after Boolean calculation to be formed the IO set of pointers of inquiry selected works according to inquiry request.
By these IO pointer extractings original data records and master nodes are returned to, the retrieval and extraction for completing data was operated
Journey.The basic structure of this index is illustrated in Fig. 1.
Beneficial effects of the present invention are as follows:
The characteristics of organizing for cloud storage Environments amount data distribution, the present invention sets up a kind of based on hierarchy
Index system.Its top layer is global index, and the index forms dimension with the different attribute of data, is each dimension with the row of hash
The mapping relations set up between value and paging of mode.Set up in the partial indexes of bottom corresponding with upper strata dimension value
Index, realizes the mapping with data in locally stored paging.Efficiently, balance is preferable with autgmentability, the rope for this index structure
Draw is affected less feature with query execution efficiency by data set scale, and supports multidimensional and boolean queries.Meanwhile, in cloud ring
When border node is converted, the maintenance work of node migration is only carried out in global index, index maintenance is relatively simple, disclosure satisfy that
The requirement of cloud storage environment.This method has positive using value in distributed data base field.
Description of the drawings
Fig. 1 is hierarchical index structure chart.
Fig. 2 is global index's basic block diagram.
Fig. 3 is local paging index basic block diagram.
Fig. 4 is data write operation local paging index maintenance flow chart.
Fig. 5 is the execution flow chart that global index safeguards.
Fig. 6 is that data update index upgrade maintenance process figure in operation.
Fig. 7 is data retrieval execution flow chart.
Specific embodiment
Below by specific embodiments and the drawings, the present invention will be further described.
First, index structure
As shown in figure 1, index proposed by the invention is made up of in structure two parts:Global index and local
Paging is indexed.In global index, the mapping relations between each data storage paging and data attribute value are have recorded;And counting
According to paging index, the mapping relations between IO pointers and data attribute value in storage paging are have recorded.With reference to cloud storage ring
The characteristics of border, the present invention realizes the tissue of various mapping relation informations in the way of hashing and safeguards.Below will be from global index
Two aspects are indexed with partial page to be described index structure.
1st, global index
Global index is made up of memory node information and attribute tree, is expressed as form:
GIdx={ attSeti| i=1,2 ... ..n }
AttSet={ attTag, { (range, nodeSet)k| k=1,2 ... .p }
NodeSet={ (nodei,{pageMappingj| j=1,2 ... .o }) | i=1,2 ... m }
PageMapping={ (dataPage, count)j| j=1,2 ... .n }
Wherein:
Global index gldx is made up of the hash aggregation that an attSet is constituted,
AttSet is made up of two tuples, wherein:
AttTag is the label of the attribute.It is right that it constitutes attribute value hash with one group of interval queue.Dissipate in value
Range is corresponding interval mark in row, and the mark is corresponding with specific codomain scope.During data increment, one
The value of individual certain attribute of data record can be by reduction in specific interval.Using the interval as storage mapping relation
The elementary cell of record.
NodeSet is memory node mapping table.Store in the paging of data Ji Lu worker nodes.Its storage mapping letter
Breath includes affiliated node, affiliated paging information.In global index, by this storage mapping information record depositing in current value
In storage node mapping table.It is by node identification nodeiWith paging count queue pageMappingjConstitute.
nodeiFor i-th worker node identification, paging count queue pageMapping is made up of two tuples:
DataPage represents the corresponding paging information of paging count queue;
Count is enumerator, and it is represented in currently stored paging, and attribute value hits the data record of current interval
Quantity;
In cloud storage management system, original data record D is stored according to specified primary key attribute and span
The distribution of node and data page is (such as according to timestamp or specific right between attribute interval and worker nodes
Should).After initial data completes I O storage in specified worker nodes, according to predefined extraction community set, this property set
Conjunction forms index data.Based on the index data, set up respectively in gIdx storage mapping information-attribute interval-
Attribute tags three-level mapping relations, realize the establishment of data record global index.
The basic structure of global index is illustrated in Fig. 2.As illustrated, in order to meet quick positioning D places memory node
With the position of data page, gIdx have recorded data attribute and its value and memory node sum by the way of hash index
According to the mapping relations between paging.For example, if querying attributes are attTag1, property value belongs to range1Record, then from gIdx
Middle can quickly acquisition in the be distributed worker nodes of its storage and worker according to key assignments corresponding relation stores paging
The information of page.So as to realize the orientation of data retrieval operation.
2nd, local paging index
In worker nodes, data record D is stored in the storage paging specified and incrementally order write.Write
After the completion of can obtain the data Ji Lu paging in storage row number information:rowId.Carrying out to storing the data in paging
It is that rowId is obtained to attribute value match condition according to inquiry request during retrieval, and original is extracted from bottom IO by the rowId
Beginning data record.Local paging index sets up corresponding relation in the way of arranging between attribute value and the storage line number.For
Raising data search efficiency, records in the present invention each attribute value in storage record set in the way of hashing with reference to bitmap
The positional information of rowId is hit in conjunction.
Paging index pIdx in local is defined as follows
PIdx={ attSetk| k=1,2 ... ..p }
AttSet={ attTag, { attValArrayj| j=1,2 ... .m }
AttValArray={ (range, bitmap)i| i=1,2 ... ..n }
Wherein:
PIdx is local paging index, and it is made up of one group of attSet.
AttSet is hashed for the value of attribute tags, and it defines atTag and interval queue by attribute tags
AttValArray is constituted.
AttValArray is the bitmap index record of the specific interval of attribute, and it is by section definition range and right
The bitmap bitmap for answering is constituted.Wherein:
Range is interval definition, and it is the definition of attribute codomain segmentation limit.
Bitmap is the data record rowId bitmap record for hitting the attribute interval.It is a binary stream.
It is 1 by position corresponding with the rowId of hit record in the binary stream, other non-hit bits are set to 0.
Fig. 3 illustrates the basic structure of local paging index.As illustrated, paging index in local is made up of layering hash.
Defined by attribute tags and constitute ground floor hash, have attribute value hash to constitute in each attribute tags definition.Each attribute takes
A bitmap is included in value hash, 1 position is set in the bitmap and is represented that the data record of the correspondence line number in current paging is worked as
Front attribute value hits the interval, is set to 0 and represents the miss area of the corresponding line number data record current attribute value
Between.
2nd, the attended operation indexed in data writing process
In ablation process, cloud storage system needs complete successively according to the membership credentials between two layer indexs data record D
Into the maintenance and renewal of index content.In this operation, system is first according to major key in distribution policy and data record D
Value selects corresponding worker nodes, worker nodes to complete after local data records write, safeguards local paging index letter
Breath.Then these information are returned into master nodes.Letter of the Master nodes according to data Fragmentation in worker nodes
Breath safeguards the content in global index.Illustrate in Fig. 4 in data writing process, locally divide in data distribution and worker nodes
Page index maintenance operation:
Its step is as follows:
1st, master nodes receive data record D write request;
2nd, the corresponding attribute value of major key is extracted from data record D;
3rd, directional profile information corresponding with the major key value is obtained from current global index, if the information is present
Execution step 4, otherwise execution step 5;
4th, the worker nodes in the directional profile information are selected as distribution node;Execution step 6;
5th, according to distribution policy, current major key value is carried out into hash process, forms the mapping with worker nodes, selected
The worker nodes are used as distribution node;
6th, the worker nodes that request is sent to aforementioned binding are write data into;
7th, Worker nodes receive the write request;
8th, in the storage paging that Worker nodes specify the data record D write, and the data Ji Lu storage is obtained
Line number position rowId information in paging;
9th, current data record and storage information are submitted to into local paging index.System basis from data record D
Predefined information extracts attribute value;
10th, i=1 is set;
11st, the value of ith attribute is extracted;
12nd, in opening current paging index, the value hash of attribute i obtains the corresponding interval of current attribute value, opens
The interval corresponding value hash, obtains the bitmap in the hash;
13rd, there are no then execution step 14, otherwise execution step 15 in the corresponding bitmap of current attribute value;
14th, the corresponding interval hash of current attribute value is created under current attribute label right, creates new bitmap object
And it is placed on the hash centering;
15th, position corresponding with line number rowId is obtained from current bitmap, the execution step 16 if acquisition failure, otherwise
Execution step 17;
16th, by the byte length polishing of current bitmap to the corresponding length of rowId, supplement position and be set to 0 first, meanwhile,
Obtain the rowId position;
17th, it is 1 by the position for obtaining, completes the updating maintenance of current attribute index;
18th, i=i+1 is made, the otherwise execution step 11 of execution step 19 if whole property index attended operations are completed;
19th, the storage information by data D in current worker nodes returns to master;
20th, Master receives the storage information of return;
21st, Master updates global index's content;
22nd, data record write is completed, terminates current operation.
Fig. 5 illustrates the execution flow process that global index safeguards in master nodes, and its step is as follows:
1st, master nodes receive the storage information of worker nodes return;
2nd, community set is obtained from data record D according to predefined;
3rd, i=1 is set;
4th, the value of ith attribute is obtained;
5th, the corresponding interval hash of current attribute value is obtained from the hash of global index, if there is then execution step
7, otherwise execution step 6;
6th, a new interval hash is added in current global index's attribute hash.The interval covers current
The value of attribute;
7th, mapping table corresponding with worker nodes is obtained in current attribute value hash;If there is then execution step
9, otherwise execution step 8;
8th, node mapping table is created in current sublist;
9th, the corresponding paging count queue of node is obtained from current worker nodes mapping table, if there is then performing step
Rapid 11, otherwise execution step 10;
10th, new paging count queue is created in present node mapping table, in the paging queue and currently stored information
Paging correspondence;
11st, obtain in current paging queue, paging handle object corresponding with return information, if there is then execution step
13, otherwise execution step 12;
12nd, create new paging handle object and write in node mapping table;
13rd, the enumerator in existing object is added 1, completes the renewal of current attribute index;
14th, i=i+1 is made, execution step 15, otherwise execution step 4 if i overflows the current index definition set upper bound;
15th, global index's renewal, returned data write operation successful information are completed;
16th, terminate.
3rd, the attended operation indexed in data updating process
Data update operation and refer to that data deletion changes two basic operations with data.
In data updating process, whether major key value is to change in system analysis request first, the root if not changing
According to task requests, the storage paging map information of corresponding data is extracted from global index.Then, data are updated into solicited message
It is sent to worker nodes.Completed after IO changes, by local corresponding attribute value bitmap by its corresponding data record of acquisition
Index carries out set operation.Old value bitmap sets to 0, and new bitmap value puts 1.Then fresh information is returned to into master nodes.
Master nodes are received after this information, by the counting in the paging mapping queue for corresponding to attribute before changing in global index
Device subtracts counting.Meanwhile, increase the enumerator in the attribute value paging mapping queue after updating.Update so as to complete whole data
Operating process.
When updated value includes major key, the corresponding data record of the major key is deleted, and current data is reinserted.
This basic procedure is illustrated in Fig. 6, its step is as follows:
1st, master nodes receive data renewal request;
2nd, its corresponding worker node and paging are retrieved from global index according to the data record primary key attribute value
Information;
3rd, the worker node foldbacks data obtained in step 2 update request;
4th, Worker nodes receive renewal request, according to local paging indexed search, obtain the data record corresponding
Line number rowId information, extracts original data record D;
5th, its corresponding value is obtained according to the attribute value of original data record D and hashes middle position index of the picture;
6th, it is 0 by position corresponding with current rowId in above-mentioned bitmap index;
7th, it is D ' to change current data record D, and writes data IO;
8th, attribute value is extracted according to D ', obtains the community set that value occurs change;
9th, obtain successively in current paging with the corresponding bitmap index of hash of attribute value described in step 8, by bitmap rope
Position corresponding with rowId is 1 in drawing;
10th, the bitmap completed in local paging index updates operation, returns operation information and gives master nodes;
11st, Master nodes are received after fresh information, and the corresponding node mapping of data record D is extracted from global index
Table, subtracts 1 by corresponding enumerator;
12nd, the attribute value that change occurs is extracted according to D ', its corresponding node mapping table of hash is obtained, will wherein paging
Enumerator in count queue adds 1;
13rd, the change operation of current data record is completed.
In data deletion operating process, the storage paging map information of corresponding data is extracted first from global index.
Then, data deletion solicited message is sent to into worker nodes.The deletion of data record in local IO is completed by it.Complete to delete
After removing, data line number rowId of the record is submitted to into local paging index, by it by local corresponding attribute value bitmap rope
Introduce row set operation.Then fresh information is returned to into master nodes.Master nodes are received after this information, complete
The enumerator in the paging mapping queue for corresponding to attribute before changing is subtracted into counting in office's index.So as to complete whole data deletion behaviour
Make process.
4th, data retrieval and inquiry
Realize that Data Data retrieves this process and there are two steps complete on the basis of inquiry constraints is matched with index
Into:
One-level is retrieved:Host node is received after inquiry, and storage paging composition inquiry that data are located is filtered out from global index
Task-set, and distribute that query task to respective stored paging is present from node.
2-level search:Query task is distributed to and enters line retrieval from node accordingly by host node.From node according to data page
Index, crossing filtering is carried out to query task, forms local queries result.The local queries that each storage paged data inquiry is obtained
As a result host node is uploaded to, row set that host node with whole local queries results are entered collects to form final Query Result return.
Retrieval query can be expressed as following form:
Wherein attriIt is ith attribute, m represents the number of query child-operations;opiIt is operation
Connector, value is AND, OR, NOT,Expression does not operate connector.
Host node after query task is received, need according to inquiry in each attribute tags and attribute span from
The storage paging set for meeting respective condition is found out in gIdx.Then each storage paging is transported according to operation connector
Calculate and obtain the corresponding storage paging set of query task.Finally query task is distributed to into corresponding each number in storage paging set
According to node.
This process is as shown in fig. 7, comprises following steps:
1st, master nodes obtain inquiry request;
2nd, the constraints in global index in inquiry request, obtains different attribute value hash, according to inquiry
Condition, by the node mapping table in hash Boolean calculation is carried out;
3rd, after step 2 process, Query Result rough set pointer is obtained;
4th, corresponding worker nodes send inquiry request in rough set acquired in step 3;
5th, node obtains inquiry request;
6th, i=1 is set;
7th, i-th paging information from inquiry request;
8th, corresponding local paging index in opening steps 7;
9th, the bitmap index in corresponding attribute value hash is obtained according to inquiry request;
10th, the bitmap index that whole attribute values are hashed is carried out into length vs, obtains the most short bitmap index of length;
11st, the bitmap index with step 10 acquisition is respectively carried out it with bitmap index of other values hash as object
Boolean calculation;
12nd, the result of calculation of obtaining step 11, forms Query Result selected works bitmap index;
13rd, the number of the position correspondence row that value is 1 is obtained from storage paging IO according to the selected works bitmap index in step 12
According to record;
14th, above-mentioned data record is filtered according to querying condition, selection meets the data record of condition, is formed paging
Result set;
15th, by current paging result set cache;
16th, i=i+1 is made, execution step 17, otherwise execution step 7 if completing whole paging queries and processing;
17th, worker local whole paging result sets are merged and returns to master nodes;
18th, Master nodes converge the result set that whole worker nodes are returned, and return to query task;
19th, tasks carrying is finished.
By above-mentioned hierarchical index technology, the present invention for each dimension set up in the way of the row of hash value and paging it
Between mapping relations.Realize storage mapping information-attribute interval-attribute tags three-level mapping relations.This index structure
Efficiently, with autgmentability preferably, there is the index balance query execution efficiency to be affected less feature by data set scale, and prop up
Hold multidimensional and boolean queries.Relatively conventional index technology (B+-tree) method proposed by the invention is in typical cloud storage ring
Inquiry response speed is improved 31% or so in border, and write is improved 23% or so with efficiency in renewal process.Meanwhile, in cloud ring
When border node is converted, the maintenance work of node migration is only carried out in global index, index maintenance is relatively simple, disclosure satisfy that
The requirement of cloud storage environment.
Above example only to illustrate technical scheme rather than be limited, the ordinary skill of this area
Personnel can modify or equivalent to technical scheme, without departing from the spirit and scope of the present invention, this
The protection domain of invention should be to be defined described in claims.
Claims (10)
1. a kind of hierarchical index method towards cloud storage, its step includes:
The first step:Global index is set up in master nodes, in worker nodes local paging index is set up;In global index
Record the mapping relations between each data storage paging and data attribute value;The record data storage point in data page index
Mapping relations in page between IO pointers and data attribute value;
Second step:Data record D is indexed the maintenance of content in ablation process according to the membership credentials between two layer indexs
With renewal:First corresponding worker nodes, worker sections are selected according to the value of major key in distribution policy and data record D
Point is completed after local data records write, is safeguarded local paging index information and is returned to master nodes;Master is saved
Content of the point according to data in worker nodes in the maintenance of information global index of Fragmentation.
2. the method for claim 1, it is characterised in that the global index hashes-section by attribute hash-attribute value
Point three levels of mapping table are constituted;The local paging index hashes-falls ranking index of the picture three by attribute hash-attribute value
Level is constituted.
3. the method for claim 1, it is characterised in that the rope of two levels is indexed by global index and local paging
Attract the mapping relation information of storage attribute value and storage location;According to attribute value in global index and partial indexes
Scope sets up interval, and by the attribute value correspondence of data record in some specific interval, the interval includes the value;Together
When, interval and storage mapping information are set up into hash to relation.
4. the method for claim 1, it is characterised in that the global index builds attribute interval with storage paging
Vertical hash is to relation, and a particular community interval is corresponding with a node mapping table;Node mapping table by a group node and
Corresponding paging count queue is constituted;By global index, storage worker nodes and the paging position of data are quickly navigated to.
5. the method for claim 1, it is characterised in that the local paging index remembers attribute interval and data
Line number rowId of the record in Fragmentation sets up corresponding relation;Simultaneously rowId is converted into bitmap in attribute interval
Mode is recorded, to improve recall precision;The bitmap is the throttling of binary word, wherein i-th when being 1, is represented current
RowId hits current interval for the attribute value of the data record of i rows in storage paging.
6. the method for claim 1, it is characterised in that the ablation process of data record D is comprised the following steps:
The first step:In data record D ablation process, master nodes receive request, and data record D major key pair is extracted first
The attribute value answered, is that current data record D selects a corresponding worker node according to distribution policy, and by write request
It is sent to the worker nodes;
Second step:Worker nodes are received after the write request, the data record is write in locally stored paging and is obtained
Its line number rowId, then opens local paging index;Local paging index extracts the attribute in D, and its value is taken with corresponding
Value interval range contrasts, select the interval of hit;Then open bitmap bitmap therein, by bitmap with rowId
Corresponding position is 1;After completing whole property indexs process, storage paging is returned to into master sections with worker nodal informations
Point;
3rd step:Master nodes are received after the storage information of return, extract the attribute in D, and open global index, successively
Attribute value is contrasted with corresponding interval range, the interval of hit is selected, node mapping table therein is opened;
Paging count queue corresponding with aforementioned worker is selected in node mapping table;Extract from paging count queue and return letter
The corresponding paging count target of paging is stored in breath;Its enumerator is added 1;After completing whole property indexs process, data are realized
Index upgrade in ablation process, and return operating result.
7. the method for claim 1, it is characterised in that in data record D renewal process the step of index maintenance such as
Under:
The first step:Master nodes receive data and update operation requests, and according to major key value in D its storage worker sections are bound
Point, and send the requests to the worker nodes;
Second step:Worker nodes receive request, according to the contents extraction initial data in data record local paging index
Record D, according to request content D ' is changed it to;After change is finished, the corresponding attribute values of D first in local paging index
Bitmap in interval is modified;It is 0 by position corresponding with D storages line number in the bitmap;Then, by the corresponding value areas of D '
Between bitmap modify, will with the corresponding position of D ' storage line numbers be 1 in the bitmap;After completing aforesaid operations, by depositing for D and D '
Storage paging information, returns to master nodes;
3rd step:Master nodes are received after the storage paging information of return, by the corresponding values of D first in global index
Node mapping table in interval is opened.Enumerator corresponding with the affiliated pagings of D is extracted from wherein paging count queue, is subtracted
1;Then the node mapping table in the corresponding intervals of D ' is opened, is extracted from wherein paging count queue and affiliated point of D '
The corresponding enumerator of page, plus 1 by it;After completing aforesaid operations, operation information is returned.
8. method as claimed in claim 7, it is characterised in that if the major key value of data is changed, will update behaviour
It is encapsulated as deleting original data record, adds new two parts of data record, it is step-wise execution to complete.
9. method as claimed in claim 8, it is characterised in that during data record is deleted, first according to major key value
Worker nodes are sent the requests to, worker nodes delete the record in local paging, and paging index in local extracts the number
It is 0 by the corresponding position of the line number of the record according to the bitmap recorded in corresponding interval;Then, master sections are returned to
Point;The paging count target in node mapping table in the corresponding interval of the master Node extractions data record, will divide
Enumerator in page count target subtracts 1, works as enumerator<0 is set to 0.
10. it is a kind of based on described in claim 1 towards cloud storage hierarchical index method data retrieval method, its step is such as
Under:
The first step:Master nodes receive retrieval request, and it opens global index, and the value constraint in retrieval request is beaten
Corresponding attribute value hash interior joint mapping table is opened, and extracts paging count queue;
Second step:After obtaining paging and the nodal information in whole paging count queues, master nodes enter according to retrieval request
Row Boolean calculation, obtains final worker nodes and paging information set;
3rd step:Retrieval request and paging information are sent to successively master nodes the worker nodes described in second step;
4th step:Worker nodes are received after retrieval request, open the corresponding local paging index of paging, obtain inquiry request
Value constrains corresponding interval bitmap index;
5th step:Worker nodes carry out Boolean calculation according to inquiry request content to the bitmap index in the 4th step, finally obtain
Obtain in current paging and meet the selected works message bit pattern of requirement, and corresponding data record is extracted from paging;
6th step:Complete to filter selected works after whole paged data records are extracted, form result set, return to master nodes;
7th step:Master nodes collect total data record, return to inquiry request.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610975816.5A CN106599040A (en) | 2016-11-07 | 2016-11-07 | Layered indexing method and search method for cloud storage |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610975816.5A CN106599040A (en) | 2016-11-07 | 2016-11-07 | Layered indexing method and search method for cloud storage |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106599040A true CN106599040A (en) | 2017-04-26 |
Family
ID=58589854
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610975816.5A Pending CN106599040A (en) | 2016-11-07 | 2016-11-07 | Layered indexing method and search method for cloud storage |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106599040A (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341221A (en) * | 2017-06-28 | 2017-11-10 | 百度在线网络技术(北京)有限公司 | Foundation, associative search method, apparatus, equipment and the storage medium of index structure |
CN107436736A (en) * | 2017-08-08 | 2017-12-05 | 郑州云海信息技术有限公司 | The storage method and device of file in a kind of HDFS |
CN107451243A (en) * | 2017-07-27 | 2017-12-08 | 迪尚集团有限公司 | Complex query method based on attribute |
CN108229358A (en) * | 2017-12-22 | 2018-06-29 | 北京市商汤科技开发有限公司 | Index establishing method and device, electronic equipment, computer storage media, program |
CN108427748A (en) * | 2018-03-12 | 2018-08-21 | 北京奇艺世纪科技有限公司 | Distributed data base secondary index querying method, device and server |
CN109471840A (en) * | 2018-10-15 | 2019-03-15 | 北京海数宝科技有限公司 | Fileview method, apparatus, computer equipment and storage medium |
CN109783513A (en) * | 2018-12-20 | 2019-05-21 | 北京大米科技有限公司 | Data processing method, device, server and computer readable storage medium |
CN109978081A (en) * | 2019-05-05 | 2019-07-05 | 星环信息科技(上海)有限公司 | Determination method, apparatus, equipment and the medium of eigentransformation mode |
CN112948386A (en) * | 2021-03-04 | 2021-06-11 | 电信科学技术第五研究所有限公司 | Simple indexing and encryption tray-dropping mechanism for ETL abnormal data |
CN114791913A (en) * | 2022-04-26 | 2022-07-26 | 北京人大金仓信息技术股份有限公司 | Method, storage medium and device for processing shared memory buffer pool of database |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101370025A (en) * | 2007-08-17 | 2009-02-18 | 北京灵图软件技术有限公司 | Storing method, scheduling method and management system for geographic information data |
-
2016
- 2016-11-07 CN CN201610975816.5A patent/CN106599040A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101370025A (en) * | 2007-08-17 | 2009-02-18 | 北京灵图软件技术有限公司 | Storing method, scheduling method and management system for geographic information data |
Non-Patent Citations (3)
Title |
---|
孟必平: ""分片位图索引:一种适用于云数据管理的辅助索引机制"", 《计算机学报》 * |
朱春莹: ""面向大数据查询的索引技术研究"", 《万方》 * |
王振: ""面向海量数据的位图索引技术及应用研究"", 《万方》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107341221B (en) * | 2017-06-28 | 2020-08-11 | 百度在线网络技术(北京)有限公司 | Index structure establishing and associated retrieving method, device, equipment and storage medium |
CN107341221A (en) * | 2017-06-28 | 2017-11-10 | 百度在线网络技术(北京)有限公司 | Foundation, associative search method, apparatus, equipment and the storage medium of index structure |
CN107451243A (en) * | 2017-07-27 | 2017-12-08 | 迪尚集团有限公司 | Complex query method based on attribute |
CN107451243B (en) * | 2017-07-27 | 2024-04-12 | 迪尚集团有限公司 | Complex query method based on attribute |
CN107436736A (en) * | 2017-08-08 | 2017-12-05 | 郑州云海信息技术有限公司 | The storage method and device of file in a kind of HDFS |
CN108229358A (en) * | 2017-12-22 | 2018-06-29 | 北京市商汤科技开发有限公司 | Index establishing method and device, electronic equipment, computer storage media, program |
CN108427748A (en) * | 2018-03-12 | 2018-08-21 | 北京奇艺世纪科技有限公司 | Distributed data base secondary index querying method, device and server |
CN109471840A (en) * | 2018-10-15 | 2019-03-15 | 北京海数宝科技有限公司 | Fileview method, apparatus, computer equipment and storage medium |
CN109783513A (en) * | 2018-12-20 | 2019-05-21 | 北京大米科技有限公司 | Data processing method, device, server and computer readable storage medium |
CN109783513B (en) * | 2018-12-20 | 2021-03-16 | 北京大米科技有限公司 | Data processing method, device, server and computer readable storage medium |
CN109978081A (en) * | 2019-05-05 | 2019-07-05 | 星环信息科技(上海)有限公司 | Determination method, apparatus, equipment and the medium of eigentransformation mode |
CN109978081B (en) * | 2019-05-05 | 2019-12-24 | 星环信息科技(上海)有限公司 | Method, apparatus, device and medium for determining feature transformation mode |
CN112948386A (en) * | 2021-03-04 | 2021-06-11 | 电信科学技术第五研究所有限公司 | Simple indexing and encryption tray-dropping mechanism for ETL abnormal data |
CN112948386B (en) * | 2021-03-04 | 2023-09-22 | 电信科学技术第五研究所有限公司 | Simple indexing and encrypting disk-dropping mechanism for ETL abnormal data |
CN114791913A (en) * | 2022-04-26 | 2022-07-26 | 北京人大金仓信息技术股份有限公司 | Method, storage medium and device for processing shared memory buffer pool of database |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106599040A (en) | Layered indexing method and search method for cloud storage | |
US11238098B2 (en) | Heterogenous key-value sets in tree database | |
CN107122443B (en) | A kind of distributed full-text search system and method based on Spark SQL | |
CN110291518A (en) | Merge tree garbage index | |
CN103177055B (en) | It is stored as row storage and row stores the hybrid database table of the two | |
CN107423422B (en) | Spatial data distributed storage and search method and system based on grid | |
CN104765876B (en) | Magnanimity GNSS small documents cloud storage methods | |
CN110268399A (en) | Merging tree for attended operation is modified | |
CN110383261A (en) | Stream for multithread storage device selects | |
CN110268394A (en) | KVS tree | |
CN108874971A (en) | A kind of tool and method applied to the storage of magnanimity labeling solid data | |
CN111427847B (en) | Indexing and querying method and system for user-defined metadata | |
CN102982103A (en) | On-line analytical processing (OLAP) massive multidimensional data dimension storage method | |
CN106716409A (en) | Method and system for adaptively building and updating column store database from row store database based on query demands | |
CN108536692A (en) | A kind of generation method of executive plan, device and database server | |
CN105912666A (en) | Method for high-performance storage and inquiry of hybrid structure data aiming at cloud platform | |
CN108009265B (en) | Spatial data indexing method in cloud computing environment | |
CN108021702A (en) | Classification storage method, device, OLAP database system and medium based on LSM-tree | |
CN104346444B (en) | A kind of the best site selection method based on the anti-spatial key inquiry of road network | |
CN1845093A (en) | Attribute extensible object file system | |
CN104408128B (en) | A kind of reading optimization method indexed based on B+ trees asynchronous refresh | |
CN111126461A (en) | Intelligent auditing method based on machine learning model explanation | |
CN107273443B (en) | Mixed indexing method based on metadata of big data model | |
JP6006740B2 (en) | Index management device | |
CN109213760B (en) | High-load service storage and retrieval method for non-relational data storage |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170426 |
|
RJ01 | Rejection of invention patent application after publication |