CN102722531A - Query method based on regional bitmap indexes in cloud environment - Google Patents

Query method based on regional bitmap indexes in cloud environment Download PDF

Info

Publication number
CN102722531A
CN102722531A CN2012101552537A CN201210155253A CN102722531A CN 102722531 A CN102722531 A CN 102722531A CN 2012101552537 A CN2012101552537 A CN 2012101552537A CN 201210155253 A CN201210155253 A CN 201210155253A CN 102722531 A CN102722531 A CN 102722531A
Authority
CN
China
Prior art keywords
bitmap
condition
tuple
cloud environment
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2012101552537A
Other languages
Chinese (zh)
Other versions
CN102722531B (en
Inventor
孟必平
王腾蛟
李红燕
高军
杨冬青
唐世渭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN201210155253.7A priority Critical patent/CN102722531B/en
Publication of CN102722531A publication Critical patent/CN102722531A/en
Application granted granted Critical
Publication of CN102722531B publication Critical patent/CN102722531B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a query method based on regional bitmap indexes in a cloud environment. The method comprises the following steps of: 1) establishing the regional bitmap indexes; 1.1) performing range division on index attributes on a data table in the cloud environment to generate a global sequencing table of attribute values, wherein the global sequencing table is used for sequencing tuples by using a set rule; 1.2) establishing an indicating bitmap on each data node according to the range division result, wherein the indicating bitmap records the storage condition of local attribute values; 1.3) establishing a local bitmap index on each data node according to the framework of the cloud environment to finish establishment of the regional bitmap indexes; and 2) inputting a query condition, establishing a condition bitmap according to the query condition by a main node, and distributing the condition bitmap to each data node, wherein the condition bitmap covers all probabilities included in the query condition; and concurrently executing retrieval task through each data node, acquiring the query result of each data node by the main node, and returning a union set of the query results of the data nodes to a user. By establishing the regional bitmap indexes, configurable parallel computing resources in the cloud environment can be fully utilized, and quick response can be provided for the data query request using capacity comparison as a condition.

Description

In a kind of cloud environment based on the querying method of burst bitmap index
Technical field
The invention belongs to areas of information technology, relate to the distributed bitmap indexing method in a kind of cloud environment and utilize this method that data are inquired about.
Background technology
Cloud computing environment and data management
The fast development of cloud computing technology is that the storage and the management of mass data provides possibility.Compare traditional unit computing environment; The huge computational resource that cloud environment can effectively utilize distributed type assemblies comes the demand of satisfying magnanimity data management to computational resource and storage resources, and has and be easy to safeguard, be easy to expansion and be easy to good characteristic such as management.In the face of the quick growth of data volume, the cloud computing technology can rapid adjustment and is distributed resource requirement to expand with the madness that adapts to data; Simultaneously can the memory module of rubber-like, crumbly texture be provided and be based upon the distributed parallel computational resource on this memory module for non-institutional data.Along with the rapid expansion of data cybertimes, the managing large scale data become a very urgent demand.Cloud environment makes it possess the ability of storage and managing large scale data in the advantage aspect computational resource and the storage resources.
Bitmap index
Bitmap index is a kind of special database index technology of using bitmap.In bitmap index, the value of each is 0 or 1 in the bitmap, and whether the corresponding tuple of expression obtains a certain given value on by index attributes.Therefore the length of bitmap equals the tuple sum.Bitmap index on the attribute A will be set up a bitmap for all possible value on this attribute, in order to indicate the value condition of each tuple on this attribute.If it is more that attribute A goes up possible value, will produce more a plurality of bitmaps.In this case, can use B +Tree is organized these bitmaps.B +Tree can guarantee the bitmap that quick location desires to search.The advantage of bitmap index is, can utilize traditional computer step-by-step logical operation efficiently to come the compound querying condition of fast processing.For example, will be at attribute A 1The bitmap that last retrieval obtains and at attribute A 2Bitmap actuating logic step-by-step that last retrieval obtains and the result that can be satisfied the querying condition on these two attributes simultaneously.
The centralized management scheme of index in the cloud environment
In centralized solution, sorted by the overall situation and concentrated management by all values on the index field.Particularly, every corresponding index entry of record will comprise by the value of the field of index and the corresponding Major key of this record.In index structure, these index entries are according to being sorted by the value of the index field overall situation.System handles is divided into two steps by the process of the query requests on the index field.At first in the index structure of overall situation ordering, find qualified index entry, thereby learn the Major key of respective record.Then, thus assemble index according to Major key visit and locate complete record.
The management of the index entry that the ultimate challenge of centralized solution has sorted no more than the overall situation.One the most directly method be that these index entries are stored in the data management system as other general datas are equally distributed.With the major key search mechanism among the project HBase that increases income is example; Fig. 1 has showed the formation and the access mode of index structure in the centralized solution; Wherein the root table record target tuple should corresponding to which metadata table, metadata table has then write down target tuple real position in tables of data.In the retrieving, at first obtain the pairing Major key of target tuple through the 1st, 2,3 steps, then through the 5th, 6,7 steps were found the target tuple according to Major key.In this way, huge index structure also can be enjoyed reliability, extensibility and the ease of manageability that the Mass Data Management system provides.But each step in the whole access process only has an independent back end to participate in accomplishing the advantage that this does not effectively utilize the Distributed Calculation resource to bring.Thereby the response time of inquiry will be very long.
The distributed management scheme of index in the cloud environment
In distributed schemes, the independent index of setting up in the local data that manages separately on each back end.Distributed schemes is not safeguarded the overall situation ordering of index value, but it is localized on each independent back end.Thereby there is not dependence each other in back end, and this has brought facility for the concurrent execution of query requests.When being arrived by the query requests on the index key assignments, retrieval tasks will be distributed on all back end and with concurrent mode and carry out, and final query result is the union of return results on all back end.The index of data will be by independent maintenance on each back end, so its local index structure has very strong dirigibility: the index technology that each node uses can be an isomorphism, for example all uses B +The tree index; Also can make isomery, use B on the node that for example has +The tree index then uses bitmap index or the like on other the node.The partial indexes of isomery allows each back end to select employed index technology according to the computational resource of self, and for example, the insufficient node of main memory resource can use B +The tree index is also only safeguarded B in main memory +Tree is near the two-layer node of root.The relatively poor node of CPU computing power then can use bitmap index, thereby utilizes the logical operation on the bitmap to reduce calculated amount.Fig. 2 has showed the formation and the access mode of index structure in the distributed schemes.
In this way, index structure depends on data itself and is dispersed in each node.Has independence between the node.Retrieval tasks also is assigned to independent execution the in each node, thereby the parallel computation resource is utilized well.But; Especially with the most frequently used to equivalent condition be representative; Because the target record quantity of most retrieval tasks seldom; In distributed type assemblies, carry out this task concurrently and often cause the back end of much not storing any target record also to trigger retrieving, and return empty set the most at last.Under the frequent situation of retrieval tasks, the unnecessary computational resource that this executed in parallel process will labor finally will reduce the handling capacity of system.
Summary of the invention
The object of the present invention is to provide the distributed bitmap indexing method in a kind of cloud environment---and the burst bitmap index (Regional Bitmap Index, RBI) and utilize this method that large-scale data is efficiently inquired about.The present invention has drawn in centralized index scheme and the distributed index scheme advantage separately, proposes burst bitmap index structure and sets up the indication bitmap through the overall ordering mechanism of property value and makes each back end can understand the distribution situation of local data in global data; Through with index structure localization, thereby make the separate as far as possible enforcement of being convenient to parallel search between the back end.Method among the present invention has made full use of the response time of the inquiry that parallel computation resource in the cloud environment significantly improves; Each back end utilizes the numeric distribution information on the codomain to avoid not having unnecessary calculation cost and the expense of retrieving in the back end that hits and causing a large amount of simultaneously, thereby the inquiry handling capacity has also obtained raising.
The present invention proposes in a kind of cloud environment based on the querying method of burst bitmap index, and its step comprises:
1) set up the burst bitmap index,
1.1) property value of each tuple in the cloud environment is carried out the codomain division, generate the overall sequencing table of property value, said overall sequencing table is to the rule compositor of tuple with setting;
1.2) according to the codomain results, on each distributed data node, setting up the indication bitmap, said indicating bit figure writes down local attribute's value storage condition;
1.3) on each distributed data node, set up the local bitmap index according to the cloud environment framework, accomplish the establishment of burst bitmap index;
2) input inquiry condition, host node is set up the condition bitmap according to querying condition, and is distributed to each back end, and it is possible that said condition bitmap covering querying condition comprises institute; The concurrent execution retrieval tasks of each back end, host node is collected the Query Result of each back end, and returns the union of Query Result on each back end to the user.
Use length to do Each tuple of bits string representation, wherein, the codomain of the attribute i of tuple is cut into c iSub-domain, f is the number of participating in the attribute of cutting, 1≤i≤f.
Said c iSub-domain constitutes set C i, and use cartesian product Des 1...f=C 1* C 2* C 3* ... C fExpression, the size of said cartesian product is: B = Π i = 1 f c i .
Said bit string is carried out overall situation ordering, and the ranking value that obtains and tuple arbitrarily be unique corresponding by the value on the inquiry field, the corresponding bit string of said tuple possible value according to from small to large rank order.
The length of said indication bitmap equals the number that this tuple attributes codomain is divided subdomain, and is identical with the big or small B of cartesian product.
The method of setting up bitmap index in the step 3) is: the overall ranking value to the pairing bit string of tuple that exists on this node is set up B +Tree, each key in the leaf node of tree is corresponding to a ranking value; Be B +The additional length of each key on the leaf node of tree is that the bitmap of the tuple sum managed of notebook data node is as the pairing tuple bitmap of corresponding ranking value.
When querying condition is the single query condition,
A) each computing node is split as the pairing bit string of element in the attribute cartesian product with the condition bitmap respectively, and converts the bit string that fractionation obtains into the corresponding sequencing value and set up target ranking value set;
B) generate length and equal the complete 0 bit string cb of B, and the position that will belong in the set of target ranking value is 1;
C) whether the result of calculation of inspection logic step-by-step and eb&cb is 0, and wherein eb representes the indication bitmap on this computing node, and cb is the tuple bit string;
D) if be 0 then on this computing node, directly return empty set as result of calculation;
E) otherwise, search for the local B of this computing node +Tree and find corresponding leaf node and on the tuple bitmap that adheres to, check one by one whether be changed to the pairing tuple in position of 1 in the tuple bitmap satisfies condition.
When querying condition is a plurality of querying condition,
ⅰ) carry out inquiry according to single query condition situation;
ⅱ) with step ⅰ) in the Query Result of each querying condition carry out corresponding step-by-step logical operation according to the complex method of condition in the former inquiry, check one by one whether be changed to the pairing tuple in position of 1 in the result of calculation satisfies condition;
ⅲ) return all results that satisfy condition as Query Result to host node at last.
The condition bitmap be the pairing bit string of qualified element in the tuple attributes bit string cartesian product the logic step-by-step with.
When query requests arrives the distributed data node, confirm through comparison indication bitmap whether the notebook data node comprises the target tuple,, then directly return the Query Result of null value, and need not to carry out retrieval tasks as this back end if do not exist.Beneficial effect of the present invention is:
To have designed the index that combines the two advantage on the difference basis of contrast centralized solution and distributed schemes machine-processed analyzing in the present invention.This method can make full use of the configurable parallel computation resource in the cloud environment, can be for being that the data query request of condition provides corresponding fast with size relatively.Have benefited from indicating the use of bitmap, the present invention has avoided unnecessary computational resource expense effectively.The present invention is deployed in the cloud environment, and in the face of large-scale data, search method provided by the invention is with good expansibility.
Description of drawings
Fig. 1 is a centralized solution synoptic diagram of the prior art;
Fig. 2 is a distributed schemes synoptic diagram of the prior art;
Fig. 3 be in the cloud environment of the present invention based on the querying method of burst bitmap index aspect the response time with the comparison synoptic diagram of prior art search method;
Fig. 4 be in the cloud environment of the present invention based on the querying method of burst bitmap index aspect the inquiry handling capacity with the comparison synoptic diagram of prior art search method;
Fig. 5 is based on tuple and bit string value and the ranking value on certain back end among querying method one embodiment of burst bitmap index in the cloud environment of the present invention;
Fig. 6 is based on the local B among querying method one embodiment of burst bitmap index in the cloud environment of the present invention +The synoptic diagram of tree construction and tuple bitmap;
Fig. 7 is based on the querying method query steps process flow diagram of burst bitmap index in the cloud environment of the present invention.
Embodiment
Have 4.5 hundred million tuples at one, carry out result of experiment on the data set of size for the record Twitter microblogging forwarding relation of 52GB and show that the indexing means that proposes among the present invention all has good performance on response time and inquiry handling capacity.Fig. 3 and Fig. 4 have listed basic realization (the Global Approach of centralized solution index; GA); Basic realization (the DistributedApproach of distributed schemes index; DA) the burst bitmap index (Regional Bitmap Index, contrast and experiment RBI) that propose with this method.Experiment has used a distributed type assemblies that is made up of 4 machines as experimental situation.
Introduce each step in the building process of the burst bitmap index that the present invention proposes in detail below in conjunction with accompanying drawing 7:
At first, generate data and carried out overall sequencing table by the value on the index attributes.The generation method of overall situation sequencing table is following:
1. be A at numerical attribute 1, A 2, A 3..., A fF data list and set up bitmap index altogether; At first with each attribute A 1Codomain be cut into c iSub-domain (if the attribute value is a discrete value, then can each value be divided into a sub-domain separately), the set of establishing these subdomains formations is C i, the cartesian product Des of subdomain so 1...f=C 1* C 2* C 3* ... C fSize be:
Figure BDA00001652285000051
2. use one longly to do
Figure BDA00001652285000052
Bit string represent each tuple, wherein c iIt is the size of codomain set.If the attribute A of tuple t iThe corresponding position of j sub-domain be b I, j, stipulate that so the pairing bit string of this tuple can be:
Figure BDA00001652285000053
Figure BDA00001652285000055
3. the pairing bit string of given any tuple t,
Figure BDA00001652285000056
Have and only have a unique i ∈ [1, c i] make b I, j=1.Therefore the possible value of the institute of this bit string can be one by one corresponding to above-mentioned subdomain cartesian product Des 1...fIn element.Thereby the pairing bit string of tuple will have
Figure BDA00001652285000057
individual possible value arbitrarily.Different tuple in same tables of data has identical table schema; Table schema has determined the attribute that table is had; Given attribute row and on the subdomain situation of dividing under, the possible value of the institute of the bit string that tuple is corresponding is according to from small to large rank order.The all possible value homogeneous one of bit string corresponds on the ranking value r ∈ [1, B].The value of tuple on data field to be detected is all corresponding to a unique ranking value r like this, arbitrarily.So far, the overall situation of having accomplished the index value on all tuples sorts.
Think that below it is example that two attributes in certain company personnel's information table are set up combined index, introduce from tuple and generate corresponding bit string, and then calculate the process of corresponding ranking value.
Example 1: the employee information table of given certain company, establish this Table Properties A 1Indication employee sex comprises two values of male male and women female; Attribute A 2Registrar's salary water, value are the integer in [0,3000] scope.At first, attribute A 1Codomain be split into two sub-domain, only comprise value man male and women female respectively; Attribute A 2Codomain be split into three sub-domain: [0,1000], (1000,2000] and (2000,3000].Consider employee 1, establish its sex and be the male sex that salary is 1300, this employee's 1 corresponding bit string is so: ' 10010 '.Front two ' 10 ' representation attribute A wherein 1Last value is male, back three ' 010 ' represent its salary scope (1000,2000] in.Consider employee 2 again, establishing its sex is the women, and salary is 2600, and the corresponding bit string of this employee is so: ' 01001 '.Front two ' 01 ' representation attribute A wherein 1Last value is male, back three ' 001 ' represent its salary scope (2000,3000] in.Based on above division to the attribute codomain, the size that can know the cartesian product of attribute subdomain is B=2 * 3=6.Arbitrarily the possible value of the pairing bit string of tuple has 6, it can be got according to from small to large series arrangement: ' 01001 ', ' 01010 ', ' 01100 ', ' 10001 ', ' 10010 ' and ' 10100 '.Contrast can know that employee's 1 corresponding bit string comes the 5th, and employee's 2 corresponding bit strings come the 1st.Therefore, employee 1 is respectively 5 and 1 with the ranking value of the bit string of employee's 2 correspondences.
Next, generate the indication bitmap respectively at each back end.The indication bitmap is a bit string that length is the big or small B of subdomain cartesian product.If having ranking value on this back end is the tuple of r; Then the r position of indication bitmap is 1 on this back end; If not having ranking value on this back end is the tuple (being greater than or less than the r tuple) of r, then the r position of indication bitmap is 0 on this back end.The indication bitmap has reflected the distribution situation of local data in the whole codomain of attribute.It will be stored in the internal memory of back end.
Example 2: about the same example 1 of the hypothesis of employee information table.As shown in Figure 5, establish this table and on certain back end, have 7 records.Because B=6, so the indication bitmap lengths on this back end is 6.Again because this back end only comprises ranking value is 1,3,4 and 5 tuple, so its indication bitmap should be: 101110.
Indicating bit figure has write down the situation that exists of local attribute's value.When query requests arrives the distributed data node, at first confirm through comparison indication bitmap whether the notebook data node comprises the target tuple,, then directly return null value, and do not carry out retrieval tasks if do not exist.Comparison to the indication bitmap is operated through the step-by-step logical and of bit string and is accomplished.
Example 3: about the same example 1 of the hypothesis of employee information table.As shown in Figure 5, establish this table and on certain back end, have 7 records.Can know that by example 3 its indication bitmap is 101110.Suppose the user sent the inquiry sex for women and salary scope (1000,2000] in the employee, the bit string that can generate the correspondence of target record in view of the above is 01010, its ranking value is 2.Thereby, structure bit string 01000 with exist 101110 carry out the logic step-by-step with: 01000&101110=0, result be complete zero, so do not have the target tuple on this node, can directly return the Query Result of empty set as this node.
At last, be that the data of local management are set up bitmap index at each back end.
Be similar to the said distributed schemes of preamble, this method adopts back end to manage the mode of the index in the local data independently equally, so that improve the degree of parallelism when carrying out retrieval.On each back end, the executed in parallel following steps:
1. the overall ranking value of the pairing bit string of tuple that exists on this node is set up B +Tree, each leaf node of tree is corresponding to a ranking value.B +Described in the building process such as Goetz Graefe and Harumi A.Kuno.Modern B-Tree techniques. (In ICDE, pages 1370-1373,2011.) of tree.
2. be B +The additional length of each leaf node of tree is that the bitmap of the tuple sum managed of notebook data node is as the corresponding data of this leaf node.The pairing position of tuple that has this ranking value in the bitmap is changed to 1 other positions and is changed to 0.This bitmap is called as the tuple bitmap.Be prone to see the B on the individual data node +Tree has B leaf node at most, therefore has B tuple bitmap at most.
Continue the hypothesis in the example 1, following Example has been introduced the generative process of the local bitmap index on certain back end.
Example 4: about the same example 1 of the hypothesis of employee information table.As shown in Figure 5, establish this table and on certain back end, have 7 records.Wherein bit string Bit String and ranking value Rank are not the attributes in the former table, but do as one likes not with two attributes of salary on the bit string that generates of value and the ranking value of correspondence thereof.The B that sets up according to the ranking value of each record +Tree construction (this B as shown in Figure 6 +Each node in the tree contains 3 child nodes at the most).
Introduce given querying condition in detail below in conjunction with accompanying drawing 7, carry out the step of retrieval tasks with distributed way:
At first, host node is according to querying condition formation condition bitmap, and is distributed to each back end.The condition bitmap be the pairing bit string of qualified element in the attribute cartesian product the logic step-by-step with.For example, inquiry women salary is that employee between 1500 to 1800 will obtain condition bitmap 01010; And for example, the querying condition salary is lower than 1300 male sex employee and will be converted into condition bit Figure 101 10.Notice that the condition bit string of generation should cover all possibilities that querying condition comprises.In addition, if querying condition relates to the compound condition of a plurality of querying conditions on the attribute relevant with a plurality of index structures, should be each condition so and generate independently condition bitmap.
Then, each back end executed in parallel retrieval tasks.For the compound query condition of single query condition and a plurality of querying condition composition, below divide briefing:
1. the situation of single query condition:
A) the condition bitmap is split as the pairing bit string of element in the attribute cartesian product, and converts the bit string that fractionation obtains into the corresponding sequencing value.The set of target ranking value that these ranking value have been formed.
B) then generate the complete 0 bit string cb that length equals B, and the position that will belong in the set of target ranking value is 1.
C) whether the result of calculation of inspection logic step-by-step and eb&cb is 0.Wherein eb representes the indication bitmap on this back end.
D) if be 0 then on this back end, directly return empty set as result of calculation.
E) otherwise, search for the local B of this back end +Tree and find corresponding leaf node and on the tuple bitmap that adheres to, check one by one whether be changed to the pairing tuple in position of 1 in the tuple bitmap satisfies condition.Return all results that satisfy condition as Query Result to host node at last.
2. the situation of the compound query condition formed of many querying conditions:
A) for each independent querying condition, carry out following steps successively:
Each back end of ⅰ converts the condition bit string into the scope of corresponding ranking value respectively.
ⅱ then generates the complete 0 bit string cb that length equals B, and the position that will belong in the ranking value scope is 1.
Whether the result of calculation of ⅲ inspection step-by-step logical and eb&cb is 0.
If ⅳ is 0 then on this back end, directly generates complete 0 bitmap of local tuple quantity that length equals this back end management as the Query Result of this querying condition.
ⅴ otherwise, search for the local B of this back end +Tree and find corresponding leaf node and on the tuple bitmap that adheres to as the Query Result of this querying condition.
B) Query Result of each querying condition in the step a) is carried out corresponding step-by-step logical operation according to the complex method of condition in the former inquiry, check one by one be changed in the result of calculation 1 the position a pairing tuple whether satisfy condition.
C) return all results that satisfy condition as the Query Result on the notebook data node to host node at last.
At last, host node is collected the Query Result that each back end returns, and gets union as net result.

Claims (10)

  1. In the cloud environment based on the querying method of burst bitmap index, its step comprises:
    1) set up the burst bitmap index,
    1.1) property value of each tuple in the cloud environment is carried out the codomain division, generate the overall sequencing table of property value, said overall sequencing table is to the rule compositor of tuple with setting;
    1.2) according to the codomain results, on each distributed data node, setting up the indication bitmap, said indicating bit figure writes down local attribute's value storage condition;
    1.3) on each distributed data node, set up the local bitmap index according to the cloud environment framework, accomplish the establishment of burst bitmap index;
    2) input inquiry condition, host node is set up the condition bitmap according to querying condition, and is distributed to each back end, and it is possible that said condition bitmap covering querying condition comprises institute; The concurrent execution retrieval tasks of each back end, host node is collected the Query Result of each back end, and returns the union of Query Result on each back end to the user.
  2. 2. based on the querying method of burst bitmap index, it is characterized in that in the cloud environment as claimed in claim 1, use length to do Each tuple of bits string representation, wherein, the codomain of the attribute i of tuple is cut into c iSub-domain, f is the number of participating in the attribute of cutting, 1≤i≤f.
  3. 3. based on the querying method of burst bitmap index, it is characterized in that said c in the cloud environment as claimed in claim 2 iSub-domain constitutes set C i, and use cartesian product Des 1...f=C 1* C 2* C 3* ... C fExpression, the size of said cartesian product is: B = Π i = 1 f c i .
  4. 4. in the cloud environment as claimed in claim 2 based on the querying method of burst bitmap index; It is characterized in that; Said bit string is carried out overall situation ordering; The ranking value that obtains and any tuple be unique corresponding by the value on the inquiry field, and the possible value of the institute of the corresponding bit string of said tuple is according to from small to large rank order.
  5. As in claim 1 or the 3 described cloud environments based on the querying method of burst bitmap index, it is characterized in that the length of said indication bitmap equals the number that this tuple attributes codomain is divided subdomain, identical with the big or small B of cartesian product.
  6. 6. based on the querying method of burst bitmap index, it is characterized in that in the cloud environment as claimed in claim 1 that the method for setting up bitmap index in the step 3) is: the overall ranking value to the pairing bit string of tuple that exists on this node is set up B +Tree, each key in the leaf node of tree is corresponding to a ranking value; Be B +The additional length of each key on the leaf node of tree is that the bitmap of the tuple sum managed of notebook data node is as the pairing tuple bitmap of corresponding ranking value.
  7. 7. based on the querying method of burst bitmap index, it is characterized in that in the cloud environment as claimed in claim 1, when querying condition is the single query condition,
    A) each computing node is split as the pairing bit string of element in the attribute cartesian product with the condition bitmap respectively, and converts the bit string that fractionation obtains into the corresponding sequencing value and set up target ranking value set;
    B) generate length and equal the complete 0 bit string cb of B, and the position that will belong in the set of target ranking value is 1;
    C) whether the result of calculation of inspection logic step-by-step and eb&cb is 0, and wherein eb representes the indication bitmap on this computing node;
    D) if be 0 then on this computing node, directly return empty set as result of calculation;
    E) otherwise, search for the local B of this computing node +Tree and find corresponding leaf node and on the tuple bitmap that adheres to, check one by one whether be changed to the pairing tuple in position of 1 in the tuple bitmap satisfies condition.
  8. As in claim 1 or the 7 described cloud environments based on the querying method of distributed bitmap index, it is characterized in that, when querying condition is a plurality of querying condition,
    ⅰ) carry out inquiry according to single query condition situation;
    ⅱ) with step ⅰ) in retrieval obtains according to each querying condition tuple bitmap carry out corresponding step-by-step logical operation according to the complex method of condition in the former inquiry, check one by one whether be changed to the pairing tuple in position of 1 in the result of calculation satisfies condition;
    ⅲ) return all results that satisfy condition as Query Result to host node at last.
  9. 9. based on the querying method of burst bitmap index, it is characterized in that in the cloud environment as claimed in claim 1, the condition bitmap be the pairing bit string of qualified element in the tuple attributes bit string cartesian product the logic step-by-step with.
  10. 10. in the cloud environment as claimed in claim 1 based on the querying method of burst bitmap index; It is characterized in that; When query requests arrives the distributed data node, confirm through comparison indication bitmap whether the notebook data node comprises the target tuple, if do not exist; Then directly return the Query Result of null value, and need not to carry out retrieval tasks as this back end.
CN201210155253.7A 2012-05-17 2012-05-17 Query method based on regional bitmap indexes in cloud environment Expired - Fee Related CN102722531B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201210155253.7A CN102722531B (en) 2012-05-17 2012-05-17 Query method based on regional bitmap indexes in cloud environment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201210155253.7A CN102722531B (en) 2012-05-17 2012-05-17 Query method based on regional bitmap indexes in cloud environment

Publications (2)

Publication Number Publication Date
CN102722531A true CN102722531A (en) 2012-10-10
CN102722531B CN102722531B (en) 2014-04-16

Family

ID=46948292

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201210155253.7A Expired - Fee Related CN102722531B (en) 2012-05-17 2012-05-17 Query method based on regional bitmap indexes in cloud environment

Country Status (1)

Country Link
CN (1) CN102722531B (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968309A (en) * 2012-11-30 2013-03-13 亚信联创科技(中国)有限公司 Method and device for realizing rule matching based on rule engine
CN103310023A (en) * 2013-07-05 2013-09-18 深圳中兴网信科技有限公司 Distributed searching system and method
WO2014086019A1 (en) * 2012-12-06 2014-06-12 Empire Technology Development Llc Decentralizing a hadoop cluster
CN105573824A (en) * 2014-10-10 2016-05-11 腾讯科技(深圳)有限公司 Monitoring method and system of distributed computing system
CN106250565A (en) * 2016-08-30 2016-12-21 福建天晴数码有限公司 Querying method based on burst relevant database and system
CN107704527A (en) * 2017-09-18 2018-02-16 华为技术有限公司 Date storage method, device and storage medium
CN109086344A (en) * 2018-07-12 2018-12-25 广州市闲愉凡生信息科技有限公司 A kind of method of the full-text search of cloud computing platform
CN109165262A (en) * 2018-10-16 2019-01-08 成都索贝数码科技股份有限公司 Fragmentation clustering system and fragmentation method of relational large table
CN109960944A (en) * 2017-12-14 2019-07-02 中兴通讯股份有限公司 A kind of data desensitization method, server, terminal and computer readable storage medium
CN109960695A (en) * 2019-04-09 2019-07-02 苏州浪潮智能科技有限公司 The management method and device of database in cloud computing system
CN110019204A (en) * 2017-10-27 2019-07-16 航天信息股份有限公司 Method and apparatus are indexed inside split towards HDFS
CN110968762A (en) * 2019-12-05 2020-04-07 北京天融信网络安全技术有限公司 Adjusting method and device for retrieval
CN111737264A (en) * 2020-07-20 2020-10-02 智者四海(北京)技术有限公司 Information processing method and system
CN112765171A (en) * 2021-01-12 2021-05-07 湖北宸威玺链信息技术有限公司 Optimization algorithm for multi-field combined index access of block chain data uplink
CN112783835A (en) * 2021-03-11 2021-05-11 百果园技术(新加坡)有限公司 Index management method and device and electronic equipment
WO2023160115A1 (en) * 2022-02-28 2023-08-31 华为技术有限公司 Key-value pair retrieving method and apparatus, and storage medium
CN117555906A (en) * 2024-01-12 2024-02-13 腾讯科技(深圳)有限公司 Data processing method, device, electronic equipment and storage medium
CN112783835B (en) * 2021-03-11 2024-06-04 百果园技术(新加坡)有限公司 Index management method and device and electronic equipment

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
《DAMAP'08》 20080325 Katerina Fotiadou,Evaggelia Pitoura "BITPEER: Continuous Subspace Skyline Computation with Distributed Bitmap Indexes" 1-8 , *
《Proceedings of the VLDB Endowment》 20100917 Sai Wu et al. "Efficient B-tree Based Indexing for Cloud Data Processing" 1207-1218 第3卷, 第1期 *
《SIGMOD'10》 20100611 Jinbao Wang et al. "Indexing Multi-dimensional Data in a Cloud System" 591-602 , *
《SOSP'03》 20031022 Sanjay Ghemawat et al. "The Google File System" 1-15 , *
JINBAO WANG ET AL.: ""Indexing Multi-dimensional Data in a Cloud System"", 《SIGMOD’10》 *
KATERINA FOTIADOU,EVAGGELIA PITOURA: ""BITPEER: Continuous Subspace Skyline Computation with Distributed Bitmap Indexes"", 《DAMAP’08》 *
SAI WU ET AL.: ""Efficient B-tree Based Indexing for Cloud Data Processing"", 《PROCEEDINGS OF THE VLDB ENDOWMENT》 *
SANJAY GHEMAWAT ET AL.: ""The Google File System"", 《SOSP’03》 *

Cited By (28)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968309A (en) * 2012-11-30 2013-03-13 亚信联创科技(中国)有限公司 Method and device for realizing rule matching based on rule engine
CN102968309B (en) * 2012-11-30 2016-01-20 亚信科技(中国)有限公司 A kind of rule matching method and device realizing rule-based engine
WO2014086019A1 (en) * 2012-12-06 2014-06-12 Empire Technology Development Llc Decentralizing a hadoop cluster
CN104838374A (en) * 2012-12-06 2015-08-12 英派尔科技开发有限公司 Decentralizing a HADOOP cluster
US9588984B2 (en) 2012-12-06 2017-03-07 Empire Technology Development Llc Peer-to-peer data management for a distributed file system
CN103310023A (en) * 2013-07-05 2013-09-18 深圳中兴网信科技有限公司 Distributed searching system and method
CN105573824A (en) * 2014-10-10 2016-05-11 腾讯科技(深圳)有限公司 Monitoring method and system of distributed computing system
CN105573824B (en) * 2014-10-10 2020-04-03 腾讯科技(深圳)有限公司 Monitoring method and system for distributed computing system
CN106250565A (en) * 2016-08-30 2016-12-21 福建天晴数码有限公司 Querying method based on burst relevant database and system
CN106250565B (en) * 2016-08-30 2019-05-07 福建天晴数码有限公司 Querying method and system based on fragment relevant database
CN107704527A (en) * 2017-09-18 2018-02-16 华为技术有限公司 Date storage method, device and storage medium
CN107704527B (en) * 2017-09-18 2020-05-08 华为技术有限公司 Data storage method, device and storage medium
CN110019204A (en) * 2017-10-27 2019-07-16 航天信息股份有限公司 Method and apparatus are indexed inside split towards HDFS
CN109960944A (en) * 2017-12-14 2019-07-02 中兴通讯股份有限公司 A kind of data desensitization method, server, terminal and computer readable storage medium
CN109086344A (en) * 2018-07-12 2018-12-25 广州市闲愉凡生信息科技有限公司 A kind of method of the full-text search of cloud computing platform
CN109165262B (en) * 2018-10-16 2022-05-10 成都索贝数码科技股份有限公司 Fragmentation clustering system and fragmentation method of relational large table
CN109165262A (en) * 2018-10-16 2019-01-08 成都索贝数码科技股份有限公司 Fragmentation clustering system and fragmentation method of relational large table
CN109960695B (en) * 2019-04-09 2020-03-13 苏州浪潮智能科技有限公司 Management method and device for database in cloud computing system
CN109960695A (en) * 2019-04-09 2019-07-02 苏州浪潮智能科技有限公司 The management method and device of database in cloud computing system
CN110968762A (en) * 2019-12-05 2020-04-07 北京天融信网络安全技术有限公司 Adjusting method and device for retrieval
CN110968762B (en) * 2019-12-05 2023-07-18 北京天融信网络安全技术有限公司 Adjustment method and device for retrieval
CN111737264A (en) * 2020-07-20 2020-10-02 智者四海(北京)技术有限公司 Information processing method and system
CN112765171A (en) * 2021-01-12 2021-05-07 湖北宸威玺链信息技术有限公司 Optimization algorithm for multi-field combined index access of block chain data uplink
CN112783835A (en) * 2021-03-11 2021-05-11 百果园技术(新加坡)有限公司 Index management method and device and electronic equipment
CN112783835B (en) * 2021-03-11 2024-06-04 百果园技术(新加坡)有限公司 Index management method and device and electronic equipment
WO2023160115A1 (en) * 2022-02-28 2023-08-31 华为技术有限公司 Key-value pair retrieving method and apparatus, and storage medium
CN117555906A (en) * 2024-01-12 2024-02-13 腾讯科技(深圳)有限公司 Data processing method, device, electronic equipment and storage medium
CN117555906B (en) * 2024-01-12 2024-04-05 腾讯科技(深圳)有限公司 Data processing method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN102722531B (en) 2014-04-16

Similar Documents

Publication Publication Date Title
CN102722531B (en) Query method based on regional bitmap indexes in cloud environment
CN105589951B (en) A kind of mass remote sensing image meta-data distribution formula storage method and parallel query method
US20120011144A1 (en) Aggregation in parallel computation environments with shared memory
US11093473B2 (en) Hierarchical tree data structures and uses thereof
CN102982103A (en) On-line analytical processing (OLAP) massive multidimensional data dimension storage method
CN102867066B (en) Data Transform Device and data summarization method
Ji et al. Inverted grid-based knn query processing with mapreduce
EP2469423B1 (en) Aggregation in parallel computation environments with shared memory
CN106599040A (en) Layered indexing method and search method for cloud storage
CN108009265B (en) Spatial data indexing method in cloud computing environment
EP3678032A1 (en) Computer implemented methods and systems for improved data retrieval
CN104391908B (en) Multiple key indexing means based on local sensitivity Hash on a kind of figure
CN105117442A (en) Probability based big data query method
CN100530192C (en) Text searching method and device
CN113535788A (en) Retrieval method, system, equipment and medium for marine environment data
CN105243081A (en) Formal concept based file system directory structure organization method
CN105550332A (en) Dual-layer index structure based origin graph query method
CN103049555A (en) Dynamic hierarchical integrated data accessing method capable of guaranteeing semantic correctness
Ji et al. Scalable nearest neighbor query processing based on inverted grid index
CN103218433A (en) Method and module for managing metadata applied to random access
Xun et al. Parallel spatial index algorithm based on Hilbert partition
CN107273443B (en) Mixed indexing method based on metadata of big data model
GB2609831A (en) Multi-value primary keys for plurality of unique identifiers of entities
CN106055690A (en) Method for carrying out rapid retrieval and acquiring data features on basis of attribute matching
EP3364314B1 (en) Methods and systems for indexing using indexlets

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140416

Termination date: 20170517

CF01 Termination of patent right due to non-payment of annual fee