CN108287868B - A kind of data base querying, data block division methods and device - Google Patents

A kind of data base querying, data block division methods and device Download PDF

Info

Publication number
CN108287868B
CN108287868B CN201711378123.9A CN201711378123A CN108287868B CN 108287868 B CN108287868 B CN 108287868B CN 201711378123 A CN201711378123 A CN 201711378123A CN 108287868 B CN108287868 B CN 108287868B
Authority
CN
China
Prior art keywords
vector
tuple
query
data block
query unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711378123.9A
Other languages
Chinese (zh)
Other versions
CN108287868A (en
Inventor
王继业
曾楠
孙乔
张春光
邓卜侨
孙雷
王晋雄
付兰梅
崔伟
刘炜
王思宁
冷曼
赵蕾
李华勤
曲传哲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Beijing Guodiantong Network Technology Co Ltd
Beijing China Power Information Technology Co Ltd
Original Assignee
State Grid Corp of China SGCC
Beijing Guodiantong Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Beijing Guodiantong Network Technology Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN201711378123.9A priority Critical patent/CN108287868B/en
Publication of CN108287868A publication Critical patent/CN108287868A/en
Application granted granted Critical
Publication of CN108287868B publication Critical patent/CN108287868B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data base querying, data block division methods and device, the data base query method includes: to be compared the querying condition that user sends with each query unit in query unit set, generates query vector according to comparison result;The query vector is carried out and operated with vector corresponding to each data block in the database respectively, vector is chosen according to simultaneously operating result;According to the querying condition, is inquired in the data block corresponding to the vector of selection, obtain query result;Wherein, the query unit set is to be composed of in advance the query unit decomposited in inquiry workload.Realize that is more refined skips, and improves search efficiency, and the carrying cost paid is smaller when using the present invention in order to inquire.

Description

A kind of data base querying, data block division methods and device
Technical field
The present invention relates to database technical field, a kind of data base querying, data block division methods and device are particularly related to.
Background technique
With the continuous reduction for the query latency that the growth of data volume and user require, the query processor of database faces Huge challenge, it needs near-real-time to handle a large amount of data in time.In order to improve the speed of data processing, One of them important technology is to try every possible means to neglect scanning that is some and inquiring unrelated data, improves query processing speed. By safeguarding that some descriptive informations (i.e. metamessage), query processor can be ignored to the record for including in each data block The scanning of certain data blocks, if these metamessages show that this data block does not include any and related data of inquiry.But It is that can veritably ignore and (also referred to as skip) partial data, the piecemeal (also referred to as subregion) and querying condition dependent on data The matching degree of (filter condition).
A kind of querying method for supporting the technology of skipping is the database table based on horizontal division at present: in the database can be with Use scope (Range) or Hash (Hash) data partition method realize load balancing.Namely data are by division Afterwards, we can safeguard the statistical information for the data block that division obtains, such as maximum value, minimum value, the note of certain inquiry fields Quantity etc. is recorded, using these statistical informations, query processor can ignore some data blocks, without being scanned.For example, working as Querying condition is " time >=time≤20170105 20170101and ", then the value of " time " field is in this range Except data block, so that it may skipped and (ignored), do not have to scanning.This method is accomplished that the data for comparing coarseness are drawn Point, it cannot achieve skipping of more refining.
Another kind support the to skip querying method of technology is the method based on embodied spillover scheme: so-called materialization view Figure, is that the result of inquiry is saved, when inquiring next next time, if querying condition is just the same, directly previous Result return, without execute again one time inquiry, accelerate inquiry processing.If the inquiry that next time comes, if inquiry item Part and the querying condition of existing inquiry (establishing that inquiry of Materialized View) are different, but according to judgement, new inquiry Part of records can be selected from Materialized View and forms query result, and Materialized View is still useful.Embodied spillover, When being exactly that new inquiry comes, suitable Materialized View is selected, directly group is that answer or filter out from existing Materialized View is answered Case.But Materialized View needs to pay sizable carrying cost, because each inquiry for needing materialization requires storage sky Between.
Summary of the invention
In view of this, it is an object of the invention to propose a kind of data base querying, data block division methods and device, so as to It realizes that is more refined skips when inquiry, improves search efficiency, and the carrying cost paid is smaller.
A kind of data base query method is provided based on the above-mentioned purpose present invention, comprising:
The querying condition that user sends is compared with each query unit in query unit set, it is raw according to comparison result At query vector;
The query vector is carried out and operated with vector corresponding to each data block in the database respectively, according to simultaneously Operating result chooses vector;
According to the querying condition, is inquired in the data block corresponding to the vector of selection, obtain query result;
Wherein, the query unit set is to be composed of in advance the query unit decomposited in inquiry workload 's.
Wherein, the element in the query vector and the query unit in the query unit set correspond, described The value of element in query vector, according to the querying condition whether include query unit corresponding to the element comparison knot It is decisive and resolute fixed.
Wherein, the basis and operating result selection vector, specifically:
Choose the vector that the vector element for executing and obtaining after operating with the query vector is 1.
The present invention also provides a kind of data block division methods, comprising:
For each tuple in database, tuple vector corresponding with the tuple is generated;Wherein, in the tuple vector Element and query unit set in query unit correspond, and element sequence with the query vector in element sort Unanimously;The value of element in the tuple vector, according to the tuple whether meet the element corresponding to query unit feelings Condition determines;
The identical tuple of tuple vector is merged into same data block, and the data block is corresponding with the tuple vector;
Wherein, the query unit set is to be composed of in advance the query unit decomposited in inquiry workload 's.
Further, it is described the identical tuple of tuple vector is merged into same data block after, further includes:
Similar tuple vector is divided into same vector group;
Data block corresponding to tuple vector each in same vector group is merged;Wherein, the data block institute after merging Corresponding vector is the vector for carrying out and obtaining after operating between the similar tuple vector.
Wherein, described that similar tuple vector is divided into same vector group, it specifically includes:
Each tuple vector is individually divided into a Vector Groups, the Vector Groups obtained as initial division;
Vector Groups at least once are carried out to merge;In a Vector Groups merging process:
Whole cost index C (P) according to current Vector Groups merges similar two Vector Groups, so that similar tuple Vector is divided into same vector group;Wherein, C (P) two is calculated according to the following formula:
Wherein, n indicates the quantity of current Vector Groups, C (Pi) one calculate according to the following formula:
In formula one, d is the quantity of query unit in query unit set, | Pi| indicate PiIn tuple vector field homoemorphism, wj It is FjThe quantity of related querying condition, FjFor j-th of query unit in the query unit set,For u (Pi) J-th bit, in PiAny one corresponding tuple, does not comply with FjWhen value be 0, be otherwise 1;Wherein, PiIndicate the I Vector Groups.
The present invention also provides a kind of database inquiry devices, comprising:
In query vector generation module, querying condition for sending user and query unit set each query unit into Row compares, and generates query vector according to comparison result;
Vector choose module, for by the query vector respectively with vector corresponding to each data block in the database It carries out and operates, vector is chosen according to simultaneously operating result;
Data inquiry module, for being looked into the data block corresponding to the vector of selection according to the querying condition It askes, obtains query result.
Wherein, the element in the query vector and the query unit in the query unit set correspond, described The value of element in query vector, according to the querying condition whether include query unit corresponding to the element comparison knot It is decisive and resolute fixed.
Preferably, the vector choose module be specifically used for by the query vector respectively with each data in the database Vector corresponding to block is carried out and is operated, and chooses the vector that the element for executing and obtaining after operating with the query vector is 1.
Further, described device further include:
Data block division module, for generating tuple corresponding with the tuple for each tuple in the database Vector;The identical tuple of tuple vector is merged into same data block, and the data block is corresponding with the tuple vector;Its In, the query unit in element and the query unit set in the tuple vector corresponds, and element sequence and institute The element sequence stated in query vector is consistent;Whether the value of element in the tuple vector, meet the member according to the tuple The case where query unit corresponding to element, determines.
Further, the data block division module is also used to for similar tuple vector being divided into same vector group;It will Data block corresponding to each tuple vector merges in same vector group;Wherein, vector corresponding to the data block after merging The vector obtained after carrying out and operate between the similar tuple vector.
The present invention also provides a kind of data block dividing devices, including above-mentioned data block division module.
In the technical solution of the embodiment of the present invention, according to the decomposition result to inquiry workload, and pass through optimisation technique (such as clustering algorithm) is automatically performed tuple to the mapping relations of data block, and establishes vector corresponding to data block in order to make For the index of subsequent query, compared with prior art simple range partition or hash partition, or artificially formulation data block point Area's strategy carries out data block division, available more careful data block, and the division of data block is more reasonable, is more convenient for Skipping for fining is realized on the basis of the subsequent data block divided based on this, improves search efficiency.
Detailed description of the invention
Fig. 1 is a kind of flow chart of data block division methods provided in an embodiment of the present invention;
Fig. 2 is that a kind of corresponding data block of tuple vector by similar tuple vector provided in an embodiment of the present invention carries out Combined method flow diagram;
Fig. 3 is a kind of method flow diagram for merging similar two Vector Groups provided in an embodiment of the present invention;
Fig. 4 is a kind of flow chart of data base query method provided in an embodiment of the present invention;
Fig. 5 is a kind of internal structure chart of database inquiry device provided in an embodiment of the present invention.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference Attached drawing, the present invention is described in more detail.
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one It is a ", " described " and "the" may also comprise plural form.It is to be further understood that when we claim element to be " connected " or " coupling Connect " to another element when, it can be directly connected or coupled to other elements, or there may also be intermediary elements.In addition, this In " connection " or " coupling " that uses may include being wirelessly connected or wireless coupling.Wording "and/or" used herein includes one A or more associated whole for listing item or any cell and all combination.
It should be noted that all statements for using " first " and " second " are for differentiation two in the embodiment of the present invention The non-equal entity of a same names or non-equal parameter, it is seen that " first " " second " only for the convenience of statement, does not answer It is interpreted as the restriction to the embodiment of the present invention, subsequent embodiment no longer illustrates this one by one.
In technical solution of the present invention, first data block is carried out according to the decomposition result to inquiry workload automatically and drawn Point, detailed process is as shown in Figure 1, include the following steps:
Step S100: inquiry workload is analyzed, query unit set is obtained.
Inquiry workload (or inquiry workload) is a set being made of inquiry, and a Database Systems are past It is past typical inquiry workload to be constructed by analyzing journal file.Each inquiry in workload is inquired, Comprising a querying condition, each tuple (record) of database can determine this yuan after the assessment of this querying condition Whether group meets the querying condition.For example, query filter condition is " age >=30 ", then the only age is more than or equal to 30 Tuple meets querying condition, and tuple of the age less than 30 years old does not meet querying condition.Herein, it will be assumed that querying condition is to look into The conjunction expression of unit is ask, includes a predicate in a query unit;I.e. querying condition is by least one query unit (predicate) By " and " logical operation constitute, such as querying condition " age >=30 and gender=' male ' " by two query units (predicate) is constituted, be respectively " age >=30 " and " gender=' male ' ".
Workload analysis is inquired, exactly these conjunction expressions are extracted, for example some inquiry workload is by four Querying condition is constituted, and uses Q1, Q2, Q3 and Q4 respectively, such as:
Q1: product=' clothes ';
Q2: product in (' clothes ', ' shirt ') and sales volume > 320,000 yuan;
Q3: product=' shirt ' and sales volume > 210,000 yuan and client age > 30;
Q4: client age > 30;
By frequent item set algorithm, the frequent degree of each query unit (predicate) in inquiry workload is calculated;In turn, The high K query unit (predicate) of frequent degree is formed into a query unit set, wherein K be one can be by this field skill The parameter that art personnel specify.
It more preferably, can also be to the querying condition with inclusion relation before forming query unit set in this step It is further to be merged:
A usual querying condition generally comprises multiple queries unit (predicate), such as " product=' shirt ' and sales volume > 21 (Wan Yuan) " include two query units (predicate), be respectively " product=' shirt ' ", " sales volume > 21 (Wan Yuan) ".More preferably, We can be further analyzed single query unit (predicate), analyze a kind of delicate packet between query unit Containing relationship, for example the set of records ends of " sales volume > 21 (Wan Yuan) " is the set of records ends comprising " sales volume > 32 (Wan Yuan) ".This is with regard to class It is similar to, the age is greater than 10 years old crowd, contains crowd of the age greater than 30 years old.Above-mentioned inquiry workload, by inquiry After the inclusion relation of unit (predicate) is analyzed, following improved representation is formd:
Q1: product=' clothes ' and product in (' clothes ', ' shirt ');
Q2: product in (' clothes ', ' shirt ') and sales volume > 320,000 yuan and sales volume > 210,000 yuan;
Q3: product=' shirt ' and sales volume > 21 (Wan Yuan) and client age > 30 and product in (' clothes ', ' lining Shirt ');
Q4: client age > 30;
Wherein, add bottom scribing line is increased predicate, but increases these predicates, does not change containing for original querying condition Justice, but previously described inclusion relation to expressing, be conducive to the more acurrate each query unit (predicate) of statistics Frequency.
On this basis, it by frequent item set algorithm, calculates each in the inquiry workload by inclusion relation analysis The frequent degree of query unit (predicate);In turn, the high K query unit (predicate) of frequent degree is formed into a query unit Set.That is, it covers the minimum K query unit (predicate) in inquiry workload.
Step S101: for each tuple in database, tuple vector corresponding with the tuple is generated.
In this step, the entire table of database is scanned, after every scanning obtains a tuple (record), for looking into Each query unit in unit set is ask, judges whether the tuple meets the query unit, and record judging result;For example, The tuple meets the query unit, then recording judging result is 0;Otherwise, record judging result is 1.
For example, including 6 tuples as shown in Table 1 below in database, serial number (id) is respectively 102~107.These yuan Group is made of several fields, is id, friendship incident time, the product category of sale, sales volume of tuple etc. respectively.
Table 1
Description id Time Product Client age Sales volume
Tuple 1 102 09:15:00 Shoes 35 15
Tuple 2 103 09:16:00 Shoes 20 22
Tuple 3 104 09:17:00 Cap 20 15
Tuple 4 105 09:31:00 Clothes 36 10
Tuple 5 106 09:34:00 Cap 36 23
Tuple 6 107 09:35:00 Shirt 29 15
Include 3 query units as shown in table 2 below in query unit set:
Table 2
Code name Query unit Frequency
F1 Product in (' clothes ', ' shirt ') 4
F2 Client age > 30 2
F3 Sales volume > 21 (Wan Yuan) 2
Above-mentioned tuple 1 is filtered with above-mentioned query unit F1, F2, F3 respectively, that is to say, that in query unit set Each query unit, judge whether tuple 1 meets the query unit, and record judging result;For example, tuple 1 does not meet F1 (0), meet F2 (1), do not meet F3 (0), the judging result obtained from records respectively are as follows: 0,1,0.It, will for each tuple After whether the tuple meets the judging result composition vector of each query unit of query unit set, obtain the tuple of the tuple to Amount.For example, table 3 shows the tuple vector of 1~tuple of tuple 6.
Table 3
That is, the element in the tuple vector of tuple is corresponded with the query unit in query unit set , and the value of the element in tuple vector, according to the tuple whether meet the element corresponding to query unit judgement knot It is decisive and resolute fixed.
Step S102: the identical tuple of tuple vector is merged into same data block.
In this step, the identical tuple of tuple vector can be incorporated in same data block, and the data block be with The tuple vector is corresponding;And then the quantity of tuple in data block corresponding to the tuple vector can also be counted, formed < Tuple vector, tuple quantity > right.In the citing shown in above-mentioned table 3, tuple in data block corresponding to all tuple vectors Quantity be all 1.
More preferably, it is the quantity of reduction tuple vector, that is, reduces the quantity of the data block of division, saves maintenance data The space of block management information can also be performed following steps S103 and merge to data block.
Step S103: the corresponding data block of tuple vector of similar tuple vector is merged, specific method process As described in Figure 2, include the following steps:
Step S201: each tuple vector is individually divided into a Vector Groups, the vector obtained as initial division Group.
For example, there is m tuple vector, then P is respectively divided in m tuple vector1,P2,…,PmVector Groups, wherein Pi Indicate i-th of Vector Groups, i-th of tuple vector is divided into PiIn.For example the tuple vector in above-mentioned table 3 can be respectively divided P1,P2,…,P6In Vector Groups.
Above-mentioned<tuple vector, tuple quantity>centering tuple quantity determine the tuple vector in the Vector Groups of each division Corresponding number of tuples, while also determining the number for dividing Vector Groups.Such as: the tuple of 6 tuple vectors in above-mentioned example Quantity is 1, i.e., the vector of tuple is not identical, therefore is first divided into 6 Vector Groups, includes 1 member in each group Group vector.Assuming that the corresponding tuple quantity of tuple vector 2 is 2, and corresponding tuple is respectively tuple 2 and 3, then the two are first Group both corresponds to the same Vector Groups.
Step S202: Vector Groups at least once are carried out and are merged.In a Vector Groups merging process, according to current Vector Groups Whole cost index C (P) similar two Vector Groups are merged so that similar tuple vector is divided into same vector group In, specific division methods process, as shown in figure 3, including the following steps:
Step S301: the weighted value of each query unit is calculated.
Specifically, the weighted value w of j-th of query unitjIt is FjThe quantity of related querying condition, wherein FjFor inquiry J-th of query unit in unit set.Such as in the example above, F2" client age > 30 " relate to querying condition Q3 and Q4, So w2It is 2.
Step S302: each Vector Groups are directed to, cut the Vector Groups with each query unit.
Specifically, F is usedjCut PiProcess are as follows: if PiAny one corresponding tuple, does not comply with Fj, thus really Surely result u (P is cuti) j-th bit, i.e. u (Pi)jIt is 0;It otherwise, is 1.
Step S303: merging similar two Vector Groups, so that the variation of entirety cost index C (P) is maximum.
Specifically, for PiDefine an important cost index C (Pi) shown in following formula 1, it indicates PiIn tuple How many tuple vector corresponds to, and can be ignored when executing all inquiries.
In formula one, d is the quantity of query unit in query unit set, | Pi| indicate PiIn tuple vector field homoemorphism;It should The meaning of formula is, as u (Pi)jWhen being 1,1-1=0, then can ignore without tuple, when for 0,1-0=1, So the tuple can be ignored.Shown in the following formula two of calculating of the whole cost index C (P) of current Vector Groups:
In formula two, n indicates the quantity of current Vector Groups.Merge best similar two Vector Groups;It is so-called close, After exactly incorporating the two Vector Groups, the variation of C (P) is maximum.
Merging two Vector Groups, it, can above steps may be repeated multiple times S302-S303 after obtaining new Vector Groups.To logical Tuple vector can be divided into a group by a group using the Agglomerative Hierarchical Clustering method of standard by crossing whole cost index, thus Also just tuple corresponding to tuple vector is also divided into a group by a group.Member corresponding to tuple vector in one Vector Groups Group can be divided into a data block.
The vector obtained after tuple vector each in Vector Groups corresponding to data block is executed and operated, as the data block Corresponding data block vector.Such as each tuple in above-mentioned table 3 can be divided into data corresponding to Vector Groups shown in table 4 In block, data block vector corresponding to i-th of data block is u (Pi)。
Table 4
Such as shown in table 5, P1In corresponding Vector Groups, the tuple vector of tuple 1 is (0,1,0), the tuple vector of tuple 4 For (1,1,0), the two and (Union) operating result be (1,1,0), so P1Vector corresponding to corresponding data block is u (P1)=(1,1,0).
Table 5
Record1 0 1 0
Record4 1 1 0
Union operation 0union 1 is 1 1union 1 is 1 0union 0 is 0
u(P1) 1 1 0
After database divides, the necessary management information of the data block of division can recorde, such as data block Management information may include: the build header of data block, block tail tail, descriptive information etc.;Wherein, descriptive information can be with Including information such as record quantity, data block sizes.
Basis provided in an embodiment of the present invention carries out data block division methods to the decomposition result of inquiry workload automatically, Artificially formulation data block partition strategy progress data block division compared with prior art, available more careful data block, And the division of data block is more reasonable, and skipping for fining is realized during subsequent query of being more convenient for, and improves search efficiency.
Based on the data block divided in advance according to the above method, a kind of data base query method provided in an embodiment of the present invention Process, as shown in figure 4, including the following steps:
Step S401: the querying condition that user sends is compared with each query unit in query unit set, according to Comparison result generates query vector.
In this step, the query unit one in element and the query unit set in the query vector of generation is a pair of It answers, and the element sequence in query vector is consistent with the element sequence in above-mentioned tuple vector.Element in the query vector Value, according to the querying condition whether include query unit corresponding to the element comparison result determine.For example, inquiry Condition includes query unit corresponding to certain element, then the element value 0, and otherwise value is 1.Assuming that the inquiry item of some inquiry Part be " product in (' clothes ', ' shirt ') and client age > 30 ", then we inquire F1, F2, F3, this querying condition Comprising F1, F2, we obtain a query vector (0,0,1), the corresponding F1 and F2 of the first two 0, the last one 1 corresponding F3.
Step S402: the query vector is carried out and is grasped with vector corresponding to each data block in the database respectively Make, vector is chosen according to simultaneously operating result.
Specifically, the query vector is carried out and is operated with vector corresponding to each data block in the database respectively Afterwards, it determines and each element that operating result obtains is 1 vector, and choose the vector.
For example, query vector (0,0,1) and above-mentioned u (P1)、u(P2)、u(P3) simultaneously (Union) operation is done, for u (P1) (0,0,1) and (1,1,0) union operation be (1,1,1), for u (P2) (0,0,1) and (0,1,1) union Operation is (0,1,1), for u (P3) (0,0,1) and (1,0,0) union operation be (1,0,1).Only with u (P1) All bit that union operates obtained vector are 1, therefore, select u (P1)。
Step S403: it according to the querying condition, is inquired, is looked into the data block corresponding to the vector of selection Ask result.
For example, selecting u (P in above-mentioned steps1) after, then it only need to be in u (P1) corresponding to data block in inquired, And return to query result.And u (P2) and u (P3) corresponding to data block can then skip, improve search efficiency.
Based on above-mentioned method, a kind of database inquiry device provided in an embodiment of the present invention, structure is as shown in figure 5, packet Include: query vector generation module 501, vector choose module 502, data inquiry module 503.
Each query unit in the querying condition and query unit set that query vector generation module 501 is used to send user It is compared, query vector is generated according to comparison result;Wherein, the element in the query vector and the query unit set In query unit correspond, whether the value of the element in the query vector, include this yuan according to the querying condition The comparison result of query unit corresponding to element determines.
Vector choose module 502 be used for by the query vector respectively with corresponding to each data block in the database to Amount is carried out and is operated, and chooses vector according to simultaneously operating result;Specifically, vector is chosen module 502 and is specifically used for the inquiry Vector is carried out and is operated with vector corresponding to each data block in the database respectively, is chosen and is executed simultaneously with the query vector The element obtained after operation is 1 vector.
Data inquiry module 503 is used to be carried out in the data block corresponding to the vector of selection according to the querying condition Inquiry, obtains query result.
Further, a kind of database inquiry device provided in an embodiment of the present invention can also include: data block division module 504。
Data block division module 504 is used to generate member corresponding with the tuple for each tuple in the database Group vector;The identical tuple of tuple vector is merged into same data block;Wherein, the element in the tuple vector with it is described Query unit in query unit set corresponds, and element sequence is consistent with the element sequence in the query vector;Institute The value for stating element in tuple vector, according to the tuple whether meet the element corresponding to query unit the case where decision.
Further, the data block division module 504 is also used to for similar tuple vector being divided into same vector group; Data block corresponding to tuple vector each in same vector group is merged;Wherein, corresponding to the data block after merging to Amount is the vector for carrying out and obtaining after operating between the similar tuple vector.Data block division module 504 is by tuple vector The specific method for carrying out Vector Groups division can refer to the method in each step of process shown in above-mentioned Fig. 2, Fig. 3, no longer superfluous herein It states.
Certainly, above-mentioned data block division module also may be included in data block dividing device, and independently of above-mentioned data Except library inquiry device.
In the technical solution of the embodiment of the present invention, according to the decomposition result to inquiry workload, and pass through optimisation technique (such as clustering algorithm) is automatically performed tuple to the mapping relations of data block, and establishes vector corresponding to data block in order to make For the index of subsequent query, compared with prior art simple range partition or hash partition, or artificially formulation data block point Area's strategy carries out data block division, available more careful data block, and the division of data block is more reasonable, is more convenient for Skipping for fining is realized on the basis of the subsequent data block divided based on this, improves search efficiency.
Those skilled in the art of the present technique are appreciated that the present invention includes being related to for executing in operation described herein One or more equipment.These equipment can specially design and manufacture for required purpose, or also may include general Known device in computer.These equipment have the computer program being stored in it, these computer programs are selectively Activation or reconstruct.Such computer program can be stored in equipment (for example, computer) readable medium or be stored in It e-command and is coupled in any kind of medium of bus respectively suitable for storage, the computer-readable medium includes but not Be limited to any kind of disk (including floppy disk, hard disk, CD, CD-ROM and magneto-optic disk), ROM (Read-Only Memory, only Read memory), RAM (Random Access Memory, immediately memory), EPROM (Erasable Programmable Read-Only Memory, Erarable Programmable Read only Memory), EEPROM (Electrically Erasable Programmable Read-Only Memory, Electrically Erasable Programmable Read-Only Memory), flash memory, magnetic card or light card Piece.It is, readable medium includes by equipment (for example, computer) with any Jie for the form storage or transmission information that can be read Matter.
Those skilled in the art of the present technique be appreciated that can be realized with computer program instructions these structure charts and/or The combination of each frame and these structure charts and/or the frame in block diagram and/or flow graph in block diagram and/or flow graph.This technology neck Field technique personnel be appreciated that these computer program instructions can be supplied to general purpose computer, special purpose computer or other The processor of programmable data processing method is realized, to pass through the processing of computer or other programmable data processing methods The scheme specified in frame or multiple frames of the device to execute structure chart and/or block diagram and/or flow graph disclosed by the invention.
Those skilled in the art of the present technique have been appreciated that in the present invention the various operations crossed by discussion, method, in process Steps, measures, and schemes can be replaced, changed, combined or be deleted.Further, each with having been crossed by discussion in the present invention Kind of operation, method, other steps, measures, and schemes in process may also be alternated, changed, rearranged, decomposed, combined or deleted. Further, in the prior art to have and the step in various operations, method disclosed in the present invention, process, measure, scheme It may also be alternated, changed, rearranged, decomposed, combined or deleted.
It should be understood by those ordinary skilled in the art that: the discussion of any of the above embodiment is exemplary only, not It is intended to imply that the scope of the present disclosure (including claim) is limited to these examples;Under thinking of the invention, above embodiments Or can also be combined between the technical characteristic in different embodiments, step can be realized with random order, and be existed such as Many other variations of the upper different aspect of the invention, for simplicity, they are not provided in details.Therefore, it is all Within the spirit and principles in the present invention, any omission, modification, equivalent replacement, improvement for being made etc. be should be included in of the invention Within protection scope.

Claims (16)

1. a kind of data base query method, comprising:
The querying condition that user sends is compared with each query unit in query unit set, is looked into according to comparison result generation Ask vector;
The query vector is carried out and is operated with vector corresponding to each data block in the database respectively, according to and operate As a result vector is chosen;
According to the querying condition, is inquired in the data block corresponding to the vector of selection, obtain query result;
Wherein, the query unit set is to be composed of in advance the query unit decomposited in inquiry workload.
2. the method according to claim 1, wherein element and the cargo tracer metaset in the query vector Query unit in conjunction corresponds, and whether the value of the element in the query vector, include this according to the querying condition The comparison result of query unit corresponding to element determines.
3. the method according to claim 1, wherein the basis and operating result choose vector, specifically:
Choose the vector that the vector element for executing and obtaining after operating with the query vector is 1.
4. according to the method described in claim 2, it is characterized in that, the data block in the database is pre- according to the following method First divide:
For each tuple in the database, tuple vector corresponding with the tuple is generated;Wherein, in the tuple vector Element and the query unit set in query unit correspond, and element sequence with the query vector in element Sequence is consistent;The value of element in the tuple vector, according to the tuple whether meet the element corresponding to query unit The case where determine;
The identical tuple of tuple vector is merged into same data block, and the data block is corresponding with the tuple vector.
5. according to the method described in claim 4, it is characterized in that, it is described the identical tuple of tuple vector is merged into it is same After data block, further includes:
Similar tuple vector is divided into same vector group;
Data block corresponding to tuple vector each in same vector group is merged;Wherein, corresponding to the data block after merging Vector be the vector for carrying out and being obtained after operating between the similar tuple vector.
6. according to the method described in claim 5, it is characterized in that, described be divided into same vector group for similar tuple vector In, it specifically includes:
Each tuple vector is individually divided into a Vector Groups, the Vector Groups obtained as initial division;
Vector Groups at least once are carried out to merge;In a Vector Groups merging process:
Whole cost index C (P) according to current Vector Groups merges similar two Vector Groups, so that similar tuple vector It is divided into same vector group;Wherein, C (P) two is calculated according to the following formula:
Wherein, n indicates the quantity of current Vector Groups, C (Pi) one calculate according to the following formula:
In formula one, d is the quantity of query unit in query unit set, | Pi| indicate PiIn tuple vector field homoemorphism, wjIt is Fj The quantity of related querying condition, FjFor j-th of query unit in the query unit set, u (Pi)jFor u (Pi) J bit, in PiAny one corresponding tuple, does not comply with FjWhen value be 0, be otherwise 1;Wherein, PiIt indicates i-th Vector Groups.
7. a kind of data block division methods, comprising:
For each tuple in database, tuple vector corresponding with the tuple is generated;Wherein, the member in the tuple vector Element is corresponded with the query unit in query unit set, and element sequence is consistent with the element sequence in query vector;Institute The value for stating element in tuple vector, according to the tuple whether meet the element corresponding to query unit the case where decision;
The identical tuple of tuple vector is merged into same data block, and the data block is corresponding with the tuple vector;
Wherein, the query unit set is to be composed of in advance the query unit decomposited in inquiry workload;Institute Stating query vector is that the querying condition for sending user is compared with each query unit in query unit set, is tied according to comparing What fruit generated.
8. the method according to the description of claim 7 is characterized in that it is described the identical tuple of tuple vector is merged into it is same After data block, further includes:
Similar tuple vector is divided into same vector group;
Data block corresponding to tuple vector each in same vector group is merged;Wherein, corresponding to the data block after merging Vector be the vector for carrying out and being obtained after operating between the similar tuple vector.
9. according to the method described in claim 8, it is characterized in that, described be divided into same vector group for similar tuple vector In, it specifically includes:
Each tuple vector is individually divided into a Vector Groups, the Vector Groups obtained as initial division;
Vector Groups at least once are carried out to merge;In a Vector Groups merging process:
Whole cost index C (P) according to current Vector Groups merges similar two Vector Groups, so that similar tuple vector It is divided into same vector group;Wherein, C (P) two is calculated according to the following formula:
Wherein, n indicates the quantity of current Vector Groups, C (Pi) one calculate according to the following formula:
In formula one, d is the quantity of query unit in query unit set, | Pi| indicate PiIn tuple vector field homoemorphism, wjIt is Fj The quantity of related querying condition, FjFor j-th of query unit in the query unit set, u (Pi)jFor u (Pi) J bit, in PiAny one corresponding tuple, does not comply with FjWhen value be 0, be otherwise 1;Wherein, PiIt indicates i-th Vector Groups.
10. a kind of database inquiry device, comprising:
Query vector generation module, the querying condition for sending user compare with each query unit in query unit set Compared with according to comparison result generation query vector;
Vector chooses module, for carrying out the query vector with vector corresponding to each data block in the database respectively And operate, vector is chosen according to simultaneously operating result;
Data inquiry module, for being inquired, being obtained in the data block corresponding to the vector of selection according to the querying condition To query result;Wherein, the query unit set be combined in advance by the query unit decomposited in inquiry workload and At.
11. device according to claim 10, which is characterized in that element and the query unit in the query vector Query unit in set corresponds, the value of the element in the query vector, according to the querying condition whether include The comparison result of query unit corresponding to the element determines.
12. device according to claim 10, which is characterized in that
The vector is chosen module and is specifically used for the query vector respectively and corresponding to each data block in the database Vector is carried out and is operated, and chooses the vector that the element for executing and obtaining after operating with the query vector is 1.
13. device according to claim 11, which is characterized in that further include:
Data block division module, for generating tuple vector corresponding with the tuple for each tuple in the database; The identical tuple of tuple vector is merged into same data block, and the data block is corresponding with the tuple vector;Wherein, described Element in tuple vector and the query unit in the query unit set correspond, and element sequence with it is described inquire to Element sequence in amount is consistent;The value of element in the tuple vector, whether the element is met according to the tuple corresponding to Query unit the case where determine.
14. device according to claim 13, which is characterized in that
The data block division module is also used to for similar tuple vector being divided into same vector group;It will be in same vector group Data block corresponding to each tuple vector merges;Wherein, vector corresponding to the data block after merging is described similar The vector for carrying out between tuple vector and being obtained after operating.
15. a kind of data block dividing device, comprising:
Data block division module, for generating tuple vector corresponding with the tuple for each tuple in database;It will be first The identical tuple of group vector is merged into same data block, and the data block is corresponding with the tuple vector;Wherein, the tuple The query unit in element and query unit set in vector corresponds, and element sequence is arranged with the element in query vector Sequence is consistent;The value of element in the tuple vector, according to the tuple whether meet the element corresponding to query unit Situation determines;Wherein, the query unit set is to be composed of in advance the query unit decomposited in inquiry workload 's;The query vector is that the querying condition for sending user is compared with each query unit in query unit set, according to What comparison result generated.
16. device according to claim 15, which is characterized in that
The data block division module is also used to for similar tuple vector being divided into same vector group;It will be in same vector group Data block corresponding to each tuple vector merges;Wherein, vector corresponding to the data block after merging is described similar The vector for carrying out between tuple vector and being obtained after operating.
CN201711378123.9A 2017-12-19 2017-12-19 A kind of data base querying, data block division methods and device Active CN108287868B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711378123.9A CN108287868B (en) 2017-12-19 2017-12-19 A kind of data base querying, data block division methods and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711378123.9A CN108287868B (en) 2017-12-19 2017-12-19 A kind of data base querying, data block division methods and device

Publications (2)

Publication Number Publication Date
CN108287868A CN108287868A (en) 2018-07-17
CN108287868B true CN108287868B (en) 2019-02-26

Family

ID=62832165

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711378123.9A Active CN108287868B (en) 2017-12-19 2017-12-19 A kind of data base querying, data block division methods and device

Country Status (1)

Country Link
CN (1) CN108287868B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109962920B (en) * 2019-03-29 2022-02-08 北京奇艺世纪科技有限公司 Method, device and system for determining split page number
CN111400346A (en) * 2020-03-13 2020-07-10 苏州浪潮智能科技有限公司 Method, equipment, device and medium for improving execution efficiency of database all-in-one machine

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN103207919A (en) * 2013-04-26 2013-07-17 北京亿赞普网络技术有限公司 Method and device for quickly inquiring and calculating MangoDB cluster
CN103336792A (en) * 2013-06-07 2013-10-02 华为技术有限公司 Method and device for data partition
CN105045877A (en) * 2015-07-20 2015-11-11 深圳市深信服电子科技有限公司 Database data fragmentation storage method and apparatus and data query method and apparatus

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105517644B (en) * 2014-03-05 2020-04-21 华为技术有限公司 Data partitioning method and equipment
CN103927331B (en) * 2014-03-21 2017-03-22 珠海多玩信息技术有限公司 Data querying method, data querying device and data querying system
CN104679858B (en) * 2015-02-16 2018-10-09 华为技术有限公司 A kind of method and apparatus of inquiry data
CN104731951B (en) * 2015-03-31 2018-08-07 北京奇艺世纪科技有限公司 A kind of data query method and device
CN105975617A (en) * 2016-05-20 2016-09-28 北京京东尚科信息技术有限公司 Multi-partition-table inquiring and processing method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN103207919A (en) * 2013-04-26 2013-07-17 北京亿赞普网络技术有限公司 Method and device for quickly inquiring and calculating MangoDB cluster
CN103336792A (en) * 2013-06-07 2013-10-02 华为技术有限公司 Method and device for data partition
CN105045877A (en) * 2015-07-20 2015-11-11 深圳市深信服电子科技有限公司 Database data fragmentation storage method and apparatus and data query method and apparatus

Also Published As

Publication number Publication date
CN108287868A (en) 2018-07-17

Similar Documents

Publication Publication Date Title
US11977541B2 (en) Systems and methods for rapid data analysis
US8682923B2 (en) Set-level comparisons in dynamically formed groups
Cormode et al. Approximation algorithms for clustering uncertain data
US7069264B2 (en) Stratified sampling of data in a database system
US6212526B1 (en) Method for apparatus for efficient mining of classification models from databases
US8326869B2 (en) Analysis of object structures such as benefits and provider contracts
CN104361113B (en) A kind of OLAP query optimization method under internal memory flash memory mixing memory module
WO2002075598A1 (en) Methods and system for handling mulitple dimensions in relational databases
KR20020034998A (en) Method and apparatus for populating multiple data marts in a single aggregation process
CN107180093A (en) Information search method and device and ageing inquiry word recognition method and device
CN109582849A (en) A kind of Internet resources intelligent search method of knowledge based map
CN108287868B (en) A kind of data base querying, data block division methods and device
CN102521706A (en) KPI data analysis method and device for the same
CN111125352A (en) Knowledge graph-based associated data visualization data cockpit construction method
US20090100038A1 (en) Information Analysis System
CN104809210B (en) One kind is based on magnanimity data weighting top k querying methods under distributed computing framework
TW201942834A (en) Item recommendation
Li et al. Set predicates in sql: Enabling set-level comparisons for dynamically formed groups
CN115936572A (en) Crop germplasm resource information management method and system
Reynolds et al. A multiobjective GRASP for rule selection
CN105989060A (en) Data management method and device
Tang et al. A multidimensional collaborative filtering fusion approach with dimensionality reduction
CN110766591A (en) Intelligent service management method, device, terminal and storage medium
Zhou et al. Olap on search logs: an infrastructure supporting data-driven applications in search engines
Chettri et al. A comparative study on microaggregation techniques for microdata protection

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20190110

Address after: 100070 the 28 tier of fortune Fortune Plaza, No.1, hang Feng Road, Fengtai District, Beijing.

Applicant after: BEIJING GUODIANTONG NETWORK TECHNOLOGY Co.,Ltd.

Applicant after: State Grid Corporation of China

Address before: 100070 the 28 tier of fortune Fortune Plaza, No.1, hang Feng Road, Fengtai District, Beijing.

Applicant before: BEIJING GUODIANTONG NETWORK TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder

Address after: 100070 the 28 tier of fortune Fortune Plaza, No.1, hang Feng Road, Fengtai District, Beijing.

Co-patentee after: STATE GRID CORPORATION OF CHINA

Patentee after: BEIJING GUODIANTONG NETWORK TECHNOLOGY Co.,Ltd.

Address before: 100070 the 28 tier of fortune Fortune Plaza, No.1, hang Feng Road, Fengtai District, Beijing.

Co-patentee before: State Grid Corporation of China

Patentee before: BEIJING GUODIANTONG NETWORK TECHNOLOGY Co.,Ltd.

CP01 Change in the name or title of a patent holder
TR01 Transfer of patent right

Effective date of registration: 20190605

Address after: 100085 Beijing city Haidian District Qinghe small Camp Road No. 15

Co-patentee after: STATE GRID CORPORATION OF CHINA

Patentee after: BEIJING CHINA POWER INFORMATION TECHNOLOGY Co.,Ltd.

Address before: 100070 the 28 tier of fortune Fortune Plaza, No.1, hang Feng Road, Fengtai District, Beijing.

Co-patentee before: State Grid Corporation of China

Patentee before: BEIJING GUODIANTONG NETWORK TECHNOLOGY Co.,Ltd.

TR01 Transfer of patent right