Summary of the invention
In view of this, it is an object of the invention to propose a kind of data base querying, data block division methods and device, so as to
It realizes that is more refined skips when inquiry, improves search efficiency, and the carrying cost paid is smaller.
A kind of data base query method is provided based on the above-mentioned purpose present invention, comprising:
The querying condition that user sends is compared with each query unit in query unit set, it is raw according to comparison result
At query vector;
The query vector is carried out and operated with vector corresponding to each data block in the database respectively, according to simultaneously
Operating result chooses vector;
According to the querying condition, is inquired in the data block corresponding to the vector of selection, obtain query result;
Wherein, the query unit set is to be composed of in advance the query unit decomposited in inquiry workload
's.
Wherein, the element in the query vector and the query unit in the query unit set correspond, described
The value of element in query vector, according to the querying condition whether include query unit corresponding to the element comparison knot
It is decisive and resolute fixed.
Wherein, the basis and operating result selection vector, specifically:
Choose the vector that the vector element for executing and obtaining after operating with the query vector is 1.
The present invention also provides a kind of data block division methods, comprising:
For each tuple in database, tuple vector corresponding with the tuple is generated;Wherein, in the tuple vector
Element and query unit set in query unit correspond, and element sequence with the query vector in element sort
Unanimously;The value of element in the tuple vector, according to the tuple whether meet the element corresponding to query unit feelings
Condition determines;
The identical tuple of tuple vector is merged into same data block, and the data block is corresponding with the tuple vector;
Wherein, the query unit set is to be composed of in advance the query unit decomposited in inquiry workload
's.
Further, it is described the identical tuple of tuple vector is merged into same data block after, further includes:
Similar tuple vector is divided into same vector group;
Data block corresponding to tuple vector each in same vector group is merged;Wherein, the data block institute after merging
Corresponding vector is the vector for carrying out and obtaining after operating between the similar tuple vector.
Wherein, described that similar tuple vector is divided into same vector group, it specifically includes:
Each tuple vector is individually divided into a Vector Groups, the Vector Groups obtained as initial division;
Vector Groups at least once are carried out to merge;In a Vector Groups merging process:
Whole cost index C (P) according to current Vector Groups merges similar two Vector Groups, so that similar tuple
Vector is divided into same vector group;Wherein, C (P) two is calculated according to the following formula:
Wherein, n indicates the quantity of current Vector Groups, C (Pi) one calculate according to the following formula:
In formula one, d is the quantity of query unit in query unit set, | Pi| indicate PiIn tuple vector field homoemorphism, wj
It is FjThe quantity of related querying condition, FjFor j-th of query unit in the query unit set,For u (Pi)
J-th bit, in PiAny one corresponding tuple, does not comply with FjWhen value be 0, be otherwise 1;Wherein, PiIndicate the
I Vector Groups.
The present invention also provides a kind of database inquiry devices, comprising:
In query vector generation module, querying condition for sending user and query unit set each query unit into
Row compares, and generates query vector according to comparison result;
Vector choose module, for by the query vector respectively with vector corresponding to each data block in the database
It carries out and operates, vector is chosen according to simultaneously operating result;
Data inquiry module, for being looked into the data block corresponding to the vector of selection according to the querying condition
It askes, obtains query result.
Wherein, the element in the query vector and the query unit in the query unit set correspond, described
The value of element in query vector, according to the querying condition whether include query unit corresponding to the element comparison knot
It is decisive and resolute fixed.
Preferably, the vector choose module be specifically used for by the query vector respectively with each data in the database
Vector corresponding to block is carried out and is operated, and chooses the vector that the element for executing and obtaining after operating with the query vector is 1.
Further, described device further include:
Data block division module, for generating tuple corresponding with the tuple for each tuple in the database
Vector;The identical tuple of tuple vector is merged into same data block, and the data block is corresponding with the tuple vector;Its
In, the query unit in element and the query unit set in the tuple vector corresponds, and element sequence and institute
The element sequence stated in query vector is consistent;Whether the value of element in the tuple vector, meet the member according to the tuple
The case where query unit corresponding to element, determines.
Further, the data block division module is also used to for similar tuple vector being divided into same vector group;It will
Data block corresponding to each tuple vector merges in same vector group;Wherein, vector corresponding to the data block after merging
The vector obtained after carrying out and operate between the similar tuple vector.
The present invention also provides a kind of data block dividing devices, including above-mentioned data block division module.
In the technical solution of the embodiment of the present invention, according to the decomposition result to inquiry workload, and pass through optimisation technique
(such as clustering algorithm) is automatically performed tuple to the mapping relations of data block, and establishes vector corresponding to data block in order to make
For the index of subsequent query, compared with prior art simple range partition or hash partition, or artificially formulation data block point
Area's strategy carries out data block division, available more careful data block, and the division of data block is more reasonable, is more convenient for
Skipping for fining is realized on the basis of the subsequent data block divided based on this, improves search efficiency.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference
Attached drawing, the present invention is described in more detail.
The embodiment of the present invention is described below in detail, examples of the embodiments are shown in the accompanying drawings, wherein from beginning to end
Same or similar label indicates same or similar element or element with the same or similar functions.Below with reference to attached
The embodiment of figure description is exemplary, and for explaining only the invention, and is not construed as limiting the claims.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one
It is a ", " described " and "the" may also comprise plural form.It is to be further understood that when we claim element to be " connected " or " coupling
Connect " to another element when, it can be directly connected or coupled to other elements, or there may also be intermediary elements.In addition, this
In " connection " or " coupling " that uses may include being wirelessly connected or wireless coupling.Wording "and/or" used herein includes one
A or more associated whole for listing item or any cell and all combination.
It should be noted that all statements for using " first " and " second " are for differentiation two in the embodiment of the present invention
The non-equal entity of a same names or non-equal parameter, it is seen that " first " " second " only for the convenience of statement, does not answer
It is interpreted as the restriction to the embodiment of the present invention, subsequent embodiment no longer illustrates this one by one.
In technical solution of the present invention, first data block is carried out according to the decomposition result to inquiry workload automatically and drawn
Point, detailed process is as shown in Figure 1, include the following steps:
Step S100: inquiry workload is analyzed, query unit set is obtained.
Inquiry workload (or inquiry workload) is a set being made of inquiry, and a Database Systems are past
It is past typical inquiry workload to be constructed by analyzing journal file.Each inquiry in workload is inquired,
Comprising a querying condition, each tuple (record) of database can determine this yuan after the assessment of this querying condition
Whether group meets the querying condition.For example, query filter condition is " age >=30 ", then the only age is more than or equal to 30
Tuple meets querying condition, and tuple of the age less than 30 years old does not meet querying condition.Herein, it will be assumed that querying condition is to look into
The conjunction expression of unit is ask, includes a predicate in a query unit;I.e. querying condition is by least one query unit (predicate)
By " and " logical operation constitute, such as querying condition " age >=30 and gender=' male ' " by two query units
(predicate) is constituted, be respectively " age >=30 " and " gender=' male ' ".
Workload analysis is inquired, exactly these conjunction expressions are extracted, for example some inquiry workload is by four
Querying condition is constituted, and uses Q1, Q2, Q3 and Q4 respectively, such as:
Q1: product=' clothes ';
Q2: product in (' clothes ', ' shirt ') and sales volume > 320,000 yuan;
Q3: product=' shirt ' and sales volume > 210,000 yuan and client age > 30;
Q4: client age > 30;
By frequent item set algorithm, the frequent degree of each query unit (predicate) in inquiry workload is calculated;In turn,
The high K query unit (predicate) of frequent degree is formed into a query unit set, wherein K be one can be by this field skill
The parameter that art personnel specify.
It more preferably, can also be to the querying condition with inclusion relation before forming query unit set in this step
It is further to be merged:
A usual querying condition generally comprises multiple queries unit (predicate), such as " product=' shirt ' and sales volume >
21 (Wan Yuan) " include two query units (predicate), be respectively " product=' shirt ' ", " sales volume > 21 (Wan Yuan) ".More preferably,
We can be further analyzed single query unit (predicate), analyze a kind of delicate packet between query unit
Containing relationship, for example the set of records ends of " sales volume > 21 (Wan Yuan) " is the set of records ends comprising " sales volume > 32 (Wan Yuan) ".This is with regard to class
It is similar to, the age is greater than 10 years old crowd, contains crowd of the age greater than 30 years old.Above-mentioned inquiry workload, by inquiry
After the inclusion relation of unit (predicate) is analyzed, following improved representation is formd:
Q1: product=' clothes ' and product in (' clothes ', ' shirt ');
Q2: product in (' clothes ', ' shirt ') and sales volume > 320,000 yuan and sales volume > 210,000 yuan;
Q3: product=' shirt ' and sales volume > 21 (Wan Yuan) and client age > 30 and product in (' clothes ', ' lining
Shirt ');
Q4: client age > 30;
Wherein, add bottom scribing line is increased predicate, but increases these predicates, does not change containing for original querying condition
Justice, but previously described inclusion relation to expressing, be conducive to the more acurrate each query unit (predicate) of statistics
Frequency.
On this basis, it by frequent item set algorithm, calculates each in the inquiry workload by inclusion relation analysis
The frequent degree of query unit (predicate);In turn, the high K query unit (predicate) of frequent degree is formed into a query unit
Set.That is, it covers the minimum K query unit (predicate) in inquiry workload.
Step S101: for each tuple in database, tuple vector corresponding with the tuple is generated.
In this step, the entire table of database is scanned, after every scanning obtains a tuple (record), for looking into
Each query unit in unit set is ask, judges whether the tuple meets the query unit, and record judging result;For example,
The tuple meets the query unit, then recording judging result is 0;Otherwise, record judging result is 1.
For example, including 6 tuples as shown in Table 1 below in database, serial number (id) is respectively 102~107.These yuan
Group is made of several fields, is id, friendship incident time, the product category of sale, sales volume of tuple etc. respectively.
Table 1
Description |
id |
Time |
Product |
Client age |
Sales volume |
Tuple 1 |
102 |
09:15:00 |
Shoes |
35 |
15 |
Tuple 2 |
103 |
09:16:00 |
Shoes |
20 |
22 |
Tuple 3 |
104 |
09:17:00 |
Cap |
20 |
15 |
Tuple 4 |
105 |
09:31:00 |
Clothes |
36 |
10 |
Tuple 5 |
106 |
09:34:00 |
Cap |
36 |
23 |
Tuple 6 |
107 |
09:35:00 |
Shirt |
29 |
15 |
Include 3 query units as shown in table 2 below in query unit set:
Table 2
Code name |
Query unit |
Frequency |
F1 |
Product in (' clothes ', ' shirt ') |
4 |
F2 |
Client age > 30 |
2 |
F3 |
Sales volume > 21 (Wan Yuan) |
2 |
Above-mentioned tuple 1 is filtered with above-mentioned query unit F1, F2, F3 respectively, that is to say, that in query unit set
Each query unit, judge whether tuple 1 meets the query unit, and record judging result;For example, tuple 1 does not meet F1
(0), meet F2 (1), do not meet F3 (0), the judging result obtained from records respectively are as follows: 0,1,0.It, will for each tuple
After whether the tuple meets the judging result composition vector of each query unit of query unit set, obtain the tuple of the tuple to
Amount.For example, table 3 shows the tuple vector of 1~tuple of tuple 6.
Table 3
That is, the element in the tuple vector of tuple is corresponded with the query unit in query unit set
, and the value of the element in tuple vector, according to the tuple whether meet the element corresponding to query unit judgement knot
It is decisive and resolute fixed.
Step S102: the identical tuple of tuple vector is merged into same data block.
In this step, the identical tuple of tuple vector can be incorporated in same data block, and the data block be with
The tuple vector is corresponding;And then the quantity of tuple in data block corresponding to the tuple vector can also be counted, formed <
Tuple vector, tuple quantity > right.In the citing shown in above-mentioned table 3, tuple in data block corresponding to all tuple vectors
Quantity be all 1.
More preferably, it is the quantity of reduction tuple vector, that is, reduces the quantity of the data block of division, saves maintenance data
The space of block management information can also be performed following steps S103 and merge to data block.
Step S103: the corresponding data block of tuple vector of similar tuple vector is merged, specific method process
As described in Figure 2, include the following steps:
Step S201: each tuple vector is individually divided into a Vector Groups, the vector obtained as initial division
Group.
For example, there is m tuple vector, then P is respectively divided in m tuple vector1,P2,…,PmVector Groups, wherein Pi
Indicate i-th of Vector Groups, i-th of tuple vector is divided into PiIn.For example the tuple vector in above-mentioned table 3 can be respectively divided
P1,P2,…,P6In Vector Groups.
Above-mentioned<tuple vector, tuple quantity>centering tuple quantity determine the tuple vector in the Vector Groups of each division
Corresponding number of tuples, while also determining the number for dividing Vector Groups.Such as: the tuple of 6 tuple vectors in above-mentioned example
Quantity is 1, i.e., the vector of tuple is not identical, therefore is first divided into 6 Vector Groups, includes 1 member in each group
Group vector.Assuming that the corresponding tuple quantity of tuple vector 2 is 2, and corresponding tuple is respectively tuple 2 and 3, then the two are first
Group both corresponds to the same Vector Groups.
Step S202: Vector Groups at least once are carried out and are merged.In a Vector Groups merging process, according to current Vector Groups
Whole cost index C (P) similar two Vector Groups are merged so that similar tuple vector is divided into same vector group
In, specific division methods process, as shown in figure 3, including the following steps:
Step S301: the weighted value of each query unit is calculated.
Specifically, the weighted value w of j-th of query unitjIt is FjThe quantity of related querying condition, wherein FjFor inquiry
J-th of query unit in unit set.Such as in the example above, F2" client age > 30 " relate to querying condition Q3 and Q4,
So w2It is 2.
Step S302: each Vector Groups are directed to, cut the Vector Groups with each query unit.
Specifically, F is usedjCut PiProcess are as follows: if PiAny one corresponding tuple, does not comply with Fj, thus really
Surely result u (P is cuti) j-th bit, i.e. u (Pi)jIt is 0;It otherwise, is 1.
Step S303: merging similar two Vector Groups, so that the variation of entirety cost index C (P) is maximum.
Specifically, for PiDefine an important cost index C (Pi) shown in following formula 1, it indicates PiIn tuple
How many tuple vector corresponds to, and can be ignored when executing all inquiries.
In formula one, d is the quantity of query unit in query unit set, | Pi| indicate PiIn tuple vector field homoemorphism;It should
The meaning of formula is, as u (Pi)jWhen being 1,1-1=0, then can ignore without tuple, when for 0,1-0=1,
So the tuple can be ignored.Shown in the following formula two of calculating of the whole cost index C (P) of current Vector Groups:
In formula two, n indicates the quantity of current Vector Groups.Merge best similar two Vector Groups;It is so-called close,
After exactly incorporating the two Vector Groups, the variation of C (P) is maximum.
Merging two Vector Groups, it, can above steps may be repeated multiple times S302-S303 after obtaining new Vector Groups.To logical
Tuple vector can be divided into a group by a group using the Agglomerative Hierarchical Clustering method of standard by crossing whole cost index, thus
Also just tuple corresponding to tuple vector is also divided into a group by a group.Member corresponding to tuple vector in one Vector Groups
Group can be divided into a data block.
The vector obtained after tuple vector each in Vector Groups corresponding to data block is executed and operated, as the data block
Corresponding data block vector.Such as each tuple in above-mentioned table 3 can be divided into data corresponding to Vector Groups shown in table 4
In block, data block vector corresponding to i-th of data block is u (Pi)。
Table 4
Such as shown in table 5, P1In corresponding Vector Groups, the tuple vector of tuple 1 is (0,1,0), the tuple vector of tuple 4
For (1,1,0), the two and (Union) operating result be (1,1,0), so P1Vector corresponding to corresponding data block is u
(P1)=(1,1,0).
Table 5
Record1 |
0 |
1 |
0 |
Record4 |
1 |
1 |
0 |
Union operation |
0union 1 is 1 |
1union 1 is 1 |
0union 0 is 0 |
u(P1) |
1 |
1 |
0 |
After database divides, the necessary management information of the data block of division can recorde, such as data block
Management information may include: the build header of data block, block tail tail, descriptive information etc.;Wherein, descriptive information can be with
Including information such as record quantity, data block sizes.
Basis provided in an embodiment of the present invention carries out data block division methods to the decomposition result of inquiry workload automatically,
Artificially formulation data block partition strategy progress data block division compared with prior art, available more careful data block,
And the division of data block is more reasonable, and skipping for fining is realized during subsequent query of being more convenient for, and improves search efficiency.
Based on the data block divided in advance according to the above method, a kind of data base query method provided in an embodiment of the present invention
Process, as shown in figure 4, including the following steps:
Step S401: the querying condition that user sends is compared with each query unit in query unit set, according to
Comparison result generates query vector.
In this step, the query unit one in element and the query unit set in the query vector of generation is a pair of
It answers, and the element sequence in query vector is consistent with the element sequence in above-mentioned tuple vector.Element in the query vector
Value, according to the querying condition whether include query unit corresponding to the element comparison result determine.For example, inquiry
Condition includes query unit corresponding to certain element, then the element value 0, and otherwise value is 1.Assuming that the inquiry item of some inquiry
Part be " product in (' clothes ', ' shirt ') and client age > 30 ", then we inquire F1, F2, F3, this querying condition
Comprising F1, F2, we obtain a query vector (0,0,1), the corresponding F1 and F2 of the first two 0, the last one 1 corresponding F3.
Step S402: the query vector is carried out and is grasped with vector corresponding to each data block in the database respectively
Make, vector is chosen according to simultaneously operating result.
Specifically, the query vector is carried out and is operated with vector corresponding to each data block in the database respectively
Afterwards, it determines and each element that operating result obtains is 1 vector, and choose the vector.
For example, query vector (0,0,1) and above-mentioned u (P1)、u(P2)、u(P3) simultaneously (Union) operation is done, for u
(P1) (0,0,1) and (1,1,0) union operation be (1,1,1), for u (P2) (0,0,1) and (0,1,1) union
Operation is (0,1,1), for u (P3) (0,0,1) and (1,0,0) union operation be (1,0,1).Only with u (P1)
All bit that union operates obtained vector are 1, therefore, select u (P1)。
Step S403: it according to the querying condition, is inquired, is looked into the data block corresponding to the vector of selection
Ask result.
For example, selecting u (P in above-mentioned steps1) after, then it only need to be in u (P1) corresponding to data block in inquired,
And return to query result.And u (P2) and u (P3) corresponding to data block can then skip, improve search efficiency.
Based on above-mentioned method, a kind of database inquiry device provided in an embodiment of the present invention, structure is as shown in figure 5, packet
Include: query vector generation module 501, vector choose module 502, data inquiry module 503.
Each query unit in the querying condition and query unit set that query vector generation module 501 is used to send user
It is compared, query vector is generated according to comparison result;Wherein, the element in the query vector and the query unit set
In query unit correspond, whether the value of the element in the query vector, include this yuan according to the querying condition
The comparison result of query unit corresponding to element determines.
Vector choose module 502 be used for by the query vector respectively with corresponding to each data block in the database to
Amount is carried out and is operated, and chooses vector according to simultaneously operating result;Specifically, vector is chosen module 502 and is specifically used for the inquiry
Vector is carried out and is operated with vector corresponding to each data block in the database respectively, is chosen and is executed simultaneously with the query vector
The element obtained after operation is 1 vector.
Data inquiry module 503 is used to be carried out in the data block corresponding to the vector of selection according to the querying condition
Inquiry, obtains query result.
Further, a kind of database inquiry device provided in an embodiment of the present invention can also include: data block division module
504。
Data block division module 504 is used to generate member corresponding with the tuple for each tuple in the database
Group vector;The identical tuple of tuple vector is merged into same data block;Wherein, the element in the tuple vector with it is described
Query unit in query unit set corresponds, and element sequence is consistent with the element sequence in the query vector;Institute
The value for stating element in tuple vector, according to the tuple whether meet the element corresponding to query unit the case where decision.
Further, the data block division module 504 is also used to for similar tuple vector being divided into same vector group;
Data block corresponding to tuple vector each in same vector group is merged;Wherein, corresponding to the data block after merging to
Amount is the vector for carrying out and obtaining after operating between the similar tuple vector.Data block division module 504 is by tuple vector
The specific method for carrying out Vector Groups division can refer to the method in each step of process shown in above-mentioned Fig. 2, Fig. 3, no longer superfluous herein
It states.
Certainly, above-mentioned data block division module also may be included in data block dividing device, and independently of above-mentioned data
Except library inquiry device.
In the technical solution of the embodiment of the present invention, according to the decomposition result to inquiry workload, and pass through optimisation technique
(such as clustering algorithm) is automatically performed tuple to the mapping relations of data block, and establishes vector corresponding to data block in order to make
For the index of subsequent query, compared with prior art simple range partition or hash partition, or artificially formulation data block point
Area's strategy carries out data block division, available more careful data block, and the division of data block is more reasonable, is more convenient for
Skipping for fining is realized on the basis of the subsequent data block divided based on this, improves search efficiency.
Those skilled in the art of the present technique are appreciated that the present invention includes being related to for executing in operation described herein
One or more equipment.These equipment can specially design and manufacture for required purpose, or also may include general
Known device in computer.These equipment have the computer program being stored in it, these computer programs are selectively
Activation or reconstruct.Such computer program can be stored in equipment (for example, computer) readable medium or be stored in
It e-command and is coupled in any kind of medium of bus respectively suitable for storage, the computer-readable medium includes but not
Be limited to any kind of disk (including floppy disk, hard disk, CD, CD-ROM and magneto-optic disk), ROM (Read-Only Memory, only
Read memory), RAM (Random Access Memory, immediately memory), EPROM (Erasable Programmable
Read-Only Memory, Erarable Programmable Read only Memory), EEPROM (Electrically Erasable
Programmable Read-Only Memory, Electrically Erasable Programmable Read-Only Memory), flash memory, magnetic card or light card
Piece.It is, readable medium includes by equipment (for example, computer) with any Jie for the form storage or transmission information that can be read
Matter.
Those skilled in the art of the present technique be appreciated that can be realized with computer program instructions these structure charts and/or
The combination of each frame and these structure charts and/or the frame in block diagram and/or flow graph in block diagram and/or flow graph.This technology neck
Field technique personnel be appreciated that these computer program instructions can be supplied to general purpose computer, special purpose computer or other
The processor of programmable data processing method is realized, to pass through the processing of computer or other programmable data processing methods
The scheme specified in frame or multiple frames of the device to execute structure chart and/or block diagram and/or flow graph disclosed by the invention.
Those skilled in the art of the present technique have been appreciated that in the present invention the various operations crossed by discussion, method, in process
Steps, measures, and schemes can be replaced, changed, combined or be deleted.Further, each with having been crossed by discussion in the present invention
Kind of operation, method, other steps, measures, and schemes in process may also be alternated, changed, rearranged, decomposed, combined or deleted.
Further, in the prior art to have and the step in various operations, method disclosed in the present invention, process, measure, scheme
It may also be alternated, changed, rearranged, decomposed, combined or deleted.
It should be understood by those ordinary skilled in the art that: the discussion of any of the above embodiment is exemplary only, not
It is intended to imply that the scope of the present disclosure (including claim) is limited to these examples;Under thinking of the invention, above embodiments
Or can also be combined between the technical characteristic in different embodiments, step can be realized with random order, and be existed such as
Many other variations of the upper different aspect of the invention, for simplicity, they are not provided in details.Therefore, it is all
Within the spirit and principles in the present invention, any omission, modification, equivalent replacement, improvement for being made etc. be should be included in of the invention
Within protection scope.