CN104809210B - One kind is based on magnanimity data weighting top k querying methods under distributed computing framework - Google Patents

One kind is based on magnanimity data weighting top k querying methods under distributed computing framework Download PDF

Info

Publication number
CN104809210B
CN104809210B CN201510209691.0A CN201510209691A CN104809210B CN 104809210 B CN104809210 B CN 104809210B CN 201510209691 A CN201510209691 A CN 201510209691A CN 104809210 B CN104809210 B CN 104809210B
Authority
CN
China
Prior art keywords
region
data
attribute
point
maximum
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510209691.0A
Other languages
Chinese (zh)
Other versions
CN104809210A (en
Inventor
何洁月
罗浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN201510209691.0A priority Critical patent/CN104809210B/en
Publication of CN104809210A publication Critical patent/CN104809210A/en
Application granted granted Critical
Publication of CN104809210B publication Critical patent/CN104809210B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking

Abstract

The invention discloses a kind of top k enquiring and optimizing methods of mass data under distributed computing framework based on spark, mass data collection is subjected to data segmentation in advance, mainly using the data dividing method of similar grid.Raw data set is divided into different Sub Data Sets, the weight and inquiry k values then assigned according to user to each attribute of data object, a small amount of suitable Sub Data Set is chosen and is inquired about instead of whole data set.The results show set forth herein method inquiry velocity it is very fast, and be with good expansibility.Contrasted with traditional top k querying methods and based on angle and distance data dividing method, improve inquiry velocity, the information that user needs to inquire about can be fed back in time in a short time.

Description

One kind is based on magnanimity data weighting top-k querying methods under distributed computing framework
Technical field
The present invention relates to a kind of data query method, particularly a kind of mass data concentrates the top-k querying methods of weighting.
Background technology
Top-k inquiries also referred to as sequence sensitive queries (rank-aware query), be in database one it is most basic Operation, while be also data analysis important tool, especially in business analysis, often only need to pay close attention to most useful data, and It is not whole data set.
Top-k inquiries are defined as follows:With D={ T1,T2,…,TnRepresent the set of all data objects, TiRepresent it In i-th of data object, each data object has d dimensions, and is all a point in space.For top-k inquiry Q (f, K), f represents score function, and k represents to return to k result for meeting search request.F is weighted sum function, i.e., in sample One data object T (t1,t2,…,td), user assigns a weight W (w to each attribute of the data object1,w2,…, wd), the score of each data object is obtained by each property value weighted sum, i.e., scoring function is:
As long as top-k inquiries finally obtain the result set of k element, as long as to the data progress much smaller than input data set Sequence can be obtained by, without handling the data of the overall situation.Recently as the volatile growth of data scale, sea The data scale of amount brings great challenge to data storage, management and analysis.Top-k inquiries are used as one in data analysis Individual basic operation is, it is necessary to quickly obtain Query Result.Such as:In Taobao's magnanimity commodity user according to itself preference to commodity category Property assign different weights, then system quickly returns to the preceding k commodity for meeting user's request according to user's request.
But it is faced with two big challenges for mass data top-k inquiries:First, data scale reaches TB or PB levels, pass The centralized data processing method of system is no longer applicable;Second, how can fast and accurately obtain inquiry knot for mass data collection Fruit.
Top-k inquiries in traditional centralized data system run into performance bottleneck in mass data, so uncomfortable Close the processing of mass data collection.In traditional distributed environment, some research improves inquiry by the caching to Query Result Efficiency, this method is in itself without solving mass data top-k inquiry problem;Some utilizes the inquiry of Skyline profiles Method carries out data processing, proposes a whole set of top-k processing frameworks of DiTo, but be also in traditional distributed environment.
Top-k problems solution most basic under cloud environment is exactly that all data are ranked up and then returned in recent years K result before returning, but inquiry will be handled raw data set this method every time, caused the workload of redundancy, looked into Time length is ask, so inadvisable.Statistical analysis when RanKloud et al. proposes to pass through system operation under MapReduce frameworks To calculate the threshold value that inquiry terminates in advance, this method cannot be guaranteed to obtain k accurate results.Also study by caching machine System has inquired about similitude by comparing in a new inquiry and caching, if similarity degree is big, does not have to inquire about again, though So accelerate inquiry velocity, but Query Result is inaccurate.There is proposition to be inquired about based on angle and distance data division top-k, still Data partition schemes based on angle, data coordinates conversion is complicated time-consuming, so being also not suitable for the processing of mass data collection.
The content of the invention
Goal of the invention:In order to overcome the deficiencies in the prior art, the present invention provides one kind and is based on Distributed Calculation frame The mass data weighting top-k querying methods of frame, can not be fast when handling mass data for solving existing top-k inquiries Speed, technical problem that is accurate, easily obtaining Query Result.
Technical scheme:To achieve the above object, the technical solution adopted by the present invention is:
Following 4 reasonable assumptions are made first:
(1), the negated negative value of any one data object attribute value, even negative value can also pass through the normalizing of data Change, be changed into nonnegative value.
(2), data set is relatively fixed, or data renewal speed for whole data set, can be certain Ignored in time, although for example, the commodity data moment in Jingdone district updating, can be with based on huge commodity radix Think to change less within some period.Therefore, flow data processing is directed to, the inventive method does not apply to.
(3), data are generally evenly distributed in space, are concentrated in mass data, and this assumes to be to meet under many scenes 's.
(4) for an input weight W, meetEven if not being, can also be obtained by normalization.
On the basis of above-mentioned 4 reasonable assumptions, propose a kind of based on magnanimity data weighting under distributed computing framework Top-k querying methods, including the following steps that order performs:
Step 1, establish data space
The property value of all data objects comprising d attribute is converted to nonnegative value first, and property value is entered Row normalized;D dimension coordinates system is established, the axle of coordinate system and the attribute of data object correspond, by all data objects It is positioned in coordinate system and forms data space;
Step 2, data division
Using the origin of coordinate system as starting point, whole coordinate system is divided into m region from inside to outside, m values can not here Negative consequence excessive, that amount of calculation otherwise can be brought to increase, in the case of current data scale, is typically taken as 3~5 by m, Such scope reasonably considers data scale and amount of calculation, certainly, can be with the further increase of later data scale The purpose of the reduction of data total amount in the region that suitably increase m value is marked off with obtaining;By each region from extroversion Interior serial number be 1,2 ... ..., m, and the border in the 1st region and reference axis cooperatively by all data objects all include into Go, to any one region, the maximum of every attribute in the region is identical, and the coordinate of the peripheral boundary in each region meets The coordinate value of at least one axle is the maximum of the attribute in the region, is setting the maximum of the attribute in the 1st region as a1's Under the premise of, then the maximum of the attribute of ith zoneI=1,2 ..., m.Drawn according to the above method After dividing good region, understand that the data volume in each region is equal with reference to hypothesis (3) above.
Except one region of outermost, to remaining each region, the attribute by axle that belong to the region and each is maximum For the point of value as basic point, the region that all properties value in whole coordinate system is both greater than to the respective attributes value being equal at basic point is equal Mark off and, be 1,2 according to serial number from outside to inside ... ..., M, wherein M=m-1, by the above-mentioned region for newly marking off and As judging area.
According to Skyline principle, point 2 is both less than to any two point 1 and point 2, such as all properties value of fruit dot 1, then Point 1 supports point 2.Based on above-mentioned principle, if giving two data object T1And T2If forThere is T1Category Property value is more than or equal to T2Corresponding property value is T1.ti≥T2.ti, tiThe property value of expression ith attribute, then any given one Input weight W (w1,w2,…,wd), there must be T1Score be more than T2Score be fW(T1)≥fw(T2)。
Based on above-mentioned analysis, a certain score for judging the data object in area is necessarily both greater than positioned at belonging to this judgement area Region inner side all areas data object score, because the result that algorithm returns is taken from the k of highest scoring, k For algorithm return result set in data object number, so once it is a certain judge area in data object number be more than etc. In k, then this k according to object is obtained from the inside region in the region belonging to the judgement area.Therefore, base In above-mentioned analysis, to judging that area proceeds as follows judgement:
According to number order from small to large, N is judged successivelyiWhether >=k sets up, wherein NiThe data in area are judged for No. i The number of object, k are the number of data object in the result set that algorithm returns;Judge that region meets that above formula is set up when No. i, then tie Beam judges, and i region from outside to inside is scanned for as region of search.
Further, in the present invention, to being finely divided as a region of most inner side in region of search, the region is compiled Number it is i, divided method is as follows:
It is d+1 blocks by the region division, wherein d blocks are search domain to be selected, remaining region in addition to search domain to be selected For must search domain;
It is n=1,2 ... ..., d by search domain to be selected numbering, wherein any one number in n-th of search domain to be selected Strong point Tnj(tn1,tn2,…,tnd), t herenjRepresent data point TnjJ axles corresponding to property value, tnjMeet following 2 Formula:
0≤tnj≤2ai+1-ai, 1≤j≤d and j ≠ n (1) here
ai-ai+1≤tnj≤ai, j=n (2) here
In n-th of search domain to be selected, if data point Tnj(tn1,tn2,…,tnd) meet attribute corresponding to one of axle It is worth for aiAnd property value corresponding to remaining axle is 2ai+1-ai, then the maximum side using the data point as n-th of search domain to be selected Boundary's point;
Traverse user is given to each attribute weight w at the maximum boundary point of each search domain to be selectedjIt whether there is Meet following condition:
If in the presence of the attribute weight w for the maximum boundary point for meeting above-mentioned conditionj, then region of search range shorter be include from Area and the search domain to be selected belonging to the maximum boundary point must be retrieved in i-1 region outside to inside, ith zone;
If in the absence of the attribute weight w for the maximum boundary point for meeting above-mentioned conditionj, then region of search range shorter is to include Area must be retrieved in i-1 region from outside to inside and ith zone.
Meeting to judge area NiOn the premise of >=k is set up, divided method will be divided into and must examine positioned at the region of search of most inner side Rope region and search domain to be selected, and appropriate part is further selected from search domain to be selected according to judgment principle and examined Rope, further reduce range of search.According to demonstration above, the score at the maximum boundary point of each search domain to be selected is inevitable More than or equal to the score of the data point of other positions in the search domain to be selected;Therefore, if some search domain to be selected most Score at big boundary point, which is less than, judges area NiBasic point at score, then this it is to be selected retrieval area need not just retrieved, conversely, then The search domain to be selected then needs to retrieve.
For convenience of explanation, a search domain to be selected is chosen, the coordinate of its maximum boundary point is T (ai,2ai+1-ai,2ai+1- ai,…,2ai+1-ai), judge area N accordinglyiBasic point coordinate be T (ai+1,ai+1,ai+1,...,ai+1);Above-mentioned coordinate is substituted into Scoring function, if there is (ai,2ai+1-ai,2ai+1-ai,…,2ai+1-ai) * W > (ai+1,ai+1,…,ai+1) * W, here W=(w1, w2,…,wd), then above formula can be deformed intoIt is hereby achieved thatThen need Region to be retrieved corresponding to above-mentioned maximum boundary point is retrieved;In the manner described above, all regions to be retrieved are entered Row judges, you can obtains unified expression formulaHere can from which further follow that, if some region to be retrieved meets Condition, then the attribute weight w for setting up inequalityjNecessarily the property value of the maximum boundary point in the region to be retrieved is aiCategory Property weight, then traversal retrieval when, as long as being a by the property value of the maximum boundary point in each area to be retrievediAttribute Weight is brought intoThe area to be retrieved needs to retrieve if setting up, and once finds a category that above formula is set up Property weight, just do not have to be further continued for examining other areas to be retrieved, because Attribute Weight has reformed normalized, therefore can not possibly be same When have more than 2 or 2 attribute weight meet inequality above;So when selecting area to be retrieved, can be defeated according to user The Attribute Weight weight values entered quickly judge whether an attribute weight of maximum is fullIf it is satisfied, so correspondingly It is a to find out j-th of attribute of maximum boundary pointiArea to be retrieved.
Beneficial effect:
It is provided by the invention a kind of based on magnanimity data weighting top-k querying methods under distributed computing framework, it is proposed that A kind of data partitioning scheme of new similar mesh generation, and by judging data volume and data volume k in result set in area How much carry out contrast and primarily determine that region of search, greatly reduce hunting zone;Then the field of search of most inner side is further reduced The hunting zone in domain so that final region of search is smaller, improves search efficiency and speed.
According to statistical result, the data space for possessing 1,000,000,000 datas is divided into m=3 region, outermost judgement Data volume in area is that the attribute number that each data object includes is presented below as the variation tendency shown in table with dimension d:
Table 1
As seen from table, when dimension d is less than 8,18 data objects, practical application are still suffered from outermost judgement area In, it is often little for the requirement of result set data object number, 10 results are such as returned to, as long as therefore most inquiry outermost areas Data set can in domain, therefore 2/3 hunting zone is at least reduced, filter out a large amount of extraneous datas.Therefore, this hair Bright method significantly improves the top-k query performances under mass data, improves magnanimity higher dimensionality and is inquired about according to the top-k of collection Speed.
Brief description of the drawings
Fig. 1 is the present invention to data partition method schematic diagram;
Fig. 2 is the present invention to data subdividing method schematic diagram;
Fig. 3 represents that three kinds of different pieces of information dividing modes contrast with data dimension difference query time, wherein DistImprove It is the inventive method, AngleDistTop_k is to be based on angle and distance data dividing method, and BasicTop_k is to original number According to collection without dividing query time;
Fig. 4 is represented in the case of dimension 4, is contrasted for different user input weight query time.
Fig. 5 is speed-up ratio of the inventive method in different cluster nodes;
Fig. 6 is top-k query process figures in top-k inquiries specific implementation of the present invention.
Embodiment
The present invention is further described below in conjunction with the accompanying drawings.
Experiment is completed on the spark clusters of 7 nodes, and spark is built on hadoop, uses hadoop Yarn explorers and HDFS document storage systems.Master nodes not only as Driver nodes but also are done in 7 nodes Worker nodes, remaining 6 node is worker nodes.The basic configuration of experimental situation such as table 2 below:
Table 2
Using uniform data set, every record has 8 attributes, the integer between each attribute span [0,1000], 1,000,000,000 records, about 40G data volumes are collectively generated.The also dimension of generation 4,6 dimension data collection, and similar with 8 dimensions simultaneously, all Data set is randomly generated, and is all 1,000,000,000 records.
It is to take average in weight if experiment is without specified otherwise below, the experiment done under conditions of k=100, and often Secondary inquiry is all to have done 10 results averaged.Because the data prediction of inquiry is only used as once, then inquiry every time Without considering data prediction, therefore hereafter query time does not more count the substantial amounts of data prediction time.Present invention side Method is about 42mins for 8 dimension data pretreatment times.
Present embodiment is as shown in fig. 6, be divided into two big steps:
Step 1:Data prediction.It is main according to set forth herein data dividing method raw data set is divided, Different Block is divided into, mark is carried out to each Block, is then stored on HDFS disks.The HDFS disks in spark Mainly it is made up of each worker nodes disk.Judge according to the data partitioning scheme in claim and inquiry data Mode is inquired about.Specific division is as follows:
The first step:Overall segmentation from inside to outside
Whole data space is divided into m=3 with this from inside to outside according to homalographic principle and is divided into 3 big subregions.Such as Shown in Fig. 1, dividing mode is intuitively illustrated by taking two dimension as an example, transverse axis is that x-axis corresponds to attribute 1, the longitudinal axis is that y-axis corresponds to attribute 2, figure In bold portion divide the space into 3 deciles, by each region from outside to inside serial number be 1,2,3.Except outermost one Region, to remaining each region, the attribute using axle that belong to the region and each be the point of maximum as basic point, will be whole The region that all properties value in coordinate system is both greater than the respective attributes value being equal at basic point marks off, according to from outside to inside Serial number be A, B.N is judged successivelyA>=k, or nBThe no establishments of >=k, if square A data volumes nA>=k is set up, then only With the data set in 1 region of inquiry, the data volume n otherwise checked in square BAWhether >=k sets up.Mass data is directed to look into Ask generally square A data volumes only can be obtained by top-k results much larger than k with data in 1 region of inquiry, be It is progressive that each big subregion is finely divided to reducing inquiry data volume.
Second step:The subdivision in each big region
Each subregion 1,2 is directed to further to divide, such as big subregion 1, (ABC), D, E can be divided into as shown in Figure 2 Three regions, wherein A, B, the trizonal areas of C are equal, and A, B, C are as must retrieve area, and D, E are as retrieval to be selected Area.
Following truth also be present:If for a data object T1ScoreSo it is known that fW (T1) and spatial data points T1In straight lineProjected length it is directly proportional, therefore can with projected length come Weigh score function.
Thus, it is supposed that data set is more than or equal to k in square A, judge whether D, E need retrieval only to need in such a way Checking:
If the weight of user's input meetsAnd w1=w2=0.5, compare the maximum boundary point d in D regions in figure In straight lineOn the basic point a of subpoint to the distance between origin and square A arrived in the subpoint of above-mentioned straight line The distance between origin, it can be found that the two is equal, similarly, the projection of the maximum boundary point e in E regions on above-mentioned straight line Point to the distance between origin also with square A basic point a above-mentioned straight line subpoint to the equal of the distance between origin, It is therefore not necessary to inquire about D and E regions, top-k results only are can be obtained by with inquiry A, B, C regions, and according to same reason It can also be seen that the top-k results found in A, B, C region necessarily appear in dotted line top-right part in Fig. 2;
It is similar with above-mentioned principle, if w1> w2, then inquiry D regions are not had to;If w1< w2, then without query region E, Do not prove one by one herein;
To sum up, equation below can be obtained:
D dimension datas space is generalized to, uses Si, 1≤i≤3 represents one in the 3 big subregion that divides from inside to outside;Sij, 1≤j≤d represents big subregion SiIn be similar to D or E j-th of subregion;Si(d+1)Represent big subregion SiIn it is similar be A, B, C Subregion.When the weight of data object jth attributeWhen, then for big subregion SiOnly need to inquire about 2 therein Region is respectively Si(d+1)And Sij;Otherwise it is directed to big subregion SiOnly with one region S of inquiryi(d+1)
Step 2:Query processing.For one inquiry f (W, k) of user, inputted, chosen according to inquiry on Driver nodes Partial data collection is inquired about.It takes k=100 to the present embodiment, and each data object attribute weight takes average, now only with looking into Ask S1(d+1)Data area.
Such as Fig. 3, represent that three kinds of different pieces of information dividing modes contrast with data dimension difference query time, wherein DistImprove is the inventive method, and AngleDistTop_k is to be based on angle and distance data dividing method, BasicTop_k It is without dividing query time to raw data set;From experiment it can be seen that the inventive method ratio is based on angle and distance data Dividing method more improves inquiry velocity, and inquiry velocity improves about 15%, and as dimension increases query time It is steadily to increase, larger fluctuation does not occur.
Due to the weight W=(w of user1,w2,…,wd) input can influence inquire about data area size, such as Fig. 4 institutes Show, the query time of different weighted values under 4 dimensions, the wherein first kind be with a certain attribute weight is extremely inclined to the characteristics of, Including W2=(0.06,0.06,0.07,0.8) and W4=(0.56,0.14,0.25,0.03), the second class W1=(0.25,0.25, 0.25,0.25) it is equivalent weight, the 3rd class W3=(0.16,0.32,0.34,0.18) is deviation unobvious weight.W in figure1With W3Query time is roughly equal, W2With W4Query time is about the same, and W1With W3Query time compares W2With W4Query time is short, main If due to concentrating different weights to result in the need for inquiring about data block difference, W in low-dimensional data2With W4To be extremely inclined to some category Property weight, more some data blocks of inquiry are resulted in the need for, so as to cause query time to be more than w1With w3
The scalability of the inventive method can be seen as shown in figure 5, speed-up ratio in 8 dimension data collection on different nodes Go out speed-up ratio close to preferable speed-up ratio, as processor is worker doubles, performing speed can also double, i.e., number of the present invention It is with good expansibility according to division methods.
Described above is only the preferred embodiment of the present invention, it should be pointed out that:For the ordinary skill people of the art For member, under the premise without departing from the principles of the invention, some improvements and modifications can also be made, these improvements and modifications also should It is considered as protection scope of the present invention.

Claims (2)

1. one kind is based on magnanimity data weighting top-k querying methods under distributed computing framework, it is characterised in that:Including sequentially holding Capable following steps:
Step 1, establish data space
The property value of all data objects comprising d attribute is converted to nonnegative value first, and property value is returned One change is handled;D dimension coordinates system is established, the axle of coordinate system and the attribute of data object correspond, by all Data object placements Data space is formed in coordinate system;
Step 2, data division
Using the origin of coordinate system as starting point, whole coordinate system is divided into m region from inside to outside, by each region from outside to inside Serial number is 1,2 ... ..., m, and the border in the 1st region and reference axis cooperatively all include all data objects, To any one region, the maximum of every attribute in the region is identical, and the coordinate of the peripheral boundary in each region meet to The coordinate value of a rare axle is the maximum of the attribute in the region, is setting the maximum of the attribute in the 1st region as a1Before Put, then the maximum of the attribute of ith zoneExcept one area of outermost Domain, to remaining each region, the attribute using axle that belong to the region and each be the point of maximum as basic point, will entirely sit The region that all properties value in mark system is both greater than the respective attributes value being equal at basic point marks off, according to from outside to inside Serial number is 1,2 ... ..., M, wherein M=m-1, using the above-mentioned new region come of marking off as judging that area proceeds as follows Judge:
According to number order from small to large, N is judged successivelyiWhether >=k sets up, wherein NiThe data object in area is judged for No. i Number, k be algorithm return result set in data object number;Judge that region meets that above formula is set up when No. i, then terminate to sentence It is disconnected, and i region from outside to inside is scanned for as region of search.
2. according to claim 1 be based on magnanimity data weighting top-k querying methods under distributed computing framework, its feature It is:To being finely divided as a region of most inner side in region of search, the zone number is i, and divided method is as follows:
It is d+1 blocks by the region division, wherein d blocks are search domain to be selected, and remaining region in addition to search domain to be selected is must Search domain;
It is n=1,2 ... ..., d by search domain to be selected numbering, wherein any one data point in n-th of search domain to be selected Tnj(tn1,tn2,…,tnd), t herenjRepresent data point TnjJ axles corresponding to property value, tnjMeet following 2 inequality:
0≤tnj≤2ai+1-ai, 1≤j≤d and j ≠ n (1) here
ai-ai+1≤tnj≤ai, j=n (2) here
In n-th of search domain to be selected, if data point Tnj(tn1,tn2,…,tnd) meet that property value corresponding to one of axle is aiAnd property value corresponding to remaining axle is 2ai+1-ai, then the maximum boundary point using the data point as n-th of search domain to be selected;
Traverse user is given to each attribute weight w at the maximum boundary point of each search domain to be selectedjWhether following bar is met Part:
If in the presence of the attribute weight w for the maximum boundary point for meeting above-mentioned conditionj, then region of search range shorter is to include from extroversion Area and the search domain to be selected belonging to the maximum boundary point must be retrieved in interior i-1 region, ith zone;
If in the absence of the attribute weight w for the maximum boundary point for meeting above-mentioned conditionj, then region of search range shorter is to include from outer Area must be retrieved in i-1 inside region and ith zone.
CN201510209691.0A 2015-04-28 2015-04-28 One kind is based on magnanimity data weighting top k querying methods under distributed computing framework Active CN104809210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510209691.0A CN104809210B (en) 2015-04-28 2015-04-28 One kind is based on magnanimity data weighting top k querying methods under distributed computing framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510209691.0A CN104809210B (en) 2015-04-28 2015-04-28 One kind is based on magnanimity data weighting top k querying methods under distributed computing framework

Publications (2)

Publication Number Publication Date
CN104809210A CN104809210A (en) 2015-07-29
CN104809210B true CN104809210B (en) 2017-12-26

Family

ID=53694032

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510209691.0A Active CN104809210B (en) 2015-04-28 2015-04-28 One kind is based on magnanimity data weighting top k querying methods under distributed computing framework

Country Status (1)

Country Link
CN (1) CN104809210B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106777091A (en) * 2016-12-14 2017-05-31 大连大学 The double filtering searching systems of the Skyline based on many medical factors under mobile O2O environment
CN106777095A (en) * 2016-12-14 2017-05-31 大连交通大学 The double filtering search methods of the Skyline based on many medical factors under mobile O2O environment
CN108491541A (en) * 2018-04-03 2018-09-04 哈工大大数据(哈尔滨)智能科技有限公司 One kind being applied to distributed multi-dimensional database conjunctive query method and system
CN110245022B (en) * 2019-06-21 2021-11-12 齐鲁工业大学 Parallel Skyline processing method and system under mass data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314521A (en) * 2011-10-25 2012-01-11 中国人民解放军国防科学技术大学 Distributed parallel Skyline inquiring method based on cloud computing environment
CN103177130A (en) * 2013-04-25 2013-06-26 苏州大学 Continuous query method and continuous query system for K-Skyband on distributed data stream

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314521A (en) * 2011-10-25 2012-01-11 中国人民解放军国防科学技术大学 Distributed parallel Skyline inquiring method based on cloud computing environment
CN103177130A (en) * 2013-04-25 2013-06-26 苏州大学 Continuous query method and continuous query system for K-Skyband on distributed data stream

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Multi-dimensional top-k dominating queries;Man Lung Yiu 等;《The VLDB Journal》;20090630;第18卷(第3期);全文 *
Top-k Dominant Web Services Under Multi-Criteria Matching;Dimitrios Skoutas 等;《EDBT’09 Proceedings of the 12th International Conference on Extending Database Technology:Advances in Database》;20090326;全文 *
度量空间中的Top-k反向Skyline查询算法;张彬 等;《计算机研究与发展》;20140315;第51卷(第3期);全文 *

Also Published As

Publication number Publication date
CN104809210A (en) 2015-07-29

Similar Documents

Publication Publication Date Title
US20200250163A1 (en) Index Sharding
Kim et al. Taming subgraph isomorphism for RDF query processing
EP3014488B1 (en) Incremental maintenance of range-partitioned statistics for query optimization
US10162857B2 (en) Optimized inequality join method
US7877376B2 (en) Supporting aggregate expressions in query rewrite
US10565201B2 (en) Query processing management in a database management system
CN106874426B (en) RDF (resource description framework) streaming data keyword real-time searching method based on Storm
JP6112440B2 (en) Data partitioning method and apparatus
CN104809210B (en) One kind is based on magnanimity data weighting top k querying methods under distributed computing framework
US9110949B2 (en) Generating estimates for query optimization
EP2469423B1 (en) Aggregation in parallel computation environments with shared memory
CN110909111B (en) Distributed storage and indexing method based on RDF data characteristics of knowledge graph
CN102270232A (en) Semantic data query system with optimized storage
Chen et al. Metric similarity joins using MapReduce
Qi et al. The min-dist location selection and facility replacement queries
Tang et al. An intermediate data partition algorithm for skew mitigation in spark computing environment
US9378243B1 (en) Predicate-based range set generation
Yin et al. Efficient distributed skyline computation using dependency-based data partitioning
Aluç et al. chameleon-db: a workload-aware robust RDF data management system
do Carmo Oliveira et al. Set similarity joins with complex expressions on distributed platforms
Zheng et al. User preference-based data partitioning top-k skyline query processing algorithm
Zhu et al. A Type-Based Blocking Technique for Efficient Entity Resolution over Large-Scale Data
Pan et al. Garden: a real-time processing framework for continuous top-k trajectory similarity search
Shi et al. HEDC++: an extended histogram estimator for data in the cloud
Lu Dynamic matrix clustering method based on time series

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
EXSB Decision made by sipo to initiate substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant