CN102799681A - Top-k query method oriented to any data segment - Google Patents
Top-k query method oriented to any data segment Download PDFInfo
- Publication number
- CN102799681A CN102799681A CN2012102576401A CN201210257640A CN102799681A CN 102799681 A CN102799681 A CN 102799681A CN 2012102576401 A CN2012102576401 A CN 2012102576401A CN 201210257640 A CN201210257640 A CN 201210257640A CN 102799681 A CN102799681 A CN 102799681A
- Authority
- CN
- China
- Prior art keywords
- data
- node
- index
- result
- inquiry
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a Top-k query method oriented to any data segment, comprising the following steps: firstly acquiring data; then analyzing the characteristics of the data, building an index structure according to the characteristics of the data, if data size is small and a DG (dominant graph) index is created, entering into DG-index-based Top-k query based on any data segment; if the data size is large and nodes on the DG index are sparse, entering into the Top-k query based on a double-layer dominant graph (DDG) index structure; and if any segment is more difficult to determine, entering into a mixed index query based on the DG and GS (general subject). The method disclosed by the invention can be applicable to the indexing of global Top-k query and partial Top-k query of any data segment, and freedom and randomicity of a Top-k query application are improved.
Description
Technical field
The present invention relates to a kind of Top-k querying method, belong to technical field of information retrieval towards any section data.
Background technology
Continuous development along with infotech; People improve constantly the requirement of information retrieval, and Top-k inquiry has obtained using widely in information retrieval, multimedia similarity searching, text and data integration, business analysis, products catalogue and preference inquiry, distributed network gathering and Sensor Data Record and some other application based on the suggestion source of internet.
At present, the algorithm to preference Top-k inquiry mainly contains four big types: 1) sort-list method; 2) hierarchy type method; 3) view approach; 4) summarization methods.
The most classical in the Sort-list method is TA algorithm (H.Bast, D.Majumdar, R.Schenkel; M.Theobald, andG.Weikum.IO-Top-k:Index-accessoptimizedtop-kquerypr ocessing.InVLDB, pages475 – 486; 2006.FaginR, LotemA, NaorM.Optimal aggregation algorithms for middleware.Journal of Computerand System Sciences 66; 2003, pp.614-656.).Algorithm is to build up a plurality of sorted lists after each computing dimension sorts independently.Find out all greater than all tuples of given threshold value, rather than directly search all tuples.Sequentially scan each tabulation in the computation process, when sequential access, as run into the tuple indicator, immediately random access other tabulate and calculate the Top-k score value.The tuple of having visited obtains Top-k result through ordering, and the topmost difficult point of this method is the decision threshold size, will cause return results too much if threshold value is crossed pine, if the threshold value tension will cause return results very few.
In the hierarchy type method, the data centralization tuple is pressed the layering of given level rule.The Top-k inquiry of arbitrary function F obtains Query Result in the k layer in the past.Existing multiple layered approach: DG (Zou L; Chen L.Dominant graph:An efficient indexing struture to answer top-k queries [C] //Proc of the IEEE 24th Int Conf on Data Engineering.Washington; DC:IEEE Computer Society, 2008:536-545.), AppRI (Xin D, Chen C; Han J.Towards robust indexing for rankedqueries [C] //Proc of the 32nd Int Conf on Very Large Data Bases.Trondheim; Norwary:VLDB Endowment, 2006:235-246.) and Onion (C hang Y C, Bergman L D; Castelli V; Et al.The onion technique:Indexing for linear optimization queries [J] .ACM SIGMOD Record, 2000,29 (4): 391-402.).The Onion method is the layering rule with the convex closure.A given linear query function, interested tuple only is present in the convex closure.Onion method building process is the convex closure that calculates tuple, at first calculates the 1st convex closure, calculates the 2nd convex closure of residue tuple then, by that analogy, finishes up to all first set of calculated.The level rule that defines in the AppRI method is: tuple t puts into the l layer, and and if only if satisfies two conditions: 1) given any linear query makes t not in Top-(l-1) result; 2) have at least an inquiry to make t belong to the top-l layer.The level rule of DG method definition is: each layer is previous skyline.Skyline is introduced, at first calculate the 1st skyline, calculate the 2nd skyline of residue tuple then, finish up to all first set of calculated.Different with top two kinds of methods, owing to add the dominance relation between data point among the DG, make and need not visit and calculate all k layers query function value of tuple in the past.
Method based on view is a matching result in by the view of given function pre-sorting; Typical method has: PREFER (Hristidis V; Koudas N, Papakonstantinou Y.Prefer:A system for the efficient execution of multi-parametric ranked queries [J] .ACM SIGMOD Record, 2001; 30 (2): 259-270.) and LPTA (Das G; Gunopulos D, Koudas N, et al.Answering top-k queries using views [C] //Proc of the32nd Int Conf on Very Large Data Bases.Trondheim; Norway:VLDB Endowment, 2006:451-462.).In this type of algorithm, if the function of query function and presort view is fast more near inquiry velocity more.The PREFER algorithm uses view sequence Rv, will write down tuple and press the preference function ordering.In the time will inquiring about preference function, calculate the watermark among the Rv, guarantee that it is the 1st value that inquiry obtains. repeat said process and obtain the Top-k value.The LPTA algorithm is safeguarded the tuple ID tabulation of some preference function orderings.Retrieval in these tuples ID tabulation is up to finding the Top-k value.
Method based on summary is generally to use grid dividing (waiting dark or wide) data set, and data point in the record grid cell.When inquiry, pass through the approximate function score value of grid summary info computational data intensive data point, with the data point of the non-Query Result of beta pruning.In the grid cell that satisfies condition, obtain accurate function score value and ordering through further access number strong point, obtain Query Result.RankCube concentrates multidimensional to select inquiry to adopt summarization methods in historical data.This method structure grid is very fast but computation process is more rough, is applicable to set up fast in the indexed data continuous query.Domestic researcher also makes big quantity research in the Top-k computing field; Like data stream Top-k frequent item set mining method (Yang Bei; Huang Houkuan. mining data stream boundary mark window Top-K frequent item set [J]. computer research and development; 2010,47 (3): 463-473)., data stream Top-k abnormity point discover method etc.
More than the method for relevant Top-k inquiry, obtain the Top-k result set of global optimum emphatically, seldom study, thereby reduced freedom and arbitrariness that the Top-k inquiry is used to the Top-k inquiry of data in any section.Therefore, be necessary that research and establishment can be fit to the index that overall Top-k inquiry again can any section section data Top-k inquiry.
Summary of the invention
Goal of the invention: to the problem that exists in the prior art; The present invention provides a kind of Top-k querying method towards any section data; This method has the index that can be fit to overall Top-k inquiry and any section section data Top-k inquiry, improves freedom and arbitrariness that the Top-k inquiry is used.
Technical scheme: a kind of Top-k querying method towards any section data comprises the steps:
Steps A: reading of data;
Step B: analyze data characteristics, set up index structure according to data characteristics: if data volume is less, the DG index has been built up and has been got into step B-1; If data volume is bigger, the node on the corresponding DG index of data set is " sparse " (need to add " pseudo-node " just can be reduced to the continuous subgraph in DG index middle level 50% or more) comparatively, entering step B-2; If confirm more at need when any section, get into step B-3;
Step B-1: any section data Top-k querying method based on the DG index comprises the steps;
Step B-1-1: add the pseudo-node of part, reduction DG index;
Step B-1-2: carry out handling, specifically comprise following steps based on the Traveler of DG:
Step B-1-2-1: scan the level number of data segment to be checked, the node of smallest tier minlayer is added Candidate Set RS according to the non-decreasing order, the maximal value R among the RS is added result set;
Step B-1-2-2: the size of judged result collection and the relation of K, if the result set size less than K, changes step B-1-2-3 over to, otherwise changes step B-1-3 over to;
Step B-1-2-3: son's node C of scanning R; If all father's nodes of C all in Candidate Set and C do not visited; Node C is added Candidate Set; And the max node in the Candidate Set added result set, otherwise the node that gets into result set is in query context, and the size of result set is added 1;
Step B-1-3: the dummy record in the deletion result set obtains final Top-k inquiry result;
Step B-2: the Top-k querying method based on double-deck dominating figure DDG index structure comprises the steps;
Step B-2-1: data are carried out segmentation;
Step B-2-2: to the data construct DDG index structure after the segmentation;
Step B-2-3: carry out the Top-k inquiry, specifically comprise the steps;
Step B-2-3-1: the DG index that calculates inquiry section place;
Step B-2-3-2: each the DG index to the inquiry place carries out basic Traveler processing, forms result set result;
Step B-2-3-3: the bottom DG index to the inquiry place carries out handling based on the Traveler of DG, and the result is write result;
Step B-2-3-4: the top DG index to the inquiry place carries out handling based on the Traveler of DG, and the result is write result, forms final Top-K Query Result.
Step B-3: the hybrid index querying method based on DG and GS comprises the steps;
Step B-3-1: set up DGS domination network, be divided into double-layer structure up and down, the upper strata is the DG index structure, be suitable for overall Top-k inquiry, and for any section data Top-k inquiry, the GS of lower floor data structure can well keep the throwback dominance relation.
Step B-3-2: the notion according to adapted mesh is adjusted, and each dimension data of GS layer is all carried out adaptive adjustment, makes data all reach even distribution above the dimension at each;
Step B-3-3: according to the DG index GS structure is adjusted, made that the inner node of same level in the GS network keeps certain sequence, reduce the number of comparisons between the same level data in the DG index;
Step B-3-4: inquire about based on DGS domination network, specifically comprise following steps:
Step B-3-4-1: the row number (column) and the row number (rower) of computational data section data query section place grid;
Step B-3-4-2: handle in the grid row successively and number be column, row number is 0 to rower node, the data node that falls in the above-mentioned grid between interrogation zone is added Candidate Set according to the non-decreasing order, and calculate the maximum column number (col) that satisfies condition;
Step B-3-4-3: handle in the grid row successively and number be rower, row number is 0 to column node, the data node that falls in the above-mentioned grid between interrogation zone is added Candidate Set according to the non-decreasing order, and calculate the maximum line number (row) that satisfies condition;
Step B-3-4-4: with ranks number is that the data node of the mesh node of i (row <i < rower) and j (col<j < column) is pressed the non-decreasing order and added Candidate Set;
Step B-3-4-5: first node in the Candidate Set is added result set;
Step B-3-4-6: the big or small len of judged result collection and the relation of K, if len K gets into step B-3-4-7, otherwise, get into step B-3-5, finish inquiry;
Step B-3-4-7: whether the nodal point number of judged result collection equals the node number in the query context, if unequal, if equate, then its follow-up node is added Candidate Set;
Step B-3-4-8: then len data node in the Candidate Set added result set, and len is added 1, get into step B-3-4-6;
Step B-3-5: the return results collection finishes inquiry as the Top-k Query Result.
Beneficial effect: compared with prior art, the Top-k querying method towards any section data provided by the invention has the index that can be fit to overall Top-k inquiry and any section section data Top-k inquiry, improves freedom and arbitrariness that the Top-k inquiry is used
Description of drawings
Fig. 1 is the process flow diagram of the embodiment of the invention;
Fig. 2 is the even distributed data figure of the embodiment of the invention;
Fig. 3 is the normal distribution data plot of the embodiment of the invention;
Fig. 4 is the Top-k inquiry synoptic diagram based on DGS of the embodiment of the invention.
Embodiment
Below in conjunction with specific embodiment; Further illustrate the present invention; Should understand these embodiment only be used to the present invention is described and be not used in the restriction scope of the present invention; After having read the present invention, those skilled in the art all fall within the application's accompanying claims institute restricted portion to the modification of the various equivalent form of values of the present invention.
As shown in Figure 1, the detailed technology scheme of present embodiment is:
The steps A reading of data
Originally the case of having a try adopts two groups of data that produce at random, all is 20K bar records, two attributes of every record, and one group of data S is evenly distributed in (on 0 to 1000 interval), and another group data N is normal distribution (μ=500, σ=1), and is like accompanying drawing 2, shown in Figure 3.
Step B analyzes the books characteristics of above-mentioned two group data sets; Set up index structure according to data characteristics: if data volume is less, the DG index has been built up and has been got into step B-1; If data volume is bigger, the node on the corresponding DG index of data set is " sparse " (need to add " pseudo-node " just can be reduced to the continuous subgraph in DG index middle level 50% or more) comparatively, entering step B-2; If confirm more at need when any section, get into step B-3.
Step B-1 carries out any section data Top-k querying method based on the DG index;
Top-k querying method (based on the Traveler disposal route of DG) based on any section data of DG index shown in algorithm 1, at first scans the level number of data segment to be checked, and the node of smallest tier minlayer is added Candidate Set RS according to the non-decreasing order; Maximal value R among the RS is added result set, scans all son's node C of R then, if all father's nodes of C all in Candidate Set and C do not visited; Node C is added Candidate Set, then the max node in the Candidate Set is added result set, and the like; If the node that gets into result set is not in query context; The query results number K from increasing 1, when the node number that and if only if gets into result set is K, is stopped inquiry; Then result set is rejected pseudo-node operation; When the node in the result set in query context is not, from result set, to reject, final acquisition needs the Top-k result set result of the data segment of inquiry.
Step B-2: adopt Top-k querying method to carry out Top-k, comprise the steps based on double-deck dominating figure DDG index structure;
Step B-2-1: data are carried out segmentation, and, create algorithm shown in algorithm 2 to the data construct DDG index structure after the segmentation
The foundation of algorithm 2DDG index structure:
Wherein, CreateDGIndex () method be basic DG index structure creation method (shown in algorithm 3) wherein the function of SkylineNode (i) method be to find out the big layer of i.Through with data sementation, create DG index separately then, on the basis of each index, create DG index then, thereby realize the establishment of DDG index structure for the ground floor data.
Algorithm 3DG index set up algorithm
Step B-2-3: carry out the Top-k inquiry, shown in algorithm 4, specifically comprise the steps;
Step B-2-3-1: the DG index that calculates inquiry section place;
Step B-2-3-2: each the DG index to the inquiry place carries out basic Traveler processing, forms result set result;
Step B-2-3-3: the bottom DG index to the inquiry place carries out handling based on the Traveler of DG, and the result is write result;
Step B-2-3-4: the top DG index to the inquiry place carries out handling based on the Traveler of DG, and the result is write result, forms final Top-K Query Result.
Algorithm 4 is based on search algorithm Top-k of any section data of DDG index
Step B-3: the hybrid index querying method based on DG and GS comprises the steps;
Step B-3-1: set up DGS domination network, be divided into double-layer structure up and down, the upper strata is the DG index structure, be suitable for overall Top-k inquiry, and for any section data Top-k inquiry, the GS of lower floor data structure can well keep the throwback dominance relation.
In order to solve the problem that the DG index can not keep the throwback domination of data, the GS network of the relation that we can fine maintenance throwback domination combines with the DG index; But also there are some problems in the GS network; If such as data is not uniform distribution; May cause the data volume of certain several grid too big; And some grid data is too sparse in addition, and for the grid of same level, the dominance relation of the data that it is inner does not well keep; We will adjust the GS grid for this reason, specifically shown in step B-3-2 and step B-3-3.
Step B-3-2: the notion according to adapted mesh is adjusted;
In order to make data all reach even distribution above the dimension at each, avoid the data in the grid too intensive, we carry out adaptive adjustment to each dimension data in the GS grid, make data all reach even distribution above the dimension at each; Adaptive adjustment algorithm is shown in algorithm 6.
Algorithm 6 adaptive adjustment algorithm
Step B-3-3: according to the DG index GS structure is adjusted, increased the dominance relation of same level;
The inner node of same level that makes in the GS network that its objective is that our DG index adjustment GS network is adjusted keeps certain sequence; Because the inquiry of GS index structure is the level inquiry equally; Therefore can reduce the number of comparisons between the same level data; Can utilize technology of prunning branches to get rid of the node that needn't appear at candidate's nodal set, thereby improve search efficiency.
Step B-3-4: inquire about based on DGS domination network, shown in algorithm 6, specifically comprise following steps:
Step B-3-4-1: the row number (column) and the row number (rower) of computational data section data query section place grid;
Step B-3-4-2: handle in the grid row successively and number be column, row number is 0 to rower node, the data node that falls in the above-mentioned grid between interrogation zone is added Candidate Set according to the non-decreasing order, and calculate the maximum column number (col) that satisfies condition;
Step B-3-4-3: handle in the grid row successively and number be rower, row number is 0 to column node, the data node that falls in the above-mentioned grid between interrogation zone is added Candidate Set according to the non-decreasing order, and calculate the maximum line number (row) that satisfies condition;
Step B-3-4-4: with ranks number is that the data node of the mesh node of i (row <i < rower) and j (col<j < column) is pressed the non-decreasing order and added Candidate Set;
Step B-3-4-5: first node in the Candidate Set is added result set;
Step B-3-4-6: the big or small len of judged result collection and the relation of K, if len K gets into step B-3-4-7, otherwise, get into step B-3-5, finish inquiry;
Step B-3-4-7: whether the nodal point number of judged result collection equals the node number in the query context, if unequal, if equate, then its follow-up node is added Candidate Set;
Step B-3-4-8: then len data node in the Candidate Set added result set, and len is added 1, get into step B-3-4-6;
Step B-3-5: the return results collection finishes inquiry as the Top-k Query Result.
We can learn according to existing dominance relation so that as node among Fig. 4 being carried out the Top-k inquiry node c [3] [2] and node c [2] [3] get into Candidate Set, and promptly node 3,11,4 gets into Candidate Set.According to aggregate function F node 4 is added result set then, this moment, the follow-up node with node 4 added Candidate Set, was 6,2,1, because node 1 is simultaneously by grid node c [3] [2] domination, so only node 6,2 is added Candidate Set.Next step adds result sets with node 6, and the node among the grid c [1] this moment [3] still not entirely in result set, therefore need not sought follow-up node and add Candidate Set.Node 3 is added result set, equally node 2 is added result set, the node among the grid node c [1] this moment [3] is all in result set; So need its follow-up node be added result set, be node 7, this moment, node 7 was arranged by c [3] [2]; So need not add Candidate Set, then node 11 is added result set, all nodes among the grid c [3] this moment [2] have all added result set; Need its follow-up node be added Candidate Set, its follow-up node is 5,1,7.In like manner node 5,1 is added result set, the follow-up node adding Candidate Set 8,9,10 with node 1 after node 10 gets into result sets, adds Candidate Set with node 0, thereby accomplishes all inquiries of Top-k.
Claims (5)
1. the Top-k querying method towards any section data is characterized in that, comprises the steps:
Steps A: reading of data;
Step B: analyze data characteristics, set up index structure according to data characteristics: if data volume is less, the DG index is built up, and then gets into any section data Top-k inquiry based on the DG index; If data volume is bigger, when the node on the corresponding DG index of data set is comparatively sparse, then get into Top-k inquiry based on double-deck dominating figure DDG index structure; If confirm more at need, then get into hybrid index inquiry based on DG and GS when any section; Said node is comparatively sparse to be meant needs to add that pseudo-node just can be reduced to the continuous subgraph in DG index middle level more than 50%.
2. the Top-k querying method towards any section data as claimed in claim 1 is characterized in that, any section data Top-k querying method based on the DG index comprises the steps;
Step B-1-1: add pseudo-node with reduction DG index;
Step B-1-2: carry out handling based on the Traveler of DG,
Step B-1-2-1: scan the level number of data segment to be checked, the node of smallest tier minlayer is added Candidate Set RS according to the non-decreasing order, the maximal value R among the RS is added result set;
Step B-1-2-2: the size of judged result collection and the relation of K, if the size of result set changes step B-1-2-3 over to, otherwise changes step B-1-3 over to less than K;
Step B-1-2-3: son's node C of scanning R; If all father's nodes of C all in Candidate Set and C do not visited; Node C is added Candidate Set; And the max node in the Candidate Set added result set, otherwise the node that gets into result set is in query context, and the size of result set is added 1;
Step B-1-3: the dummy record in the deletion result set obtains final Top-k Query Result result.
3. the Top-k querying method towards any section data as claimed in claim 1 is characterized in that the Top-k querying method based on double-deck dominating figure DDG index structure comprises the steps;
Step B-2-1: data are carried out segmentation;
Step B-2-2: to the data construct DDG index structure after the segmentation;
Step B-2-3: carry out the Top-k inquiry, specifically comprise the steps;
Step B-2-3-1: the DG index that calculates inquiry section place;
Step B-2-3-2: each the DG index to the inquiry place carries out basic Traveler processing, forms result set result;
Step B-2-3-3: the bottom DG index to the inquiry place carries out handling based on the Traveler of DG, and the result is write result;
Step B-2-3-4: the top DG index to the inquiry place carries out handling based on the Traveler of DG, and the result is write result, forms final Top-K Query Result.
4. the Top-k querying method towards any section data as claimed in claim 1 is characterized in that the hybrid index querying method based on DG and GS comprises the steps;
Step B-3-1: set up DGS domination network, be divided into double-layer structure up and down, the upper strata is the DG index structure, be suitable for overall Top-k inquiry, and for any section data Top-k inquiry, the GS of lower floor data structure can well keep the throwback dominance relation.
5. step B-3-2: the notion according to adapted mesh is adjusted, and each dimension data of GS layer is all carried out adaptive adjustment, makes data all reach even distribution above the dimension at each;
Step B-3-3: according to the DG index GS structure is adjusted, made that the inner node of same level in the GS network keeps certain sequence, reduce the number of comparisons between the same level data in the DG index;
Step B-3-4: inquire about based on DGS domination network, specifically comprise following steps:
Step B-3-4-1: the row column and row rower of computational data section data query section place grid;
Step B-3-4-2: handle in the grid row successively and number be column, row number is 0 to rower node, the data node that falls in the above-mentioned grid between interrogation zone is added Candidate Set according to the non-decreasing order, and calculate the maximum column col that satisfies condition;
Step B-3-4-3: handle in the grid row successively and number be rower, row number is 0 to rower node, the data node that falls in the above-mentioned grid between interrogation zone is added Candidate Set according to the non-decreasing order, and calculate the maximum line number row that satisfies condition;
Step B-3-4-4: with ranks number is that the data node of the mesh node of i and j is pressed the non-decreasing order and added Candidate Set; Row <i < rower, col < j < column wherein;
Step B-3-4-5: first node in the Candidate Set is added result set;
Step B-3-4-6: the big or small len of judged result collection and the relation of K, if len K gets into step B-3-4-7, otherwise, get into step B-3-5, finish inquiry;
Step B-3-4-7: whether the nodal point number of judged result collection equals the node number in the query context, if unequal, if equate, then its follow-up node is added Candidate Set;
Step B-3-4-8: then len data node in the Candidate Set added result set, and len is added 1, get into step B-3-4-6;
Step B-3-5: the return results collection finishes inquiry as the Top-k Query Result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210257640.1A CN102799681B (en) | 2012-07-24 | 2012-07-24 | Top-k query method oriented to any data segment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201210257640.1A CN102799681B (en) | 2012-07-24 | 2012-07-24 | Top-k query method oriented to any data segment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN102799681A true CN102799681A (en) | 2012-11-28 |
CN102799681B CN102799681B (en) | 2014-11-12 |
Family
ID=47198791
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201210257640.1A Active CN102799681B (en) | 2012-07-24 | 2012-07-24 | Top-k query method oriented to any data segment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102799681B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106055674A (en) * | 2016-06-03 | 2016-10-26 | 东南大学 | top-k arrangement query method based on metric space in distributed environment |
CN108616837A (en) * | 2018-04-13 | 2018-10-02 | 中南大学 | A kind of Top-k query method for the double-deck Sensor Network |
CN110046265A (en) * | 2019-03-08 | 2019-07-23 | 西安理工大学 | A kind of subgraph query method based on bilayer index |
CN112328877A (en) * | 2020-11-03 | 2021-02-05 | 南京航空航天大学 | Skyline inquiry method for multiple users on time-dependent road network |
CN113032400A (en) * | 2021-03-31 | 2021-06-25 | 上海天旦网络科技发展有限公司 | High-performance TopN query method, system and medium for mass data |
US20210406312A1 (en) * | 2020-02-20 | 2021-12-30 | Yahoo Japan Corporation | Information processing apparatus and information processing method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110202526A1 (en) * | 2010-02-12 | 2011-08-18 | Korea Advanced Institute Of Science And Technology | Semantic search system using semantic ranking scheme |
CN102163218A (en) * | 2011-03-28 | 2011-08-24 | 武汉大学 | Graph-index-based graph database keyword vicinity searching method |
CN102375852A (en) * | 2010-08-24 | 2012-03-14 | 中国移动通信集团公司 | Method for building data index as well as method and system using data index for inquiring data |
-
2012
- 2012-07-24 CN CN201210257640.1A patent/CN102799681B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110202526A1 (en) * | 2010-02-12 | 2011-08-18 | Korea Advanced Institute Of Science And Technology | Semantic search system using semantic ranking scheme |
CN102375852A (en) * | 2010-08-24 | 2012-03-14 | 中国移动通信集团公司 | Method for building data index as well as method and system using data index for inquiring data |
CN102163218A (en) * | 2011-03-28 | 2011-08-24 | 武汉大学 | Graph-index-based graph database keyword vicinity searching method |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106055674A (en) * | 2016-06-03 | 2016-10-26 | 东南大学 | top-k arrangement query method based on metric space in distributed environment |
CN106055674B (en) * | 2016-06-03 | 2019-05-31 | 东南大学 | A kind of top-k under distributed environment based on metric space dominates querying method |
CN108616837A (en) * | 2018-04-13 | 2018-10-02 | 中南大学 | A kind of Top-k query method for the double-deck Sensor Network |
CN110046265A (en) * | 2019-03-08 | 2019-07-23 | 西安理工大学 | A kind of subgraph query method based on bilayer index |
CN110046265B (en) * | 2019-03-08 | 2022-10-11 | 西安理工大学 | Subgraph query method based on double-layer index |
US20210406312A1 (en) * | 2020-02-20 | 2021-12-30 | Yahoo Japan Corporation | Information processing apparatus and information processing method |
CN112328877A (en) * | 2020-11-03 | 2021-02-05 | 南京航空航天大学 | Skyline inquiry method for multiple users on time-dependent road network |
CN113032400A (en) * | 2021-03-31 | 2021-06-25 | 上海天旦网络科技发展有限公司 | High-performance TopN query method, system and medium for mass data |
Also Published As
Publication number | Publication date |
---|---|
CN102799681B (en) | 2014-11-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102799681B (en) | Top-k query method oriented to any data segment | |
CN102201001B (en) | Fast retrieval method based on inverted technology | |
Kranen et al. | The clustree: indexing micro-clusters for anytime stream mining | |
Combe et al. | Combining relations and text in scientific network clustering | |
CN102306176B (en) | On-line analytical processing (OLAP) keyword query method based on intrinsic characteristic of data warehouse | |
Zou et al. | Pareto-based dominant graph: An efficient indexing structure to answer top-k queries | |
Zhao et al. | ICFS clustering with multiple representatives for large data | |
Deng et al. | An improved fuzzy clustering method for text mining | |
CN111259933B (en) | High-dimensional characteristic data classification method and system based on distributed parallel decision tree | |
Wang et al. | Distance-based outlier detection on uncertain data | |
CN107341199B (en) | Recommendation method based on document information commonality mode | |
Hosseini Rad et al. | A new hybridization of DBSCAN and fuzzy earthworm optimization algorithm for data cube clustering | |
CN114186121A (en) | Mixed recommendation algorithm system based on service record | |
CN111309944A (en) | Digital human search algorithm based on graph database | |
CN106611016A (en) | Image retrieval method based on decomposable word pack model | |
CN111639673A (en) | Self-interpretation protocol modeling method for processing mixed feature data | |
Singh et al. | Knowledge based retrieval scheme from big data for aviation industry | |
Arslan et al. | Comparison of feature-based and image registration-based retrieval of image data using multidimensional data access methods | |
Lai et al. | Approximate minimum spanning tree clustering in high-dimensional space | |
Siddique et al. | Extended k-dominant skyline in high dimensional space | |
Ngo et al. | Distribution pattern of free living nematode communities in the eight Mekong estuaries by seasonal factor | |
CN113792202B (en) | User classification screening method | |
Ramsak et al. | Interactive ROLAP on large datasets: a case study with UB-trees | |
Yavtushenko | Peculiarities of data processing methods in a business organization’s CRM system | |
Sairam et al. | Optimizined skyline queries over uncertain data using improved scalable framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |