CN105045806A - Dynamic splitting and maintenance method of quantile query oriented summary data - Google Patents

Dynamic splitting and maintenance method of quantile query oriented summary data Download PDF

Info

Publication number
CN105045806A
CN105045806A CN201510304691.9A CN201510304691A CN105045806A CN 105045806 A CN105045806 A CN 105045806A CN 201510304691 A CN201510304691 A CN 201510304691A CN 105045806 A CN105045806 A CN 105045806A
Authority
CN
China
Prior art keywords
data
digit
summary data
node
fractile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510304691.9A
Other languages
Chinese (zh)
Other versions
CN105045806B (en
Inventor
王树鹏
张燕琴
吴广君
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN201510304691.9A priority Critical patent/CN105045806B/en
Publication of CN105045806A publication Critical patent/CN105045806A/en
Application granted granted Critical
Publication of CN105045806B publication Critical patent/CN105045806B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/40Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
    • G06F16/43Querying

Abstract

The present invention relates to a dynamic splitting and maintenance method of quantile query oriented summary data. The method comprises: firstly, sampling a written data item to construct q-digit summary data; secondly, according to a quantile query rule of q-digit backward traversal, querying an intermediate point of a data item in the q-digit summary data; and then, reversely traversing the q-digit summary data based on the intermediate point, establishing a segmentation path, and according to the segmentation path, splitting the q-digit summary data into two summary data structures with approximately equal data volumes, wherein after splitting, each structure is still an independent q-digit structure, and can normally receive and process a newly arrived data source. The dynamic splitting and maintenance method of quantile query oriented summary data can be used to dynamically manage the q-digit summary data in a distributed environment, effectively support the maintenance and management of summary data in a big-data environment, and effectively support the quantile query and computation.

Description

A kind of summary data Dynamic Division towards fractile inquiry and maintaining method
Technical field
The invention belongs to areas of information technology, propose a kind of summary data Dynamic Division based on q-digit and maintaining method, the method comprises that the split point of summary data structure is selected, method for estimating error etc. after Dynamic Division algorithm and division.The method can be used for the dynamic management of q-digit summary data under distributed environment, effectively supports the maintenance and management of summary data under large data environment, effectively supports fractile inquiry and calculate.
Background technology
Under the large data environment of streaming, the important querying method of a class carries out fractile (Quantile) inquiry on stream data, is typically expressed as φ-fractile inquiry, its physical significance be data are sorted after return the , be called for short fractile inquiry.The span of fractile φ is a real number between 0 to 1, that is: (0,1].1-fractile (Φ=1) is exactly the maximal value that data query is concentrated, and 0.5-fractile (Φ=1) is then the intermediate value of data centralization, also known as making median.Such as: given stream data collection D={6,1,8,7,9,0,4,2,5,3}, after sorting D '=0,1,2,3,4,5,6,7,8,9}, 0.1-fractile inquiry return 0; The inquiry that the inquiry of 0.5-fractile returns 4,1-fractile returns maximal value 9.
Under flow data environment, due to cannot total data be obtained, therefore cannot effectively sort to data, now fractile inquiry seems particularly important, such as, the temperature tendency in each place of monitoring, inquires about the maximum temperature of some sensor nodes within nearest a period of time (1-fractile) in real time, medium temperature (0.5-fractile), or even the profiling temperatures of whole ratio.In addition fractile inquiry is also applied in the fields such as stock market trend analysis, web Aggregation Query, Web log mining, distributed storage data management.
Because flow data arrives at a high speed, cannot obtain and store whole partial datas, the more employings of current industry are similar to fractile querying method, by the sampled data of part, obtain approximate fractile inquiry, the target calculated with the real-time fractile reached under flow data environment.
The research that approximate fractile calculates at present mainly concentrates in the counting yield of related algorithm and the optimization of storage efficiency.Typical achievement in research is summarized as follows: the MRL99 algorithm (G.S.Manku that Manku etc. carry, S.Rajagopalan, andB.G.Lindsay.Randomsamplingtechniquesforspaceefficient onlinecomputationoforderstatisticsoflargedatasets.InACMS IGMOD, 1999.) be a kind of search algorithm based on single pass.The space complexity of this algorithm is return the consistance approximation (| r'-r|≤ε N) determined.This algorithm weak point knows data item number N definite in data stream in advance.Greenwald and Khanna proposes another kind of fractile search algorithm-GK algorithm (M.GreenwaldandS.Khanna.Space-efficientonlinecomputationo fquantilesummaries.InACMSIGMOD, 2001.).A upper algorithm space complexity has not only been reduced to by this algorithm also need not predict data item number N in advance simultaneously.When data stream codomain is known, Cormode and Muthukrishnan proposes application count-min technology (G.Cormode further, S.Muthukrishnan.Animproveddatastreamsummary:thecount-min sketchanditsapplications.JournalofAlgorithms.2005, vol.55, no.1.pp.58-75.) carry out interval management, space complexity is this algorithm space complexity is only relevant with the codomain divided and irrelevant with the data item number of the actual arrival in data stream, reduce space consuming, but this method effectively cannot support the dynamic division in any codomain interval.
Q-digit approximate enquiring method (the N.Shrivastava that Shrivastava etc. propose, C.Buragohain, D.Agrawal, andS.Suri.Mediansandbeyond:Newaggregationtechniquesforse nsornetworks.InACMSenSys, 2004.), when data item constantly arrives, can dynamic conditioning summary data be responsible for numerical intervals, by certain traversal rule, support the inquiry of flow data fractile.Summary data constructed by q-digit can capture-data distribution characteristics approx, and need not store all concrete data reached and sort.The core concept that summary data builds is distribution according to data, carries out Auto-grouping to the sample values in summary data, and puts it to the having in the bucket of similar weight of variable-size.Q-digit can support the operation of some complexity further, as inquiry mid point, figure place inquiry, the inquiry of reversion fractile, the inquiry of range query frequent episode and cooperation control inquiry etc.
In addition, q-digit algorithm has the controlled feature of error.If the integer range of data item key word value is [1, σ], in q-digit summary data, sample data size is m, then the resultant error of carrying out fractile inquiry is less than O (log (σ)/m).Q-digit is that the fractile of extensively employing in flow data is at present according to querying method.
Summary of the invention
The current algorithm about fractile inquiry and related application are mainly launched under centralized stores environment, and how primary study improves the approximate treatment precision of algorithm and the efficiency of algorithm.But under distributed environment, Data distribution8, on different memory devices and loading equipemtn, needs to build data partition independent of each other module, along with the continuous write of data, the operations such as the separation that summary data corresponding in each subregion is also faced with and merging.
Under the present invention is directed to distributed environment, support the summary data of fractile inquiry, high-precision summary data separation/splitting method is proposed, by the summary data structure of a subregion, select the intermediate point (Φ=0.5) of data volume equalization to be separated, split into two data volumes and be similar to impartial summary data structure.Each summary data structure is independent after division supports follow-up data query and process.
Specifically, the technical solution used in the present invention is as follows:
Towards a summary data splitting method for fractile inquiry, its step comprises:
1) data item of write is sampled, build q-digit summary data;
2) according to the fractile rule searching of q-digit postorder traversal, the intermediate point of data item in inquiry q-digit summary data;
3) backward traversal q-digit summary data based on intermediate point, sets up split path, according to split path, q-digit summary data is split into two data volumes and be similar to impartial summary data structure.
Further, step 1) data organizational structure of described q-digit summary data can adopt tree structure, array, chained list etc.
Preferably, the data organizational structure of described q-digit summary data is tree structure, comprises the concrete steps that it divides:
A) according to the requirement of split point, intermediate point is found, as split point according to the fractile rule searching of q-digit postorder traversal;
B) take split point as starting point, along tree structure backward to father node, until root node, thus obtain split path; Based on this split path, the node of q-digit summary data is divided into two subtrees in left and right, the node on this split path is preserved respectively in left subtree and right subtree;
C) on left and right two stalk tree, revise respectively interior nodes the scope of the codomain of data space be responsible for, when intermediate node be responsible for scope identical time, merging intermediate node.
A kind of summary data dynamic maintaining method towards fractile inquiry, when load occurs unbalanced, or when needing to increase new treatment facility, said method is adopted to carry out splitting operation to summary data, a part of data are shared on other processing nodes, divides later summary data and independently support to divide the data query in later data interval.
The technology of the present invention key point mainly below 3 points:
1., in conjunction with fractile rule searching and error analysis method, the method for a kind of backward traversal q-digit is proposed.In Q-digit query script, be taken to the end and on traversal method, obtain the fractile Query Result of arbitrfary point with this.According to this rule searching, the present invention proposes a kind of from any quantile, and adopt tree type structure backward traversal method, said method effectively can set up the split path of any quantile, and split path can be divided into two a certain proportion of two summary data collection;
2. the split path proposed in Application way 1, proposes the splitting method of q-digit.First method sets up split path according to the intermediate point of Φ=0.5, the method of postorder traversal is adopted to obtain two the y-bend subtrees in left and right, and revise the interval range of the data that interior nodes is corresponding in the middle of each binary tree, and then complete rebuilding of q-digit summary data in new data interval;
3. the summary data after being separated under pair method (1) (2) carries out error estimation and analysis, through theoretical analysis, divide later summary data can completely independently support to divide the data query in later data interval, and keep maximum error not change.
Compared with prior art, beneficial effect of the present invention is as follows:
1. the splitting method that the present invention proposes carries out according to q-digit rule searching, ensure that division result does not change the original querying method of q-digit, method for estimating error and relative various application, make the inventive method possess good application prospect and theoretical foundation;
2. the present invention has only used the original summary data structure of q-digit, achieves the splitting function of summary data, ensure that fission process performs fast.After division, each structure remains independently q-digit structure, normally can receive and process newly arrived data source, and therefore this method effectively can support the process such as the Dynamic Division of arbitrary data subregion under distributed environment and merging.
3. the present invention can be used for Dynamic Maintenance and the management of q-digit summary data under distributed environment, can obtain corresponding structure at any time according to method of the present invention.When such as load occurs unbalanced, when increasing new treatment facility, method of the present invention now can be adopted to share a part of data on other processing nodes.The method that upper-layer service can propose according to the present invention, according to situation at that time, triggers splitting operation.
Accompanying drawing explanation
Fig. 1 is q-digit summary data structural representation in embodiment.
Fig. 2 is the left subtree q1 and the right subtree q2 schematic diagram that carry out dividing rear generation in embodiment according to splitpath, and wherein (a) figure is q1 subtree exemplary plot, and (b) figure is q2 subtree exemplary plot.
Fig. 3 is that in embodiment, after division, left subtree q1 safeguards schematic diagram, and wherein (a) figure is amendment q1 range of nodes schematic diagram after division, and (b) figure is that after division, q1 node merges schematic diagram.
Fig. 4 is the rear right subtree q2 node maintenance schematic diagram of division in embodiment, wherein (a) figure is amendment q2 range of nodes schematic diagram after division, b () figure is that after division, q2 node merges schematic diagram, (c) figure is that after division, right subtree q2 node merges net result schematic diagram.
Fig. 5 is that in embodiment, q-digit applies schematic diagram under distributed environment.
Embodiment
For enabling above-mentioned purpose of the present invention, feature and advantage become apparent more, and below by specific embodiments and the drawings, the present invention will be further described.
The present invention, based on q-digit, according to fractile rule searching, can inquire the intermediate point of data item, i.e. Φ=0.5.Then based on intermediate point, according to backward traversal q-digit summary data method, set up split path, according to split path, summary data is divided into into approximately equalised two subsets in left and right interval, the original data volume namely obtaining two sub-ranges after apportion corresponding respectively accounts for 50%.Only use the summary data preserved in q-digit in detachment process, and remain q-digit original function and character, fractile inquiry error does not change.
Q-digit can use the realizations such as tree structure, array, chained list, is described below for the structure of binary tree.Other data organizational structures specifically can implement with reference to this structure, such as array, list structure etc., corresponding relation between its level still demand fulfillment the present invention describe tree, the relation in implementation procedure between element and sampling principle, all consistent with tree structure.
The node rule of 1.q-digit summary data binary tree
Q-digitdigit interior joint is divided into root node, leaf node and interior nodes, and interior nodes must meet following two conditions:
(1)
(2)
Wherein count (v) represents the value of node v, v pthe father node of v, v sbe the brotgher of node of v, n is first norm (data scale) of all data item, and k is the compression parameters of algorithm setting.
2. summary data division core Methods and steps
1) q-digit summary data builds.According to above-mentioned joint structure condition, the data item of write is sampled, and builds binary tree structure; The structure of binary tree and query script can with reference to traditional q-digit theories of algorithm;
2) inquiry of split point.When dividing, according to the requirement of input split point, according to the fractile querying method of q-digit postorder traversal, find corresponding node;
3) calculating of backward splitpath.Take split point as starting point, along the right subtree backward of binary tree to father node, until root node, based on this path, the node of q-digit summary data is divided into two, left and right subtree (node on path is preserved respectively in left subtree and right subtree);
4) the codomain scope of knot modification record.On left and right two stalk tree, revise respectively interior nodes the scope of data space codomain be responsible for, when intermediate node be responsible for scope identical time, merging intermediate node.
3. concrete case study on implementation
Below in conjunction with concrete data, provide the concrete implementation method process of above-mentioned steps.
1) q-digit summary data builds
The data item of tentation data is key-value type, i.e. key-value form.When the data arrives, write corresponding leaf node according to key, according to the condition that the node in above-mentioned q-digit must meet, total is safeguarded.Usual way is from bottom to top, successively carries out the compression of node, merging, and the compression degree of summary data is relevant with above-mentioned parameter k.
If Fig. 1 builds the q-digit summary data structure of getting up according to above-mentioned rule, dotted portion represents selected Split Right subtree (q2).Wherein, in figure, the node of white is empty (node of the disappearance after compression merging), in order to the integrality of binary tree still records in the example shown.Solid node is the node still existed after compression, the numbering of numeral node in node, root node is 1, the left child nodes of root node is labeled as 2 with this, the right child nodes of root node is 3..., near diagram interior joint the scope [min, max] of mark represent this node the scope of (key word key) be responsible for.In the scope [1-16] of whole q-digit key word, k=5, as shown in Figure 1.
2) inquiry of split point.
The process of division q-digit is as follows: utilize q-digit to carry out fractile inquiry Φ=0.5, follow-up binary tree traversal (point of black), order is left subtree, right subtree, root node, with this order recursive traversal, the sequence node obtained is <8>, <18>, <9>, <20>, <26>, <13>, <6> ... <15>, <3>, <1>.Suppose that label be the node of <13> is Φ=0.5, the scope that it represents key is [11-12], so obtain the key=12 of the fractile of Φ=0.5, key=12 is as the intermediate point of division.
3) calculating of backward splitpath.
The computation process of backward splitpath is along right subtree to the order of root node from the node of fractile key=12, be divided into two parts in left and right: splitpath is: <13>, <6>, <3>, <1>.Two subtrees are divided into according to backward path, be respectively: left subtree: <8>, <18>, <26>, <13>, <6>, <3>, <1>; Right subtree: <6>, <28>, <14>, <15>, <3>, <1>, wherein splitpath is at each portion of left and right subtree, and split point only remains on left subtree.In fig. 2, left and right subtree is recorded as respectively: q1 and q2.In Fig. 2, (a) figure is q1 subtree exemplary plot, and (b) figure is q2 subtree exemplary plot.The scope of the key that q1 is responsible for is [1-12]; The scope that q2 is responsible for is the data partition of [13-16].
4) the codomain scope of knot modification record.
Amendment q1, q2 interior joint scope, if the Range Representation that each node is responsible for is: [min, max].The principle of amendment is: the range of nodes of max>12 in q1 made into [min, 12].The range of nodes of min<13 in q2 is changed into [13, max].Here suppose that the value of key is integer.Fig. 3 (a) is amendment q1 range of nodes exemplary plot, and Fig. 4 (a) is amendment q2 range of nodes exemplary plot.
Owing to have changed the scope of some node in fission process, the scope of key is identical to cause some nodes to represent, needs they to be merged into a node further, and the value value merging posterior nodal point is two node value value sums to be combined.Such as, in q2, node such as label for max<13 is the node of <6>, its scope is become [13,13], now identical with the scope that leaf node <28> represents, now need to merge with leaf node.Merging process is as shown in Fig. 4 (b).
In q1, the scope [1-16] of node <1> changes into [1-12], the scope [9-16] of node <3> changes into [9-12], and the value of <3> and <6> is merged into <6>.As shown in Fig. 3 (b).
<1>, <3>, <7> scope is identical, be merged into <7>. node <6>, <28> is merged into leaf node <28>.As shown in Fig. 4 (b) He Fig. 4 (c).
4. error analysis
1) q-digit inquires about error
V is certain node in q-digit, and x is the ancestor node of v, follows the process of establishing according to q-digit, can obtain with lower inequality: error ( v ) &le; &Sigma; x &Element; ancestor ( v ) count ( x ) , Because the node in q-digit must satisfy condition so have inequality again error ( v ) &le; &Sigma; x &Element; ancestor ( v ) count ( x ) &le; &Sigma; x &Element; ancestor ( v ) n k &le; log &sigma; &CenterDot; n k , Wherein log σ is the binary tree height of tree, and n is first norm (data scale) of all data item, and k is compression parameters.
Interval query (range-query) defines: to key 1and key 2the value value summation of interval data item, namely wherein value iinterval [key 1, key 2] in the value value of certain data item.From q-digit building process, interval query maximum error is too
2) after division, q-digit inquires about error
Because in fact still utilize original binary tree to inquire about after division, being equivalent to obtain two identical trees through copying, removing some unnecessary nodes and forming left subtree q1; Another one tree removes some unnecessary nodes, and remaining part forms q2.No matter inquire about the node in q1 or q2, the sequence node order formed through follow-up traversal with original all the same, so error and original q-digit are identical.
The application of 5.q-digit under distributed environment
Fig. 5 is that q-digit applies schematic diagram under distributed environment.Data are input to " Data distribution8 statistics " module, carry out data analysis after arriving.This module is made up of multiple separate q-digit, and each q-digit is responsible for adding up a certain section of interval censored data information.Under distributed environment, carry out as required between multiple fractile structure dividing and safeguard.
Above embodiment is only in order to illustrate technical scheme of the present invention but not to be limited; those of ordinary skill in the art can modify to technical scheme of the present invention or equivalent replacement; and not departing from the spirit and scope of the present invention, protection scope of the present invention should be as the criterion with described in claims.

Claims (7)

1., towards a summary data splitting method for fractile inquiry, its step comprises:
1) data item of write is sampled, build q-digit summary data;
2) according to the fractile rule searching of q-digit postorder traversal, the intermediate point of data item in inquiry q-digit summary data;
3) backward traversal q-digit summary data based on intermediate point, sets up split path, according to split path, q-digit summary data is split into two data volumes and be similar to impartial summary data structure.
2. the method for claim 1, is characterized in that, step 1) data organizational structure of described q-digit summary data be following in one: tree structure, array, chained list.
3. method as claimed in claim 2, it is characterized in that, the data organizational structure of described q-digit summary data is tree structure, comprises the concrete steps that it divides:
A) according to the requirement of split point, intermediate point is found, as split point according to the fractile rule searching of q-digit postorder traversal;
B) take split point as starting point, along tree structure backward to father node, until root node, thus obtain split path; Based on this split path, the node of q-digit summary data is divided into two subtrees in left and right, the node on this split path is preserved respectively in left subtree and right subtree;
C) on left and right two stalk tree, revise respectively interior nodes the scope of the codomain of data space be responsible for, when intermediate node be responsible for scope identical time, merging intermediate node.
4. method as claimed in claim 3, is characterized in that: the node of the tree structure of described q-digit summary data is divided into root node, leaf node and interior nodes, and wherein interior nodes meets following two conditions:
Wherein count (v) represents the value of node v, v pthe father node of v, v sbe the brotgher of node of v, n is the first norm of all data item, and k is the compression parameters of setting.
5. method as claimed in claim 3, is characterized in that: described tree structure is binary tree structure.
6. the method for claim 1, is characterized in that: step 1) data item that writes is key-value type.
7. the summary data dynamic maintaining method towards fractile inquiry, it is characterized in that, when load occurs unbalanced, or when needing to increase new treatment facility, method according to any one of claim 1 to 6 is adopted to carry out splitting operation to summary data, a part of data are shared on other processing nodes, divides later summary data and independently support to divide the data query in later data interval.
CN201510304691.9A 2015-06-04 2015-06-04 A kind of summary data Dynamic Division and maintaining method towards quantile inquiry Expired - Fee Related CN105045806B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510304691.9A CN105045806B (en) 2015-06-04 2015-06-04 A kind of summary data Dynamic Division and maintaining method towards quantile inquiry

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510304691.9A CN105045806B (en) 2015-06-04 2015-06-04 A kind of summary data Dynamic Division and maintaining method towards quantile inquiry

Publications (2)

Publication Number Publication Date
CN105045806A true CN105045806A (en) 2015-11-11
CN105045806B CN105045806B (en) 2019-04-09

Family

ID=54452353

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510304691.9A Expired - Fee Related CN105045806B (en) 2015-06-04 2015-06-04 A kind of summary data Dynamic Division and maintaining method towards quantile inquiry

Country Status (1)

Country Link
CN (1) CN105045806B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293298A (en) * 2016-04-05 2017-10-24 富泰华工业(深圳)有限公司 Speech control system and method
CN107368281A (en) * 2017-04-21 2017-11-21 阿里巴巴集团控股有限公司 A kind of data processing method and device
CN107506418A (en) * 2017-08-16 2017-12-22 吉林大学 A kind of online flow data approximate evaluation method and device
CN108388603A (en) * 2018-02-05 2018-08-10 中国科学院信息工程研究所 The construction method and querying method of distributed summary data structure based on Spark frames
CN111291108A (en) * 2020-01-16 2020-06-16 西北农林科技大学 Method and device for obtaining overall probability distribution of uncertain data set based on bit-splitting summary
CN111310784A (en) * 2020-01-14 2020-06-19 支付宝(杭州)信息技术有限公司 Resource data processing method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102970349A (en) * 2012-11-02 2013-03-13 上海交通大学 Distributed hash table (DHT) network storage load balancing method
CN104391679A (en) * 2014-11-18 2015-03-04 浪潮电子信息产业股份有限公司 GPU (graphics processing unit) processing method for high-dimensional data stream in irregular stream

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102970349A (en) * 2012-11-02 2013-03-13 上海交通大学 Distributed hash table (DHT) network storage load balancing method
CN104391679A (en) * 2014-11-18 2015-03-04 浪潮电子信息产业股份有限公司 GPU (graphics processing unit) processing method for high-dimensional data stream in irregular stream

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
杨蓓等: "数据流上的分位数近似算法研究", 《计算机研究与发展》 *
钟志勇等著: "《微信公众平台应用开发实战 第2版》", 30 June 2014, 机械工业出版社 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107293298A (en) * 2016-04-05 2017-10-24 富泰华工业(深圳)有限公司 Speech control system and method
CN107368281A (en) * 2017-04-21 2017-11-21 阿里巴巴集团控股有限公司 A kind of data processing method and device
CN107368281B (en) * 2017-04-21 2020-10-16 阿里巴巴集团控股有限公司 Data processing method and device
CN107506418A (en) * 2017-08-16 2017-12-22 吉林大学 A kind of online flow data approximate evaluation method and device
CN107506418B (en) * 2017-08-16 2018-08-24 吉林大学 A kind of online flow data approximate evaluation method and device
CN108388603A (en) * 2018-02-05 2018-08-10 中国科学院信息工程研究所 The construction method and querying method of distributed summary data structure based on Spark frames
CN108388603B (en) * 2018-02-05 2022-05-17 中国科学院信息工程研究所 Spark framework-based distributed summary data structure construction method and query method
CN111310784A (en) * 2020-01-14 2020-06-19 支付宝(杭州)信息技术有限公司 Resource data processing method and device
CN111291108A (en) * 2020-01-16 2020-06-16 西北农林科技大学 Method and device for obtaining overall probability distribution of uncertain data set based on bit-splitting summary

Also Published As

Publication number Publication date
CN105045806B (en) 2019-04-09

Similar Documents

Publication Publication Date Title
CN105045806A (en) Dynamic splitting and maintenance method of quantile query oriented summary data
CN102915347B (en) A kind of distributed traffic clustering method and system
CN102214176B (en) Method for splitting and join of huge dimension table
CN102722531B (en) Query method based on regional bitmap indexes in cloud environment
CN106528787B (en) query method and device based on multidimensional analysis of mass data
CN103678671A (en) Dynamic community detection method in social network
CN105956015A (en) Service platform integration method based on big data
CN106250457B (en) The inquiry processing method and system of big data platform Materialized View
CN101119302B (en) Method for digging frequency mode in the lately time window of affair data flow
Liang et al. Express supervision system based on NodeJS and MongoDB
JP6928677B2 (en) Data processing methods and equipment for performing online analysis processing
CN103336790A (en) Hadoop-based fast neighborhood rough set attribute reduction method
CN108388603B (en) Spark framework-based distributed summary data structure construction method and query method
US10977280B2 (en) Systems and methods for memory optimization interest-driven business intelligence systems
Xiao et al. SWEclat: a frequent itemset mining algorithm over streaming data using Spark Streaming
CN105653609A (en) Memory-based data processing method and device
CN105608135A (en) Data mining method and system based on Apriori algorithm
CN105405069A (en) Electricity purchase operating decision analysis and data processing method
CN106055590A (en) Power grid data processing method and system based on big data and graph database
CN105550368A (en) Approximate nearest neighbor searching method and system of high dimensional data
Xu et al. Distributed maximal clique computation and management
CN104462095B (en) A kind of extracting method and device of query statement common portion
KR101955376B1 (en) Processing method for a relational query in distributed stream processing engine based on shared-nothing architecture, recording medium and device for performing the method
KR101515304B1 (en) Reduce-side join query processing method for hadoop-based reduce-side join processing system
CN110008239A (en) Logic based on precomputation optimization executes optimization method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20190409

CF01 Termination of patent right due to non-payment of annual fee