CN106250565A - Fragmental relational database-based query method and system - Google Patents

Fragmental relational database-based query method and system Download PDF

Info

Publication number
CN106250565A
CN106250565A CN201610771058.5A CN201610771058A CN106250565A CN 106250565 A CN106250565 A CN 106250565A CN 201610771058 A CN201610771058 A CN 201610771058A CN 106250565 A CN106250565 A CN 106250565A
Authority
CN
China
Prior art keywords
column
name
value
query
data
Prior art date
Application number
CN201610771058.5A
Other languages
Chinese (zh)
Other versions
CN106250565B (en
Inventor
刘德建
邱宗铭
陈霖
吴拥民
陈宏展
Original Assignee
福建天晴数码有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 福建天晴数码有限公司 filed Critical 福建天晴数码有限公司
Priority to CN201610771058.5A priority Critical patent/CN106250565B/en
Publication of CN106250565A publication Critical patent/CN106250565A/en
Application granted granted Critical
Publication of CN106250565B publication Critical patent/CN106250565B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Abstract

The invention provides a fragmental relational database-based query method and system, and solves the problems that the sequential execution performance of an intermediate node is very poor when a large amount of data is read to a memory of a central node and the online query demand cannot be met when the data amount is large in the prior art. The method comprises the steps of receiving a query statement with the following semanteme: SELECT column name A, COUNT (DISTINCT column name B) FROM table name T GROUP BY column name G, wherein the column name G is a subset of the column name A. According to the method and the system, the memory consumption and even memory overflow of the central node can be avoided.

Description

基于分片关系型数据库的查询方法和系统 Based query method and system fragmentation relational database

技术领域 FIELD

[0001 ] 本发明涉及分布式数据的查询,特别涉及包含count ,distinct和groupby的查询。 [0001] The present invention relates to distributed data query, and in particular relates to a count, distinct and groupby query.

背景技术 Background technique

[0002]大数据时代,数据统计是一个常见业务。 [0002] the era of big data, statistical data is a common business. 现在的统计型业务常常构建在关系型数据库之上,比如mysql。 Current statistics type of business often built on a relational database, such as mysql. 这类关系型数据库,因为单机的限制,数据量在千万级别之后,统计性能就会急剧下降。 Such relational databases, because of limitations of stand-alone, the amount of data after the million level, statistical performance decreases dramatically. 通常地,可以通过对关系型数据库做水平分片,将数据分布在多个机器上,来解决单机瓶颈的问题。 Generally, the level can be made by fragmentation of the relational database, the data is distributed across multiple machines, single bottleneck to solve the problem.

[0003]在分片之后,统计类型SQL的执行过程会变成在各个分片节点执行一次统计,然后在中心节点进行一次汇总统计,以保证SQL的语义。 [0003] Following fragmentation, the type of SQL execution statistics will become a count in each node performs fragmentation, followed by a summary statistics, to ensure that the SQL semantics central node. 对于具有分组,去重,计数功能的SQL会遇到功能和性能上的问题。 For a group, to re-counting function will encounter problems on SQL functionality and performance. 该类SQL如下:select group_col, count (distinct dist_col)from table group by group_col。 Such SQL as follows: select group_col, count (distinct dist_col) from table group by group_col. 其中group_col和dist_col可以是一个或者多个字段名。 And wherein group_col dist_col may be one or more field names.

[0004]通常的,如果将该类SQL直接下发到各个分片执行,并在中心节点将各个分片返回的结果进行汇总,会出现结果错误。 [0004] Generally, if a direct SQL execution slices onward transport the class, and the results summarized in the respective sub-central node will return the sheet, result of the error occurs. 因为在多个分片中,可能出现group_col和dist_col —样的数据,分片执行再汇总会出现重复计数导致结果变大。 Kind of data, performing re-aggregated fragments may lead to double counting result is increased - because of the plurality of sub-sheets, group_col and dist_col possible.

[0005]另外地,为了保证结果正确,可能采用以下方法执行该类SQL。 [0005] Additionally, in order to ensure correct results, the following method may be performed such SQL.

[0006] 1.从各个分片节点读取group_col和dist_col包含的字段值。 [0006] 1. Read field value and dist_col group_col comprising fragments from each node. 具体SQL为:selectgroup_col,dist_col from table; SQL is specifically: selectgroup_col, dist_col from table;

[0007] 2.顺序遍历所有的数据,先根据gr0Up_C0l计算数据所属于的分组,再在该分组内根据dist_col执行去重操作。 [0007] 2. The sequence through all the data, the data packet belongs to calculated gr0Up_C0l, and then to re-execute operation according dist_col within the packet.

[0008] 3.所有数据遍历完成后,计算各个分组中去重后的dist_Col数量。 After [0008] 3. All data complete traversal, calculates the number of packets in each dist_Col after deduplication.

[0009]该方法遇到的问题是,会将大量数据读取到中心节点的内存中,中间节点顺序执行性能非常差,在数据量大的时候,无法满足在线查询的需求。 [0009] The method encounters a problem that large amounts of data will be read into memory center node, intermediate node order execution performance is very poor, a large amount of data in time, unable to meet the needs of online queries.

发明内容 SUMMARY

[0010]以下给出对一个或更多个方面的简化概述以力图提供对此类方面的基本理解。 [0010] The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. 此概述不是所有构想到的方面的详尽综览,并且既非旨在指认出所有方面的关键性或决定性要素亦非试图界定任何或所有方面的范围。 This summary is not all contemplated aspects of an extensive overview, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. 其唯一的目的是要以简化形式给出一个或更多个方面的一些概念以作为稍后给出的更加具体的说明之序。 Its sole purpose is to present a simplified form as a sequence more detailed description that is presented later some concepts of one or more aspects.

[0011]本发明提供一种基于分片关系型数据库的查询方法和系统解决现有技术中将大量数据读取到中心节点的内存中,中间节点顺序执行性能非常差,在数据量大的时候,无法满足在线查询的需求的问题。 [0011] The present invention provides a method and system based query relational database fragmentation solution in the prior art to read large amounts of data into the memory of the central node, the intermediate node sequentially performs very poor performance, when a large amount of data in can not meet the needs of online query problem.

[0012]为实现上述目的,发明人提供基于分片关系型数据库的查询方法,包括步骤: [0012] To achieve the above object, the invention provides a method of slicing a query-based relational database, comprising the steps of:

[0013] SlOl、接收语义如下的查询语句:SELECT列名称A,⑶UNT(DISTINCT列名称B)FR0M表名称T GROUP BY列名称G;列名称G是列名称A的子集; [0013] SlOl, received the following query semantics: SELECT column names A, ⑶UNT (DISTINCT column name B) FR0M T GROUP BY name table column name G; G column name is the name of a subset of columns of A;

[0014] S102、在各分片节点分别执行SELECT列名称C FROM表名称T,列名称C是列名称A与列名称B的并集;将上述各分片节点的查询结果中的每条纪录进行处理,处理即根据每条记录的b值取其hash值,hash值相同的记录放入相同的数据管道,b值为记录中列名称B所对应的值; [0014] S102, performing SELECT column names are the FROM table name T C in the slice nodes, C is the column name and column names A and B of the current column name; query results of the respective fragment in each node record processing, i.e. the recording processing according to whichever is the hash value of each record of the value of b, the same hash value into the same data pipeline, the value b is recorded in column B corresponding to the name;

[0015] S103、在相同数据管道的纪录,分别根据G列的值进行分组,在各分组内计算该分组内不同b值出现的个数count,每个分组计算结果对应一条对应关系(g,count); [0015] S103, the same data record in the pipeline, respectively, according to the value of G column groups, the number of count values ​​within the different packet appears b calculated within each packet, each packet corresponding to a calculation result of the correspondence relationship (g, count);

[0016] S104、将各管道中上述分组计算获得的对应关系(G,C0UNT)合并,即根据列名称G,合并g值相同的纪录,合并后的记录的COUNT列的值为合并的所有对应关系的COUNT列的值相加;合并的结果即为查询结果。 [0016] S104, the combined value of the correspondence relationship (G, C0UNT) were combined, i.e. the same column name records under G, the value of g combined, the combined records in each packet the above-described pipe obtained by calculation of all corresponding column COUNT COUNT column value adding relationships; the combined result is the query results.

[0017] 进一步,列名称G等于列名称A。 [0017] Further, G is equal to the column name column name A.

[0018]进一步,数据管道的个数N为机器cpu核心数的两倍,所述步骤“处理即根据每条记录b值取其hash值,hash值相同的记录放入相同的数据管道”为根据每条记录的b值求取该b值对应的hash值,并将该hash值模N,根据模N的值分配该条记录到对应的数据管道,数值0_到N-1分别唯一对应I个数据管道。 [0018] Further, the number N of data pipe is twice the number of machine cpu core, said step "process i.e. whichever hash value, records the same hash value into the pipeline in accordance with the same data value for each record b" is the value of b is obtained for each record hash value corresponding to the value b, and the hash value of the modulus N, the recording data corresponding to the pipe according to the value assigned to that section modulus N, N-1 values ​​0_ to uniquely correspond respectively I data pipe.

[0019] 进一步,在步骤“接收语义如下的查询语句:SELECT列名称A,C0UNT(DISTINCT列名称B)FR0M表名称T GROUP BY列名称G之前还包括步骤: [0019] Further, in the step "received query semantics are as follows: before the SELECT column names A, C0UNT (DISTINCT column name B) FR0M table column names Name G T GROUP BY further comprising the step of:

[0020]读取数据库查询语句, [0020] read the database query,

[0021 ]判断查询语句是否符合语意; [0021] determine whether the query semantic;

[0022]若不符合语意,则返回; [0022] do not meet the semantic Returns;

[0023]若符合语意,则执行上述接收查询语句的步骤。 [0023] if they meet the semantics, the step of receiving the above query execution.

[0024]进一步,其中上述SlOl、S102为根据分片节点中的表T的大小分批执行,上述步骤S103在分片节点执行。 [0024] Further, where the above-described SlOl, S102 is the size of the tile nodes to perform the batch table T, in step S103 performed fragmentation node.

[0025]本文还提供一种分布式数据库系统,其包括至少第一分片节点和第二分片节点、接收模块、第一处理模块、第二处理模块、第三处理模块;第一分片节点和第二分片节点分别存储有数据库的水平切片数据; [0025] Also provided herein is a distributed database system comprising at least a first fragment and a second fragment node node, receiving module, a first processing module, a second processing module, a third processing module; first fragment node and the second node are stored slice level of the slice data in the database;

[0026] 接收模块接收语义如下的查询语句:SELECT列名称A,⑶UNT(DISTINCT列名称B)FROM表名称T GROUP BY列名称G;列名称G是列名称A的子集;第一处理模块用于向第一分片节点和第一分片节点分别发送查询语句SELECT列名称C FROM表名称T,列名称C是列名称A与列名称B的并集,将各分片节点的查询结果中的每条纪录进行处理,处理即根据每条记录的b值取其hash值,hash值相同的记录放入相同的数据管道,b值为记录中列名称B所对应的值; [0026] The receiving module receives the query semantics are as follows: SELECT column names A, ⑶UNT (DISTINCT column name B) FROM table name column name T GROUP BY G; G column name is a subset of the A column name; a first processing module SELECT query are transmitted to the column names the FROM table name T C to a first node and a first fragment fragments node, C is the column name of the column name and column names a and B are set, the query result of each slice node each record is processed, i.e., the recording process in accordance with the hash value of each record whichever, b, the same hash value into the same data pipeline, b value is recorded in column B corresponding to the name;

[0027]第二处理模块用于在相同数据管道的纪录,分别根据G列的值进行分组,在各分组内计算该分组内不同b值出现的个数count,每个分组计算结果为对应的一条对应关系(g,count); [0027] The second processing module is configured to record data of the same pipes, respectively, are grouped according to the value of G columns, the number of count values ​​within the different packet appears b calculated within each packet, each packet corresponding to the calculation result a correspondence relationship (g, count);

[0028]第三处理模块用于将各管道中上述分组计算获得的对应关系(G,C0UNT)合并,SP根据列名称G,合并g值相同的纪录,合并后的记录的COUNT列的值为合并的所有对应关系的COUNT列的值相加;合并的结果即为查询结果。 COUNT value columns [0028] to a third processing module for a correspondence relationship (G, C0UNT) the packet to each of the pipe obtained by the calculation combined, SP column according to the same record name G, g combined value of the combined records value of COUNT is a correspondence between a column all combined sum; the combined result is the query result.

[0029] 进一步,列名称G等于列名称A。 [0029] Further, G is equal to the column name column name A.

[0030]进一步,数据管道的个数N为机器cpu核心数的两倍,第一处理模块用于根据每条记录的b值求取该b值对应的hash值,并将该hash值模N,根据模N值分配该条记录到对应的数据管道,数值O-到N-1分别唯一对应I个数据管道。 [0030] Further, N is the number of data pipeline twice the number of machine cpu core, a first processing means for obtaining a hash value corresponding to the value b of the b-value in accordance with each record and the hash value modulo N the modulus N which records the value assigned to the corresponding data pipeline, O- and N-1 values ​​respectively corresponding to I-th unique data pipeline.

[0031 ]进一步,接收模块用于在“接收语义如下的查询语句:SELECT列名称A,⑶UNT(DISTINCT列名称B)FR0M表名称T GROUP BY列名称G之前,还用于读取数据库查询语句,判断查询语句符合语意;若不符合语意,则返回;若符合语意,则继续执行。 [0031] Further, means for receiving the "receiving semantic following query: SELECT column names before A, ⑶UNT (DISTINCT column name B) FR0M table column names Name T GROUP BY G, is also used to read database queries, to determine compliance with semantic query; do not meet the semantics, the return; if they meet the semantics, it continues.

[0032]进一步,第一处理模块、第二处理模块为根据分片节点中表T的大小,对表T数据分批执行;第三处理模块位于分片节点。 [0032] Further, the first processing module, the second processing module according to the size of the fragment table T of the node, the data table T performs batch; a third processing module is located fragmented node.

[0033]区别于现有技术,上述技术方案读取数据的过程采用流式(小批量)读取,批量按需地把数据从各个分片读取到内存,避免一次性大量数据涌入中心节点造成过大的内存消耗甚至内存溢出。 [0033] distinguished from the prior art, the technical scheme of the process using the read data stream (small volume) is read, the batch-demand data from each slice is read into memory, avoiding large one data center influx node cause excessive memory consumption even memory overflow.

[0034]为能达成前述及相关目的,这一个或更多个方面包括在下文中充分描述并在所附权利要求中特别指出的特征。 [0034] To the accomplishment of the foregoing and related ends, the one or more aspects comprise fully described and particularly pointed out in the appended claims, the features hereinafter. 以下描述和附图详细阐述了这一个或更多个方面的某些说明性特征。 Following description and drawings set forth in detail certain illustrative features of the one or more aspects. 但是,这些特征仅仅是指示了可采用各种方面的原理的各种方式中的若干种,并且本描述旨在涵盖所有此类方面及其等效方面。 However, these features are merely indicative of a few of the various ways the principles of various aspects may be employed, and this description is intended to include all such aspects and their equivalents.

附图说明 BRIEF DESCRIPTION

[0035]以下将结合附图来描述所公开的方面,提供附图是为了说明而非限定所公开的方面,附图中相似的标号标示相似要素,并且在其中: [0035] Here will be described in conjunction with the disclosed aspects drawings, figures are provided to illustrate and not to limit the disclosed aspects, wherein like reference numerals denote like elements, and in which:

[0036]图1为结合具体数据说明本文所述基于分片关系型数据库的查询方法; [0036] FIG. 1 is described herein with reference to specific data slice query method based on a relational database;

[0037]图2为本文所述基于分片关系型数据库的查询方法。 [0037] FIG. 2 is a fragment method described herein the query based on a relational database.

[0038]图1中的标号用于指代其指向的表。 In [0038] FIG. 1 reference numeral used to refer to the table to which it points.

具体实施方式 Detailed ways

[0039]为详细说明技术方案的技术内容、构造特征、所实现目的及效果,以下结合具体实施例并配合附图详予说明。 [0039] The technical content of the technical solutions described in detail, structural features, objects and effects achieved by the following embodiments and with reference to specific embodiments in detail with reference to FIG. 在以下描述中,出于解释目的阐述了众多的具体细节以提供对一个或更多个方面的透彻理解。 In the following description, for purposes of explanation, numerous specific details are set forth to provide a thorough understanding of one or more aspects. 但是显而易见的是,没有这些具体细节也可实践此类方面。 It is obvious that without these specific details such aspects may be practiced.

[0040]本发明公开一种基于分片关系型数据库的查询方法,分片关系型数据库为水平分片的分布式数据库包括步骤: [0040] The present invention discloses a method of slicing a query-based relational database, a relational database is fragmented horizontal slice distributed database comprises the steps of:

[0041 ] SlOl、接收语义如下的查询语句:SELECT列名称A,⑶UNT(DISTINCT列名称B)FR0M表名称T GROUP BY列名称G;列名称G是列名称A的子集; [0041] SlOl, received the following query semantics: SELECT column names A, ⑶UNT (DISTINCT column name B) FR0M T GROUP BY name table column name G; G column name is the name of a subset of columns of A;

[0042] S102、在各分片节点分别执行SELECT列名称C FROM表名称T,列名称C是列名称A与列名称B的并集;将上述各分片节点的查询结果中的每条纪录进行处理,处理即根据每条记录的b值取其hash值,hash值相同的记录放入相同的数据管道,b值为记录中列名称B所对应的值; [0042] S102, performing SELECT column names are the FROM table name T C in the slice nodes, C is the column name and column names A and B of the current column name; query results of the respective fragment in each node record processing, i.e. the recording processing according to whichever is the hash value of each record of the value of b, the same hash value into the same data pipeline, the value b is recorded in column B corresponding to the name;

[0043] S103、在相同数据管道的纪录,分别根据G列的值进行分组,在各分组内计算该分组内不同b值出现的个数count,每个分组计算结果对应一条对应关系(g,count); [0043] S103, the same data record in the pipeline, respectively, according to the value of G column groups, the number of count values ​​within the different packet appears b calculated within each packet, each packet corresponding to a calculation result of the correspondence relationship (g, count);

[0044] S104、将各管道中上述分组计算获得的对应关系(G,C0UNT)合并,即根据列名称G,合并g值相同的纪录,合并后的记录的COUNT列的值为合并的所有对应关系的COUNT列的值相加;合并的结果即为查询结果。 [0044] S104, the combined value of the correspondence relationship (G, C0UNT) were combined, i.e. the same column name records under G, the value of g combined, the combined records in each packet the above-described pipe obtained by calculation of all corresponding column COUNT COUNT column value adding relationships; the combined result is the query results.

[0045]步骤S101-S103可以是在分片节点中执行的,也可以是将分片节点的数据或经过对应步骤获得的数据通过数据流的方读取到第三节点后(例如中心节点),在第三节点执行的。 [0045] Step S101-S103 may be performed in the slice node may be a node after the data pieces corresponding steps or data obtained through reading by a point to the third party data stream (e.g., central node) in the third node execution. 中心节点指的是数据处理能力强的节点,也可以指主节点(其他分片节点为从节点)。 Refers to a strong central node data processing capability of the node, the master node may also refer to (other fragments node from the node).

[0046]优选的,步骤S102、S103在分片节点中执行,这样在一定程度上降低了中心节点的负荷,同时采用由于上述分布式计算,提高了技术效率和速度。 [0046] Preferably, the steps S102, S103 performed in the slice node, so that to a certain extent reduces the load on the central node, while using the above distributed computing, a technique to improve efficiency and speed.

[0047]为例便于理解步骤S104,以下进一步举例说明:在一些实施例中,共有4各个数据管道,在每个数据管道中,分别根据G列值进行分组,分别计算分组中不同b值出现的个数,例如对于G列值为2的小组统计b值出现的个数,各数据管道分别计算出如下对应关系(2, [0047] Example facilitate understanding step S104, the following is further exemplified: In some embodiments, a total of four each data pipeline, the data in each pipeline, respectively grouped column G values ​​are calculated different value appears packet b number, for example, G is the number of columns 2 of group B statistical value occurs, each data channel for each correspondence relationship is calculated as follows (2,

3),(2,4),(2,1),(2、2),合并g值为2的对应关系即得到(2,3+4+1+2),8卩(2,10)。 3), (2,4), (2,1), (2,2) and the combined value of g, i.e. the correspondence relationship obtained 2 (3 + 4 + 2 + 1), 8 Jie (2,10) . 通过上述方法合并各数据管道的所的对应关系即获得用户输入的查询语句对应的查询结果,该查询结果为(G,C0UNT); Were combined data pipeline by the method described above correspondence relationship, i.e., input by the user to obtain query results corresponding to the query, the query result is (G, C0UNT);

[0048]可以理解的是,上述步骤中可以小批量的读取分片节点中的数据,并实时对读取的数据执行SELECT列名称CFROM表名称T,将该小批量数据的查询结果逐条纪录进行处理(处理可以在分片节点中,也可以在中心节点),而无须一次性读取或处理大量数据,找出某一结点计算或内存超负荷。 [0048] It will be appreciated that small quantities of data can be read in the above step fragments node, and performing real-time data on the read table name SELECT column names CFROM T, small quantities of the query results by one data record process (process can slice node, the central node may be), or without processing large amounts of data read in one time, to identify a computing node or memory overload.

[0049]区别于现有技术,上述步骤读取数据的过程采用流式(小批量)读取,批量按需地把数据从各个分片读取到内存,避免一次性大量数据涌入中心节点造成过大的内存消耗甚至内存溢出。 [0049] distinguished from the prior art, the process of reading the data stream using the above-described step (small volume) is read, the batch-demand data from each slice is read into memory, avoiding large one central node data influx resulting in excessive memory consumption even memory overflow.

[0050]可以理解的是“DISTINCT列名称B”中的B字段可能并非数字类型,有可能是字符串或者时间,甚至是多个字段的组合。 [0050] It will be appreciated that the B field "DISTINCT column name B" might not numeric type, there may be a time or a string or even a combination of a plurality of fields.

[0051]对于数字类型,字符串类型,时间类型,取hash值后会得到数字结果。 [0051] For numeric type, string type, the type of time, after taking hash value the result. 多于多字段的场景,可以是将各个字段hash后的值累加,作为最后的hash结果。 More multi-field scenes, may be a hash value obtained by accumulating each field, as the final hash result. 总的来说,hash的目的就是针对所有B字段非数字类型以及B字段是多字段组合的场景,计算出一个数字类型的结果(这里保证两个完全一样B字段值,hash结果会是一样的)。 In general, hash of the object field is a combination of multi-scenes for all non-numeric fields B and B fields, the calculated result of a numeric type (here, to ensure that two identical field value B, hash result will be the same ).

[0052]以一下结合具体数据(见图1和图2)说明本文所述基于分片关系型数据库的查询方法: [0052] In what specific binding data (see FIG. 1 and FIG. 2) described herein, the query fragmentation method based on the relational database:

[0053] Stepl.原始数据存储在两个分片111和121,每个分片各有3行记录,每行记录有3列,分别是id,user_id,date0 [0053] Stepl. In two slices 111 and 121, each slice have three rows of raw data storage, there are three rows each, are id, user_id, date0

[0054] Step2.现在需要执行的SQL为:select date,count(distinct user_id)fromtable group by dateD . [0054] Step2 SQL now need to do is: select date, count (distinct user_id) fromtable group by dateD

[0055] Step3.数据查询模块根据分片的数量,使用2个线程分别从两个分片读取数据,读取数据使用的SQL为:select user_id,date from table,读取结果(即步骤S102中的查询结果)分别为112和122。5丨印2、5丨印3的查询过程即对应图2中的数据查询; . [0055] Step3 data query module according to the number of fragments, using two threads are two fragmentation read data from read data using SQL to: select user_id, date from table, reading result (i.e., step S102 the query results) were 112 and 122.5 Shu Shu 2,5 printing plate 3, i.e., the query process corresponding to the data in FIG. 2 is a query;

[0056] Step4.数据分配时会遍历每一条记录的distinct_col字段(本例中为user_id字段),根据该字段的hash值取模3(数据管道的数量为3,编号分别为O,I,2),会得到O或者I或者2三个值其中的一个。 [0056] Step4. Traverses data distribution distinct_col fields of each record (in this case user_id field), according to hash value of the field modulo 3 (the number of data pipes 3, numbered respectively O, I, 2 ), O, or will give a value of I or 2 wherein the three. 根据以上结果,选择对应编号的数据管道,将数据放入其中。 From the above results, select the corresponding numbered data pipe, into which the data. 本例中,1^811算法为直接取11861'_1(1的值,11861'_1(1为1和4的俩记录被分配到;[11(161=1的数据管道中。即1%3 = 1,4%3 = 1,图中113,123根据取模结果,纪录被分配到不同的管道214、管道224和管道234 印4即对应图2的数据分配; In this example, 811 ^ 1 algorithm takes a value directly 11861'_1 (1, two recording 11861'_1 (1 1 and 4 are assigned to; data pipe [11 (i.e., 161 = 1 in 1% 3. 1,4% 3 = = 1, 113, 123 according to FIG modulo result, records are assigned to different pipe 214, pipe 224 and pipe 234 i.e. 4 print data distribution corresponding to figure 2;

[0057] Step5.这里有3个线程负责数据计算(和数据管道的数量保持一致),每个计算模块对应一个数据管道。 [0057] Step5. There are three thread is responsible for data calculation (the number of data pipeline and consistent), each corresponding to a data pipeline computing module. 计算过程中对每个数据根据group_Col字段(这里是date字段)进行分组,图1中214根据date字段分组得到216,224根据date字段分组得到226,234根据date字段分组得到236(215、225、235指示为上述过程的中间结果)。 Calculation process for each of the data grouped according group_Col field (here date field), to give 216, 224 in FIG. 1, 214 226, 234, to give to give 236 (215, 225 according to the date field of the packet according to the date field of the packet according to the packet date field, 235 indicating the intermediate result of the process). 在根据distinct_c0l字段(这里是user」d)去重。 In (here user "d) de-emphasis according to distinct_c0l field.

[0058] Step6.当计算模块消费完自己负责的数据管道中所有数据后,会得到一个date和11861'_1(1链表(去重后)的映射关系,对每个11861'_1(1链表进行计数,(即图1中合并216、226、236标号对应的表得到如标号300所不的表)得到一个date和user_id数量(去重后)的映射关系。上述Step 5,Step 6的过程即对应图2中的数据计算; [0058] Step6. After complete consumption calculation module responsible for their own data in the pipe all the data will get a date and 11861'_1 (1 list (duplicates removed) the mapping relationship for each 11861'_1 (linked list 1 count (i.e., the reference numerals 216,226,236 in FIG. 1 were combined to give the corresponding table as reference numeral 300 does a table) to give a date and user_id number (after de-emphasis) mapping relationship during the above Step 5, Step 6, i.e., data map 2 is calculated;

[0059] Step7.结果合并过程由主线程完成,将各个数据计算过程的结果汇总后,对于date相同的记录,将count (distinct user_id)的值相加得到最终的结果。 [0059] Step7. Results merge process performed by the main thread, the result of each data calculation summary, for the same recording date, the value of the count (distinct user_id) obtained by adding the final result. 上述Step7即对应图2的结果合并。 Step7 i.e., corresponding to the above-described results of FIG. 2 combined.

[0060]可以理解的是图2中用简化的方式描述了本发明的数据查询、数据分配和数据计算可以是多线程执行的。 [0060] It will be appreciated that FIG. 2 describes a simplified manner by the present invention, data query, data calculation, and data distribution may be multi-threaded execution.

[0061 ]本发明还提供一种实现上述方法的分布式数据库系统, [0061] The present invention further provides a distributed database system for implementing the above method,

[0062]本文还提供一种分布式数据库系统,用于实现上述基于分片关系型数据库的查询方法,其包括至少第一分片节点和第二分片节点、接收模块、第一处理模块、第二处理模块、第三处理模块;第一分片节点和第二分片节点分别存储有数据库的水平切片数据。 [0062] Also provided herein is a distributed database system for implementing the above query methods fragmentation-based relational database, which includes at least a first fragment and a second fragment node node, receiving module, a first processing module, a second processing module, a third processing module; a first slice and a second slice node node horizontal slice data are stored in the database. 接收模块、第一处理模块、第二处理模块、可以是位于分片节点也可以是位于中心节点。 A receiving module, a first process module, the second process module may be located fragmented node may be located in the central node.

[0063]接收模块连接第一处理模块,第一处理模块连接第二处理模块,第二处理模块连接第三处理模块。 [0063] a first receiving module connected to a processing module, a first processing module connected to the second processing module, the processing module is connected to the second third processing module.

[0064] 接收模块接收语义如下的查询语句:SELECT列名称A,⑶UNT(DISTINCT列名称B)FROM表名称T GROUP BY列名称G;列名称G是列名称A的子集;第一处理模块用于向第一分片节点和第一分片节点分别发送查询语句SELECT列名称CFROM表名称T,列名称C是列名称A与列名称B的并集,将各分片节点的查询结果中的每条纪录进行处理,处理即根据每条记录的b值取其hash值,hash值相同的记录放入相同的数据管道,b值为记录中列名称B所对应的值; [0064] The receiving module receives the query semantics are as follows: SELECT column names A, ⑶UNT (DISTINCT column name B) FROM table name column name T GROUP BY G; G column name is a subset of the A column name; a first processing module SELECT query are transmitted to the column name of the table name CFROM T to the first node and the first fragment fragments node, C is the column name of the column name and column names a and B are set, the query result of each node in the fragment each record is processed, i.e., the recording process according to whichever is the hash value of each record of the value of b, the same hash value into the same data pipeline, the value b is recorded in column B corresponding to the name;

[0065]第二处理模块用于在相同数据管道的纪录,分别根据G列的值进行分组,在各分组内计算该分组内不同b值出现的个数count,每个分组计算结果为对应的一条对应关系(g,count); [0065] The second processing module is configured to record data of the same pipes, respectively, are grouped according to the value of G columns, the number of count values ​​within the different packet appears b calculated within each packet, each packet corresponding to the calculation result a correspondence relationship (g, count);

[0066]第三处理模块用于将各管道中上述分组计算获得的对应关系(G,C0UNT)合并,SP根据列名称G,合并g值相同的纪录,合并后的记录的COUNT列的值为合并的所有对应关系的COUNT列的值相加;合并的结果即为查询结果。 COUNT value columns [0066] to a third processing module for a correspondence relationship (G, C0UNT) the packet to each of the pipe obtained by the calculation combined, SP column according to the same record name G, g combined value of the combined records value of COUNT is a correspondence between a column all combined sum; the combined result is the query result.

[0067] 在一些实施例中,列名称G等于列名称A。 [0067] In some embodiments, G is equal to the column name column name A.

[0068]在一些实施例中,数据管道的个数N为机器cpu核心数的两倍,第一处理模块用于根据每条记录的b值求取该b值对应的hash值,并将该hash值模N,根据模N值分配该条记录到对应的数据管道,数值O-到N-1分别唯一对应I个数据管道。 [0068] In some embodiments, N is the number of data pipeline twice the number of machine cpu core, a first processing means for obtaining the value of b corresponding to the hash value of each record in accordance with the value b, and the hash value modulo-N, N modulus value according to the record is assigned to the corresponding data pipeline, O- and N-1 values ​​respectively corresponding to I-th unique data pipeline.

[0069]在一些实施例中,接收模块用于在“接收语义如下的查询语句:SELECT列名称A,C0UNT(DISTINCT列名称B)FR0M表名称T GROUP BY列名称G;”之前,还用于读取数据库查询语句,判断查询语句符合语意;若不符合语意,则返回;若符合语意,则继续执行。 [0069] In some embodiments, means for receiving the "receiving semantic following query: SELECT column names A, C0UNT (DISTINCT column name B) FR0M table name G T GROUP BY column name;" before further configured read the database query, the query to determine compliance with semantics; do not meet the semantics, the return; if they meet the semantics, it continues.

[0070]在一些实施例中,第一处理模块、第二处理模块为根据分片节点中表T的大小,对表T数据分批执行;第三处理模块位于中心节点。 [0070] In some embodiments, a first processing module, the second processing module according to the size of the fragment table T of the node, the data table T performs batch; a third processing module is located in the central node.

[0071]需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。 [0071] Incidentally, herein, relational terms such as first and second and the like are only used to distinguish one entity or operation from another entity or action without necessarily requiring or implying these entities the presence of any such actual relationship or order between or operations. 而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者终端设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者终端设备所固有的要素。 Further, the term "comprising", "containing" or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, article, or terminal device not include only those elements but not explicitly listed further comprising the other elements, or elements of the process further comprising, method, article, or inherent to the terminal device. 在没有更多限制的情况下,由语句“包括……”或“包含……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者终端设备中还存在另外的要素。 Without more constraints, by the statement "...... comprises" or "comprising ......" element defined does not exclude the presence of additional elements in the process element comprising, method, article, or terminal device. 此夕卜,在本文中,“大于”、“小于”、“超过”等理解为不包括本数;“以上”、“以下”、“以内”等理解为包括本数。 This evening Bu, herein, "greater than," "less than", "over", etc. is understood to not include the; "above", "below", "within" and the like are understood to include this number.

[0072]本领域内的技术人员应明白,上述各实施例可提供为方法、装置、或计算机程序产品。 [0072] skill in the art should understand that the above embodiments may be provided as a method, apparatus or computer program product. 这些实施例可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。 These embodiments may take an entirely hardware embodiment, an entirely software embodiment, or an embodiment in conjunction with the form of software and hardware aspects. 上述各实施例涉及的方法中的全部或部分步骤可以通过程序来指令相关的硬件来完成,所述的程序可以存储于计算机设备可读取的存储介质中,用于执行上述各实施例方法所述的全部或部分步骤。 Each of the above embodiments relate in all or part of the steps may be by a program instructing relevant hardware to complete, the program may be stored in a storage medium can be read by a computer, the above-described embodiments for performing the method all or part of the steps described below. 所述计算机设备,包括但不限于:个人计算机、服务器、通用计算机、专用计算机、网络设备、嵌入式设备、可编程设备、智能移动终端、智能家居设备、穿戴式智能设备、车载智能设备等;所述的存储介质,包括但不限于:RAM、R0M、磁碟、磁带、光盘、闪存、U盘、移动硬盘、存储卡、记忆棒、网络服务器存储、网络云存储等。 The computer device, including but not limited to: a personal computer, a server, a general purpose computer, special purpose computer, network devices, embedded devices, programmable equipment, intelligent mobile terminal, smart home devices, wearable smart devices, intelligent vehicle equipment; the storage medium, including but not limited to: RAM, R0M, magnetic disk, magnetic tape, optical disk, flash memory, U disk, mobile hard disk, memory card, memory stick, a network storage server, network cloud storage.

[0073]上述各实施例是参照根据实施例所述的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。 [0073] each of the above embodiments with reference to the embodiment of the method according to embodiments, apparatus (systems) and computer program products flowchart and / or block diagrams described. 应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。 It should be understood and implemented by computer program instructions and block, and the flowchart / or block diagrams each process and / or flowchart illustrations and / or block diagrams of processes and / or blocks. 可提供这些计算机程序指令到计算机设备的处理器以产生一个机器,使得通过计算机设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。 These computer program instructions to computer device processor to produce a machine, such that in the flowchart for implementing a process or a plurality of process / or block diagram blocks and a processor executing instructions generated by a computer device or a plurality of device block functions specified.

[0074]这些计算机程序指令也可存储在能引导计算机设备以特定方式工作的计算机设备可读存储器中,使得存储在该计算机设备可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。 [0074] These computer program instructions may also be stored in a computer-readable memory that can direct a computer device apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an apparatus article of manufacture including instruction means and the instruction means implemented in a flowchart or more flows and / or block diagram block or blocks in a specified function.

[0075]这些计算机程序指令也可装载到计算机设备上,使得在计算机设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。 [0075] These computer program instructions may also be loaded onto a computer device, such that a series of operations on a computer device in step to produce a computer-implemented instructions which execute on the computer apparatus provide steps for implementing a process in the flowchart or a plurality of processes and / or steps are a block diagram of a block or blocks specified functions.

[0076]尽管已经对上述各实施例进行了描述,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例做出另外的变更和修改,所以以上所述仅为本发明的实施例,并非因此限制本发明的专利保护范围,凡是利用本发明说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本发明的专利保护范围之内。 [0076] Although the above embodiments have been described, but those skilled in the art from the underlying inventive concept can make further changes and modifications of these embodiments, the above are only present embodiments of the invention, not intended to limit the scope of the present invention, all utilize the present specification and drawings taken equivalent structures or equivalent process, or applied directly or indirectly to other related technical fields shall fall It included within the scope of protection of the present invention.

Claims (10)

1.基于分片关系型数据库的查询方法,其特征在于,包括步骤: 5101、接收语义如下的查询语句:SELECT列名称A,⑶UNT(DISTINCT列名称B)FROM表名称T GROUP BY列名称G;列名称G是列名称A的子集; 5102、在各分片节点分别执行SELECT列名称C FROM表名称T,列名称C是列名称A与列名称B的并集;将上述各分片节点的查询结果中的每条纪录进行处理,处理即根据每条记录的b值取其hash值,hash值相同的记录放入相同的数据管道,b值为记录中列名称B所对应的值; 5103、对在相同数据管道的纪录,分别根据G列的值进行分组,在各分组内计算该分组内不同b值出现的个数count,每个分组计算结果对应一条对应关系(g,count); 5104、将各管道中上述分组计算获得的对应关系(G,C0UNT)合并,即根据列名称G,合并g值相同的纪录,合并后的记录的COUNT列的值为合并的所有对应关系的COUNT列的值相 1. The method of fragmentation query based on a relational database, characterized by comprising the steps of: 5101, receiving the query semantics are as follows: SELECT column names A, ⑶UNT (DISTINCT column name B) FROM table name G T GROUP BY column name; column name G is a subset of the column name of a; 5102, respectively, perform SELECT column names C the FROM table name T at each slice nodes, column name C is the column name a column name B and set; each of the above fragments node query results in each record is processed, i.e., the recording process according to whichever is the hash value of each record of the value of b, the same hash value into the same data pipeline, the value b is recorded in column B corresponding to the name; 5103, to record the same data in the pipeline, respectively, according to the value of G column groups, the number of count values ​​within the different packet appears b calculated within each packet, each packet corresponding to a calculation result of the correspondence relationship (G, count) ; 5104, the combined value of COUNT column of the correspondence relationship recorded (G, C0UNT) the packet to each of the pipe obtained by the calculation combined, i.e. according to the same record column names G, g combined values, after all combined correspondence relationship column with the value COUNT ;合并的结果即为查询结果。 ; Result of the merger is the query results.
2.根据权利要求1所述的基于分片关系型数据库的查询方法,其特征在于列名称G等于列名称A。 The query method based on a relational database fragment according to claim 1, wherein G is equal to the name of the column name of the column A.
3.根据权利要求1所述的基于分片关系型数据库的查询方法,其特征在于,数据管道的个数N为机器cpu核心数的两倍,所述步骤“处理即根据每条记录b值取其hash值,hash值相同的记录放入相同的数据管道”为根据每条记录的b值求取该b值对应的hash值,并将该hash值模N,根据模N的值分配该条记录到对应的数据管道,数值O到N-1分别唯一对应I个数据管道。 3. The method of slicing a query-based relational database, wherein according to claim 1, N is the number of data pipeline twice the number of machine cpu core, said step "process that is in accordance with each record value b whichever hash value, the hash value into the same record the same data pipeline "obtains the value of the hash value corresponding to the value b according to b of each record, and a hash value of the modulus N, the value is assigned in accordance with modulus N record data corresponding to the pipe, the value O to N-1 are the only data corresponding to I duct.
4.根据权利要求1所述的基于分片关系型数据库的查询方法,其特征在在于,在步骤“接收语义如下的查询语句:SELECT列名称A,COUNT(DISTINCT列名称B)FR0M表名称T GROUPBY列名称G之前还包括步骤: 读取数据库查询语句, 判断查询语句是否符合语意; 若不符合语意,则返回; 若符合语意,则执行上述接收查询语句的步骤。 The query method based on a relational database fragment according to claim 1, characterized in that in step "received query semantics are as follows: SELECT column names A, COUNT (DISTINCT column name B) FR0M name table T GROUPBY before the name G column further comprising the step of: reading a database query, determines whether the query semantics; semantics do not meet, the process returns; semantic if they meet the step of receiving the query statement is executed.
5.根据权利要求1所述的基于分片关系型数据库的查询方法,其特征在于,其中上述S101、S102为根据分片节点中的表T的大小分批执行,上述步骤S103在分片节点执行。 The query method based on a relational database fragment according to claim 1, characterized in that wherein the above-described S101, S102 is performed according to a batch size of slice T node table, said slicing step S103 node carried out.
6.分布式数据库系统,其特征在于,其包括至少第一分片节点和第二分片节点、接收模块、第一处理模块、第二处理模块、第三处理模块;第一分片节点和第二分片节点分别存储有数据库的水平切片数据;其特征在于, 接收模块接收语义如下的查询语句:SELECT列名称A,COUNT (DISTINCT列名称B) FROM表名称T GROUP BY列名称G;列名称G是列名称A的子集;第一处理模块用于向第一分片节点和第一分片节点分别发送查询语句SELECT列名称C FROM表名称T,将各分片节点的查询结果中的每条纪录进行处理,处理即根据每条记录的b值取其hash值,hash值相同的记录放入相同的数据管道,b值为记录中列名称B所对应的值; 第二处理模块用于在相同数据管道的纪录,分别根据G列的值进行分组,在各分组内计算该分组内不同b值出现的个数count,每个分组计算结果为对应的一条对应关系(g,c 6. A distributed database system, characterized in that it comprises at least a first fragment and a second fragment node node, receiving module, a first processing module, a second processing module, a third processing module; and a first node fragment the second fragment node horizontal slice data are stored in a database; wherein, the receiving module receives the query semantics are as follows: SELECT column names a, COUNT (DISTINCT column name B) FROM table name column name G T GROUP BY; column G is a subset of the name a column name; a first processing means for transmitting a query to each node of the first fragment and the first fragment SELECT statement node C FROM table name column names T, the query result of each slice node each record is processed, i.e., the processing in accordance with whichever b values ​​from each hash value, the hash value of the same record data into the same pipe, b value record value B corresponding to the column name; a second processing module record data for the same pipeline, respectively, according to the value of G column groups, the number of count values ​​within the different packet appears b calculated within each packet, each packet corresponding to a calculation result of the correspondence relationship (g, c ount); 第三处理模块用于将各管道中上述分组计算获得的对应关系(G,COUNT)合并,即根据列名称G,合并g值相同的纪录,合并后的记录的COUNT列的值为合并的所有对应关系的COUNT列的值相加;合并的结果即为查询结果。 ount); correspondence relationship (G, COUNT) the packet to each of the pipe obtained by calculation of a third processing module for incorporating, i.e., according to the same record name column G, g combined value, COUNT column records the combined value value of COUNT is a correspondence between a column all combined sum; the combined result is the query result.
7.根据权利要求6所述的分布式数据库系统,其特征在于,列名称G等于列名称A。 The distributed database system according to claim 6, wherein G is equal to the column name column name A.
8.根据权利要求6所述的分布式数据库系统,其特征在于,数据管道的个数N为机器cpu核心数的两倍,第一处理模块用于根据每条记录的b值求取该b值对应的hash值,并将该hash值模N,根据模N值分配该条记录到对应的数据管道,数值O-到N-1分别唯一对应I个数据管道。 8. The distributed database system according to claim 6, wherein N is the number of data pipeline twice the number of machine cpu core, a first processing means for obtaining the value of b according to b each record value corresponding to the hash value and the hash value of the modulus N, the record is assigned to the corresponding data pipe according to the value modulo N, O- and N-1 values ​​respectively corresponding to I-th unique data pipeline.
9.根据权利要求6所述的分布式数据库系统,其特征在于,接收模块用于在“接收语义如下的查询语句:SELECT列名称A,COUNT(DISTINCT列名称B)FR0M表名称T GROUP BY列名称G;”之前,还用于读取数据库查询语句,判断查询语句符合语意;若不符合语意,则返回;若符合语意,则继续执行。 9. The distributed database system according to claim 6, characterized in that the means for receiving the "receiving semantic following query: SELECT column names A, COUNT (DISTINCT column name B) FR0M table column name T GROUP BY name G; "before, is also used to read the database query, the query to determine compliance with semantics; do not meet the semantics, the return; if they meet the semantics, continue.
10.根据权利要求6所述的分布式数据库系统,其特征在于,第一处理模块、第二处理模块为根据分片节点中表T的大小,对表T数据分批执行;第三处理模块位于分片节点。 10. The distributed database system according to claim 6, wherein the first processing module, the second processing module according to the size of the fragment table T of the node, the data table T performs batch; a third processing module located fragmented node.
CN201610771058.5A 2016-08-30 2016-08-30 Querying method and system based on fragment relevant database CN106250565B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610771058.5A CN106250565B (en) 2016-08-30 2016-08-30 Querying method and system based on fragment relevant database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610771058.5A CN106250565B (en) 2016-08-30 2016-08-30 Querying method and system based on fragment relevant database

Publications (2)

Publication Number Publication Date
CN106250565A true CN106250565A (en) 2016-12-21
CN106250565B CN106250565B (en) 2019-05-07

Family

ID=58080520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610771058.5A CN106250565B (en) 2016-08-30 2016-08-30 Querying method and system based on fragment relevant database

Country Status (1)

Country Link
CN (1) CN106250565B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093493A (en) * 2006-06-23 2007-12-26 国际商业机器公司 Speech conversion method for database inquiry, converter, and database inquiry system
US20100077107A1 (en) * 2008-09-19 2010-03-25 Oracle International Corporation Storage-side storage request management
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN102722531A (en) * 2012-05-17 2012-10-10 北京大学 Query method based on regional bitmap indexes in cloud environment
CN103310023A (en) * 2013-07-05 2013-09-18 深圳中兴网信科技有限公司 Distributed searching system and method
CN104756101A (en) * 2012-10-31 2015-07-01 惠普发展公司,有限责任合伙企业 Executing a query having multiple set operators
CN105335403A (en) * 2014-07-23 2016-02-17 华为技术有限公司 Database access method and device, and database system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093493A (en) * 2006-06-23 2007-12-26 国际商业机器公司 Speech conversion method for database inquiry, converter, and database inquiry system
US20100077107A1 (en) * 2008-09-19 2010-03-25 Oracle International Corporation Storage-side storage request management
CN102663116A (en) * 2012-04-11 2012-09-12 中国人民大学 Multi-dimensional OLAP (On Line Analytical Processing) inquiry processing method facing column storage data warehouse
CN102722531A (en) * 2012-05-17 2012-10-10 北京大学 Query method based on regional bitmap indexes in cloud environment
CN104756101A (en) * 2012-10-31 2015-07-01 惠普发展公司,有限责任合伙企业 Executing a query having multiple set operators
CN103310023A (en) * 2013-07-05 2013-09-18 深圳中兴网信科技有限公司 Distributed searching system and method
CN105335403A (en) * 2014-07-23 2016-02-17 华为技术有限公司 Database access method and device, and database system

Also Published As

Publication number Publication date
CN106250565B (en) 2019-05-07

Similar Documents

Publication Publication Date Title
US8725730B2 (en) Responding to a query in a data processing system
CN103678665B (en) Heterogeneous data integration method and system for large based on data warehouse
US9223829B2 (en) Interdistinct operator
Bajda-Pawlikowski et al. Efficient processing of data warehousing queries in a split execution environment
Yang et al. Druid: A real-time analytical data store
US20130103658A1 (en) Time series data mapping into a key-value database
CN100484017C (en) Method for statistics of mass performance data in network element management system
US9619549B2 (en) Reporting and summarizing metrics in sparse relationships on an OLTP database
CN103049556B (en) Quick Stats query method for mass medical data
JP2016504679A (en) Event processing and integration with MapReduce
Aly et al. M3: Stream processing on main-memory mapreduce
CN103177062B (en) For high-speed memory to accelerate query query and online analytical processing operations
US9390162B2 (en) Management of a database system
Cipar et al. LazyBase: trading freshness for performance in a scalable database
US20170177646A1 (en) Processing time series data from multiple sensors
US10242052B2 (en) Relational database tree engine implementing map-reduce query handling
US9582520B1 (en) Transaction model for data stores using distributed file systems
CN102314460B (en) Data analysis method and system and servers
WO2016183539A1 (en) Data partitioning and ordering
CN102521307A (en) Parallel query processing method for share-nothing database cluster in cloud computing environment
US9892178B2 (en) Systems and methods for interest-driven business intelligence systems including event-oriented data
CN107077476A (en) Enriching events with dynamically typed big data for event processing
CN102750356B (en) Construction and management method for secondary indexes of key value library
CN103778135B (en) Storage and distribution of paging query method for real-time data
CN102521225B (en) Incremental data extraction device and incremental data extraction method

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
GR01 Patent grant