CN115563103A

CN115563103A - Multi-dimensional aggregation method, system, electronic device and storage medium

Info

Publication number: CN115563103A
Application number: CN202211121862.0A
Authority: CN
Inventors: 王帅
Original assignee: Henan Xinghuan Zhongzhi Information Technology Co ltd; Transwarp Technology Shanghai Co Ltd
Current assignee: Henan Xinghuan Zhongzhi Information Technology Co ltd; Transwarp Technology Shanghai Co Ltd
Priority date: 2022-09-15
Filing date: 2022-09-15
Publication date: 2023-01-03
Anticipated expiration: 2042-09-15
Also published as: CN115563103B

Abstract

The invention discloses a multi-dimensional aggregation method, a multi-dimensional aggregation system, electronic equipment and a storage medium. The method comprises the following steps: obtaining, by a scheduling component, a plurality of data blocks from a downstream operator; performing single-dimensional aggregation calculation on the plurality of data blocks through a plurality of actuator operators in an aggregation work task component to obtain a plurality of single-dimensional column data, and then persistently setting the data in the plurality of data blocks; generating a secondary index corresponding to the plurality of single-dimensional column data through an aggregation index component, and generating a plurality of secondary indexes through one single-dimensional column data; and performing aggregation table combination through the actuator operators based on the second-level indexes and the dichotomy search to obtain a plurality of target aggregation tables, wherein the aggregation tables are the combination of a series of second-level indexes, and the second-level indexes generated by one single-dimensional column data form one aggregation table. The method provides a real-time multidimensional polymerization method with optimal balance of memory and a central processing unit by the linkage of data partitioning and secondary indexing.

Description

Multi-dimensional aggregation method, system, electronic device and storage medium

Technical Field

The embodiment of the invention relates to the technical field of multi-dimensional polymerization, in particular to a multi-dimensional polymerization method, a multi-dimensional polymerization system, electronic equipment and a storage medium.

Background

Multidimensional aggregation analysis is a common Business Intelligence (BI) requirement for enterprises. And performing dimension combination aiming at a plurality of service indexes, and analyzing the performance of the service data in different service dimensions. The multidimensional aggregation algorithm mainly has the following implementation modes:

in the first mode, kylin calculates the result in advance when the database is idle, stores the result in a temporary table, and directly reads the result from the temporary table when a user initiates a multi-dimensional aggregation query request.

And in the mode II, the Druid, a database bottom layer data structure and a basic interface are specially designed for multi-dimensional aggregation, data pre-aggregation and persistence are carried out when the original data are injected, and pre-aggregated data are collected when a user initiates a multi-dimensional aggregation query request.

And in the third mode, mysql performs full sequencing on the data in the table, and then performs streaming aggregation on multiple dimensions.

The three ways described above have the following disadvantages: the disadvantage of the first method is that the data is not real-time; the second mode has the defects that the limitation is high, the data insertion delay is high due to the pre-polymerization on one side, and the pre-polymerization needs to be determined in advance for which services on the other side; the third method has the disadvantage that the cost of performing full sequencing On a service analysis table is relatively high in an On-Line Transaction Processor (OLTP) scenario.

Disclosure of Invention

The invention provides a multi-dimensional aggregation method, a multi-dimensional aggregation system, electronic equipment and a storage medium, which are used for solving the problems of non-real-time data, high limitation and high cost of the conventional multi-dimensional aggregation algorithm.

According to an aspect of the present invention, there is provided a multi-dimensional polymerization method, comprising:

obtaining a plurality of data blocks from a downstream operator by a scheduling component;

performing single-dimensional aggregation calculation on the plurality of data blocks through a plurality of actuator operators in an aggregation work task component to obtain a plurality of single-dimensional column data, and then persistently setting the data in the plurality of data blocks;

generating a secondary index corresponding to the plurality of single-dimensional column data through an aggregation index component, and generating a plurality of secondary indexes through one single-dimensional column data;

and performing aggregation table combination through the actuator operators based on the second-level indexes and the dichotomy search to obtain a plurality of target aggregation tables, wherein the aggregation tables are the combination of a series of second-level indexes, and the second-level indexes generated by one single-dimensional column data form one aggregation table.

According to another aspect of the present invention, a multidimensional aggregation system is provided, which includes a scheduling component, an aggregation work task component, and an aggregation index component, where the aggregation work task component is connected to the scheduling component and the aggregation index component, respectively;

the scheduling component is used for acquiring a plurality of data blocks from a downstream operator;

the aggregation work task component is used for performing single-dimensional aggregation calculation on the data blocks through a plurality of actuator operators to obtain a plurality of single-dimensional column data, and then persistently setting down the data in the data blocks;

the aggregation index component is used for generating secondary indexes corresponding to the plurality of single-dimensional column data, and one single-dimensional column data generates a plurality of secondary indexes;

the aggregation work task component is further configured to search and execute aggregation table merging through the multiple actuator operators based on the secondary indexes and the dichotomy to obtain multiple target aggregation tables, the aggregation tables are combinations of a series of secondary indexes, and the multiple secondary indexes generated by one single-dimensional column data form one aggregation table.

According to another aspect of the present invention, there is provided an electronic apparatus including: at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores a computer program executable by the at least one processor, the computer program being executable by the at least one processor to enable the at least one processor to perform the multi-dimensional aggregation method of any of the embodiments of the present invention.

According to another aspect of the present invention, there is provided a computer-readable storage medium storing computer instructions for causing a processor to implement a multi-dimensional aggregation method according to any one of the embodiments of the present invention when executed.

According to the technical scheme of the embodiment of the invention, a plurality of data blocks are obtained from a downstream operator through a scheduling component; performing single-dimensional aggregation calculation on the plurality of data blocks through a plurality of actuator operators in an aggregation work task component to obtain a plurality of single-dimensional column data, and then persistently setting the data in the plurality of data blocks; generating a secondary index corresponding to the plurality of single-dimensional column data through an aggregation index component, and generating a plurality of secondary indexes through one single-dimensional column data; and searching and executing aggregation table combination through the actuator operators based on a secondary index and a bisection method to obtain a plurality of target aggregation tables, wherein the aggregation tables are a combination of a series of secondary indexes, and a plurality of secondary indexes generated by column data of a single dimension form an aggregation table.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present invention, nor do they necessarily limit the scope of the invention. Other features of the present invention will become apparent from the following description.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flow chart of a multidimensional polymerization method according to an embodiment of the present invention;

FIG. 2 is a partial schematic flow chart of a multidimensional polymerization method according to an embodiment of the present invention;

fig. 3 is a diagram illustrating a two-level index in a multidimensional polymerization method according to an embodiment of the present invention;

fig. 4 is a diagram illustrating a first process of merging aggregation tables in a multidimensional aggregation method according to an embodiment of the present invention;

fig. 5 is a diagram illustrating a second process of merging aggregation tables in a multidimensional aggregation method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of a multidimensional aggregation system according to a second embodiment of the present invention;

fig. 7 is a schematic structural diagram of a multidimensional aggregation system according to a third embodiment of the present invention;

fig. 8 is a schematic structural diagram of an electronic device of a multidimensional aggregation method according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. It should be understood that the various steps recited in method embodiments of the present invention may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the invention is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Moreover, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It is noted that references to "a", "an", and "the" modifications in the present invention are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that reference to "one or more" unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present invention are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

Example one

Fig. 1 is a schematic flowchart of a multidimensional aggregation method provided in an embodiment of the present invention, where the method is applicable to a case where a multidimensional aggregation analysis is performed on a distributed OLTP-type service, and the method may be executed by a multidimensional aggregation apparatus, where the apparatus may be implemented by software and/or hardware and is generally integrated on an electronic device, where the electronic device in this embodiment includes but is not limited to: a computer device.

As shown in fig. 1, a multidimensional polymerization method provided in an embodiment of the present invention includes the following steps:

s110, acquiring a plurality of data blocks from downstream operators through a scheduling component.

The scheduling component may be a software component Dispatcher with a data scheduling function, and the number of the scheduling components may be 1.

In this embodiment, the multiple data blocks may be obtained by uniformly dividing the data by the scheduling component, and the type and the amount of the data are not specifically limited here. The data may include numbers, letters, and combinations of numbers, among others. The number of data blocks is not limited in particular, and the downstream operator may divide the data into 3 data blocks, for example. Wherein each data block may comprise a plurality of columns, each column having a plurality of data stored therein.

In this embodiment, the process of the scheduling component obtaining the plurality of data blocks from the downstream operator is not particularly limited, and the downstream operator may send the plurality of data blocks to the scheduling component, or the scheduling component may take the plurality of data blocks from the downstream operator.

Illustratively, the scheduling component sends Next () to the downstream operator, which divides the data equally into a plurality of data blocks according to the Next (), from which the scheduling component can fetch the plurality of data blocks.

And S120, performing single-dimensional aggregation calculation on the data blocks through a plurality of actuator operators in the aggregation work task assembly to obtain a plurality of single-dimensional column data, and then persistently dropping the data in the data blocks.

The aggregation work task component can be a software component Grouping Worker scheduler for executing an aggregation task, the number of the aggregation work task components can be 1, the aggregation work task component can comprise a plurality of actuator operators work, parameters of the actuator operators can be configured automatically, and the actuator operators can be understood as Grouping workers.

In this embodiment, the aggregate work task component may be responsible for scheduling of the overall aggregate computations. After the aggregation work component can place the obtained multiple data blocks into the aggregation work Queue Grouping Worker Queue, the actuator operator can perform single-dimensional aggregation calculation on the data blocks according to the directed acyclic graph generated by the aggregation work task component. It should be noted that the single-dimensional aggregation calculation cannot be performed on a plurality of data blocks at the same time.

The Grouping Worker Queue may include a Grouping Data Queue and a Grouping table Queue. Specifically, the obtained multiple Data block blocks may be placed in a Grouping Data Queue in a Grouping Worker Queue.

Further, performing single-dimension aggregation calculation on the plurality of data blocks to obtain a plurality of single-dimension column data, including: putting the data blocks into an aggregation work queue, and determining a plurality of multi-dimensional combinations through the aggregation work task component, wherein one multi-dimensional combination is formed by data corresponding to at least one column; aiming at a target data block, generating a corresponding directed acyclic graph based on a plurality of multi-dimensional combinations determined by the target data block through the aggregation work task component; and performing single-dimension aggregation calculation on data corresponding to the first nodes of the complete links in the directed acyclic graph to obtain a plurality of single-dimension column data.

The multidimensional combination can be a combination formed by data of multiple dimensions, and can be understood as a cube, and each edge of the cube corresponds to different dimensions. In the present embodiment, a multidimensional combination can be understood as a combination made up of data of a plurality of columns. Each data block may correspond to a plurality of different multidimensional combinations, for example, a multidimensional combination may include 2 columns of data, and a multidimensional combination may include 3 columns of data.

For example, a data block may include three columns of data a, b, c, and the corresponding multidimensional combinations of the data block may be (a, b, c), (a, c), and (c).

In this embodiment, the process of obtaining single-dimensional column data for each data block is the same, and a data block is taken as an example for description below: the aggregate work task component may determine a plurality of multi-dimensional combinations based on the target data block; generating a corresponding directed acyclic graph according to a plurality of multi-dimensional combinations, wherein the specific generation process is not described herein; the aggregation work task component can dispatch the aggregation task to a corresponding directed acyclic graph unit to be processed as a special group; the aggregation work task component can perform single-dimension aggregation calculation on data of the head node of each complete link in the directed acyclic graph in all the data blocks when traversing the data blocks to obtain a plurality of single-dimension column data.

The directed acyclic graph may include a plurality of complete links, each link is formed by a plurality of nodes, and one node may represent one column according to the flow direction of the arrow mark data between the nodes.

Fig. 2 is a partial schematic flow chart of a multidimensional aggregation method according to an embodiment of the present invention, and as shown in fig. 2, a scheduling component obtains a data block 0, a data block 1, and a data block 2 from a downstream operator, where the three data blocks all include three columns a, b, and c, column a in data block 1 includes 2 and 3, column b includes 456,789, and column c includes y and z; sending the data block 0, the data block 1 and the data block 2 to an aggregation work queue, wherein an aggregation work task component can generate a corresponding directed acyclic graph DAG according to the data blocks in the aggregation work task queue, and the graph comprises three complete links, namely a-b-c, b-c and c, wherein the first node in the complete link a-b-c is a column a, the first node in the complete link b-c is a column b, and the first node in the complete link c is a column c; when data are traversed from the aggregation work task queue, data of a column a, a column b and a column c in a data block 0, a data block 1 and a data block 2 are traversed, and then single-dimensional aggregation calculation is carried out to obtain three single-dimensional column data, namely group by a, group by b and group by c, wherein the group by a comprises data 1,2 and 3; the group by b comprises data 123,456,789; the group by c includes data x, y, z.

In this embodiment, the data persistence dropping may be understood as storing data on a local disk. The data in the data block which completes the single-dimensional aggregation calculation is temporarily stored and persistently landed, so that real data does not need to be used in subsequent calculation, and the memory consumption is greatly reduced.

S130, generating secondary indexes corresponding to the plurality of single-dimensional column data through the aggregation index component, and generating a plurality of secondary indexes through one single-dimensional column data.

The aggregation index component may be a software component having a secondary index function, and the number of the aggregation index components may be multiple. The aggregation index component may perform secondary indexing by key value pairs, taking keys in key value pairs as data block indexes, and taking values in key value pairs as row indexes. The data block and the row of each data in the column data of the single dimension can be known according to the secondary index.

Furthermore, a plurality of secondary indexes are generated by one single-dimensional column data, each secondary index has a corresponding key value pair, a key in a key value pair represents one single-dimensional data, and a value in the key value pair represents one data in one single-dimensional column data; a secondary index includes a data block index that identifies a data block in which the one single-dimensional column data is located and a row index that identifies a column in which the one single-dimensional column data is located.

Exemplarily, fig. 3 is an exemplary diagram of two-level indexes in a multidimensional aggregation method according to an embodiment of the present invention, as shown in fig. 3, taking a single-dimensional column data a as an example for explanation, the single-dimensional column data a may generate 3 two-level indexes, key _ a in a Key value pair keya: 1 of a first two-level index represents the single-dimensional column data a,1 represents one data 1 in the single-dimensional column data a, blk _ idx in the first two-level index, i.e., a data block index, is 0, represents that data 1 is in a data block 0, and row _ idx in the first two-level index, i.e., a row index, is 0, represents a 0 th row of the data 1 in the data block 0; key _ A in a Key value pair Key _ A:2 of the second secondary index represents column data a of a single dimension, 2 represents one data 2 in the column data a of the single dimension, 2 data 2 are included in the column data a of the single dimension according to blk _ idx and row _ idx in the second secondary index, one data 2 is in the 1 st row of a data block 0, and the other data 2 is in the 0 th row of the data block 1; key _ A in a Key value pair Key _ A:3 of the third secondary index represents column data a of a single dimension, 3 represents one data 3 in the column data a of the single dimension, 3 data 3 are contained in the column data a of the single dimension, the first data 3 is in the 1 st row of a data block 1, the second data 3 is in the 0 th row of a data block 2, and the third data 3 is in the 1 st row of the data block 2.

And S140, performing aggregation table merging through the actuator operators based on the secondary indexes and dichotomy searching to obtain a plurality of standard aggregation tables, wherein the aggregation tables are combinations of a series of secondary indexes, and a plurality of secondary indexes generated by one single-dimensional column data form an aggregation table.

In this embodiment, all the calculations may be performed by an actuator operator, which may include calculating a grouping by exprs value to generate an aggregation table; merging the aggregation tables; and if the aggregation table does not need to be combined continuously, calculating and outputting an aggregation result.

The aggregation table may be understood as a hash table, and the aggregation table combination may be understood as a multidimensional aggregation.

In this embodiment, the multiple actuator operators may merge aggregation tables formed by multiple secondary indexes generated by column data of each single dimension in each complete link according to a data flow direction in each complete link in the directed acyclic graph, so as to obtain multiple target aggregation tables, where each complete link corresponds to one target aggregation table. If a complete link includes multiple nodes, then multiple multidimensional aggregation can be performed to obtain a target aggregation table.

Further, performing aggregation table merging based on the second-level index and dichotomy lookup to obtain a plurality of target aggregation tables, including: and performing aggregation table merging on each node in each complete link in the directed acyclic graph based on two-level index and dichotomy search to obtain a plurality of target aggregation tables, wherein each node is formed by single-dimensional column data.

Specifically, for a complete link, determining a merging sequence according to the data flow direction of each node in the complete link; according to the merging sequence, based on a two-level index and a binary search, merging a first aggregation table corresponding to a first node and a second aggregation table corresponding to a second node in the complete link to obtain an initial target aggregation table; and merging the initial target aggregation table with a third aggregation table corresponding to a third node based on two-level index and dichotomy search until all the aggregation tables corresponding to all the nodes are merged to obtain the target aggregation table.

For example, if a complete link in the directed acyclic graph is a-b-c, the initial target aggregation table needs to be obtained by merging the node a and the node b, and then the target aggregation table can be obtained by merging the initial target aggregation table and the node c. The node a can be understood as a node formed by a column a data of a single dimension, the node b can be understood as a node formed by b column data of a single dimension, and the node c can be understood as a node formed by c column data of a single dimension.

Further, based on the second-level index and the binary search, merging the first aggregation table corresponding to the first node and the second aggregation table corresponding to the second node to obtain an initial target aggregation table, including: determining a secondary index which does not need to be merged in the second node according to a data block index in a plurality of secondary indexes in a first aggregation table corresponding to the first node, and filtering the secondary index which does not need to be merged; in the search of the row index, determining a plurality of reference secondary indexes by using dichotomy search, wherein one reference secondary index is a secondary index in the first aggregation table and a secondary index in the second aggregation table, and the corresponding array of the two reference secondary indexes is smaller; determining a detection secondary index, wherein the detection secondary index is a secondary index in the first aggregation table and a secondary index in the second aggregation table, and the detection secondary index corresponds to a secondary index with a larger array; traversing the reference secondary index, and finding a row shared with the reference secondary index from the detection secondary index as a shared row; merging the common lines into a secondary index to obtain a merged secondary index; and combining the obtained multiple secondary merging indexes into an initial target aggregation table.

Determining the secondary index which does not need to be merged in the second node according to the data block index in the plurality of secondary indexes in the first aggregation table corresponding to the first node comprises the following steps: and aiming at one secondary index in the first aggregation table, taking the data block index in the secondary index as a target index, and determining a secondary index which does not comprise the target index from a plurality of secondary indexes in the second aggregation table as a secondary index which does not need to be merged.

For example, if the first aggregation table includes a secondary index a, a secondary index B, and a secondary index C, and if the second aggregation table includes a secondary index a, a secondary index B, and a secondary index C, for the secondary index a, if the secondary index a includes 2 data indexes and the secondary index a includes 1 data index, the array size of the secondary index a is considered to be large, the secondary index a may be used as a reference secondary index, the secondary index B may be used as a detection secondary index, a row in the secondary index B that is common to the secondary index a may be used as a common row, the common row is merged into one secondary index to obtain one merged secondary index, the secondary index B may obtain one merged secondary index in the manner described above, the secondary index C may obtain one merged secondary index in the manner described above, and the obtained 3 secondary indexes are combined into an initial target aggregation table.

It can be understood that, if the data sizes corresponding to the two secondary indexes are equal, any one of the two secondary indexes may be used as the reference secondary index, and the corresponding other secondary index may be used as the detection secondary index.

Exemplarily, fig. 4 is a first process example diagram of merging aggregation tables in a multidimensional aggregation method according to an embodiment of the present invention, as shown in fig. 4, nodes a and B are merged, and since blk _ idx in the secondary index Key _ a:1 is 0, a secondary index with blk _ idx not being 0 among three secondary indexes, i.e., key _ B:123, key _ B:456 and Key _ B:789, is filtered, that is, a dashed arrow part in the diagram, and thus when a common row is merged for Key _ a:1, merging with Key _ B:456 and Key _ B:789 filtered in node B is not required; when Key _ A:1 and Key _ B:123 are merged, because 1 piece of data corresponding to Key _ A:1 and 2 pieces of data corresponding to Key _ B:123 are present, key _ A:1 can be used as a reference secondary index, key _ B:123 can be used as a detection secondary index, and the 0 th row in a behavior data block 0 is shared, then blk _ idx in one merged secondary index Key _ A _ B [1] [123] obtained after merging is 0, and row _ u idx is 0. Since blk _ idx in the secondary indexes Key _ A:2 is 0 and 1, and blk _ idx in Key _ B:123 includes 0, blk _ idx in Key _ B:456 includes 1, blk _ idx in Key _ B:789 includes 1, then there is no secondary index to filter, so Key _ A:2 needs to be merged with all secondary indexes in node B, when Key _ A:2 is merged with Key _ B:123, because data corresponding to Key _ A:2 and Key _ B:123 are both 2, any one of Key _ A:2 and Key _ B:123 can be used as a reference secondary index, the corresponding other is used as a probing index, and the 1 st row in the shared behavior data block 0 of Key _ A:2 and Key _ B:123, then the merged secondary index Key _ A _ B [2] [123] obtained after merging has a blk _ idx of 0, and robldw _; when Key _ A:2 and Key _ B:456 are merged, because Key _ A:2 corresponds to 2 data and Key _ B:456 corresponds to 1 data, key _ B:456 can be used as a reference secondary index, key _ A:2 is used as a probe index, and the shared behavior of Key _ A:2 and Key _ B:456 is the 0 th row in the data block 1, then blk _ idx in one merged secondary index Key _ A _ B [2] [456] obtained after merging is 0, and row _ idx is 1. All the secondary indexes are merged according to the merging method, and the merging process of the remaining secondary indexes is not described herein. The final initial target aggregation table is composed of Key _ A _ B [1] [123], key _ A _ B [2] [123], key _ A _ B [2] [456] and Key _ A _ B [3] [789 ].

In this embodiment, the process of merging the initial target aggregation table with the third aggregation table corresponding to the third node based on the second-level index and the binary search is the same as the above process, and is not described herein again.

Fig. 5 is a second exemplary flow chart of merging aggregation tables in a multidimensional aggregation method according to an embodiment of the present invention, where fig. 5 shows a process of merging an initial target aggregation table, that is, a-b and a node c, and a specific merging manner may refer to the explanation in fig. 4, which is not described herein again.

The embodiment of the invention provides a multi-dimensional aggregation method, which comprises the steps of firstly, acquiring a plurality of data blocks from a downstream operator through a scheduling component; then, performing single-dimensional aggregation calculation on the plurality of data blocks through a plurality of actuator operators in an aggregation work task component to obtain a plurality of single-dimensional column data, and then persistently setting the data in the plurality of data blocks; generating a plurality of single-dimensional column data by using a plurality of single-dimensional column data; and finally, performing aggregation table combination through the actuator operators based on the two-level indexes and the dichotomy search to obtain a plurality of target aggregation tables, wherein the aggregation tables are combination of a series of two-level indexes, and a plurality of two-level indexes generated by one single-dimensional column data form an aggregation table. According to the method, the data partitioning and the secondary index linkage effect are utilized, the use of the multi-dimensional aggregation memory can be greatly reduced by the data partitioning, the disk can be timely dropped, and extra memory space does not need to be occupied; the secondary index can efficiently filter data in the merging process of the aggregated data, improve the performance and achieve the optimal balance of the use of the memory and the CPU.

Further, a multidimensional polymerization method provided by the first embodiment of the present invention further includes: and taking out the data of the permanent disk drop according to the secondary index in the target aggregation table through the actuator operator, and outputting the data to an upstream operator.

In this embodiment, after the target aggregation table is obtained, corresponding data may be obtained from the local disk according to the secondary index in the target aggregation table, and the obtained data is output to the upstream operator.

Example two

Fig. 6 is a schematic structural diagram of a multidimensional aggregation system according to a second embodiment of the present invention, where the system is applicable to a case where multidimensional aggregation analysis is performed on a distributed OLTP-type service, and the system is generally integrated on an electronic device as a software system.

As shown in fig. 6, the system includes: a scheduling component 110, an aggregate work task component 120, and an aggregate index component 130, the aggregate work task component 120 being coupled to the scheduling component 110 and the aggregate index component 130, respectively.

A scheduling component 110 for obtaining a plurality of data blocks from a downstream operator;

the aggregation work task component 120 is configured to perform single-dimensional aggregation calculation on the plurality of data blocks through a plurality of actuator operators to obtain a plurality of single-dimensional column data, and then persistently destage data in the plurality of data blocks;

the aggregation index component 130 is configured to generate secondary indexes corresponding to the plurality of single-dimensional column data, where one single-dimensional column data generates a plurality of secondary indexes;

the aggregation task component 120 is further configured to perform aggregation table merging based on the second-level index and the bisection search by the multiple actuator operators to obtain multiple target aggregation tables, where the aggregation tables are a combination of a series of second-level indexes, and multiple second-level indexes generated by one single-dimensional column data form one aggregation table.

In this embodiment, the system first obtains a plurality of data blocks from a downstream operator via the scheduling component 110; then, performing single-dimensional aggregation calculation on the plurality of data blocks through a plurality of actuator operators by an aggregation work task component 120 to obtain a plurality of single-dimensional column data, and then persistently setting the data in the plurality of data blocks; then, generating a secondary index corresponding to the plurality of single-dimensional column data through the aggregation index component 130, wherein a plurality of secondary indexes are generated by one single-dimensional column data; and finally, performing aggregation table merging by a plurality of actuator operators in the aggregation work task component 120 based on the secondary indexes and dichotomy search to obtain a plurality of target aggregation tables, wherein the aggregation tables are a combination of a series of secondary indexes, and a plurality of secondary indexes generated by one single-dimensional column data form one aggregation table.

The embodiment provides a multidimensional aggregation device, which can provide a multidimensional aggregation method with optimal balance between a memory and a central processing unit in real time.

Further, the aggregate work task component 120 includes a computing unit to: putting the data blocks into an aggregation work queue, and determining a plurality of multi-dimensional combinations through the aggregation work task component, wherein one multi-dimensional combination is formed by data corresponding to at least one column; aiming at a target data block, generating a corresponding directed acyclic graph based on a plurality of multi-dimensional combinations determined by the target data block through the aggregation work task component; and performing single-dimension aggregation calculation on data corresponding to the first nodes of the complete links in the directed acyclic graph to obtain a plurality of single-dimension column data.

Further, a plurality of secondary indexes are generated by using the single-dimensional column data, each secondary index has a corresponding key value pair, a key in each key value pair represents one single-dimensional data, and a value in each key value pair represents one data in one single-dimensional column data; a secondary index includes a data block index that identifies a data block in which the one single-dimensional column data is located and a row index that identifies a column in which the one single-dimensional column data is located.

Further, the aggregate work task component 120 includes a merge unit to: and performing aggregation table merging on each node in each complete link in the directed acyclic graph based on two-level index and dichotomy search to obtain a plurality of target aggregation tables, wherein each node is formed by single-dimensional column data.

Further, the merging unit is specifically configured to: for a complete link, determining a merging sequence according to the data flow direction of each node in the complete link; according to the merging sequence, based on a two-level index and a binary search, merging a first aggregation table corresponding to a first node and a second aggregation table corresponding to a second node in the complete link to obtain an initial target aggregation table; and merging the initial target aggregation table with a third aggregation table corresponding to a third node based on the second-level index and the dichotomy search until all the aggregation tables corresponding to all the nodes are merged to obtain the target aggregation table.

Further, the apparatus further comprises an output module configured to: and taking out the data of the permanent disk drop according to the secondary index in the target aggregation table, and outputting the data to an upstream operator.

The multi-dimensional aggregation device can execute the multi-dimensional aggregation method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.

EXAMPLE III

Fig. 7 is a schematic structural diagram of a multidimensional aggregation system provided in a third embodiment of the present invention, where the third embodiment is taken as an exemplary embodiment and provides a multidimensional aggregation system, where the multidimensional aggregation system can execute the multidimensional aggregation method described in any embodiment of the present invention.

As shown in fig. 7, the Dispatcher, that is, the dispatch component, obtains a data block from the Child executive, that is, the downstream operator; putting the obtained data blocks into a Grouping Worker Queue, namely an aggregation work Queue; the group Worker scheduler, namely the aggregation work task component, can acquire data blocks from the group Worker Queue to generate a corresponding directed acyclic graph; performing single-dimension aggregation calculation by any one of work0, work1, work2 and work3 to obtain a plurality of single-dimension column data, and persistently setting data in a plurality of data blocks; generating a plurality of secondary indexes corresponding to single-dimensional column data by the group Worker Queue, searching and executing aggregation table combination by a work based on the secondary indexes and a dichotomy to obtain an aggregation table, namely a plurality of target aggregation tables, and sending the aggregation table to a group Map Queue in the group Worker Queue; and (4) taking out the permanently-dropped data by the work according to the secondary index in the aggregation table, and outputting the data to a Parent execution or namely an upstream operator.

The multidimensional polymerization system provided by the third embodiment of the invention can realize multidimensional polymerization in real time and with optimal balance of the use of the memory and the central processing unit.

Further, the components included in the multidimensional aggregation system and the number of functions of each component are shown in table 1:

TABLE 1

Example four

FIG. 8 illustrates a block diagram of an electronic device 10 that may be used to implement an embodiment of the invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular phones, smart phones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 8, the electronic device 10 includes at least one processor 11, and a memory communicatively connected to the at least one processor 11, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, and the like, wherein the memory stores a computer program executable by the at least one processor, and the processor 11 can perform various suitable actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from a storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data necessary for the operation of the electronic apparatus 10 can also be stored. The processor 11, the ROM 12, and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to the bus 14.

A number of components in the electronic device 10 are connected to the I/O interface 15, including: an input unit 16 such as a keyboard, a mouse, or the like; an output unit 17 such as various types of displays, speakers, and the like; a storage unit 18 such as a magnetic disk, an optical disk, or the like; and a communication unit 19 such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, or the like. The processor 11 performs the various methods and processes described above, such as a multi-dimensional aggregation method.

In some embodiments, the multidimensional aggregation method may be implemented as a computer program tangibly embodied in a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the multi-dimensional aggregation method described above may be performed. Alternatively, in other embodiments, the processor 11 may be configured to perform the multi-dimensional aggregation method by any other suitable means (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Computer programs for implementing the methods of the present invention can be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be performed. A computer program can execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. A computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), blockchain networks, and the internet.

The computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present invention may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired result of the technical solution of the present invention can be achieved.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A multi-dimensional polymerization process, the process comprising:

2. The method of claim 1, wherein performing a single-dimension aggregation calculation on the plurality of data blocks to obtain a plurality of single-dimension column data comprises:

putting the data blocks into an aggregation work queue, and determining a plurality of multi-dimensional combinations through the aggregation work task component, wherein one multi-dimensional combination is formed by data corresponding to at least one column;

aiming at a target data block, generating a corresponding directed acyclic graph based on a plurality of multi-dimensional combinations determined by the target data block through the aggregation work task component;

and performing single-dimension aggregation calculation on data corresponding to the first nodes of the complete links in the directed acyclic graph to obtain a plurality of single-dimension column data.

3. The method of claim 1, wherein one single-dimensional column data generates a plurality of secondary indexes, each secondary index having a corresponding key value pair, a key in a key value pair representing one single-dimensional data, a value in a key value pair representing one data in one single-dimensional column data; one secondary index includes a data block index that identifies a data block in which the one single-dimensional column data is located and a row index that identifies a column in which the one single-dimensional column data is located.

4. The method of claim 1, wherein performing the aggregation table merging based on the two-level index and the binary search to obtain a plurality of target aggregation tables comprises:

and performing aggregation table merging on each node in each complete link in the directed acyclic graph based on two-level index and dichotomy search to obtain a plurality of target aggregation tables, wherein each node is formed by single-dimensional column data.

5. The method of claim 4,

for a complete link, determining a merging sequence according to the data flow direction of each node in the complete link;

merging a first aggregation table corresponding to a first node and a second aggregation table corresponding to a second node in the complete link according to the merging sequence based on a two-level index and dichotomy search to obtain an initial target aggregation table;

and merging the initial target aggregation table with a third aggregation table corresponding to a third node based on two-level index and dichotomy search until all the aggregation tables corresponding to all the nodes are merged to obtain the target aggregation table.

6. The method of claim 5, wherein merging the first aggregation table corresponding to the first node and the second aggregation table corresponding to the second node based on the two-level index and the binary search to obtain the initial target aggregation table comprises:

determining a secondary index which does not need to be merged in the second node according to a data block index in a plurality of secondary indexes in a first aggregation table corresponding to the first node, and filtering the secondary index which does not need to be merged;

in the search of the row index, determining a plurality of reference secondary indexes by using dichotomy search, wherein one reference secondary index is a secondary index in the first aggregation table and a secondary index in the second aggregation table, and the corresponding array of the two reference secondary indexes is smaller;

determining a detection secondary index, wherein the detection secondary index is a secondary index in the first aggregation table and a secondary index in the second aggregation table, and the detection secondary index corresponds to a secondary index with a larger array;

traversing the reference secondary index, and finding a row shared with the reference secondary index from the detection secondary index as a shared row;

merging the common lines into a secondary index to obtain a merged secondary index;

and combining the obtained multiple secondary merging indexes into an initial target aggregation table.

7. The method of claim 1, further comprising:

and taking out the data of the permanent disk drop according to the secondary index in the target aggregation table through the actuator operator, and outputting the data to an upstream operator.

8. A multi-dimensional aggregation system is characterized by comprising a scheduling component, an aggregation work task component and an aggregation index component, wherein the aggregation work task component is respectively connected with the scheduling component and the aggregation index component;

9. An electronic device, characterized in that the electronic device comprises:

at least one processor; and

the memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the multi-dimensional aggregation method of any one of claims 1-7.

10. A computer-readable storage medium storing computer instructions for causing a processor to perform the multidimensional aggregation method of any one of claims 1-7 when executed.