WO2023273183A1

WO2023273183A1 - Hybrid engine-based multidimensional data query method and apparatus

Info

Publication number: WO2023273183A1
Application number: PCT/CN2021/137140
Authority: WO
Inventors: 鄂海红; 宋美娜; 田川
Original assignee: 北京邮电大学
Priority date: 2021-06-30
Filing date: 2021-12-10
Publication date: 2023-01-05
Also published as: CN113641669B; CN113641669A

Abstract

A hybrid engine-based multidimensional data query method and apparatus. The method comprises: constructing a data cube spanning tree, the data cube spanning tree comprising a plurality of sub-data cubes, and each sub-data cube corresponding to a pre-aggregation result of a dimension combination (S10); establishing a dimension dictionary, the dimension dictionary being used for representing mapping relationships between dimensions and bits (S20); constructing a bitmap index on the basis of the dimension dictionary, and introducing the bitmap index into the cube spanning tree to obtain a bitmap index-based bitmap tree, the bitmap index being composed of a bit array, and a bit value in the bit array representing whether the dimension corresponding to the bit is pre-computed (S30); and obtaining a query requirement, generating a structured query language according to the query requirement, and obtaining a query result according to the structured query language and the bitmap tree (S40).

Description

Multidimensional data query method and device based on hybrid engine

Cross References to Related Applications

This application is based on a Chinese patent application with application number 202110733535.X and a filing date of June 30, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.

technical field

The present invention relates to the technical field of information technology and data business, and in particular to the field of a hybrid engine multi-dimensional data query method.

Background technique

In the era of big data, owning data means having value. However, mining the value of big data is not an easy task, and multidimensional data analysis is one of the important means. The multi-dimensional analysis of big data is to observe and mine massive data from multiple angles and sides, supplemented by various analysis operations such as scrolling, drilling down, slicing, dicing, and rotating. After professional integration and analysis, the final output is visualized Data or charts that help analysts and business users gain insight into the information and meaning contained in the data.

On-Line Analytical Processing (OLAP) is the mainstream technology for multidimensional analysis of big data, which includes traditional relational OLAP (Relational OLAP, ROLAP) based on data warehouse and multidimensional OLAP (Multidimensional OLAP) using precomputing technology. ,MOLAP). Among them, ROLAP has developed relatively maturely, but with the explosive growth of data volume, its query time inevitably increases linearly with the data size, and the join operation of multi-table and multi-dimensional will bring greater overhead, so in large In the multi-dimensional data analysis scenario, ROLAP has a serious performance bottleneck. MOLAP uses pre-computing technology to solve this problem. After pre-computing, it only needs to scan the materialized view to get the result during the query, avoiding the scanning of the original records with increasing scale. However, the MOLAP query must be based on the pre-calculated materialized view set (also known as data cube, Cube). Whether it is manual modeling or automatic modeling, the pre-calculation process will last for tens of minutes or even hours. Multidimensional data analysis increases a lot of lead-time waiting time, so MOLAP can play its advantage only in the mixed engine system.

In general, ROLAP and MOLAP engines have their own advantages and disadvantages, so they are good at different usage scenarios. In a system based on a hybrid engine, it is difficult for the system to make a fast and reasonable choice between the two, and a query routing is urgently needed to solve the problem of query engine selection, and to establish a multi-dimensional model index to provide support for query routing.

Traditional OLAP index technology includes four types of indexes: B-tree index, R-tree index, hash index and bitmap index. The first three are widely used, but they are difficult to meet the requirements of fast query of high-dimensional scientific data, for example: B-tree index and R-tree index are only valid for data sets with dimensions less than 15. The bitmap index can better adapt to the characteristics of high-dimensional data.

Contents of the invention

The present invention aims to solve one of the technical problems in the related art at least to a certain extent.

For this reason, the first purpose of the present invention is to propose a multi-dimensional data query method based on a hybrid engine, so as to realize that the bitmap Bitmap index of Cube can be established in the background, intelligently select a suitable query engine to perform query tasks, and improve the overall system performance.

The second object of the present invention is to propose a multi-dimensional data query device based on a hybrid engine.

A third object of the present invention is to provide a non-transitory computer-readable storage medium.

A fourth object of the present invention is to provide an electronic device.

A fifth object of the present invention is to provide a computer program product.

In order to achieve the above purpose, the embodiment of the first aspect of the present invention proposes a multi-dimensional data query method based on a hybrid engine, including the following steps:

Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;

Establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;

Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, the The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;

Obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.

Optionally, in an embodiment of the present application, obtaining query results according to the structured query language and the bitmap tree includes:

When determining the bitmap tree matching the structured query language, a depth-first search is performed from the root node of the bitmap tree;

When it is determined that the sub-data cube that is accurately hit is found through the depth-first search, the MOLAP engine is used to query the pre-aggregated data, and a query result is generated according to the pre-aggregated data.

Optionally, in an embodiment of the present application, obtaining the query result according to the structured query language and the bitmap tree further includes:

When it is determined that the sub-data cube of the exact hit is not found through the depth-first search, backtracking to find the sub-data cube of the fuzzy hit;

When it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, the MOLAP engine is used to query the pre-aggregated data, and the query result is generated according to the pre-aggregated data;

When it is determined that the backtracking search does not find a sub-data cube with a fuzzy hit, the ROLAP engine is used to calculate the query result online.

Optionally, in one embodiment of the present application, after building the data cube spanning tree, it also includes:

The data cube spanning tree is pruned by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, and each aggregation group is used as the root node to materialize the respective child In a data cube, the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.

Optionally, in one embodiment of the present application, the establishment of a dimension dictionary includes:

Obtain the pregnancy result table, basic information table, and mother's physical examination table, generate a star schema according to the pregnancy result table, basic information table, and mother's physical examination table, and flatten the star schema to obtain the dimension dictionary, wherein , each dimension corresponds to a bit.

In order to achieve the above purpose, the embodiment of the second aspect of the present application proposes a multi-dimensional data query device based on a hybrid engine of the present invention, which includes the following modules:

A building block for building a data cube spanning tree, the data cube spanning tree comprising a plurality of sub-data cubes, each sub-data cube corresponding to a pre-aggregated result of a combination of dimensions;

Establishing a module for establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;

A generating module, configured to construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index consists of bits Composed of an array, the bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;

The query module is used to obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.

Optionally, in one embodiment of the present application, the query module includes:

The first query unit is configured to perform a depth-first search from the root node of the bitmap tree when determining the bitmap tree matching the structured query language;

The first generation unit is configured to use the MOLAP engine to query the pre-aggregated data and generate a query result according to the pre-aggregated data when it is determined that an accurately hit sub-data cube is found through the depth-first search.

Optionally, in one embodiment of the present application, the query module further includes:

The second query unit is used to retroactively search for sub-data cubes with fuzzy hits when it is determined that no sub-data cubes with precise hits have been found through the depth-first search;

The second generating unit is used to use the MOLAP engine to query the pre-aggregated data when it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, and the query result is generated according to the pre-aggregated data;

Optionally, in an embodiment of the present application, it also includes:

The pruning module is used to prune the data cube generation tree by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, with each aggregation group as the root Nodes start to materialize their respective child data cubes, and the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.

The technical effect of the present application: the Bitmap index of the Cube can be established in the background, and the appropriate query engine can be intelligently selected to perform the query task, and the overall performance of the system can be improved.

In order to achieve the above purpose, the embodiment of the third aspect of the present application proposes a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implemented in the embodiment of the first aspect of the application The hybrid engine-based multidimensional data query method described above.

To achieve the above purpose, the embodiment of the fourth aspect of the present application proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions , so as to implement the hybrid engine-based multidimensional data query method described in the embodiment of the first aspect of the present application.

To achieve the above purpose, the embodiment of the fifth aspect of the present application proposes a computer program product, including a computer program, when the computer program is executed by a processor, it realizes the multi-dimensional data based on the hybrid engine described in the embodiment of the first aspect of the application. Query method.

Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.

Description of drawings

The above and/or additional aspects and advantages of the present application will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:

FIG. 1 is a flowchart of a multidimensional data query method based on a hybrid engine according to an embodiment of the present application.

Fig. 2 is the schematic diagram of the Cube spanning tree (full amount) of the embodiment of the present application;

Fig. 3 is the schematic diagram of the Cube spanning tree (optimized) of the embodiment of the present application;

FIG. 4 is a schematic diagram of converting a star schema into a dimension dictionary according to an embodiment of the present application;

Fig. 5 is the schematic diagram that the Cuboid of the embodiment of the present application constructs the Bitmap index;

Fig. 6 is the schematic diagram of the Bitmap index based on Cube spanning tree of the embodiment of the present application;

Fig. 7 is the schematic diagram of 4 kinds of matching situations when the Bitmap retrieval of the embodiment of the present application;

FIG. 8 is a schematic diagram of the overall flow of multi-dimensional data query routing based on a hybrid engine according to an embodiment of the present application;

FIG. 9 is a system overall architecture diagram of an embodiment of the present application;

FIG. 10 is a system technical architecture diagram of an embodiment of the present application;

FIG. 11 is a schematic structural diagram of an apparatus for searching multi-dimensional data based on a hybrid engine according to an embodiment of the present application.

detailed description

Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.

The hybrid engine-based multidimensional data query method of the embodiment of the present invention will be described below with reference to the accompanying drawings.

As shown in Figure 1, in order to achieve the above purpose, the embodiment of the first aspect of the present invention proposes a multi-dimensional data query method based on a hybrid engine, including the following steps:

S10: Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, and each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;

S20: Establish a dimension dictionary, the dimension dictionary is used to represent the mapping relationship between dimensions and bits;

S30: Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;

S40: Obtain a query requirement, generate a structured query language according to the query requirement, and obtain a query result according to the structured query language and the bitmap tree.

In an embodiment of the present application, further, obtaining query results according to the structured query language and the bitmap tree includes:

In an embodiment of the present application, further, obtaining query results according to the structured query language and the bitmap tree further includes:

In one embodiment of the present application, further, after constructing the data cube spanning tree, it also includes:

In an embodiment of the present application, further specifically, a data cube (Cube) is actually a collection of a series of materialized views, and each materialized view is a result of a pre-aggregation calculation. For the Cube and its generation process, we can visualize it as a Cube spanning tree, taking the 4-dimensional Cube as an example, as shown in Figure 2.

Figure 2 is a full Cube spanning tree, which shows the process of a Cube being calculated step by step. Each block in the figure represents a pre-aggregated result of a combination of dimensions, which we can call a sub-Cube (or Cuboid) . The solid line arrows indicate the process of aggregation calculation, and the dotted line is the omitted repeated calculation.

It can be seen that the vertex Cuboid (ABCD) is calculated first, and it is aggregated through a large wide table (a detail table connected by a fact table and a dimension table), which is the basis for subsequent calculations. The second layer includes 4 Cuboids, which are ABC, ABD, ACD, and BCD. They are all calculated by ABCD. Similarly, the Cuboids in the remaining layers are aggregated step by step until they finally reach the end of the 1-dimensional leaf node.

Figure 2 shows a fully precomputed Cube, which can cover all aggregation query requirements, but when the number of dimensions increases, such a Cube takes up too much space, so tools such as aggregation groups need to be used for pruning. The present invention mainly adopts aggregation group pruning (optimization algorithm is performed automatically in the background), and supports some finer-grained pruning tools at the same time. These pruning tools and their applicable scenarios are shown in the table below.

After pruning, the scale of the Cube spanning tree is further reduced, as shown in Figure 3. This kind of spanning tree is more suitable for the Cube model in practical applications, and this kind of tree structure is the basic data structure used to build the Bitmap index in this paper.

In one embodiment of the present application, further, the establishment of a dimension dictionary includes:

In an embodiment of the present application, further, specifically, after the overall data structure is determined, a Bitmap index is to be constructed, and a dimension dictionary is also established for it. The dimension dictionary represents the mapping relationship between dimensions and Bitmap bits, which greatly saves the storage space occupied by Bitmap. Let's take the star model composed of three tables as an example, and flatten it to form a dimension dictionary, as shown in Figure 4.

The upper part of Figure 4 is a simple star model composed of three tables, including fact table GD_PREGENCY_RESULT (pregnancy result table), dimension table GD_BASIC_INFO_DETAIL (basic information table), and dimension table GD_PYSICAL_EXAM_W (mother's physical examination table). The blue fields in the figure represent dimensions, and the purple fields represent measures. After flattening the dimension part in the star model, the dimension dictionary shown in the lower part is obtained. Each dimension corresponds to a specific bits.

In one embodiment of the present application, further, specifically, a Bitmap index is constructed:

The Bitmap index is composed of bit arrays. Taking the 8-dimensional array in Figure 3 as an example, each bit represents a dimension, and the bit in the array represents whether the dimension is pre-calculated. For example, bit [3] is 1. Dimension GD_BASIC_INFO_DETAIL.FEDU_LEVEL is precomputed in this Cuboid, 0 is the opposite. In the MOLAP analysis scenario, the Bitmap index has significant advantages: (1) In the case of high dimensions, the Bitmap index only needs less memory usage, which means that the system can load more Bitmap indexes into the memory when retrieving the index Among them, the speed of index retrieval is greatly improved; (2) Bitmap index can perform logical operations, such as bitwise logical AND (AND), or (OR), exclusive OR (XOR), etc., eliminating the need for complex format conversion and speed of operation quick.

Figure 5 shows an example of Cuboid's Bitmap index. Each Cuboid is a bit array, representing which dimensions have been pre-calculated.

The Cube spanning tree is a tree structure composed of multiple Cuboids. Next, we insert the Bitmap index code of this Cuboid into the Cube spanning tree to form a Cube spanning tree-based Bitmap index, referred to as Bitmap tree, also with 8 Take the dimension data model as an example, the Bitmap tree after pruning optimization is shown in Figure 6.

So far, the basic structure of the Bitmap index has been formed, which can provide index support for MOLAP. It is worth noting that the metric is not in the index structure of the Bitmap tree. Each metric corresponds to an index tree. For example, the metric AVG(BIRTH_NUM), all the Cuboids contained in the Bitmap tree under its name are for the AVG(BIRTH_NUM ) for precomputation of different dimension combinations.

In an embodiment of the present application, further, specifically, the hybrid engine query routing based on the Bitmap index:

The first step, the logical operation of the Bitmap index:

The Cuboid in the Bitmap index is an independent bit array, which represents a combination of dimensions that have been precalculated, in the form of 01001100, and will not repeat each other, referred to as Cid. The Bitmap index is inseparable from the operation on Cid. We call the target dimension combination Ctarget, and the bit array is recorded as 01001000. Logical operations can be performed on Cid and Ctarget. This kind of binary logic operation does not require complex format conversion, and can obtain better computer hardware support, and the operation speed is faster. The following will briefly introduce three bitwise logical operation operations, and the retrieval of the Bitmap index can be realized through the combination of such simple operations.

(1) Bitwise AND:

Cid AND Ctarget＝01001100 AND 01001000＝01001000

(2) Bitwise OR:

Cid OR Ctarget＝01001100 OR 01001000＝01001100

(3) Bitwise XOR:

Cid XOR Ctarget＝01001100 XOR 01001000＝11111011

The second step is to retrieve the Bitmap index:

Retrieving the Bitmap index is essentially to find a Cuboid that meets the query requirements in the Cube spanning tree. We call the combination of dimensions of the Cuboid that fully meets the query requirements an "exact hit". Although exact hits can bring near-perfect performance to queries, it is not advisable to pursue exact hits too much. On the one hand, it will lead to the rapid expansion of the Cube, which will increase the burden of storage and calculation. On the other hand, since the spanning tree of the Cube is progressive layer by layer, if the child Cuboid cannot be hit, it can find its parent Cuboid and calculate it by its online aggregation. Get the query result. We call this kind of online aggregation calculation "reckoning", and the corresponding situation where the query is completed by reckoning is called "fuzzy hit".

In the actual Cube application process, the calculation performance in the case of fuzzy hits is affected by many factors, such as the size of the data set, the size of the result set, the layer difference between the calculation start point and the target node, and the cardinality of the fields involved in the calculation. The hybrid engine system in this paper designs a "precise-fuzzy hit model", which adopts precise hits and fuzzy hits in which the layer difference between the calculated starting point and the target node is 1 as the query hit condition. Next, the Bitmap index will be retrieved to find the Cuboid that meets the query requirements.

Below we summarize several situations of Cid calculation during the retrieval process. The currently retrieved Cuboid is recorded as Cid, and the combination of target dimensions is Ctarget:

(1) Accurate hit

Cid AND Ctarget＝＝Ctarget, Cid XOR Ctarget＝＝0

(2) Fuzzy hit

Cid AND Ctarget＝＝Ctarget, Cid XOR Ctarget＝R,R AND(R-1)＝＝0

(3) Cuboid not found, need to continue searching

Cid AND Ctarget==Ctarget

(4) Cube cannot meet the query requirements, so quit the search

Cid AND Ctarget! =Ctarget

The above four situations are the situations that will occur when the currently retrieved Cuboid (Cid) matches the query requirement (Ctarget) during the Bitmap index retrieval process, and logic operations are all used. It is worth noting that R in case (2) is the result of XOR of Cid and Ctarget, and the subsequent R AND(R-1)==0 means that there is only one 1 in the bit array of R, that is In the case of a fuzzy hit, the calculated start and end points differ only by 1 dimension. In order to illustrate the above four matching situations more vividly, this paper proposes that Ctarget is 00110000 (the query requirement dimension combination is CD), and these matches are marked in Figure 7.

The third step is to query the overall process of routing:

Cuboid's search is a depth-first search (Depth First Search, DFS), starting from the vertex of the Cuboid tree, and continuing to search for child nodes if the case (3) is met, until the end of the case (1) (accurate hit). If the precise hit cannot be met, then go back to the parent node to check whether the condition of the fuzzy hit is met (2). If neither hit exists, search to case (4) and exit DFS. Combined with Cuboid's DFS retrieval, this application sorts out the overall process of multi-dimensional data query routing based on hybrid engine, as shown in Figure 8.

For a query requirement (data overview, multidimensional analysis, data visualization, etc.), the system in this paper first organizes it into a unified SQL at the front end and submits it to the back-end query. After parsing the logical syntax tree of the SQL in the background, first match the metrics and operators, and only enter the Bitmap search for dimension combinations when the qualified Bitmap tree exists. child, metrics not precomputed, etc.) are pushed directly to the ROLAP engine. In the retrieval process of the Bitmap index, the Cuboid with the exact hit is searched first, and if it does not exist, the Cuboid with the fuzzy hit is searched backwards. The entire Bitmap retrieval process uses logical operations. If the target Cuboid is found, the MOLAP engine can be used to directly query the pre-aggregated data, otherwise the ROLAP engine can be used for online calculation. Finally, the query result is returned, ending the entire process of query routing.

Technical effect of the application: the present invention proposes and implements a multi-dimensional data analysis system based on Bitmap index-based hybrid engine query routing. This system comprehensively utilizes the advantages of ROLAP and MOLAP two query engines to provide multi-dimensional data analysis services in flexible scenarios At the same time, it provides users with automatic modeling, model lifecycle monitoring, and automatic model optimization services, so that MOLAP pre-computing technology can be well implemented in the system, relieve the dependence on data experts, and finally generate rich visualization with query results Charts and cockpits that allow users to mine information from big data at a glance.

As shown in Figure 11, in order to achieve the above purpose, the embodiment of the second aspect of the present application proposes a multi-dimensional data query device based on a hybrid engine of the present invention, including the following modules:

Construction module 10, is used for constructing data cube spanning tree, and described data cube spanning tree comprises a plurality of sub-data cubes, and each sub-data cube corresponds to the pre-aggregation result of a combination of dimensions;

Establishment module 20, is used for establishing dimension dictionary, and described dimension dictionary is used for representing the mapping relation of dimension and bit;

A generating module 30, configured to construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index consists of Composed of a bit array, the bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;

The query module 40 is configured to obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.

In an embodiment of the present application, further, the query module 40 includes:

In an embodiment of the present application, further, the query module 40 further includes:

In one embodiment of the present application, it further includes:

In one embodiment of the present application, further, specifically, the multidimensional data analysis system based on Bitmap indexed hybrid engine query routing:

The first step, the overall structure of the system:

As shown in Figure 9, the logic of the overall system architecture starts from the data source, and the original data is read from multiple sources such as HDFS, Hive, and RDBMS. The Cube is built in the background by the automatic modeling module, and the information required for designing and building the Cube is mined from the historically executed SQL, which is completed by the SQL parsing module. Cube needs model optimization during initial construction and use, and the MMO algorithm related to the present invention provides support for optimization. After the construction and optimization are completed, the Cube is stored in the HBase of the storage module, and other metadata, historical SQL and other information are stored in Mysql. The hybrid engine-based multi-dimensional query method of the present invention is used when performing query tasks, and the query routing module routes it, parses out the logic syntax tree, and retrieves the Bitmap to intelligently obtain the most suitable query engine. The final query results are rendered by the multidimensional analysis and visualization module into rich charts and dashboards for users to view.

The second step, technical architecture design:

As shown in Figure 10, from a technical point of view, the system is divided into five layers: data layer, data model layer, data calculation layer, business logic layer and page display layer. The bottom data layer is the original data. After modeling and optimization by the modeling layer, the data calculation layer executes the query tasks. Finally, the page display layer invokes the API of the business logic layer to present the visualized results to the user. The following will describe The technical details of these layers are presented.

Data layer: This layer is the storage of original data, supporting multi-source data such as Hadoop's HDFS, Mysql, and Hive. Among them, HDFS and Hive are used as big data warehouses for analysis, while Mysql stores more data tables and metadata required by system functions.

Data model layer: This layer includes three modules: automatic modeling, model optimization, and task scheduling. Automatic modeling uses SQL Parser to analyze historical SQL collections, extracts metadata, and combines them into a star model. The Fast Cube algorithm is used to materialize the aggregated data into Cubes, and the pre-aggregated calculations are performed by the Spark engine. Model optimization uses MMO algorithm and Aggregation Group related to the present invention. Task scheduling is performed using the Quartz scheduler and Linux Crontab, which is responsible for performing optimization and materialized calculations based on optimization strategies.

Data calculation layer: This layer executes multidimensional data query tasks, and is divided into two modules: calculation engine and multidimensional index. The calculation engine is a hybrid engine, including Spark SQL and Kylin. The query routing is based on the data pre-calculated by the system, retrieves the multi-dimensional index and selects the appropriate query engine to perform the task. The multidimensional index uses the technology of Cube spanning tree and Bitmap index.

Business logic layer: This layer is the middle layer between front-end requests and back-end query tasks, and encapsulates lower-layer services into functions for upper-layer calls. The overall architecture of the system is written using Spring Boot, and the communication between the front-end and the back-end and the communication between the back-end functional modules is carried out in the form of RESTful API, and the information returned from the back-end to the front-end is organized in JSON format.

Page display layer: responsible for displaying query results to users, the front end is written with Vue framework, and E-charts are used to generate various chart components.

The third step is to query the routing module:

The function of this module is to perform multi-dimensional data query tasks. Based on the hybrid engine, combined with Bitmap index retrieval, it can choose between Spark SQL and Kylin to achieve optimal query performance. The overall process of this module is shown in Figure 7. According to these functional processes, the interfaces required by this module can be summarized, as shown in the following table:

query is the main interface of this module, and its execution logic is transparent to users. The query interface calls several services: the multi-dimensional query route (queryRoute) is used to retrieve the Bitmap index, select the query engine (use SQLParse to parse the logical syntax tree, and then perform the query task matching by BitmapIndex), and the query execution service of SparkSQL (sparkSql) is used to Execute the push-down query task, and Kylin's query execution service (kylinQuery) is used to execute the pre-calculated result query when hitting the Cube. The called services are encapsulated into interfaces at the same time, which can be debugged and tested by calling an engine separately.

In order to achieve the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium, the computer stores a computer program, and when the computer program is executed by a processor, the multi-dimensional data based on the hybrid engine of the embodiment of the present application is realized. Query method.

In order to achieve the above embodiments, the present invention also proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement the present invention The multi-dimensional data query method based on the hybrid engine of the embodiment of the application.

In order to implement the above embodiments, the present invention further proposes a computer program product, including a computer program, and when the computer program is executed by a processor, the method for querying multi-dimensional data based on a hybrid engine in the embodiment of the present application is implemented.

While the present application has been disclosed in detail with reference to the accompanying drawings, it should be understood that these descriptions are illustrative only and are not intended to limit the application of the present application. The protection scope of the present application is defined by the appended claims, and may include various changes, modifications and equivalent solutions for the invention without departing from the protection scope and spirit of the present application.

In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.

In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present invention, "plurality" means at least two, such as two, three, etc., unless specifically defined otherwise.

Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing custom logical functions or steps of a process , and the scope of preferred embodiments of the invention includes alternative implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order depending on the functions involved, which shall It is understood by those skilled in the art to which the embodiments of the present invention pertain.

The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium, For use with instruction execution systems, devices, or devices (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or devices and execute instructions), or in conjunction with these instruction execution systems, devices or equipment for use. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device, or device. More specific examples (non-exhaustive list) of computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary. The program is processed electronically and stored in computer memory.

It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: a discrete Logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.

Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium. During execution, one or a combination of the steps of the method embodiments is included.

In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limiting the present invention, those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.

Claims

A method for querying multidimensional data based on a hybrid engine, characterized in that it includes:

Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;

Establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;

Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, the The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;

Obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
The hybrid engine-based multidimensional data query method according to claim 1, wherein obtaining query results according to the structured query language and the bitmap tree comprises:

When determining the bitmap tree matching the structured query language, a depth-first search is performed from the root node of the bitmap tree;

When it is determined that the sub-data cube that is accurately hit is found through the depth-first search, the MOLAP engine is used to query the pre-aggregated data, and a query result is generated according to the pre-aggregated data.
The hybrid engine-based multidimensional data query method according to claim 2, wherein obtaining query results according to the structured query language and the bitmap tree also includes:

When it is determined that the sub-data cube of the exact hit is not found through the depth-first search, backtracking to find the sub-data cube of the fuzzy hit;

When it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, the MOLAP engine is used to query the pre-aggregated data, and the query result is generated according to the pre-aggregated data;

When it is determined that the backtracking search does not find a sub-data cube with a fuzzy hit, the ROLAP engine is used to calculate the query result online.
The hybrid engine-based multidimensional data query method according to any one of claims 1 to 3, wherein, after building the data cube spanning tree, further comprising:

The data cube spanning tree is pruned by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, and each aggregation group is used as the root node to materialize the respective child In a data cube, the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
The hybrid engine-based multidimensional data query method according to claim 1, wherein said establishment of a dimension dictionary includes:

Obtain the pregnancy result table, basic information table, and mother's physical examination table, generate a star schema according to the pregnancy result table, basic information table, and mother's physical examination table, and flatten the star schema to obtain the dimension dictionary, wherein , each dimension corresponds to a bit.
A multi-dimensional data query device based on a hybrid engine, characterized in that it comprises:

A building block for building a data cube spanning tree, the data cube spanning tree comprising a plurality of sub-data cubes, each sub-data cube corresponding to a pre-aggregated result of a combination of dimensions;

Establishing a module for establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;

A generating module, configured to construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index consists of bits Composed of an array, the bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;

The query module is used to obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
The multi-dimensional data query device based on hybrid engine according to claim 6, wherein said query module comprises:

The first query unit is configured to perform a depth-first search from the root node of the bitmap tree when determining the bitmap tree matching the structured query language;

The first generation unit is configured to use the MOLAP engine to query the pre-aggregated data and generate a query result according to the pre-aggregated data when it is determined that an accurately hit sub-data cube is found through the depth-first search.
The multi-dimensional data query device based on hybrid engine as claimed in claim 7, wherein said query module further comprises:

The second query unit is used to retroactively search for sub-data cubes with fuzzy hits when it is determined that no sub-data cubes with precise hits have been found through the depth-first search;

The second generating unit is used to use the MOLAP engine to query the pre-aggregated data when it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, and the query result is generated according to the pre-aggregated data;

When it is determined that the backtracking search does not find a sub-data cube with a fuzzy hit, the ROLAP engine is used to calculate the query result online.
The hybrid engine-based multidimensional data query device according to any one of claims 6 to 8, further comprising:

The pruning module is used to prune the data cube generation tree by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, with each aggregation group as the root Nodes start to materialize their respective child data cubes, and the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
A non-transitory computer-readable storage medium on which a computer program is stored, wherein the computer program implements the following steps when executed by a processor:

Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;

Establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;

Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, the The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;

Obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
An electronic device, characterized in that it comprises:

processor;

memory for storing said processor-executable instructions;

Wherein, the processor is configured to execute the instructions to achieve the following steps:

Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;

Establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;

Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, the The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;

Obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
A computer program product, comprising a computer program, characterized in that, when the computer program is executed by a processor, the following steps are implemented:

Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;

Establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;

Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, the The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;

Obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.