WO2023273183A1 - Hybrid engine-based multidimensional data query method and apparatus - Google Patents

Hybrid engine-based multidimensional data query method and apparatus Download PDF

Info

Publication number
WO2023273183A1
WO2023273183A1 PCT/CN2021/137140 CN2021137140W WO2023273183A1 WO 2023273183 A1 WO2023273183 A1 WO 2023273183A1 CN 2021137140 W CN2021137140 W CN 2021137140W WO 2023273183 A1 WO2023273183 A1 WO 2023273183A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
data
bitmap
tree
dimensions
Prior art date
Application number
PCT/CN2021/137140
Other languages
French (fr)
Chinese (zh)
Inventor
鄂海红
宋美娜
田川
Original Assignee
北京邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京邮电大学 filed Critical 北京邮电大学
Publication of WO2023273183A1 publication Critical patent/WO2023273183A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2237Vectors, bitmaps or matrices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2425Iterative querying; Query formulation based on the results of a preceding query
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present invention relates to the technical field of information technology and data business, and in particular to the field of a hybrid engine multi-dimensional data query method.
  • On-Line Analytical Processing is the mainstream technology for multidimensional analysis of big data, which includes traditional relational OLAP (Relational OLAP, ROLAP) based on data warehouse and multidimensional OLAP (Multidimensional OLAP) using precomputing technology.
  • ROLAP Relational OLAP
  • MOLAP multidimensional OLAP
  • ROLAP has developed relatively maturely, but with the explosive growth of data volume, its query time inevitably increases linearly with the data size, and the join operation of multi-table and multi-dimensional will bring greater overhead, so in large In the multi-dimensional data analysis scenario, ROLAP has a serious performance bottleneck.
  • MOLAP uses pre-computing technology to solve this problem.
  • the MOLAP query must be based on the pre-calculated materialized view set (also known as data cube, Cube). Whether it is manual modeling or automatic modeling, the pre-calculation process will last for tens of minutes or even hours. Multidimensional data analysis increases a lot of lead-time waiting time, so MOLAP can play its advantage only in the mixed engine system.
  • ROLAP and MOLAP engines have their own advantages and disadvantages, so they are good at different usage scenarios.
  • a system based on a hybrid engine it is difficult for the system to make a fast and reasonable choice between the two, and a query routing is urgently needed to solve the problem of query engine selection, and to establish a multi-dimensional model index to provide support for query routing.
  • B-tree index Traditional OLAP index technology includes four types of indexes: B-tree index, R-tree index, hash index and bitmap index.
  • the first three are widely used, but they are difficult to meet the requirements of fast query of high-dimensional scientific data, for example: B-tree index and R-tree index are only valid for data sets with dimensions less than 15.
  • the bitmap index can better adapt to the characteristics of high-dimensional data.
  • the present invention aims to solve one of the technical problems in the related art at least to a certain extent.
  • the first purpose of the present invention is to propose a multi-dimensional data query method based on a hybrid engine, so as to realize that the bitmap Bitmap index of Cube can be established in the background, intelligently select a suitable query engine to perform query tasks, and improve the overall system performance.
  • the second object of the present invention is to propose a multi-dimensional data query device based on a hybrid engine.
  • a third object of the present invention is to provide a non-transitory computer-readable storage medium.
  • a fourth object of the present invention is to provide an electronic device.
  • a fifth object of the present invention is to provide a computer program product.
  • the embodiment of the first aspect of the present invention proposes a multi-dimensional data query method based on a hybrid engine, including the following steps:
  • the data cube spanning tree includes a plurality of sub-data cubes, each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;
  • bitmap index Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, the The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
  • Obtain query requirements generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
  • obtaining query results according to the structured query language and the bitmap tree includes:
  • a depth-first search is performed from the root node of the bitmap tree
  • the MOLAP engine is used to query the pre-aggregated data, and a query result is generated according to the pre-aggregated data.
  • obtaining the query result according to the structured query language and the bitmap tree further includes:
  • the MOLAP engine is used to query the pre-aggregated data, and the query result is generated according to the pre-aggregated data;
  • the ROLAP engine is used to calculate the query result online.
  • the data cube spanning tree after building the data cube spanning tree, it also includes:
  • the data cube spanning tree is pruned by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, and each aggregation group is used as the root node to materialize the respective child
  • the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
  • the establishment of a dimension dictionary includes:
  • Obtain the pregnancy result table, basic information table, and mother's physical examination table generate a star schema according to the pregnancy result table, basic information table, and mother's physical examination table, and flatten the star schema to obtain the dimension dictionary, wherein , each dimension corresponds to a bit.
  • the embodiment of the second aspect of the present application proposes a multi-dimensional data query device based on a hybrid engine of the present invention, which includes the following modules:
  • a generating module configured to construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index consists of bits Composed of an array, the bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
  • the query module is used to obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
  • the query module includes:
  • the first query unit is configured to perform a depth-first search from the root node of the bitmap tree when determining the bitmap tree matching the structured query language;
  • the first generation unit is configured to use the MOLAP engine to query the pre-aggregated data and generate a query result according to the pre-aggregated data when it is determined that an accurately hit sub-data cube is found through the depth-first search.
  • the query module further includes:
  • the second query unit is used to retroactively search for sub-data cubes with fuzzy hits when it is determined that no sub-data cubes with precise hits have been found through the depth-first search;
  • the second generating unit is used to use the MOLAP engine to query the pre-aggregated data when it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, and the query result is generated according to the pre-aggregated data;
  • the ROLAP engine is used to calculate the query result online.
  • it also includes:
  • the pruning module is used to prune the data cube generation tree by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, with each aggregation group as the root Nodes start to materialize their respective child data cubes, and the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
  • the Bitmap index of the Cube can be established in the background, and the appropriate query engine can be intelligently selected to perform the query task, and the overall performance of the system can be improved.
  • the embodiment of the third aspect of the present application proposes a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implemented in the embodiment of the first aspect of the application
  • the hybrid engine-based multidimensional data query method described above proposes a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implemented in the embodiment of the first aspect of the application.
  • the embodiment of the fourth aspect of the present application proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions , so as to implement the hybrid engine-based multidimensional data query method described in the embodiment of the first aspect of the present application.
  • the embodiment of the fifth aspect of the present application proposes a computer program product, including a computer program, when the computer program is executed by a processor, it realizes the multi-dimensional data based on the hybrid engine described in the embodiment of the first aspect of the application. Query method.
  • FIG. 1 is a flowchart of a multidimensional data query method based on a hybrid engine according to an embodiment of the present application.
  • Fig. 2 is the schematic diagram of the Cube spanning tree (full amount) of the embodiment of the present application
  • Fig. 3 is the schematic diagram of the Cube spanning tree (optimized) of the embodiment of the present application.
  • FIG. 4 is a schematic diagram of converting a star schema into a dimension dictionary according to an embodiment of the present application
  • Fig. 5 is the schematic diagram that the Cuboid of the embodiment of the present application constructs the Bitmap index
  • Fig. 6 is the schematic diagram of the Bitmap index based on Cube spanning tree of the embodiment of the present application.
  • Fig. 7 is the schematic diagram of 4 kinds of matching situations when the Bitmap retrieval of the embodiment of the present application.
  • FIG. 8 is a schematic diagram of the overall flow of multi-dimensional data query routing based on a hybrid engine according to an embodiment of the present application.
  • FIG. 9 is a system overall architecture diagram of an embodiment of the present application.
  • FIG. 10 is a system technical architecture diagram of an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an apparatus for searching multi-dimensional data based on a hybrid engine according to an embodiment of the present application.
  • the embodiment of the first aspect of the present invention proposes a multi-dimensional data query method based on a hybrid engine, including the following steps:
  • S10 Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, and each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;
  • S30 Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
  • S40 Obtain a query requirement, generate a structured query language according to the query requirement, and obtain a query result according to the structured query language and the bitmap tree.
  • obtaining query results according to the structured query language and the bitmap tree includes:
  • a depth-first search is performed from the root node of the bitmap tree
  • the MOLAP engine is used to query the pre-aggregated data, and a query result is generated according to the pre-aggregated data.
  • obtaining query results according to the structured query language and the bitmap tree further includes:
  • the MOLAP engine is used to query the pre-aggregated data, and the query result is generated according to the pre-aggregated data;
  • the ROLAP engine is used to calculate the query result online.
  • the data cube spanning tree further, after constructing the data cube spanning tree, it also includes:
  • the data cube spanning tree is pruned by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, and each aggregation group is used as the root node to materialize the respective child
  • the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
  • a data cube is actually a collection of a series of materialized views, and each materialized view is a result of a pre-aggregation calculation.
  • a data cube For the Cube and its generation process, we can visualize it as a Cube spanning tree, taking the 4-dimensional Cube as an example, as shown in Figure 2.
  • Figure 2 is a full Cube spanning tree, which shows the process of a Cube being calculated step by step.
  • Each block in the figure represents a pre-aggregated result of a combination of dimensions, which we can call a sub-Cube (or Cuboid) .
  • the solid line arrows indicate the process of aggregation calculation, and the dotted line is the omitted repeated calculation.
  • the vertex Cuboid (ABCD) is calculated first, and it is aggregated through a large wide table (a detail table connected by a fact table and a dimension table), which is the basis for subsequent calculations.
  • the second layer includes 4 Cuboids, which are ABC, ABD, ACD, and BCD. They are all calculated by ABCD.
  • the Cuboids in the remaining layers are aggregated step by step until they finally reach the end of the 1-dimensional leaf node.
  • Figure 2 shows a fully precomputed Cube, which can cover all aggregation query requirements, but when the number of dimensions increases, such a Cube takes up too much space, so tools such as aggregation groups need to be used for pruning.
  • the present invention mainly adopts aggregation group pruning (optimization algorithm is performed automatically in the background), and supports some finer-grained pruning tools at the same time. These pruning tools and their applicable scenarios are shown in the table below.
  • the establishment of a dimension dictionary includes:
  • Obtain the pregnancy result table, basic information table, and mother's physical examination table generate a star schema according to the pregnancy result table, basic information table, and mother's physical examination table, and flatten the star schema to obtain the dimension dictionary, wherein , each dimension corresponds to a bit.
  • a Bitmap index is to be constructed, and a dimension dictionary is also established for it.
  • the dimension dictionary represents the mapping relationship between dimensions and Bitmap bits, which greatly saves the storage space occupied by Bitmap. Let's take the star model composed of three tables as an example, and flatten it to form a dimension dictionary, as shown in Figure 4.
  • the upper part of Figure 4 is a simple star model composed of three tables, including fact table GD_PREGENCY_RESULT (pregnancy result table), dimension table GD_BASIC_INFO_DETAIL (basic information table), and dimension table GD_PYSICAL_EXAM_W (mother's physical examination table).
  • the blue fields in the figure represent dimensions, and the purple fields represent measures.
  • the dimension dictionary shown in the lower part is obtained. Each dimension corresponds to a specific bits.
  • Bitmap index is constructed:
  • the Bitmap index is composed of bit arrays. Taking the 8-dimensional array in Figure 3 as an example, each bit represents a dimension, and the bit in the array represents whether the dimension is pre-calculated. For example, bit [3] is 1. Dimension GD_BASIC_INFO_DETAIL.FEDU_LEVEL is precomputed in this Cuboid, 0 is the opposite.
  • the Bitmap index has significant advantages: (1) In the case of high dimensions, the Bitmap index only needs less memory usage, which means that the system can load more Bitmap indexes into the memory when retrieving the index Among them, the speed of index retrieval is greatly improved; (2) Bitmap index can perform logical operations, such as bitwise logical AND (AND), or (OR), exclusive OR (XOR), etc., eliminating the need for complex format conversion and speed of operation quick.
  • logical operations such as bitwise logical AND (AND), or (OR), exclusive OR (XOR), etc.
  • FIG. 5 shows an example of Cuboid's Bitmap index.
  • Each Cuboid is a bit array, representing which dimensions have been pre-calculated.
  • the Cube spanning tree is a tree structure composed of multiple Cuboids.
  • Bitmap tree a Cube spanning tree-based Bitmap index
  • Figure 6 Take the dimension data model as an example, the Bitmap tree after pruning optimization is shown in Figure 6.
  • the basic structure of the Bitmap index has been formed, which can provide index support for MOLAP. It is worth noting that the metric is not in the index structure of the Bitmap tree. Each metric corresponds to an index tree. For example, the metric AVG(BIRTH_NUM), all the Cuboids contained in the Bitmap tree under its name are for the AVG(BIRTH_NUM ) for precomputation of different dimension combinations.
  • the first step the logical operation of the Bitmap index:
  • the Cuboid in the Bitmap index is an independent bit array, which represents a combination of dimensions that have been precalculated, in the form of 01001100, and will not repeat each other, referred to as Cid.
  • the Bitmap index is inseparable from the operation on Cid.
  • Logical operations can be performed on Cid and Ctarget. This kind of binary logic operation does not require complex format conversion, and can obtain better computer hardware support, and the operation speed is faster. The following will briefly introduce three bitwise logical operation operations, and the retrieval of the Bitmap index can be realized through the combination of such simple operations.
  • the second step is to retrieve the Bitmap index:
  • Retrieving the Bitmap index is essentially to find a Cuboid that meets the query requirements in the Cube spanning tree.
  • exact hits can bring near-perfect performance to queries, it is not advisable to pursue exact hits too much.
  • it will lead to the rapid expansion of the Cube, which will increase the burden of storage and calculation.
  • the spanning tree of the Cube is progressive layer by layer, if the child Cuboid cannot be hit, it can find its parent Cuboid and calculate it by its online aggregation. Get the query result.
  • this kind of online aggregation calculation “reckoning”
  • fuzzy hit the corresponding situation where the query is completed by reckoning
  • the calculation performance in the case of fuzzy hits is affected by many factors, such as the size of the data set, the size of the result set, the layer difference between the calculation start point and the target node, and the cardinality of the fields involved in the calculation.
  • the hybrid engine system in this paper designs a "precise-fuzzy hit model", which adopts precise hits and fuzzy hits in which the layer difference between the calculated starting point and the target node is 1 as the query hit condition.
  • the Bitmap index will be retrieved to find the Cuboid that meets the query requirements.
  • Cid The currently retrieved Cuboid is recorded as Cid, and the combination of target dimensions is Ctarget:
  • the third step is to query the overall process of routing:
  • Cuboid's search is a depth-first search (Depth First Search, DFS), starting from the vertex of the Cuboid tree, and continuing to search for child nodes if the case (3) is met, until the end of the case (1) (accurate hit). If the precise hit cannot be met, then go back to the parent node to check whether the condition of the fuzzy hit is met (2). If neither hit exists, search to case (4) and exit DFS.
  • this application sorts out the overall process of multi-dimensional data query routing based on hybrid engine, as shown in Figure 8.
  • the system in this paper first organizes it into a unified SQL at the front end and submits it to the back-end query. After parsing the logical syntax tree of the SQL in the background, first match the metrics and operators, and only enter the Bitmap search for dimension combinations when the qualified Bitmap tree exists. child, metrics not precomputed, etc.) are pushed directly to the ROLAP engine.
  • the Cuboid with the exact hit is searched first, and if it does not exist, the Cuboid with the fuzzy hit is searched backwards.
  • the entire Bitmap retrieval process uses logical operations. If the target Cuboid is found, the MOLAP engine can be used to directly query the pre-aggregated data, otherwise the ROLAP engine can be used for online calculation. Finally, the query result is returned, ending the entire process of query routing.
  • the present invention proposes and implements a multi-dimensional data analysis system based on Bitmap index-based hybrid engine query routing.
  • This system comprehensively utilizes the advantages of ROLAP and MOLAP two query engines to provide multi-dimensional data analysis services in flexible scenarios
  • MOLAP pre-computing technology can be well implemented in the system, relieve the dependence on data experts, and finally generate rich visualization with query results Charts and cockpits that allow users to mine information from big data at a glance.
  • the embodiment of the second aspect of the present application proposes a multi-dimensional data query device based on a hybrid engine of the present invention, including the following modules:
  • Construction module 10 is used for constructing data cube spanning tree, and described data cube spanning tree comprises a plurality of sub-data cubes, and each sub-data cube corresponds to the pre-aggregation result of a combination of dimensions;
  • Establishment module 20 is used for establishing dimension dictionary, and described dimension dictionary is used for representing the mapping relation of dimension and bit;
  • a generating module 30 configured to construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index consists of Composed of a bit array, the bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
  • the query module 40 is configured to obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
  • the query module 40 includes:
  • the first query unit is configured to perform a depth-first search from the root node of the bitmap tree when determining the bitmap tree matching the structured query language;
  • the first generation unit is configured to use the MOLAP engine to query the pre-aggregated data and generate a query result according to the pre-aggregated data when it is determined that an accurately hit sub-data cube is found through the depth-first search.
  • the query module 40 further includes:
  • the second query unit is used to retroactively search for sub-data cubes with fuzzy hits when it is determined that no sub-data cubes with precise hits have been found through the depth-first search;
  • the second generating unit is used to use the MOLAP engine to query the pre-aggregated data when it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, and the query result is generated according to the pre-aggregated data;
  • the ROLAP engine is used to calculate the query result online.
  • it further includes:
  • the pruning module is used to prune the data cube generation tree by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, with each aggregation group as the root Nodes start to materialize their respective child data cubes, and the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
  • the multidimensional data analysis system based on Bitmap indexed hybrid engine query routing:
  • the first step the overall structure of the system:
  • the logic of the overall system architecture starts from the data source, and the original data is read from multiple sources such as HDFS, Hive, and RDBMS.
  • the Cube is built in the background by the automatic modeling module, and the information required for designing and building the Cube is mined from the historically executed SQL, which is completed by the SQL parsing module. Cube needs model optimization during initial construction and use, and the MMO algorithm related to the present invention provides support for optimization. After the construction and optimization are completed, the Cube is stored in the HBase of the storage module, and other metadata, historical SQL and other information are stored in Mysql.
  • the hybrid engine-based multi-dimensional query method of the present invention is used when performing query tasks, and the query routing module routes it, parses out the logic syntax tree, and retrieves the Bitmap to intelligently obtain the most suitable query engine.
  • the final query results are rendered by the multidimensional analysis and visualization module into rich charts and dashboards for users to view.
  • the system is divided into five layers: data layer, data model layer, data calculation layer, business logic layer and page display layer.
  • the bottom data layer is the original data.
  • the data calculation layer executes the query tasks.
  • the page display layer invokes the API of the business logic layer to present the visualized results to the user.
  • This layer is the storage of original data, supporting multi-source data such as Hadoop's HDFS, Mysql, and Hive.
  • HDFS and Hive are used as big data warehouses for analysis, while Mysql stores more data tables and metadata required by system functions.
  • Data model layer This layer includes three modules: automatic modeling, model optimization, and task scheduling.
  • Automatic modeling uses SQL Parser to analyze historical SQL collections, extracts metadata, and combines them into a star model.
  • the Fast Cube algorithm is used to materialize the aggregated data into Cubes, and the pre-aggregated calculations are performed by the Spark engine.
  • Model optimization uses MMO algorithm and Aggregation Group related to the present invention.
  • Task scheduling is performed using the Quartz scheduler and Linux Crontab, which is responsible for performing optimization and materialized calculations based on optimization strategies.
  • Data calculation layer This layer executes multidimensional data query tasks, and is divided into two modules: calculation engine and multidimensional index.
  • the calculation engine is a hybrid engine, including Spark SQL and Kylin.
  • the query routing is based on the data pre-calculated by the system, retrieves the multi-dimensional index and selects the appropriate query engine to perform the task.
  • the multidimensional index uses the technology of Cube spanning tree and Bitmap index.
  • This layer is the middle layer between front-end requests and back-end query tasks, and encapsulates lower-layer services into functions for upper-layer calls.
  • the overall architecture of the system is written using Spring Boot, and the communication between the front-end and the back-end and the communication between the back-end functional modules is carried out in the form of RESTful API, and the information returned from the back-end to the front-end is organized in JSON format.
  • Page display layer responsible for displaying query results to users, the front end is written with Vue framework, and E-charts are used to generate various chart components.
  • the third step is to query the routing module:
  • This module is to perform multi-dimensional data query tasks. Based on the hybrid engine, combined with Bitmap index retrieval, it can choose between Spark SQL and Kylin to achieve optimal query performance.
  • the overall process of this module is shown in Figure 7. According to these functional processes, the interfaces required by this module can be summarized, as shown in the following table:
  • the query is the main interface of this module, and its execution logic is transparent to users.
  • the query interface calls several services: the multi-dimensional query route (queryRoute) is used to retrieve the Bitmap index, select the query engine (use SQLParse to parse the logical syntax tree, and then perform the query task matching by BitmapIndex), and the query execution service of SparkSQL (sparkSql) is used to Execute the push-down query task, and Kylin's query execution service (kylinQuery) is used to execute the pre-calculated result query when hitting the Cube.
  • the called services are encapsulated into interfaces at the same time, which can be debugged and tested by calling an engine separately.
  • the Bitmap index of the Cube can be established in the background, and the appropriate query engine can be intelligently selected to perform the query task, and the overall performance of the system can be improved.
  • the present invention also proposes a non-transitory computer-readable storage medium, the computer stores a computer program, and when the computer program is executed by a processor, the multi-dimensional data based on the hybrid engine of the embodiment of the present application is realized. Query method.
  • the present invention also proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement the present invention
  • the multi-dimensional data query method based on the hybrid engine of the embodiment of the application.
  • the present invention further proposes a computer program product, including a computer program, and when the computer program is executed by a processor, the method for querying multi-dimensional data based on a hybrid engine in the embodiment of the present application is implemented.
  • first and second are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features.
  • the features defined as “first” and “second” may explicitly or implicitly include at least one of these features.
  • “plurality” means at least two, such as two, three, etc., unless specifically defined otherwise.
  • a "computer-readable medium” may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device, or device.
  • computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary.
  • the program is processed electronically and stored in computer memory.
  • each functional unit in each embodiment of the present invention may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module.
  • the above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
  • the storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A hybrid engine-based multidimensional data query method and apparatus. The method comprises: constructing a data cube spanning tree, the data cube spanning tree comprising a plurality of sub-data cubes, and each sub-data cube corresponding to a pre-aggregation result of a dimension combination (S10); establishing a dimension dictionary, the dimension dictionary being used for representing mapping relationships between dimensions and bits (S20); constructing a bitmap index on the basis of the dimension dictionary, and introducing the bitmap index into the cube spanning tree to obtain a bitmap index-based bitmap tree, the bitmap index being composed of a bit array, and a bit value in the bit array representing whether the dimension corresponding to the bit is pre-computed (S30); and obtaining a query requirement, generating a structured query language according to the query requirement, and obtaining a query result according to the structured query language and the bitmap tree (S40).

Description

基于混合引擎的多维数据查询方法及装置Multidimensional data query method and device based on hybrid engine
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202110733535.X、申请日为2021年06月30日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202110733535.X and a filing date of June 30, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本发明涉及信息技术及数据业务领域技术领域,尤其涉及一种混合引擎的多维数据查询方法领域。The present invention relates to the technical field of information technology and data business, and in particular to the field of a hybrid engine multi-dimensional data query method.
背景技术Background technique
在大数据时代,拥有数据就意味着拥有了价值,然而大数据价值的挖掘并不是一件容易的事,多维数据分析就是其中的一项重要手段。大数据的多维分析是要从多个角度、多个侧面观察和挖掘海量数据,辅以上卷、下钻、切片、切块、旋转等各种分析操作,经过专业的整合与分析,最后输出可视化数据或图表,帮助分析师和企业用户深入了解包含在数据中的信息和内涵。In the era of big data, owning data means having value. However, mining the value of big data is not an easy task, and multidimensional data analysis is one of the important means. The multi-dimensional analysis of big data is to observe and mine massive data from multiple angles and sides, supplemented by various analysis operations such as scrolling, drilling down, slicing, dicing, and rotating. After professional integration and analysis, the final output is visualized Data or charts that help analysts and business users gain insight into the information and meaning contained in the data.
联机分析处理技术(On-Line Analytical Processing,OLAP)是进行大数据多维分析的主流技术,它包括基于数据仓库的传统关系型OLAP(Relational OLAP,ROLAP)和采用预计算技术的多维OLAP(Multidimensional OLAP,MOLAP)。其中ROLAP已经发展的相对成熟,但随着数据量的爆发式增长,其查询时间不可避免地随着数据规模线性增加,多表多维度的联接操作又会带来更大的开销,所以在大数据的多维分析场景ROLAP出现了严重的性能瓶颈。MOLAP采用预计算技术解决了这一问题,经过预计算,查询时只需扫描物化视图即可得出结果,避免了对规模不断增长的原始记录的扫描。然而,MOLAP的查询必须基于已经预计算完毕的物化视图集(又称数据立方体,Cube),无论是手动建模还是自动建模,预计算的过程都会持续几十分钟甚至数小时之久,为多维数据分析增加了大量的前置等待时间,因此只有在混合引擎的系统中MOLAP才能发挥它的优势。On-Line Analytical Processing (OLAP) is the mainstream technology for multidimensional analysis of big data, which includes traditional relational OLAP (Relational OLAP, ROLAP) based on data warehouse and multidimensional OLAP (Multidimensional OLAP) using precomputing technology. ,MOLAP). Among them, ROLAP has developed relatively maturely, but with the explosive growth of data volume, its query time inevitably increases linearly with the data size, and the join operation of multi-table and multi-dimensional will bring greater overhead, so in large In the multi-dimensional data analysis scenario, ROLAP has a serious performance bottleneck. MOLAP uses pre-computing technology to solve this problem. After pre-computing, it only needs to scan the materialized view to get the result during the query, avoiding the scanning of the original records with increasing scale. However, the MOLAP query must be based on the pre-calculated materialized view set (also known as data cube, Cube). Whether it is manual modeling or automatic modeling, the pre-calculation process will last for tens of minutes or even hours. Multidimensional data analysis increases a lot of lead-time waiting time, so MOLAP can play its advantage only in the mixed engine system.
总的来说,ROLAP和MOLAP引擎各有其优点与不足,故它们所擅长的使用场景不同。在基于混合引擎的系统中,系统难以在二者之间做出快速、合理的选择,亟需一种查询路由解决查询引擎的选择问题,并建立多维模型索引对查询路由提供支持。In general, ROLAP and MOLAP engines have their own advantages and disadvantages, so they are good at different usage scenarios. In a system based on a hybrid engine, it is difficult for the system to make a fast and reasonable choice between the two, and a query routing is urgently needed to solve the problem of query engine selection, and to establish a multi-dimensional model index to provide support for query routing.
传统的OLAP索引技术包括四种类型的索引:B-tree索引,R-tree索引,哈希索引和位图索引。前三个应用较为广泛,但是它们难以满足对高维科学数据进行快速查询的要求,例如:B-tree索引和R-tree索引仅对维数小于15的数据集有效。而位图索引则可以较好地适应高维度数据的特点。Traditional OLAP index technology includes four types of indexes: B-tree index, R-tree index, hash index and bitmap index. The first three are widely used, but they are difficult to meet the requirements of fast query of high-dimensional scientific data, for example: B-tree index and R-tree index are only valid for data sets with dimensions less than 15. The bitmap index can better adapt to the characteristics of high-dimensional data.
发明内容Contents of the invention
本发明旨在至少在一定程度上解决相关技术中的技术问题之一。The present invention aims to solve one of the technical problems in the related art at least to a certain extent.
为此,本发明的第一个目的在于提出一种基于混合引擎的多维数据查询方法,以实现可以在后台建立Cube的位图Bitmap索引,智能地选择合适的查询引擎执行查询任务,提升系统整体性能。For this reason, the first purpose of the present invention is to propose a multi-dimensional data query method based on a hybrid engine, so as to realize that the bitmap Bitmap index of Cube can be established in the background, intelligently select a suitable query engine to perform query tasks, and improve the overall system performance.
本发明的第二个目的在于提出一种基于混合引擎的多维数据查询装置。The second object of the present invention is to propose a multi-dimensional data query device based on a hybrid engine.
本发明的第三个目的在于提出一种非临时性计算机可读存储介质。A third object of the present invention is to provide a non-transitory computer-readable storage medium.
本发明的第四个目的在于提出一种电子设备。A fourth object of the present invention is to provide an electronic device.
本发明的第五个目的在于提出一种计算机程序产品。A fifth object of the present invention is to provide a computer program product.
为达上述目的,本发明第一方面实施例提出了一种基于混合引擎的多维数据查询方法,包括以下步骤:In order to achieve the above purpose, the embodiment of the first aspect of the present invention proposes a multi-dimensional data query method based on a hybrid engine, including the following steps:
构建数据立方体生成树,所述数据立方体生成树包括多个子数据立方体,每个子数据立方体对应一种维度组合的预聚合结果;Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;
建立维度字典,所述维度字典用于表征维度与比特位的映射关系;Establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;
根据所述维度字典构建位图索引,并将所述位图索引套入所述立方体生成树中,得到基于位图索引的位图树,其中,所述位图索引由比特数组组成,所述比特数组中的比特值表征比特位对应的维度是否被预计算;Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, the The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
获取查询需求,根据所述查询需求生成结构化查询语言,并根据所述结构化查询语言和所述位图树获取查询结果。Obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
可选地,在本申请的一个实施例中,根据所述结构化查询语言和所述位图树获取查询结果,包括:Optionally, in an embodiment of the present application, obtaining query results according to the structured query language and the bitmap tree includes:
在确定与所述结构化查询语言匹配的位图树时,从位图树的根节点开始进行深度优先搜索;When determining the bitmap tree matching the structured query language, a depth-first search is performed from the root node of the bitmap tree;
在确定通过所述深度优先搜索查找到精确命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果。When it is determined that the sub-data cube that is accurately hit is found through the depth-first search, the MOLAP engine is used to query the pre-aggregated data, and a query result is generated according to the pre-aggregated data.
可选地,在本申请的一个实施例中,根据所述结构化查询语言和所述位图树获取查询结果,还包括:Optionally, in an embodiment of the present application, obtaining the query result according to the structured query language and the bitmap tree further includes:
在确定通过所述深度优先搜索未查找到精确命中的子数据立方体时,回溯查找模糊命中的子数据立方体;When it is determined that the sub-data cube of the exact hit is not found through the depth-first search, backtracking to find the sub-data cube of the fuzzy hit;
在确定回溯查找到模糊命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果;When it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, the MOLAP engine is used to query the pre-aggregated data, and the query result is generated according to the pre-aggregated data;
在确定回溯查找未找到模糊命中的子数据立方体时,使用ROLAP引擎在线计算查询结果。When it is determined that the backtracking search does not find a sub-data cube with a fuzzy hit, the ROLAP engine is used to calculate the query result online.
可选地,在本申请的一个实施例中,在构建数据立方体生成树之后,还包括:Optionally, in one embodiment of the present application, after building the data cube spanning tree, it also includes:
采用聚合组剪枝的方式对所述数据立方体生成树进行剪枝处理,其中,根据维度间的关联程度,将维度划分成多个聚合组,以每个聚合组为根节点开始物化各自的子数据立方体,所述维度包括强制维度、层级维度和联合维度中的一种或者多种。The data cube spanning tree is pruned by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, and each aggregation group is used as the root node to materialize the respective child In a data cube, the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
可选地,在本申请的一个实施例中,所述建立维度字典,包括:Optionally, in one embodiment of the present application, the establishment of a dimension dictionary includes:
获取妊娠结果表、基础信息表和母亲体检表,根据所述妊娠结果表、基础信息表和母亲体检表生成星型模型,对所述星型模型进行扁平化处理以得到所述维度字典,其中,每个维度对应一个比特位。Obtain the pregnancy result table, basic information table, and mother's physical examination table, generate a star schema according to the pregnancy result table, basic information table, and mother's physical examination table, and flatten the star schema to obtain the dimension dictionary, wherein , each dimension corresponds to a bit.
为达上述目的,本申请第二方面实施例提出本发明一种基于混合引擎的多维数据查询装置,包括以下模块:In order to achieve the above purpose, the embodiment of the second aspect of the present application proposes a multi-dimensional data query device based on a hybrid engine of the present invention, which includes the following modules:
构建模块,用于构建数据立方体生成树,所述数据立方体生成树包括多个子数据立方体,每个子数据立方体对应一种维度组合的预聚合结果;A building block for building a data cube spanning tree, the data cube spanning tree comprising a plurality of sub-data cubes, each sub-data cube corresponding to a pre-aggregated result of a combination of dimensions;
建立模块,用于建立维度字典,所述维度字典用于表征维度与比特位的映射关系;Establishing a module for establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;
生成模块,用于根据所述维度字典构建位图索引,并将所述位图索引套入所述立方体生成树中,得到基于位图索引的位图树,其中,所述位图索引由比特数组组成,所述比特数组中的比特值表征比特位对应的维度是否被预计算;A generating module, configured to construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index consists of bits Composed of an array, the bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
查询模块,用于获取查询需求,根据所述查询需求生成结构化查询语言,并根据所述结构化查询语言和所述位图树获取查询结果。The query module is used to obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
可选地,在本申请的一个实施例中,所述查询模块,包括:Optionally, in one embodiment of the present application, the query module includes:
第一查询单元,用于在确定与所述结构化查询语言匹配的位图树时,从位图树的根节点开始进行深度优先搜索;The first query unit is configured to perform a depth-first search from the root node of the bitmap tree when determining the bitmap tree matching the structured query language;
第一生成单元,用于在确定通过所述深度优先搜索查找到精确命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果。The first generation unit is configured to use the MOLAP engine to query the pre-aggregated data and generate a query result according to the pre-aggregated data when it is determined that an accurately hit sub-data cube is found through the depth-first search.
可选地,在本申请的一个实施例中,所述查询模块,还包括:Optionally, in one embodiment of the present application, the query module further includes:
第二查询单元,用于在确定通过所述深度优先搜索未查找到精确命中的子数据立方体时,回溯查找模糊命中的子数据立方体;The second query unit is used to retroactively search for sub-data cubes with fuzzy hits when it is determined that no sub-data cubes with precise hits have been found through the depth-first search;
第二生成单元,用于在确定回溯查找到模糊命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果;The second generating unit is used to use the MOLAP engine to query the pre-aggregated data when it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, and the query result is generated according to the pre-aggregated data;
在确定回溯查找未找到模糊命中的子数据立方体时,使用ROLAP引擎在线计算查询结果。When it is determined that the backtracking search does not find a sub-data cube with a fuzzy hit, the ROLAP engine is used to calculate the query result online.
可选地,在本申请的一个实施例中,还包括:Optionally, in an embodiment of the present application, it also includes:
剪枝模块,用于采用聚合组剪枝的方式对所述数据立方体生成树进行剪枝处理,其中,根据维度间的关联程度,将维度划分成多个聚合组,以每个聚合组为根节点开始物化各自的子数据立方体,所述维度包括强制维度、层级维度和联合维度中的一种或者多种。The pruning module is used to prune the data cube generation tree by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, with each aggregation group as the root Nodes start to materialize their respective child data cubes, and the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
本申请的技术效果:可以在后台建立Cube的位图Bitmap索引,智能地选择合适的查询引擎执行查询任务,提升系统整体性能。The technical effect of the present application: the Bitmap index of the Cube can be established in the background, and the appropriate query engine can be intelligently selected to perform the query task, and the overall performance of the system can be improved.
为达上述目的,本申请第三方面实施例提出了一种非临时性计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现本申请第一方面实施例所述的基于混合引擎的多维数据查询方法。In order to achieve the above purpose, the embodiment of the third aspect of the present application proposes a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implemented in the embodiment of the first aspect of the application The hybrid engine-based multidimensional data query method described above.
为达上述目的,本申请第四方面实施例提出了一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以实现本申请第一方面实施例所述的基于混合引擎的多维数据查询方法。To achieve the above purpose, the embodiment of the fourth aspect of the present application proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions , so as to implement the hybrid engine-based multidimensional data query method described in the embodiment of the first aspect of the present application.
为达上述目的,本申请第五方面实施例提出了一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现本申请第一方面实施例所述的基于混合引擎的多维数据查询方法。To achieve the above purpose, the embodiment of the fifth aspect of the present application proposes a computer program product, including a computer program, when the computer program is executed by a processor, it realizes the multi-dimensional data based on the hybrid engine described in the embodiment of the first aspect of the application. Query method.
本发明附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
附图说明Description of drawings
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:The above and/or additional aspects and advantages of the present application will become apparent and easy to understand from the following description of the embodiments in conjunction with the accompanying drawings, wherein:
图1为本申请实施例的基于混合引擎的多维数据查询方法的流程图。FIG. 1 is a flowchart of a multidimensional data query method based on a hybrid engine according to an embodiment of the present application.
图2是本申请实施例的Cube生成树(全量)的示意图;Fig. 2 is the schematic diagram of the Cube spanning tree (full amount) of the embodiment of the present application;
图3是本申请实施例的Cube生成树(优化后)的示意图;Fig. 3 is the schematic diagram of the Cube spanning tree (optimized) of the embodiment of the present application;
图4是本申请实施例的由星型模型转化为维度字典的示意图;FIG. 4 is a schematic diagram of converting a star schema into a dimension dictionary according to an embodiment of the present application;
图5是本申请实施例的Cuboid构建Bitmap索引的示意图;Fig. 5 is the schematic diagram that the Cuboid of the embodiment of the present application constructs the Bitmap index;
图6是本申请实施例的基于Cube生成树的Bitmap索引的示意图;Fig. 6 is the schematic diagram of the Bitmap index based on Cube spanning tree of the embodiment of the present application;
图7是本申请实施例的Bitmap检索时的4种匹配情况的示意图;Fig. 7 is the schematic diagram of 4 kinds of matching situations when the Bitmap retrieval of the embodiment of the present application;
图8是本申请实施例的基于混合引擎的多维数据查询路由整体流程的示意图;FIG. 8 is a schematic diagram of the overall flow of multi-dimensional data query routing based on a hybrid engine according to an embodiment of the present application;
图9是本申请实施例的系统整体架构图;FIG. 9 is a system overall architecture diagram of an embodiment of the present application;
图10是本申请实施例的系统技术架构图;FIG. 10 is a system technical architecture diagram of an embodiment of the present application;
图11为本申请实施例的基于混合引擎的多维数据查装置的结构示意图。FIG. 11 is a schematic structural diagram of an apparatus for searching multi-dimensional data based on a hybrid engine according to an embodiment of the present application.
具体实施方式detailed description
下面详细描述本发明的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,旨在用于解释本发明,而不能理解为对本发明的限制。Embodiments of the present invention are described in detail below, examples of which are shown in the drawings, wherein the same or similar reference numerals designate the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the figures are exemplary and are intended to explain the present invention and should not be construed as limiting the present invention.
下面参考附图描述本发明实施例的基于混合引擎的多维数据查询方法。The hybrid engine-based multidimensional data query method of the embodiment of the present invention will be described below with reference to the accompanying drawings.
如图1所示,为达上述目的,本发明第一方面实施例提出了一种基于混合引擎的多维数据查询方法,包括以下步骤:As shown in Figure 1, in order to achieve the above purpose, the embodiment of the first aspect of the present invention proposes a multi-dimensional data query method based on a hybrid engine, including the following steps:
S10:构建数据立方体生成树,所述数据立方体生成树包括多个子数据立方体,每个子数据立方体对应一种维度组合的预聚合结果;S10: Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, and each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;
S20:建立维度字典,所述维度字典用于表征维度与比特位的映射关系;S20: Establish a dimension dictionary, the dimension dictionary is used to represent the mapping relationship between dimensions and bits;
S30:根据所述维度字典构建位图索引,并将所述位图索引套入所述立方体生成树中,得到基于位图索引的位图树,其中,所述位图索引由比特数组组成,所述比特数组 中的比特值表征比特位对应的维度是否被预计算;S30: Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
S40:获取查询需求,根据所述查询需求生成结构化查询语言,并根据所述结构化查询语言和所述位图树获取查询结果。S40: Obtain a query requirement, generate a structured query language according to the query requirement, and obtain a query result according to the structured query language and the bitmap tree.
在本申请的一个实施例中,进一步地,根据所述结构化查询语言和所述位图树获取查询结果,包括:In an embodiment of the present application, further, obtaining query results according to the structured query language and the bitmap tree includes:
在确定与所述结构化查询语言匹配的位图树时,从位图树的根节点开始进行深度优先搜索;When determining the bitmap tree matching the structured query language, a depth-first search is performed from the root node of the bitmap tree;
在确定通过所述深度优先搜索查找到精确命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果。When it is determined that the sub-data cube that is accurately hit is found through the depth-first search, the MOLAP engine is used to query the pre-aggregated data, and a query result is generated according to the pre-aggregated data.
在本申请的一个实施例中,进一步地,根据所述结构化查询语言和所述位图树获取查询结果,还包括:In an embodiment of the present application, further, obtaining query results according to the structured query language and the bitmap tree further includes:
在确定通过所述深度优先搜索未查找到精确命中的子数据立方体时,回溯查找模糊命中的子数据立方体;When it is determined that the sub-data cube of the exact hit is not found through the depth-first search, backtracking to find the sub-data cube of the fuzzy hit;
在确定回溯查找到模糊命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果;When it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, the MOLAP engine is used to query the pre-aggregated data, and the query result is generated according to the pre-aggregated data;
在确定回溯查找未找到模糊命中的子数据立方体时,使用ROLAP引擎在线计算查询结果。When it is determined that the backtracking search does not find a sub-data cube with a fuzzy hit, the ROLAP engine is used to calculate the query result online.
在本申请的一个实施例中,进一步地,在构建数据立方体生成树之后,还包括:In one embodiment of the present application, further, after constructing the data cube spanning tree, it also includes:
采用聚合组剪枝的方式对所述数据立方体生成树进行剪枝处理,其中,根据维度间的关联程度,将维度划分成多个聚合组,以每个聚合组为根节点开始物化各自的子数据立方体,所述维度包括强制维度、层级维度和联合维度中的一种或者多种。The data cube spanning tree is pruned by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, and each aggregation group is used as the root node to materialize the respective child In a data cube, the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
在本申请的一个实施例中,进一步地,具体而言,数据立方体(Cube)事实上是一系列物化视图的集合,每一个物化视图都是一种预聚合计算的结果。对于Cube以及它的的生成过程,我们可以把它形象地画成一个Cube生成树,以4维的Cube为例,如图2所示。In an embodiment of the present application, further specifically, a data cube (Cube) is actually a collection of a series of materialized views, and each materialized view is a result of a pre-aggregation calculation. For the Cube and its generation process, we can visualize it as a Cube spanning tree, taking the 4-dimensional Cube as an example, as shown in Figure 2.
图2是一个全量的Cube生成树,它展示了一个Cube被逐级计算出来的过程,图中每一个方块代表了一种维度组合的预聚合结果,我们可以称之为子Cube(或Cuboid)。实线箭头表示聚合计算的过程,虚线则是被省略的重复计算。Figure 2 is a full Cube spanning tree, which shows the process of a Cube being calculated step by step. Each block in the figure represents a pre-aggregated result of a combination of dimensions, which we can call a sub-Cube (or Cuboid) . The solid line arrows indicate the process of aggregation calculation, and the dotted line is the omitted repeated calculation.
可以看到,顶点Cuboid(ABCD)是最先被计算出来的,它是经由大宽表(事实表、维度表连接成的明细表)聚合出的,是后续计算的基础。第二层包括4个Cuboid,分别是ABC、ABD、ACD、BCD,它们都是由ABCD计算得到,同理,其余层的Cuboid逐级聚合直至最终达到1维的叶子节点结束。It can be seen that the vertex Cuboid (ABCD) is calculated first, and it is aggregated through a large wide table (a detail table connected by a fact table and a dimension table), which is the basis for subsequent calculations. The second layer includes 4 Cuboids, which are ABC, ABD, ACD, and BCD. They are all calculated by ABCD. Similarly, the Cuboids in the remaining layers are aggregated step by step until they finally reach the end of the 1-dimensional leaf node.
图2展示的是一个全量预计算的Cube,它能涵盖所有的聚合查询需求,但当维度数量增加时,这样的Cube占用空间太大,所以需要用聚合组等工具进行剪枝。本发明主要采用聚合组剪枝(在后台自动地进行优化算法),同时支持一些更加细粒度的剪枝工具。这些剪枝工具及其适用场景如下表所示。Figure 2 shows a fully precomputed Cube, which can cover all aggregation query requirements, but when the number of dimensions increases, such a Cube takes up too much space, so tools such as aggregation groups need to be used for pruning. The present invention mainly adopts aggregation group pruning (optimization algorithm is performed automatically in the background), and supports some finer-grained pruning tools at the same time. These pruning tools and their applicable scenarios are shown in the table below.
Figure PCTCN2021137140-appb-000001
Figure PCTCN2021137140-appb-000001
经过剪枝后,Cube生成树的规模被进一步地缩减,如图3所示。这种生成树更贴合实际应用情况下的Cube模型,这种树状结构则为本文构建Bitmap索引所采用的基本数据结构。After pruning, the scale of the Cube spanning tree is further reduced, as shown in Figure 3. This kind of spanning tree is more suitable for the Cube model in practical applications, and this kind of tree structure is the basic data structure used to build the Bitmap index in this paper.
在本申请的一个实施例中,进一步地,所述建立维度字典,包括:In one embodiment of the present application, further, the establishment of a dimension dictionary includes:
获取妊娠结果表、基础信息表和母亲体检表,根据所述妊娠结果表、基础信息表和母亲体检表生成星型模型,对所述星型模型进行扁平化处理以得到所述维度字典,其中,每个维度对应一个比特位。Obtain the pregnancy result table, basic information table, and mother's physical examination table, generate a star schema according to the pregnancy result table, basic information table, and mother's physical examination table, and flatten the star schema to obtain the dimension dictionary, wherein , each dimension corresponds to a bit.
在本申请的一个实施例中,进一步地,具体而言,确定了整体的数据结构以后,要构建Bitmap索引,还要为其建立维度字典。维度字典代表了维度与Bitmap位的映射关系,大大节省了Bitmap占用的存储空间。我们以三张表组成的星型模型为例,将其扁平化形成维度字典,如图4所示。In an embodiment of the present application, further, specifically, after the overall data structure is determined, a Bitmap index is to be constructed, and a dimension dictionary is also established for it. The dimension dictionary represents the mapping relationship between dimensions and Bitmap bits, which greatly saves the storage space occupied by Bitmap. Let's take the star model composed of three tables as an example, and flatten it to form a dimension dictionary, as shown in Figure 4.
图4中上半部分是由三张表组成的简单星型模型,包括事实表GD_PREGENCY_RESULT(妊娠结果表)、维表GD_BASIC_INFO_DETAIL(基础信息表)、维表GD_PYSICAL_EXAM_W(母亲体检表)。图中蓝色的字段代表维度(dimensions),紫色的字段代表度量(measures),将星型模型中的维度部分扁平化处理,就得到了下半部分所示的维度字典,每个维度对应一个特定的比特位。The upper part of Figure 4 is a simple star model composed of three tables, including fact table GD_PREGENCY_RESULT (pregnancy result table), dimension table GD_BASIC_INFO_DETAIL (basic information table), and dimension table GD_PYSICAL_EXAM_W (mother's physical examination table). The blue fields in the figure represent dimensions, and the purple fields represent measures. After flattening the dimension part in the star model, the dimension dictionary shown in the lower part is obtained. Each dimension corresponds to a specific bits.
在本申请的一个实施例中,进一步地,具体而言,构建Bitmap索引:In one embodiment of the present application, further, specifically, a Bitmap index is constructed:
Bitmap索引由bit数组组成,以图3中的8维数组为例,每一个bit位代表了一个维度,数组中的bit代表了该维度是否被预计算,如第[3]位为1代表了维度GD_BASIC_INFO_DETAIL.FEDU_LEVEL在该Cuboid中被预计算,0则相反。在MOLAP分析场景中,Bitmap索引有着显著的优势:(1)高维情况下,Bitmap索引仅需较少的内存占用,这意味着系统在检索索引时,可以更多地把Bitmap索引加载到内存中,极 大地提升了索引检索速度;(2)Bitmap索引可以进行逻辑运算,例如按位逻辑与(AND)、或(OR)、异或(XOR)等,免除了复杂的格式转换,运算速度快。The Bitmap index is composed of bit arrays. Taking the 8-dimensional array in Figure 3 as an example, each bit represents a dimension, and the bit in the array represents whether the dimension is pre-calculated. For example, bit [3] is 1. Dimension GD_BASIC_INFO_DETAIL.FEDU_LEVEL is precomputed in this Cuboid, 0 is the opposite. In the MOLAP analysis scenario, the Bitmap index has significant advantages: (1) In the case of high dimensions, the Bitmap index only needs less memory usage, which means that the system can load more Bitmap indexes into the memory when retrieving the index Among them, the speed of index retrieval is greatly improved; (2) Bitmap index can perform logical operations, such as bitwise logical AND (AND), or (OR), exclusive OR (XOR), etc., eliminating the need for complex format conversion and speed of operation quick.
图5展示了Cuboid的Bitmap索引示例,每个Cuboid都是一个bit数组,表征了哪些维度已经被预计算完成。Figure 5 shows an example of Cuboid's Bitmap index. Each Cuboid is a bit array, representing which dimensions have been pre-calculated.
Cube生成树是由多个Cuboid组成的树形结构,接下来我们把这种Cuboid的Bitmap索引编码套入Cube生成树中,就可以形成基于Cube生成树Bitmap索引,简称Bitmap tree,同样以8个维度的数据模型为例,剪枝优化后的Bitmap tree如图6所示。The Cube spanning tree is a tree structure composed of multiple Cuboids. Next, we insert the Bitmap index code of this Cuboid into the Cube spanning tree to form a Cube spanning tree-based Bitmap index, referred to as Bitmap tree, also with 8 Take the dimension data model as an example, the Bitmap tree after pruning optimization is shown in Figure 6.
至此就构成了Bitmap索引的基本结构,它可以为MOLAP提供索引支持。值得注意的是,度量并不在Bitmap tree的索引结构里,每颗度量对应了一颗索引树,例如度量AVG(BIRTH_NUM),它名下的Bitmap tree中包含的所有Cuboid,均是针对AVG(BIRTH_NUM)所做的不同维度组合的预计算。So far, the basic structure of the Bitmap index has been formed, which can provide index support for MOLAP. It is worth noting that the metric is not in the index structure of the Bitmap tree. Each metric corresponds to an index tree. For example, the metric AVG(BIRTH_NUM), all the Cuboids contained in the Bitmap tree under its name are for the AVG(BIRTH_NUM ) for precomputation of different dimension combinations.
在本申请的一个实施例中,进一步地,具体而言,基于Bitmap索引的混合引擎查询路由:In an embodiment of the present application, further, specifically, the hybrid engine query routing based on the Bitmap index:
第一步、Bitmap索引的逻辑运算:The first step, the logical operation of the Bitmap index:
Bitmap索引中的Cuboid是独立的bit数组,它代表了已被预计算的维度组合,形如01001100,彼此不会出现重复,简称Cid。Bitmap索引离不开对Cid的操作,我们把目标维度组合称为Ctarget,bit数组记为01001000,可以对Cid和Ctarget进行逻辑运算。这种二进制的逻辑运算无需复杂的格式转换,且能获得更好的计算机硬件支持,运算速度较快。下面将简要介绍三种按位逻辑运算操作,通过这种简单操作的组合即可实现Bitmap索引的检索。The Cuboid in the Bitmap index is an independent bit array, which represents a combination of dimensions that have been precalculated, in the form of 01001100, and will not repeat each other, referred to as Cid. The Bitmap index is inseparable from the operation on Cid. We call the target dimension combination Ctarget, and the bit array is recorded as 01001000. Logical operations can be performed on Cid and Ctarget. This kind of binary logic operation does not require complex format conversion, and can obtain better computer hardware support, and the operation speed is faster. The following will briefly introduce three bitwise logical operation operations, and the retrieval of the Bitmap index can be realized through the combination of such simple operations.
(1)按位与AND:(1) Bitwise AND:
Cid AND Ctarget=01001100 AND 01001000=01001000Cid AND Ctarget=01001100 AND 01001000=01001000
(2)按位或OR:(2) Bitwise OR:
Cid OR Ctarget=01001100 OR 01001000=01001100Cid OR Ctarget=01001100 OR 01001000=01001100
(3)按位异或XOR:(3) Bitwise XOR:
Cid XOR Ctarget=01001100 XOR 01001000=11111011Cid XOR Ctarget=01001100 XOR 01001000=11111011
第二步、检索Bitmap索引:The second step is to retrieve the Bitmap index:
检索Bitmap索引,本质上是要在Cube生成树里找到一个符合查询需求的Cuboid,我们称这种Cuboid的维度组合完全符合查询需求的情况为“精确命中”。虽然精确命中可以为查询带来近乎完美的性能,然而过度追求精确命中也是不可取的。它一方面会导致Cube的急剧膨胀,加重存储和计算负担,另一方面,由于Cube的生成树是逐层递进的,如果子Cuboid无法命中,可以寻找其父Cuboid并由其在线聚合计算得出查询结果。我们把这种在线的聚合计算称为“推算”,相应由推算完成查询的情况叫做“模糊命中”。Retrieving the Bitmap index is essentially to find a Cuboid that meets the query requirements in the Cube spanning tree. We call the combination of dimensions of the Cuboid that fully meets the query requirements an "exact hit". Although exact hits can bring near-perfect performance to queries, it is not advisable to pursue exact hits too much. On the one hand, it will lead to the rapid expansion of the Cube, which will increase the burden of storage and calculation. On the other hand, since the spanning tree of the Cube is progressive layer by layer, if the child Cuboid cannot be hit, it can find its parent Cuboid and calculate it by its online aggregation. Get the query result. We call this kind of online aggregation calculation "reckoning", and the corresponding situation where the query is completed by reckoning is called "fuzzy hit".
在实际的Cube应用过程中,模糊命中情况下的推算所耗性能受多方因素影响,如数据集大小、结果集的规模、推算起点与目标节点的层数差、推算涉及字段的基数等。 本文的混合引擎系统设计了一种“精确——模糊命中模型”,采纳精确命中和推算起点与目标节点的层数差为1的模糊命中作为查询命中的条件。下面将检索Bitmap索引,找到符合查询需求的Cuboid。In the actual Cube application process, the calculation performance in the case of fuzzy hits is affected by many factors, such as the size of the data set, the size of the result set, the layer difference between the calculation start point and the target node, and the cardinality of the fields involved in the calculation. The hybrid engine system in this paper designs a "precise-fuzzy hit model", which adopts precise hits and fuzzy hits in which the layer difference between the calculated starting point and the target node is 1 as the query hit condition. Next, the Bitmap index will be retrieved to find the Cuboid that meets the query requirements.
下面我们总结了检索过程中Cid运算的几种情况,当前检索的Cuboid记为Cid,目标维度组合为Ctarget:Below we summarize several situations of Cid calculation during the retrieval process. The currently retrieved Cuboid is recorded as Cid, and the combination of target dimensions is Ctarget:
(1)精确命中(1) Accurate hit
Cid AND Ctarget==Ctarget,Cid XOR Ctarget==0Cid AND Ctarget==Ctarget, Cid XOR Ctarget==0
(2)模糊命中(2) Fuzzy hit
Cid AND Ctarget==Ctarget,Cid XOR Ctarget=R,R AND(R-1)==0Cid AND Ctarget==Ctarget, Cid XOR Ctarget=R,R AND(R-1)==0
(3)未找到Cuboid,需要继续检索(3) Cuboid not found, need to continue searching
Cid AND Ctarget==CtargetCid AND Ctarget==Ctarget
(4)Cube无法满足查询需求,退出检索(4) Cube cannot meet the query requirements, so quit the search
Cid AND Ctarget!=CtargetCid AND Ctarget! =Ctarget
上述4种情况即为Bitmap索引检索过程中当前检索的Cuboid(Cid)与查询需求(Ctarget)匹配时所会出现的情况,均采用逻辑运算。值得注意的是,第(2)种情况中的R是Cid和Ctarget异或的结果,而后续的R AND(R-1)==0则代表着R的bit数组中只有一个1,也即模糊命中情况下,推算的起点和终点只相差1个维度。为了更加形象地说明上述的4种匹配情况,本文拟定了Ctarget为00110000(查询需求维度组合为CD),在图7中对这些匹配进行了标注。The above four situations are the situations that will occur when the currently retrieved Cuboid (Cid) matches the query requirement (Ctarget) during the Bitmap index retrieval process, and logic operations are all used. It is worth noting that R in case (2) is the result of XOR of Cid and Ctarget, and the subsequent R AND(R-1)==0 means that there is only one 1 in the bit array of R, that is In the case of a fuzzy hit, the calculated start and end points differ only by 1 dimension. In order to illustrate the above four matching situations more vividly, this paper proposes that Ctarget is 00110000 (the query requirement dimension combination is CD), and these matches are marked in Figure 7.
第三步、查询路由整体流程:The third step is to query the overall process of routing:
Cuboid的查找是一种深度优先的搜索(Depth First Search,DFS),从Cuboid tree的顶点开始搜索,符合情况(3)则继续查找子节点,直至符合情况(1)结束(精确命中)。如果无法满足精确命中,则回溯至父节点检查是否满足模糊命中的条件即情况(2)。如果两种命中均不存在,检索至情况(4)退出DFS。结合Cuboid的DFS检索,本申请整理出基于混合引擎的多维数据查询路由整体流程如图8所示。Cuboid's search is a depth-first search (Depth First Search, DFS), starting from the vertex of the Cuboid tree, and continuing to search for child nodes if the case (3) is met, until the end of the case (1) (accurate hit). If the precise hit cannot be met, then go back to the parent node to check whether the condition of the fuzzy hit is met (2). If neither hit exists, search to case (4) and exit DFS. Combined with Cuboid's DFS retrieval, this application sorts out the overall process of multi-dimensional data query routing based on hybrid engine, as shown in Figure 8.
对于一个查询需求(数据概览、多维分析、数据可视化等),本文系统首先在前端将其组织成为统一的SQL交由后端查询。后台对该SQL进行逻辑语法树解析后,首先匹配度量和算子,只有当符合条件的Bitmap tree存在时才会进入针对维度组合的Bitmap检索,其他情况(例如数据明细查询、不支持的聚合算子、度量未经过预计算等)则会直接下压至ROLAP引擎。在Bitmap索引的检索过程中,优先查找精确命中的Cuboid,若不存在则回溯查找模糊命中的Cuboid,整个Bitmap的检索过程均使用逻辑运算。如果目标Cuboid被找到,即可使用MOLAP引擎直接查询预聚合的数据,否则使用ROLAP引擎在线计算。最后返回查询结果,结束整个查询路由的流程。For a query requirement (data overview, multidimensional analysis, data visualization, etc.), the system in this paper first organizes it into a unified SQL at the front end and submits it to the back-end query. After parsing the logical syntax tree of the SQL in the background, first match the metrics and operators, and only enter the Bitmap search for dimension combinations when the qualified Bitmap tree exists. child, metrics not precomputed, etc.) are pushed directly to the ROLAP engine. In the retrieval process of the Bitmap index, the Cuboid with the exact hit is searched first, and if it does not exist, the Cuboid with the fuzzy hit is searched backwards. The entire Bitmap retrieval process uses logical operations. If the target Cuboid is found, the MOLAP engine can be used to directly query the pre-aggregated data, otherwise the ROLAP engine can be used for online calculation. Finally, the query result is returned, ending the entire process of query routing.
本申请的技术效果:本发明提出并实现了一种基于Bitmap索引的混合引擎查询路由的多维数据分析系统,本系统综合利用ROLAP和MOLAP两种查询引擎的优点,提供灵活场景的多维数据分析服务,同时为用户提供了自动建模、模型全生命周期监控、 自动模型优化的服务,使MOLAP预计算技术在系统中很好地落地、解除对数据专家的依赖,最后以查询结果生成丰富的可视化图表和驾驶舱,使用户可以一目了然地从大数据中挖掘信息。Technical effect of the application: the present invention proposes and implements a multi-dimensional data analysis system based on Bitmap index-based hybrid engine query routing. This system comprehensively utilizes the advantages of ROLAP and MOLAP two query engines to provide multi-dimensional data analysis services in flexible scenarios At the same time, it provides users with automatic modeling, model lifecycle monitoring, and automatic model optimization services, so that MOLAP pre-computing technology can be well implemented in the system, relieve the dependence on data experts, and finally generate rich visualization with query results Charts and cockpits that allow users to mine information from big data at a glance.
如图11所示,为达上述目的,本申请第二方面实施例提出本发明一种基于混合引擎的多维数据查询装置,包括以下模块:As shown in Figure 11, in order to achieve the above purpose, the embodiment of the second aspect of the present application proposes a multi-dimensional data query device based on a hybrid engine of the present invention, including the following modules:
构建模块10,用于构建数据立方体生成树,所述数据立方体生成树包括多个子数据立方体,每个子数据立方体对应一种维度组合的预聚合结果; Construction module 10, is used for constructing data cube spanning tree, and described data cube spanning tree comprises a plurality of sub-data cubes, and each sub-data cube corresponds to the pre-aggregation result of a combination of dimensions;
建立模块20,用于建立维度字典,所述维度字典用于表征维度与比特位的映射关系; Establishment module 20, is used for establishing dimension dictionary, and described dimension dictionary is used for representing the mapping relation of dimension and bit;
生成模块30,用于根据所述维度字典构建位图索引,并将所述位图索引套入所述立方体生成树中,得到基于位图索引的位图树,其中,所述位图索引由比特数组组成,所述比特数组中的比特值表征比特位对应的维度是否被预计算;A generating module 30, configured to construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index consists of Composed of a bit array, the bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
查询模块40,用于获取查询需求,根据所述查询需求生成结构化查询语言,并根据所述结构化查询语言和所述位图树获取查询结果。The query module 40 is configured to obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
在本申请的一个实施例中,进一步地,所述查询模块40,包括:In an embodiment of the present application, further, the query module 40 includes:
第一查询单元,用于在确定与所述结构化查询语言匹配的位图树时,从位图树的根节点开始进行深度优先搜索;The first query unit is configured to perform a depth-first search from the root node of the bitmap tree when determining the bitmap tree matching the structured query language;
第一生成单元,用于在确定通过所述深度优先搜索查找到精确命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果。The first generation unit is configured to use the MOLAP engine to query the pre-aggregated data and generate a query result according to the pre-aggregated data when it is determined that an accurately hit sub-data cube is found through the depth-first search.
在本申请的一个实施例中,进一步地,所述查询模块40,还包括:In an embodiment of the present application, further, the query module 40 further includes:
第二查询单元,用于在确定通过所述深度优先搜索未查找到精确命中的子数据立方体时,回溯查找模糊命中的子数据立方体;The second query unit is used to retroactively search for sub-data cubes with fuzzy hits when it is determined that no sub-data cubes with precise hits have been found through the depth-first search;
第二生成单元,用于在确定回溯查找到模糊命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果;The second generating unit is used to use the MOLAP engine to query the pre-aggregated data when it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, and the query result is generated according to the pre-aggregated data;
在确定回溯查找未找到模糊命中的子数据立方体时,使用ROLAP引擎在线计算查询结果。When it is determined that the backtracking search does not find a sub-data cube with a fuzzy hit, the ROLAP engine is used to calculate the query result online.
在本申请的一个实施例中,进一步地,还包括:In one embodiment of the present application, it further includes:
剪枝模块,用于采用聚合组剪枝的方式对所述数据立方体生成树进行剪枝处理,其中,根据维度间的关联程度,将维度划分成多个聚合组,以每个聚合组为根节点开始物化各自的子数据立方体,所述维度包括强制维度、层级维度和联合维度中的一种或者多种。The pruning module is used to prune the data cube generation tree by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, with each aggregation group as the root Nodes start to materialize their respective child data cubes, and the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
在本申请的一个实施例中,进一步地,具体而言,基于Bitmap索引的混合引擎查询路由的多维数据分析系统:In one embodiment of the present application, further, specifically, the multidimensional data analysis system based on Bitmap indexed hybrid engine query routing:
第一步、系统整体架构:The first step, the overall structure of the system:
如图9所示,系统整体架构的逻辑从数据源开始,原始数据从HDFS、Hive、RDBMS等多源读取,对于未经预计算的数据模型,查询任务临时下压至Spark SQL执行,并由自动建模模块在后台进行Cube的构建,设计和构建Cube所需的信息是从历史执行过的 SQL中挖掘的,这部分由SQL解析模块完成。Cube在初建和使用过程中都需要进行模型优化,与本发明相关的MMO算法为优化提供了支撑。构建和优化完成后的Cube存储至存储模块的HBase中,其他的元数据、历史SQL等信息存至Mysql。执行查询任务时采用本发明的基于混合引擎的多维查询方法,由查询路由模块对其进行路由,解析出逻辑语法树以后检索Bitmap即可智能地得出最合适的查询引擎。最终查询结果被多维分析可视化模块渲染成为丰富的图表和仪表盘,供用户查看。As shown in Figure 9, the logic of the overall system architecture starts from the data source, and the original data is read from multiple sources such as HDFS, Hive, and RDBMS. The Cube is built in the background by the automatic modeling module, and the information required for designing and building the Cube is mined from the historically executed SQL, which is completed by the SQL parsing module. Cube needs model optimization during initial construction and use, and the MMO algorithm related to the present invention provides support for optimization. After the construction and optimization are completed, the Cube is stored in the HBase of the storage module, and other metadata, historical SQL and other information are stored in Mysql. The hybrid engine-based multi-dimensional query method of the present invention is used when performing query tasks, and the query routing module routes it, parses out the logic syntax tree, and retrieves the Bitmap to intelligently obtain the most suitable query engine. The final query results are rendered by the multidimensional analysis and visualization module into rich charts and dashboards for users to view.
第二步、技术架构设计:The second step, technical architecture design:
如图10所示,从技术角度划分,本系统分为5层:数据层、数据模型层、数据计算层、业务逻辑层和页面展示层。最底层的数据层为原始数据,经由建模层建模、优化后,由数据计算层执行查询任务,最后页面展示层调取业务逻辑层的API,将可视化的结果呈现给用户,下面将对这些层的技术详细介绍。As shown in Figure 10, from a technical point of view, the system is divided into five layers: data layer, data model layer, data calculation layer, business logic layer and page display layer. The bottom data layer is the original data. After modeling and optimization by the modeling layer, the data calculation layer executes the query tasks. Finally, the page display layer invokes the API of the business logic layer to present the visualized results to the user. The following will describe The technical details of these layers are presented.
数据层:本层为原始数据的存储,支持Hadoop的HDFS、Mysql、Hive等多源的数据。其中HDFS、Hive存储的作为分析使用的大数据仓库使用,而Mysql则更多地存储系统功能所需数据表、元数据等。Data layer: This layer is the storage of original data, supporting multi-source data such as Hadoop's HDFS, Mysql, and Hive. Among them, HDFS and Hive are used as big data warehouses for analysis, while Mysql stores more data tables and metadata required by system functions.
数据模型层:本层包括自动建模、模型优化、任务调度三个模块。自动建模使用SQL Parser对历史SQL集合进行解析,提取元数据,组合成为星型模型,使用Fast Cube算法将聚合数据物化成为Cube,预聚合的计算由Spark引擎执行。模型优化使用与本发明相关的MMO算法和Aggregation Group。任务调度使用Quartz调度器和Linux的Crontab进行,负责根据优化策略执行优化、物化计算。Data model layer: This layer includes three modules: automatic modeling, model optimization, and task scheduling. Automatic modeling uses SQL Parser to analyze historical SQL collections, extracts metadata, and combines them into a star model. The Fast Cube algorithm is used to materialize the aggregated data into Cubes, and the pre-aggregated calculations are performed by the Spark engine. Model optimization uses MMO algorithm and Aggregation Group related to the present invention. Task scheduling is performed using the Quartz scheduler and Linux Crontab, which is responsible for performing optimization and materialized calculations based on optimization strategies.
数据计算层:本层执行多维数据查询任务,分为计算引擎和多维索引两个模块。计算引擎为混合引擎,包括Spark SQL和Kylin,查询路由根据系统预计算完成数据的情况,检索多维索引选择合适的查询引擎执行任务。多维索引则用到了Cube生成树和Bitmap索引的技术。Data calculation layer: This layer executes multidimensional data query tasks, and is divided into two modules: calculation engine and multidimensional index. The calculation engine is a hybrid engine, including Spark SQL and Kylin. The query routing is based on the data pre-calculated by the system, retrieves the multi-dimensional index and selects the appropriate query engine to perform the task. The multidimensional index uses the technology of Cube spanning tree and Bitmap index.
业务逻辑层:本层为前端请求和后端查询任务的中间层,将下层服务封装成功能,供上层调用。系统整体架构使用Spring Boot编写,前后端通信、以及后端功能模块之间的通信使用RESTful API的方式进行,后端返回给前端的信息则组织为JSON的格式。Business logic layer: This layer is the middle layer between front-end requests and back-end query tasks, and encapsulates lower-layer services into functions for upper-layer calls. The overall architecture of the system is written using Spring Boot, and the communication between the front-end and the back-end and the communication between the back-end functional modules is carried out in the form of RESTful API, and the information returned from the back-end to the front-end is organized in JSON format.
页面展示层:负责将查询结果展示给用户,前端采用Vue框架编写,各类图表组件的生成采用了E-charts。Page display layer: responsible for displaying query results to users, the front end is written with Vue framework, and E-charts are used to generate various chart components.
第三步、查询路由模块:The third step is to query the routing module:
本模块的功能是执行多维数据的查询任务,基于混合引擎,结合Bitmap索引的检索,在Spark SQL和Kylin之间做出选择,以达到最优的查询性能。本模块的整体流程如图7所示。根据这些功能流程,可以归纳出本模块需要的接口,如下表所示:The function of this module is to perform multi-dimensional data query tasks. Based on the hybrid engine, combined with Bitmap index retrieval, it can choose between Spark SQL and Kylin to achieve optimal query performance. The overall process of this module is shown in Figure 7. According to these functional processes, the interfaces required by this module can be summarized, as shown in the following table:
Figure PCTCN2021137140-appb-000002
Figure PCTCN2021137140-appb-000002
Figure PCTCN2021137140-appb-000003
Figure PCTCN2021137140-appb-000003
query是本模块的主接口,其执行逻辑对用户透明。query接口调用了几个Service:多维查询路由(queryRoute)用来检索Bitmap索引、选择查询引擎(使用SQLParse解析逻辑语法树,然后由BitmapIndex执行查询任务匹配),SparkSQL的查询执行服务(sparkSql)用来执行下压查询任务,Kylin的查询执行服务(kylinQuery)用来执行命中Cube时的预计算结果查询。所调用的这些服务同时被封装成了接口,可以调试和单独调用某引擎测试使用。query is the main interface of this module, and its execution logic is transparent to users. The query interface calls several services: the multi-dimensional query route (queryRoute) is used to retrieve the Bitmap index, select the query engine (use SQLParse to parse the logical syntax tree, and then perform the query task matching by BitmapIndex), and the query execution service of SparkSQL (sparkSql) is used to Execute the push-down query task, and Kylin's query execution service (kylinQuery) is used to execute the pre-calculated result query when hitting the Cube. The called services are encapsulated into interfaces at the same time, which can be debugged and tested by calling an engine separately.
本申请的技术效果:可以在后台建立Cube的位图Bitmap索引,智能地选择合适的查询引擎执行查询任务,提升系统整体性能。The technical effect of the present application: the Bitmap index of the Cube can be established in the background, and the appropriate query engine can be intelligently selected to perform the query task, and the overall performance of the system can be improved.
为了实现上述实施例,本发明还提出一种非临时性计算机可读存储介质,该计算机存储有计算机程序,所述计算机程序被处理器执行时,实现本申请实施例的基于混合引擎的多维数据查询方法。In order to achieve the above embodiments, the present invention also proposes a non-transitory computer-readable storage medium, the computer stores a computer program, and when the computer program is executed by a processor, the multi-dimensional data based on the hybrid engine of the embodiment of the present application is realized. Query method.
为了实现上述实施例,本发明还提出一种电子设备,包括:处理器;用于存储所述处理器可执行指令的存储器;其中,所述处理器被配置为执行所述指令,以实现本申请实施例的基于混合引擎的多维数据查询方法。In order to achieve the above embodiments, the present invention also proposes an electronic device, including: a processor; a memory for storing instructions executable by the processor; wherein the processor is configured to execute the instructions to implement the present invention The multi-dimensional data query method based on the hybrid engine of the embodiment of the application.
为了实现上述实施例,本发明还提出一种计算机程序产品,包括计算机程序,所述计算机程序被处理器执行时实现本申请实施例的基于混合引擎的多维数据查询方法。In order to implement the above embodiments, the present invention further proposes a computer program product, including a computer program, and when the computer program is executed by a processor, the method for querying multi-dimensional data based on a hybrid engine in the embodiment of the present application is implemented.
尽管参考附图详地公开了本申请,但应理解的是,这些描述仅仅是示例性的,并非用来限制本申请的应用。本申请的保护范围由附加权利要求限定,并可包括在不脱离本申请保护范围和精神的情况下针对发明所作的各种变型、改型及等效方案。While the present application has been disclosed in detail with reference to the accompanying drawings, it should be understood that these descriptions are illustrative only and are not intended to limit the application of the present application. The protection scope of the present application is defined by the appended claims, and may include various changes, modifications and equivalent solutions for the invention without departing from the protection scope and spirit of the present application.
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本发明的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不必须针对的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任一个或多个实施例或示例中以合适的方式结合。此外,在不相互矛盾的情况下,本领域的技术人员可以将本说明书中描述的不同实施例或示例以及不同实施例或示例的特征进行结合和组合。In the description of this specification, descriptions referring to the terms "one embodiment", "some embodiments", "example", "specific examples", or "some examples" mean that specific features described in connection with the embodiment or example , structure, material or feature is included in at least one embodiment or example of the present invention. In this specification, the schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the described specific features, structures, materials or characteristics may be combined in any suitable manner in any one or more embodiments or examples. In addition, those skilled in the art can combine and combine different embodiments or examples and features of different embodiments or examples described in this specification without conflicting with each other.
此外,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。在本发明的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确具体的限定。In addition, the terms "first" and "second" are used for descriptive purposes only, and cannot be interpreted as indicating or implying relative importance or implicitly specifying the quantity of indicated technical features. Thus, the features defined as "first" and "second" may explicitly or implicitly include at least one of these features. In the description of the present invention, "plurality" means at least two, such as two, three, etc., unless specifically defined otherwise.
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现定制逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部 分,并且本发明的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本发明的实施例所属技术领域的技术人员所理解。Any process or method descriptions in flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing custom logical functions or steps of a process , and the scope of preferred embodiments of the invention includes alternative implementations in which functions may be performed out of the order shown or discussed, including substantially concurrently or in reverse order depending on the functions involved, which shall It is understood by those skilled in the art to which the embodiments of the present invention pertain.
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行指令的定序列表,可以具体实现在任何计算机可读介质中,以供指令执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从指令执行系统、装置或设备取指令并执行指令的系统)使用,或结合这些指令执行系统、装置或设备而使用。就本说明书而言,"计算机可读介质"可以是任何可以包含、存储、通信、传播或传输程序以供指令执行系统、装置或设备或结合这些指令执行系统、装置或设备而使用的装置。计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。The logic and/or steps represented in the flowcharts or otherwise described herein, for example, can be considered as a sequenced listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium, For use with instruction execution systems, devices, or devices (such as computer-based systems, systems including processors, or other systems that can fetch instructions from instruction execution systems, devices, or devices and execute instructions), or in conjunction with these instruction execution systems, devices or equipment for use. For the purposes of this specification, a "computer-readable medium" may be any device that can contain, store, communicate, propagate or transmit a program for use in or in conjunction with an instruction execution system, device, or device. More specific examples (non-exhaustive list) of computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM). In addition, the computer-readable medium may even be paper or other suitable medium on which the program can be printed, since the program can be read, for example, by optically scanning the paper or other medium, followed by editing, interpretation or other suitable processing if necessary. The program is processed electronically and stored in computer memory.
应当理解,本发明的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。如,如果用硬件来实现和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。It should be understood that various parts of the present invention can be realized by hardware, software, firmware or their combination. In the embodiments described above, various steps or methods may be implemented by software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware as in another embodiment, it can be implemented by any one or a combination of the following techniques known in the art: a discrete Logic circuits, ASICs with suitable combinational logic gates, Programmable Gate Arrays (PGA), Field Programmable Gate Arrays (FPGA), etc.
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。Those of ordinary skill in the art can understand that all or part of the steps carried by the methods of the above embodiments can be completed by instructing related hardware through a program, and the program can be stored in a computer-readable storage medium. During execution, one or a combination of the steps of the method embodiments is included.
此外,在本发明各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。In addition, each functional unit in each embodiment of the present invention may be integrated into one processing module, each unit may exist separately physically, or two or more units may be integrated into one module. The above-mentioned integrated modules can be implemented in the form of hardware or in the form of software function modules. If the integrated modules are realized in the form of software function modules and sold or used as independent products, they can also be stored in a computer-readable storage medium.
上述提到的存储介质可以是只读存储器,磁盘或光盘等。尽管上面已经示出和描述了本发明的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本发明的限制,本领域的普通技术人员在本发明的范围内可以对上述实施例进行变化、修改、替换和变型。The storage medium mentioned above may be a read-only memory, a magnetic disk or an optical disk, and the like. Although the embodiments of the present invention have been shown and described above, it can be understood that the above embodiments are exemplary and should not be construed as limiting the present invention, those skilled in the art can make the above-mentioned The embodiments are subject to changes, modifications, substitutions and variations.

Claims (12)

  1. 一种基于混合引擎的多维数据查询方法,其特征在于,包括:A method for querying multidimensional data based on a hybrid engine, characterized in that it includes:
    构建数据立方体生成树,所述数据立方体生成树包括多个子数据立方体,每个子数据立方体对应一种维度组合的预聚合结果;Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;
    建立维度字典,所述维度字典用于表征维度与比特位的映射关系;Establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;
    根据所述维度字典构建位图索引,并将所述位图索引套入所述立方体生成树中,得到基于位图索引的位图树,其中,所述位图索引由比特数组组成,所述比特数组中的比特值表征比特位对应的维度是否被预计算;Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, the The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
    获取查询需求,根据所述查询需求生成结构化查询语言,并根据所述结构化查询语言和所述位图树获取查询结果。Obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
  2. 如权利要求1所述的基于混合引擎的多维数据查询方法,其特征在于,根据所述结构化查询语言和所述位图树获取查询结果,包括:The hybrid engine-based multidimensional data query method according to claim 1, wherein obtaining query results according to the structured query language and the bitmap tree comprises:
    在确定与所述结构化查询语言匹配的位图树时,从位图树的根节点开始进行深度优先搜索;When determining the bitmap tree matching the structured query language, a depth-first search is performed from the root node of the bitmap tree;
    在确定通过所述深度优先搜索查找到精确命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果。When it is determined that the sub-data cube that is accurately hit is found through the depth-first search, the MOLAP engine is used to query the pre-aggregated data, and a query result is generated according to the pre-aggregated data.
  3. 如权利要求2所述的基于混合引擎的多维数据查询方法,其特征在于,根据所述结构化查询语言和所述位图树获取查询结果,还包括:The hybrid engine-based multidimensional data query method according to claim 2, wherein obtaining query results according to the structured query language and the bitmap tree also includes:
    在确定通过所述深度优先搜索未查找到精确命中的子数据立方体时,回溯查找模糊命中的子数据立方体;When it is determined that the sub-data cube of the exact hit is not found through the depth-first search, backtracking to find the sub-data cube of the fuzzy hit;
    在确定回溯查找到模糊命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果;When it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, the MOLAP engine is used to query the pre-aggregated data, and the query result is generated according to the pre-aggregated data;
    在确定回溯查找未找到模糊命中的子数据立方体时,使用ROLAP引擎在线计算查询结果。When it is determined that the backtracking search does not find a sub-data cube with a fuzzy hit, the ROLAP engine is used to calculate the query result online.
  4. 如权利要求1至3任一项所述的基于混合引擎的多维数据查询方法,其特征在于,在构建数据立方体生成树之后,还包括:The hybrid engine-based multidimensional data query method according to any one of claims 1 to 3, wherein, after building the data cube spanning tree, further comprising:
    采用聚合组剪枝的方式对所述数据立方体生成树进行剪枝处理,其中,根据维度间的关联程度,将维度划分成多个聚合组,以每个聚合组为根节点开始物化各自的子数据立方体,所述维度包括强制维度、层级维度和联合维度中的一种或者多种。The data cube spanning tree is pruned by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, and each aggregation group is used as the root node to materialize the respective child In a data cube, the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
  5. 如权利要求1所述的基于混合引擎的多维数据查询方法,其特征在于,所述建立维度字典,包括:The hybrid engine-based multidimensional data query method according to claim 1, wherein said establishment of a dimension dictionary includes:
    获取妊娠结果表、基础信息表和母亲体检表,根据所述妊娠结果表、基础信息表和 母亲体检表生成星型模型,对所述星型模型进行扁平化处理以得到所述维度字典,其中,每个维度对应一个比特位。Obtain the pregnancy result table, basic information table, and mother's physical examination table, generate a star schema according to the pregnancy result table, basic information table, and mother's physical examination table, and flatten the star schema to obtain the dimension dictionary, wherein , each dimension corresponds to a bit.
  6. 一种基于混合引擎的多维数据查询装置,其特征在于,包括:A multi-dimensional data query device based on a hybrid engine, characterized in that it comprises:
    构建模块,用于构建数据立方体生成树,所述数据立方体生成树包括多个子数据立方体,每个子数据立方体对应一种维度组合的预聚合结果;A building block for building a data cube spanning tree, the data cube spanning tree comprising a plurality of sub-data cubes, each sub-data cube corresponding to a pre-aggregated result of a combination of dimensions;
    建立模块,用于建立维度字典,所述维度字典用于表征维度与比特位的映射关系;Establishing a module for establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;
    生成模块,用于根据所述维度字典构建位图索引,并将所述位图索引套入所述立方体生成树中,得到基于位图索引的位图树,其中,所述位图索引由比特数组组成,所述比特数组中的比特值表征比特位对应的维度是否被预计算;A generating module, configured to construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index consists of bits Composed of an array, the bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
    查询模块,用于获取查询需求,根据所述查询需求生成结构化查询语言,并根据所述结构化查询语言和所述位图树获取查询结果。The query module is used to obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
  7. 如权利要求6所述的基于混合引擎的多维数据查询装置,其特征在于,所述查询模块,包括:The multi-dimensional data query device based on hybrid engine according to claim 6, wherein said query module comprises:
    第一查询单元,用于在确定与所述结构化查询语言匹配的位图树时,从位图树的根节点开始进行深度优先搜索;The first query unit is configured to perform a depth-first search from the root node of the bitmap tree when determining the bitmap tree matching the structured query language;
    第一生成单元,用于在确定通过所述深度优先搜索查找到精确命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果。The first generation unit is configured to use the MOLAP engine to query the pre-aggregated data and generate a query result according to the pre-aggregated data when it is determined that an accurately hit sub-data cube is found through the depth-first search.
  8. 如权利要求7所述的基于混合引擎的多维数据查询装置,其特征在于,所述查询模块,还包括:The multi-dimensional data query device based on hybrid engine as claimed in claim 7, wherein said query module further comprises:
    第二查询单元,用于在确定通过所述深度优先搜索未查找到精确命中的子数据立方体时,回溯查找模糊命中的子数据立方体;The second query unit is used to retroactively search for sub-data cubes with fuzzy hits when it is determined that no sub-data cubes with precise hits have been found through the depth-first search;
    第二生成单元,用于在确定回溯查找到模糊命中的子数据立方体时,使用MOLAP引擎查询预聚合的数据,根据预聚合的数据生成查询结果;The second generating unit is used to use the MOLAP engine to query the pre-aggregated data when it is determined that the sub-data cube of the fuzzy hit is found in the backtracking, and the query result is generated according to the pre-aggregated data;
    在确定回溯查找未找到模糊命中的子数据立方体时,使用ROLAP引擎在线计算查询结果。When it is determined that the backtracking search does not find a sub-data cube with a fuzzy hit, the ROLAP engine is used to calculate the query result online.
  9. 如权利要求6至8任一项所述的基于混合引擎的多维数据查询装置,其特征在于,还包括:The hybrid engine-based multidimensional data query device according to any one of claims 6 to 8, further comprising:
    剪枝模块,用于采用聚合组剪枝的方式对所述数据立方体生成树进行剪枝处理,其中,根据维度间的关联程度,将维度划分成多个聚合组,以每个聚合组为根节点开始物化各自的子数据立方体,所述维度包括强制维度、层级维度和联合维度中的一种或者多种。The pruning module is used to prune the data cube generation tree by means of aggregation group pruning, wherein, according to the degree of association between dimensions, the dimensions are divided into multiple aggregation groups, with each aggregation group as the root Nodes start to materialize their respective child data cubes, and the dimensions include one or more of mandatory dimensions, hierarchical dimensions, and joint dimensions.
  10. 一种非临时性计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现以下步骤:A non-transitory computer-readable storage medium on which a computer program is stored, wherein the computer program implements the following steps when executed by a processor:
    构建数据立方体生成树,所述数据立方体生成树包括多个子数据立方体,每个子数据立方体对应一种维度组合的预聚合结果;Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;
    建立维度字典,所述维度字典用于表征维度与比特位的映射关系;Establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;
    根据所述维度字典构建位图索引,并将所述位图索引套入所述立方体生成树中,得到基于位图索引的位图树,其中,所述位图索引由比特数组组成,所述比特数组中的比特值表征比特位对应的维度是否被预计算;Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, the The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
    获取查询需求,根据所述查询需求生成结构化查询语言,并根据所述结构化查询语言和所述位图树获取查询结果。Obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
  11. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    处理器;processor;
    用于存储所述处理器可执行指令的存储器;memory for storing said processor-executable instructions;
    其中,所述处理器被配置为执行所述指令,以实现以下步骤:Wherein, the processor is configured to execute the instructions to achieve the following steps:
    构建数据立方体生成树,所述数据立方体生成树包括多个子数据立方体,每个子数据立方体对应一种维度组合的预聚合结果;Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;
    建立维度字典,所述维度字典用于表征维度与比特位的映射关系;Establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;
    根据所述维度字典构建位图索引,并将所述位图索引套入所述立方体生成树中,得到基于位图索引的位图树,其中,所述位图索引由比特数组组成,所述比特数组中的比特值表征比特位对应的维度是否被预计算;Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, the The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
    获取查询需求,根据所述查询需求生成结构化查询语言,并根据所述结构化查询语言和所述位图树获取查询结果。Obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
  12. 一种计算机程序产品,包括计算机程序,其特征在于,所述计算机程序被处理器执行时实现以下步骤:A computer program product, comprising a computer program, characterized in that, when the computer program is executed by a processor, the following steps are implemented:
    构建数据立方体生成树,所述数据立方体生成树包括多个子数据立方体,每个子数据立方体对应一种维度组合的预聚合结果;Build a data cube spanning tree, the data cube spanning tree includes a plurality of sub-data cubes, each sub-data cube corresponds to a pre-aggregated result of a combination of dimensions;
    建立维度字典,所述维度字典用于表征维度与比特位的映射关系;Establishing a dimension dictionary, which is used to represent the mapping relationship between dimensions and bits;
    根据所述维度字典构建位图索引,并将所述位图索引套入所述立方体生成树中,得到基于位图索引的位图树,其中,所述位图索引由比特数组组成,所述比特数组中的比特值表征比特位对应的维度是否被预计算;Construct a bitmap index according to the dimension dictionary, and insert the bitmap index into the cube spanning tree to obtain a bitmap tree based on the bitmap index, wherein the bitmap index is composed of a bit array, the The bit value in the bit array indicates whether the dimension corresponding to the bit is pre-calculated;
    获取查询需求,根据所述查询需求生成结构化查询语言,并根据所述结构化查询语言和所述位图树获取查询结果。Obtain query requirements, generate a structured query language according to the query requirements, and obtain query results according to the structured query language and the bitmap tree.
PCT/CN2021/137140 2021-06-30 2021-12-10 Hybrid engine-based multidimensional data query method and apparatus WO2023273183A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110733535.XA CN113641669B (en) 2021-06-30 2021-06-30 Multi-dimensional data query method and device based on hybrid engine
CN202110733535.X 2021-06-30

Publications (1)

Publication Number Publication Date
WO2023273183A1 true WO2023273183A1 (en) 2023-01-05

Family

ID=78416417

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/137140 WO2023273183A1 (en) 2021-06-30 2021-12-10 Hybrid engine-based multidimensional data query method and apparatus

Country Status (2)

Country Link
CN (1) CN113641669B (en)
WO (1) WO2023273183A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113641669B (en) * 2021-06-30 2023-08-01 北京邮电大学 Multi-dimensional data query method and device based on hybrid engine
CN114547380B (en) * 2022-01-25 2022-11-15 北京元年科技股份有限公司 Data traversal query method and device, electronic equipment and readable storage medium
TWI778924B (en) * 2022-02-25 2022-09-21 國立高雄大學 Method for big data retrieval and system thereof
CN115309947B (en) * 2022-08-15 2023-03-21 北京欧拉认知智能科技有限公司 Method and system for realizing online analysis engine based on graph

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160048572A1 (en) * 2014-08-14 2016-02-18 Intellicus Technologies Pvt. Ltd. Building a Distributed Dwarf Cube using Mapreduce Technique
CN106484875A (en) * 2016-10-13 2017-03-08 广州视源电子科技股份有限公司 Data processing method based on MOLAP and device
CN110110165A (en) * 2019-04-01 2019-08-09 跬云(上海)信息科技有限公司 Dynamic routing method and device for query engine in precomputation system
CN113641669A (en) * 2021-06-30 2021-11-12 北京邮电大学 Multi-dimensional data query method and device based on hybrid engine

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6385604B1 (en) * 1999-08-04 2002-05-07 Hyperroll, Israel Limited Relational database management system having integrated non-relational multi-dimensional data store of aggregated data elements
US8150850B2 (en) * 2008-01-07 2012-04-03 Akiban Technologies, Inc. Multiple dimensioned database architecture
CN112286953B (en) * 2020-09-25 2023-02-24 北京邮电大学 Multidimensional data query method and device and electronic equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160048572A1 (en) * 2014-08-14 2016-02-18 Intellicus Technologies Pvt. Ltd. Building a Distributed Dwarf Cube using Mapreduce Technique
CN106484875A (en) * 2016-10-13 2017-03-08 广州视源电子科技股份有限公司 Data processing method based on MOLAP and device
CN110110165A (en) * 2019-04-01 2019-08-09 跬云(上海)信息科技有限公司 Dynamic routing method and device for query engine in precomputation system
CN113641669A (en) * 2021-06-30 2021-11-12 北京邮电大学 Multi-dimensional data query method and device based on hybrid engine

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DHANASREE K., SHOBABINDU C.: "A survey on OLAP", 2016 IEEE INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND COMPUTING RESEARCH (ICCIC), IEEE, 1 December 2016 (2016-12-01) - 17 December 2016 (2016-12-17), pages 1 - 9, XP093019274, ISBN: 978-1-5090-0612-0, DOI: 10.1109/ICCIC.2016.7919545 *

Also Published As

Publication number Publication date
CN113641669B (en) 2023-08-01
CN113641669A (en) 2021-11-12

Similar Documents

Publication Publication Date Title
WO2023273183A1 (en) Hybrid engine-based multidimensional data query method and apparatus
US10558659B2 (en) Techniques for dictionary based join and aggregation
US9798772B2 (en) Using persistent data samples and query-time statistics for query optimization
US9569506B2 (en) Uniform search, navigation and combination of heterogeneous data
US7505958B2 (en) Metadata management for a data abstraction model
US7730059B2 (en) Cube faceted data analysis
US8606794B2 (en) Adaptive processing of top-k queries in nested-structure arbitrary markup language such as XML
US9507825B2 (en) Techniques for partition pruning based on aggregated zone map information
US5899986A (en) Methods for collecting query workload based statistics on column groups identified by RDBMS optimizer
US9836519B2 (en) Densely grouping dimensional data
US7324991B1 (en) Sampling in a multidimensional database
US10331712B2 (en) Efficient spatial queries in large data tables
US7945561B1 (en) Methods and apparatus for processing a query joining tables stored at different data sources
CN108292315A (en) Data in storage and retrieval data cube
US20230401227A1 (en) Interactive data exploration
EP3232342B1 (en) Methods and systems for bidirectional indexing summary
TWI436222B (en) Real - time multi - dimensional analysis system and method on cloud
US9075799B1 (en) Methods and apparatus for query formulation
CN116501758B (en) NetCDF data query method and terminal
Padhy et al. A quantitative performance analysis between Mongodb and Oracle NoSQL
US8112385B2 (en) Ready to render business intelligence result sets
WO2023086322A1 (en) Late materialization of queried data in database cache
JP2005018751A (en) System and method for expressing and calculating relationship between measures
Adyatma et al. Library Development for Join Operation in Cassandra Database Management System
Porter-Brown Accelerating Aggregation Efficiency: Using Postgres as a Cache with MongoDB

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21948108

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE