WO2022247443A1 - 数据查询方法、装置、设备和存储介质 - Google Patents

数据查询方法、装置、设备和存储介质 Download PDF

Info

Publication number
WO2022247443A1
WO2022247443A1 PCT/CN2022/083616 CN2022083616W WO2022247443A1 WO 2022247443 A1 WO2022247443 A1 WO 2022247443A1 CN 2022083616 W CN2022083616 W CN 2022083616W WO 2022247443 A1 WO2022247443 A1 WO 2022247443A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
data
dimension
auxiliary
metric
Prior art date
Application number
PCT/CN2022/083616
Other languages
English (en)
French (fr)
Inventor
刘鹤
刘文政
李扬
韩卿
Original Assignee
跬云(上海)信息科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 跬云(上海)信息科技有限公司 filed Critical 跬云(上海)信息科技有限公司
Priority to EP22761040.9A priority Critical patent/EP4116838A4/en
Publication of WO2022247443A1 publication Critical patent/WO2022247443A1/zh
Priority to US18/092,330 priority patent/US20230153298A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of computer technology, in particular, to a data query method, device, equipment and storage medium.
  • OLAP Online Data Analysis
  • the main purpose of the present application is to provide a data query method, device, equipment and storage medium to solve the above problems.
  • a data query method including:
  • querying according to the related information of the measures and dimensions includes:
  • the metric data of the node is determined according to the row dimension and the column dimension of the coordinate node.
  • it also includes: obtaining an operator of the MDX query statement;
  • the auxiliary dimension of the node and/or the auxiliary metric For the coordinate node; if the metric of the coordinate node cannot be found, determine the auxiliary dimension of the node and/or the auxiliary metric according to the row and column dimensions of the coordinate node and the expression of the metric;
  • the auxiliary dimension, and/or are calculated to obtain the metric of the node.
  • receiving the MDX query statement includes: receiving the MDX query statement sent by the reporting tool;
  • Formatting the query results sending the formatted query results to the report tool.
  • the query is performed according to the relevant information of the measurement and dimension to obtain the query result, including:
  • the distributed storage system includes multiple batches of mutually isolated data
  • the root cannot query the direct measure of the node, then query the auxiliary dimension of the node, and/or, the auxiliary measure includes:
  • the auxiliary dimensions and/or auxiliary metrics are obtained from the data batch.
  • a data query device includes:
  • a receiving module configured to receive an MDX query statement
  • An acquisition module configured to acquire information related to metrics and dimensions in the MDX query statement
  • the query module is configured to perform query according to the related information of the measurement and dimension to obtain the query result.
  • the query module is also used to determine the row dimension data range and the column dimension data range;
  • the metric data of the node is determined according to the row dimension and the column dimension of the coordinate node.
  • the query module is also used to obtain the operator of the MDX query statement
  • the metric of the coordinate node if the metric of the coordinate node cannot be queried, determine the auxiliary dimension of the coordinate node according to the row and column dimensions of the coordinate node and the metric expression query, and/or, the auxiliary metric;
  • the auxiliary dimension, and/or are calculated to obtain the metric of the node.
  • the receiving module is also used to receive the MDX query statement sent by the reporting tool;
  • It also includes a formatting module, which is used to format the query results; and send the formatted query results to the report tool.
  • the query module is further configured to perform query from the distributed storage system according to the related information of the metrics and dimensions to obtain query results;
  • the distributed storage system includes multiple batches of mutually isolated data
  • each batch of data stores a set of dimensions and measures.
  • the query module is also used to determine the auxiliary dimension to be used according to the measurement expression, and/or, the auxiliary measurement;
  • the auxiliary dimensions and/or auxiliary metrics are obtained from the data batch.
  • an electronic device including at least one processor and at least one memory; the memory is used to store one or more program instructions; the processor is used to Executing one or more program instructions to perform any of the methods described above.
  • a computer-readable storage medium contains one or more program instructions, and the one or more program instructions are used to perform any of the above-mentioned method.
  • the above-mentioned technical solution of the present invention improves the efficiency of data query and analysis by extracting the measurement and dimension information in the query and calculating the MDX expression through the distributed computing framework. It can deal with various business analysis scenarios with large data volume and complex logic.
  • Fig. 1 is a flow chart of a data query method according to an embodiment of the present application
  • FIG. 2 is a schematic structural diagram of a data query device according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of another data query device according to an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a data query device according to an embodiment of the present application.
  • OLAP Online Analysis Processing, online data analysis, a technology that enables analysts to quickly gain insight into data from multiple dimensions.
  • Aggregation Query Indicates the query content of MDX query at a certain aggregation level.
  • Aggregation Query Result Organized in a specific form, the query result of MDX query at a certain aggregation level.
  • Dimension in the MDX language concept generally corresponding to a dimension table in the data source.
  • Hierarchy The hierarchical structure in the MDX language concept may consist of multiple layers.
  • Level The level in the MDX language concept, which generally corresponds to a specific field on the dimension table.
  • the present application proposes a data query method, see the flowchart of a data query method shown in accompanying drawing 1; the method includes:
  • Step S102 receiving an MDX query statement; wherein, the MDX query statement sent by the report analysis tool may be received.
  • the MDX query statements from various report analysis tools are analyzed according to the mode in which the report analysis tool organizes MDX statements, and the information required for various queries is extracted and organized and sent to the query execution module.
  • the user's query intent is extracted according to the statement: the desired dimension, measure, filter condition and its location (multidimensional data model's on several axes); since MDX queries often require data at multiple aggregation levels and the calculations between multiple aggregation levels are independent of each other, several corresponding Aggregation Query can be generated based on the extracted above information, and the above Aggregation Query is sent to the query execution module for parallel execution.
  • Receive report analysis tools include but not limited to: Excel, Tableau, PowerBI.
  • the query statement may be "query the total sales of this quarter".
  • Step S104 obtaining related information of measures and dimensions in the MDX query statement
  • the related information of the metric and the dimension includes, but is not limited to, one or more of the following: metric expressions, related operators in the metric expressions, row dimensions, column dimensions, and dimension level information.
  • the measurement can be sales
  • the dimensions include store name and date.
  • a two-dimensional table can be created, the horizontal axis of the table is the store name, and the vertical axis is the date. Sales per store, per day is the measure.
  • the query statement is "query the average grade of a student in a quarter"
  • the measure is the test score
  • the dimension includes the student's name and date.
  • Step S106 performing a query according to the relevant information of the measure and dimension to obtain a query result.
  • data can be stored in a distributed storage system.
  • the distributed storage system is a unified whole, in which multiple batches of isolated data are stored, and each batch of data includes a set of dimension and measurement data.
  • Using a distributed storage system can improve the security and backup of data storage. Different metric data can be stored in different data batches.
  • the above-mentioned method of the present invention improves the efficiency of data query and analysis by extracting the measurement and dimension information in the query and calculating the MDX expression through the distributed computing framework. It can deal with various business analysis scenarios with large data volume and complex logic.
  • querying according to the related information of the measures and dimensions includes:
  • the metric data of the node is determined according to the row dimension and the column dimension of the coordinate node.
  • the query statement is a query for multi-dimensional data.
  • the query involves the three dimensions of city, store ordinal, year and the measurement of sales. Assume that in the report tool, the city and store ordinal dimensions are placed on the row, and the year dimension and measurement are placed on the column. See Table 1 for the presentation in the report:
  • the display form is a two-dimensional data table, but it is actually a three-dimensional data cube with three-dimensional x, y, and z axes. They are: city, store number, and year.
  • it also includes: obtaining an operator of the MDX query statement;
  • the metric of the coordinate node cannot be queried, determine the auxiliary dimension of the coordinate node according to the row and column dimensions of the coordinate node, and the metric expression, and/or the auxiliary metric;
  • the auxiliary dimension, and/or are calculated to obtain the metric of the node.
  • the expression includes auxiliary dimensions and/or auxiliary metrics; and the auxiliary dimensions and/or auxiliary metrics to be used are determined according to the measurement expressions. After obtaining the auxiliary dimension and/or measure, calculate the measure value of the node.
  • Metrics are divided into two categories: basic metrics and calculated metrics.
  • the former can be obtained directly from the data batches of the distributed storage system, while the latter can be obtained according to expressions dependent on dimensions, and/or calculated metrics. Therefore, auxiliary dimensions and auxiliary measures need to be used:
  • the query statement is "query the profit of each quarter"
  • the measurement expression is, the sales amount minus the cost equals the profit
  • the relevant operator is a subtraction operation.
  • the dimensions include the row dimension and the column dimension.
  • the horizontal axis is the row dimension
  • the vertical axis is the column dimension.
  • the horizontal and vertical axes serve as basic data blocks.
  • the daily sales of each store is a node in the abstract syntax tree. Operators can be used to calculate the node.
  • it also includes: acquiring row and column dimension level information in the MDX query statement;
  • H (L1, L2, L3)
  • H (year, month, day); the year is L1, the month is L2, and the day is L3.
  • the level of H is the highest, and the level of a single year, month, and day is the lowest.
  • the prioritization of the levels from high to low is: year level, month level, and day level.
  • the sales data of a certain year, a certain month and a certain day can be found; it can also be found, the sales of a certain year, or the sales of a certain month, or the sales of a certain day.
  • Traversing the abstract syntax tree includes: for any node of the abstract syntax tree, if the direct metric or dimension of the node cannot be queried, querying the auxiliary metric or auxiliary dimension of the node;
  • the direct metric or dimension is calculated according to the auxiliary metric and/or auxiliary dimension.
  • each batch of data includes a set of dimension and measurement data.
  • measure 1 can be stored in data batch 1
  • measure 1 is sales.
  • Data batch 2 stores metric 2, which is the cost.
  • Sales and Cost are both measures.
  • the reporting tool has requirements on the format, only a specific format can be recognized by the reporting tool.
  • the query result is formatted; the format is formatted into a format that can be recognized by the report tool. Send the formatted query results to the reporting tool.
  • the final return result is constructed from multiple Aggregation Query Result.
  • the dimensions and measurement information on the rows and columns are obtained from different Aggregation Query to determine the framework of the MDX query results (dimensions on rows and columns, and measurement distribution), and the row, row, and metric are extracted from each data unit of the data block returned by the query execution module.
  • the dimension values on the column and the corresponding measurement values, and then the extraction results of different Aggregation Query are organized according to the high-low relationship of the aggregation level, and returned to the report analysis tool in a specific format.
  • the query is performed according to the relevant information of the measurement and dimension to obtain the query result, including:
  • the distributed storage system includes multiple batches of mutually isolated data; each batch of data stores a set of dimensions and metrics.
  • querying the auxiliary dimension of the node, and/or, the auxiliary metric includes:
  • the auxiliary dimensions and/or auxiliary metrics are obtained from the data batch.
  • the corresponding relationship between data batches and stored auxiliary dimensions, and/or auxiliary metrics is stored in advance; the corresponding data batches are determined according to the corresponding relationship, and then specific dimensions and measurement values are obtained.
  • the measure is sales
  • the pre-stored relationship is that the sales are stored in data batch 4, and the sales are obtained from data batch 4.
  • the present invention provides a higher-performance MDX execution engine solution, which greatly improves the execution speed of MDX queries when there is a large amount of data, and improves the overall performance of the query; provides a distributed storage system docking solution, enabling users to The storage system handles a larger amount of data; it provides a distributed computing solution, allowing users to flexibly adjust resource allocation according to actual needs, which greatly improves the flexibility of the system and the cost of use for users.
  • the present invention also provides a data processing device, as shown in Figure 2, the device includes:
  • Receiving module 21 for receiving MDX query statement
  • An acquisition module 22 configured to acquire information about metrics and dimensions in the MDX query statement
  • the query module 23 is configured to perform query according to the related information of the measures and dimensions to obtain query results.
  • the query module 23 is also used to determine the data range of the row dimension and the data range of the column dimension;
  • the metric data of the node is determined according to the row dimension and the column dimension of the coordinate node.
  • the query module 23 is also used to obtain the operator of the MDX query statement
  • auxiliary dimension of the coordinate node For the coordinate node; if the metric of the coordinate node cannot be queried, determine the auxiliary dimension of the coordinate node according to the row dimension, column dimension and metric expression query of the coordinate node, and/or the auxiliary metric;
  • the auxiliary dimension, and/or are calculated to obtain the metric of the node.
  • the receiving module 21 is also used to receive the MDX query statement sent by the reporting tool;
  • It also includes a formatting module, which is used to format the query results; and send the formatted query results to the report tool.
  • the query module 23 is further configured to perform query from the distributed storage system according to the related information of the metrics and dimensions to obtain query results;
  • the distributed storage system includes multiple batches of mutually isolated data; wherein each batch of data stores a set of dimensions and metrics.
  • the query module 23 is also used to determine the auxiliary dimension to be used according to the measurement expression, and/or the auxiliary measurement;
  • the auxiliary dimensions and/or auxiliary metrics are obtained from the data batch.
  • This device comprises: MDX statement parsing module 31, query execution module 32, data providing module 33, result construction module 34 and distributed computing module 35.
  • the MDX statement parsing module 31 is used to analyze the MDX query statements from various report analysis tools according to the pattern of organizing MDX statements by the report analysis tool, extract and organize the information required for various queries and send them to the query execution module.
  • the user's query intention is extracted according to the statement: the desired dimension, measure, filter condition and its location (on several axes of the multidimensional data model ); since MDX queries often require data at multiple aggregation levels and the calculations between multiple aggregation levels are independent of each other, several corresponding Aggregation Query can be generated based on the extracted above information, and the above Aggregation Query can be sent to the query Execution modules execute in parallel.
  • the report analysis tool Since the report analysis tool has certain rules when organizing query statements, according to its statement pattern and semantic analysis, the above sample query contains data on the two aggregation levels of total and detail under the hierarchical structure of H, so it will be transformed into Generate two Aggregation Query, calculate L1, M and L2, M respectively, and execute them in parallel;
  • the query execution module 32 is used to convert the Aggregation Query into a distributed execution plan and submit it to the distributed computing module 35, receive the calculation result sent by the distributed computing module 35, and finally return the result to the result construction module 34.
  • each Aggregation Query first queries the dimensional data of H level (L1 or L2) contained in itself and constructs it as a basic data block, and then traverses the abstract syntax tree of M, which is a calculation measure: for query statements involving According to the semantics of the different algorithms and functions, on the basic data block, use the distributed operator to add, reduce, calculate, modify and other operations (if it is necessary to add data, it needs to be provided by the data module to query), and finally add the data content corresponding to M to the basic data block.
  • the data results of L1 and L2 and the data results of M on L1 and L2 are extracted, and converted into the Aggregation Query Result data structure. and forwarded to the resulting construction module.
  • the data providing module 33 is used to receive the dimension and measurement request sent by the query execution module 32, and after further adjusting the requested dimension and measurement range according to specific rules, initiate a query to the distributed data storage service to obtain the result of the dimension and measurement After packaging, the packaged dimensions and measured data blocks are returned to the query execution module 32 .
  • the result construction module 34 is used to construct the final return result from multiple Aggregation Query Result. Firstly, the dimensions and measurement information on the rows and columns are obtained from different Aggregation Query to determine the framework of the MDX query results (dimensions on rows and columns, and measurement distribution), and the row, row, and metric are extracted from each data unit of the data block returned by the query execution module. The dimension values on the column and the corresponding measurement values, and then the extraction results of different Aggregation Query are organized according to the high-low relationship of the aggregation level, and returned to the report analysis tool in a specific format.
  • the distributed computing module 35 is used for distributed computing planning, and sends the computing results to the query execution module.
  • a data query device refers to the schematic structural diagram of a data query device shown in Figure 4, including at least one processor 41 and at least one memory 42; the memory 42 is used for One or more program instructions are stored; the processor 41 is configured to run one or more program instructions to execute any one of the above-mentioned methods.
  • the present application also proposes a computer-readable storage medium, which contains one or more program instructions, and the one or more program instructions are used to execute the method described in any one of the above .
  • a general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.
  • the steps of the methods disclosed in the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register.
  • the processor reads the information in the storage medium, and completes the steps of the above method in combination with its hardware.
  • a storage medium may be a memory, which may be, for example, volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory.
  • the non-volatile memory can be read-only memory (Read-Only Memory, referred to as ROM), programmable read-only memory (Programmable ROM, referred to as PROM), erasable programmable read-only memory (Erasable PROM, referred to as EPROM) , Electrically Erasable Programmable Read-Only Memory (Electrically Erasable EPROM, referred to as EEPROM) or flash memory.
  • ROM Read-Only Memory
  • PROM programmable read-only memory
  • EPROM erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • flash memory Electrically Erasable Programmable Read-Only Memory
  • the volatile memory may be Random Access Memory (RAM for short), which acts as an external cache.
  • RAM Random Access Memory
  • many forms of RAM are available, such as Static Random Access Memory (Static RAM, SRAM for short), Dynamic Random Access Memory (Dynamic RAM, DRAM for short), Synchronous Dynamic Random Access Memory (Synchronous DRAM, referred to as SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, referred to as DDRSDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, referred to as ESDRAM), synchronous connection dynamic random access memory (Synchlink DRAM, referred to as SLDRAM) and direct memory bus random access memory (DirectRambus RAM, referred to as DRRAM).
  • Static Random Access Memory Static Random Access Memory
  • Dynamic RAM Dynamic RAM
  • Synchronous Dynamic Random Access Memory Synchronous Dynamic Random Access Memory
  • DDRSDRAM double data rate synchronous dynamic random access memory
  • ESDRAM enhanced synchronous dynamic random access memory
  • the storage media described in the embodiments of the present invention are intended to include, but are not limited to, these and any other suitable types of memory.
  • Computer-readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one place to another.
  • a storage media may be any available media that can be accessed by a general purpose or special purpose computer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Operations Research (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请公开了一种数据查询方法、装置、设备和存储介质。一种数据查询方法,包括:接收MDX查询语句;获取所述MDX查询语句中的度量和维度的相关信息;根据所述度量和维度的相关信息进行查询得到查询结果。本发明通过对度量和维度的相关信息的提取和利用分布式计算框架对MDX表达式进行计算,极大地提高了数据分析的效率。

Description

数据查询方法、装置、设备和存储介质 技术领域
本申请涉及计算机技术领域,具体而言,涉及一种数据查询方法、装置、设备和存储介质。
背景技术
随着大数据时代的来临,人们收集和分析的数据规模越来越大,如何对海量数据进行分析和决策是一个难题。OLAP(联机数据分析)系统由于其优秀的多维分析能力,已成为大数据分析中不可或缺的组件。OLAP通常使用的查询语言是MDX。在大数据上使用MDX语言还不能够进行高效的分析。
发明内容
本申请的主要目的在于提供一种数据查询方法、装置、设备和存储介质,以解决上述问题。
为了实现上述目的,根据本申请的一个方面,提供了一种数据查询方法,包括:
接收MDX查询语句;
获取所述MDX查询语句中的度量和维度的相关信息;
根据所述度量和维度的相关信息进行查询得到查询结果。
在一种实施方式中,根据所述度量和维度的相关信息进行查询,包括:
确定行维度数据范围和列维度数据范围;
根据所述行维度数据范围和列维度数据范围构建多维数据表格;
对于所述多维数据表格中的任意的一个坐标节点;
根据所述坐标节点的行维度和列维度查询确定所述节点的度量数据。
在一种实施方式中,还包括:获取所述MDX查询语句的算子;
对于所述坐标节点;如果查询不到所述坐标节点的度量,则根据所述坐标节点的行、列维度及度量的表达式,确定所述节点的辅助维度,和/或者,辅助度量;
根据所述辅助维度,和/或者,辅助度量和所述算子计算得到所述节点的度量。
在一种实施方式中,接收MDX查询语句,包括:接收报表工具发送的MDX查询语句;
对查询结果进行格式整理;将经过格式整理的查询结果发送给所述报表工具。
在一种实施方式中,根据所述度量和维度的相关信息进行查询得到查询结果,包括:
根据所述度量和维度的相关信息从分布式存储系统进行查询得到查询结果;
所述分布式存储系统中包括多批相互隔离的数据;
其中,每批数据中存储了一组维度和度量。
在一种实施方式中,根如果查询不到所述节点的直接度量,则查询所述节点的辅助维度,和/或者,辅助度量,包括:
根据所述度量表达式确定需要用到的辅助维度,和/或者,辅助度量;
根据所述辅助维度,和/或者,辅助度量确定所述辅助维度,和/或者,辅助度量所在的数据批次;
从所述数据批次中获取所述辅助维度,和/或者,辅助度量。
为了实现上述目的,根据本申请的第二方面,提供了一种数据查询装置;该装置包括:
接收模块,用于接收MDX查询语句;
获取模块,用于获取所述MDX查询语句中的度量和维度的相关信息;
查询模块,用于根据所述度量和维度的相关信息进行查询得到查询结果。
在一种实施方式中,查询模块还用于,确定行维度数据范围和列维度数据范 围;
根据所述行维度数据范围和列维度数据范围构建多维数据表格;
对于所述多维数据表格中的任意的一个坐标节点;
根据所述坐标节点的行维度和列维度查询确定所述节点的度量数据。
在一种实施方式中,查询模块还用于,获取所述MDX查询语句的算子;
对于所述坐标节点;如果查询不到所述坐标节点的度量,则根据所述坐标节点的行、列维度及度量表达式查询确定所述坐标节点的辅助维度,和/或者,辅助度量;
根据所述辅助维度,和/或者,辅助度量和所述算子计算得到所述节点的度量。
在一种实施方式中,接收模块还用于,接收报表工具发送的MDX查询语句;
还包括格式整理模块,用于对查询结果进行格式整理;将经过格式整理的查询结果发送给所述报表工具。
在一种实施方式中,查询模块还用于,根据所述度量和维度的相关信息从分布式存储系统进行查询得到查询结果;
所述分布式存储系统中包括多批相互隔离的数据;
其中,每批数据存储一组维度和度量。
在一种实施方式中,查询模块还用于,根据所述度量表达式确定需要用到的辅助维度,,和/或者,辅助度量;
根据所述辅助维度,和/或者,辅助度量的信息确定所述辅助维度,和/或者,辅助度量所在的数据批次;
从所述数据批次中获取所述辅助维度,和/或者,辅助度量。
为了实现上述目的,根据本申请的第三方面,提供了一种电子设备;包括至少一个处理器和至少一个存储器;所述存储器用于存储一个或多个程序指令;所 述处理器,用于运行一个或多个程序指令,用以执行上述任意一项所述的方法。
根据本申请的第四方面,提供了一种计算机可读存储介质,计算机可读存储介质中包含一个或多个程序指令,所述一个或多个程序指令用于执行上述任意一项所述的方法。
本发明的上述的技术方案,通过提取查询中度量和维度信息和通过分布式计算框架来计算MDX表达式,提高了数据查询、分析的效率。能够应对各种大数据量、复杂逻辑的业务分析场景。
附图说明
构成本申请的一部分的附图用来提供对本申请的进一步理解,使得本申请的其它特征、目的和优点变得更明显。本申请的示意性实施例附图及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1是根据本申请实施例的一种数据查询方法的流程图;
图2是根据本申请实施例的一种数据查询装置的结构示意图;
图3是根据本申请实施例的另一种数据查询装置的结构示意图;
图4是根据本申请实施例的一种数据查询设备的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分的实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本申请保护的范围。
需要说明的是,本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包 含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本申请。
首先介绍一下本申请所用到的技术术语:
OLAP:Online Analysis Processing,联机数据分析,一种使分析人员能够迅速、从多个维度洞察数据的技术。
Aggregation Query:表示MDX查询在某一聚合层级上的查询内容。
Aggregation Query Result:以特定形式组织的、MDX查询在某一聚合层级上的查询结果。
Dimension:MDX语言概念中的维度,一般对应数据源中一张维表。
Hierarchy:MDX语言概念中的层级结构,可能由多层组成。
Level:MDX语言概念中的级别,一般对应为维表上的特定字段。
需要说明的是,在附图的流程图示出的步骤可以在诸如一组计算机可执行指令的计算机系统中执行,并且,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
本申请提出了一种数据查询方法,参见附图1所示的一种数据查询方法的流程图;该方法包括:
步骤S102,接收MDX查询语句;其中,可以接收报表分析工具发送的MDX查询语句。
具体的,根据报表分析工具组织MDX语句的模式,对来自各种报表分析工具的MDX查询语句进行分析,提取并组织各种查询时所需的信息发送至查询执行模块。
示例性的,首先,收到报表分析工具发送的MDX查询语句后,根据该语句提取出用户的查询意图:所想要查询的维度、度量、筛选条件及其所处的位置(多维数据模型的若干条轴上);由于MDX查询往往需要多个聚合层级的数据且多个聚合层级之间的计算互相独立,因此根据提取出的上述信息,可以生成若干个相应的Aggregation Query,并将上述Aggregation Query发送至查询执行模块并行执行。
接收报表分析工具包括但不限于:Excel、Tableau、PowerBI。
示例性的,查询语句可以为“查询这一个季度的总的销售额”。
步骤S104,获取所述MDX查询语句中的度量和维度的相关信息;
具体的,度量和维度的相关信息包括但是不限于以下的一种或者几种:度量表达式、所述度量表达式中的相关算子、行维度、列维度以及维度级别信息。
示例性的,查询语句可以为“查询这一个季度的总的销售额”时,度量可以为销售额,维度包括店名、日期。可以建立一个二维的表格,表格的横轴为店名,纵轴为日期。每个店,每天的销售额为度量。
示例性的,查询语句为“查询一个学生在一个季度的平均成绩”,度量为考试成绩,维度包括学生的姓名和日期。
步骤S106,根据所述度量和维度的相关信息进行查询得到查询结果。
其中,数据可以存储在分布式存储系统中。分布式存储系统是一个统一的整体,在这个系统中存储了多批相互隔离的数据,每批数据中包括了一组维度和度量数据。采用分布式存储系统可以提高数据存储的安全性,备用性。可以把不同的度量数据存储在不同的数据批次中。
本发明的上述的方法,通过提取查询中的度量和维度信息和通过分布式计算框架来计算MDX表达式,提高了数据查询、分析的效率。能够应对各种大数据量、复杂逻辑的业务分析场景。
在一种实施方式中,根据所述度量和维度的相关信息进行查询,包括:
确定行维度数据范围和列维度数据范围;
根据所述行维度数据范围和列维度数据范围构建多维数据表格;
对于所述多维数据表格中的任意的一个坐标节点;
根据所述坐标节点的行维度和列维度查询确定所述节点的度量数据。
具体的,尽管报表工具对数据的展示方式是二维数据表格,但逻辑上查询语句是对多维数据的查询。例如:查询每个省份各门店各年份销售额,则查询涉及到城市、门店序数、年份三个维度和销售额这一度量。假设在报表工具中行上放置了城市、门店序数维度,列上放置了年份维度和度量,在报表中的展现形式参见表1:
Figure PCTCN2022083616-appb-000001
表1
如表1所示,城市,门店序数和年份三个维度取值来确定销售额的取值,展现形式为二维数据表格,但实际上是一个三维数据立方体,三维的x,y,z轴分别为:城市、门店序数、年份。
在一种实施方式中,还包括:获取所述MDX查询语句的算子;
对于所述坐标节点;如果查询不到所述坐标节点的度量,则根据所述坐标节点的行、列维度、度量表达式确定所述坐标节点的辅助维度,和/或者,辅助度量;
根据所述辅助维度,和/或者,辅助度量和所述算子计算得到所述节点的度量。
具体的,表达式中包括辅助维度,和/或,辅助度量;根据所述度量表达式确定需要用到的辅助维度,和/或,辅助度量。得到辅助维度,和/或者度量后,再计 算得到节点的度量值。
度量分为基本度量和计算度量两类,前者可以从分布式存储系统的数据批次中直接获取到,后者则根据表达式依赖于维度,和/或者,度量计算得到。所以需要用到辅助维度、辅助度量:
示例性的,1.假设存在两个基本度量“销售额”与“成本”,那么对于“净利润”(“销售额”–“成本“)这一计算度量来说,便仅依赖于两个辅助度量;
2.假设存在维度“商品编号”取值为1-20,其中1-10号商品由于政策规定,在“销售额”和“成本”之外还有一笔“政策补贴”的额外收入固定值为10,那么此时的“净利润”便应该是:
if(“商品编号”in 1-10)
“销售额“–“成本”+10
else
“销售额“–“成本”
那么“净利润”便是依赖于辅助维度(“商品编号”)和辅助度量(“销售额”、“成本”)。
而政策规定可能会发生变动,因此在业务场景中判断条件一般也会被单独写成一个计算度量,方便进行更改,因此实际上这里会有两个计算度量:
1.政策补贴:if“商品编号”in 1-10;
2.商品净利润:if(“政策补贴”)then“销售额“–“成本”+10else“销售额“–
“成本”。
那么对“政策补贴”来说,它只依赖于辅助维度,而对“商品净利润”来说,它本身只依赖于辅助度量。当然由于它依赖于“政策补贴”,因此它实际上依赖于辅助维度和辅助度量。
示例性的,查询语句为“查询每个季度的利润”,度量表达式为,销售额减去成本等于利润,相关算子为减法运算。
根据所述度量表达式构建抽象语法树;遍历所述抽象语法树;对于所述抽象语法树中的任意的一个节点,采用所述算子对所述节点行维度、列维度进行计算。
具体的,维度包括行维度、列维度,如此可以设计一个二维表格,横轴为行维度,纵轴为列维度。横轴和纵轴作为基本数据块。
示例性的,每个店,每天的销售额为抽象语法树中的一个节点。采用算子可以对该节点进行计算。
在一种实施方式中,还包括:获取所述MDX查询语句中的行、列维度的级别信息;
具体的,H=(L1,L2,L3);
示例性的,H=(年,月,日);年为L1,月为L2,日为L3。H的级别最高,单个年,月,日的级别为低。并且在单个维度的优先级排序中,级别从高到低的优先级排序为:年级别,月级别,日级别。
根据所述级别信息从分布式数据存储系统中查询获取对应数据;
示例性的,可以查到某年某月某日的销售额数据;也可以查到,某年的销售额,或某月的销售额,或某天的销售额。比如,可以查到2020年5月1号的销售额;也可以查到:5月的销售额,包括历年的5月的销售额;比如,2019年的5月的销售额和2020年的销售额;可以查询到5月1号的销售额,包括多年的5月1号的销售额;可以进行横向的对比,更直观地判断出历年的5月1号的销售额;以及销售额横向对比的变化趋势。
遍历所述抽象语法树,包括:对于所述抽象语法树的任意一个节点,如果查询不到所述节点的直接度量或者维度,则查询所述节点的辅助度量或辅助维度;
根据所述辅助度量,和/或,辅助维度计算得到所述直接度量或维度。
具体的,采用分布式系统中,存储了多批次相互隔离的数据,每批数据包括了一组维度和度量数据。比如,可以在数据批次1存储度量1,度量1为销售额。数据批次2存储度量2,度量2为成本。
示例性的,如果想要计算某天的利润,查询不到直接的利润,需要计算,需要先获得销售额,再获得成本,用销售额减去成本得到利润。其中,销售额和成本均为度量。
因为报表工具对格式有要求,只有特定的格式才能够被报表工具识别。在一种实施方式中,对查询结果进行格式整理;将格式整理成报表工具能够识别的格式。将经过格式整理的查询结果发送给报表工具。
具体的,从多个Aggregation Query Result构造最终返回结果。首先从不同的Aggregation Query中获取到行列上的维度、度量信息确定MDX查询结果的框架(行、列上的维度、度量分布),从查询执行模块返回的数据块的各数据单元中提取出行、列上的维度值及对应的各度量值,再将不同Aggregation Query的提取结果根据聚合层级的高低关系组织,以特定格式返回至报表分析工具。
在一种实施方式中,根据所述度量和维度的相关信息进行查询得到查询结果,包括:
根据所述度量和维度的相关信息从分布式存储系统进行查询得到查询结果;
所述分布式存储系统中包括多批相互隔离的数据;其中,每批数据中存储了一组维度和度量。
在一种实施方式中,如果查询不到所述节点的直接度量,则查询所述节点的辅助维度,和/或者,辅助度量,包括:
根据所述度量表达式确定需要用到的辅助维度,和/或者,辅助度量;
根据所述辅助维度,和/或者,辅助度量确定所述辅助维度,和/或者,辅助度量所在的数据批次;
从所述数据批次中获取所述辅助维度,和/或者,辅助度量。
具体的,预先存储了数据批次和存储的辅助维度,和/或者,辅助度量的对应关系;根据对应关系确定对应的数据批次,进而获取具体的维度,度量值。
示例性的,度量为销售额,预先存储的关系为,销售额存储在数据批次4中, 从数据批次4中获取销售额。
本发明提供了更高性能的MDX执行引擎方案,极大的提升了大数据量时MDX查询的执行速度,提升了查询的整体性能;提供了分布式存储系统对接方案,使得用户能够通过分布式存储系统处理更大规模的数据量;提供了分布式计算方案,使得用户可以根据实际需求灵活调整资源分配,极大的提高了系统的灵活性和用户的使用成本。
第二方面,本发明还提供了一种数据处理装置,如图2所示,该装置包括:
接收模块21,用于接收MDX查询语句;
获取模块22,用于获取所述MDX查询语句中的度量和维度的相关信息;
查询模块23,用于根据所述度量和维度的相关信息进行查询得到查询结果。
在一种实施方式中,查询模块23还用于,确定行维度数据范围和列维度数据范围;
根据所述行维度数据范围和列维度数据范围构建多维数据表格;
对于所述多维数据表格中的任意的一个坐标节点;
根据所述坐标节点的行维度和列维度查询确定所述节点的度量数据。
在一种实施方式中,查询模块23还用于,获取所述MDX查询语句的算子;
对于所述坐标节点;如果查询不到所述坐标节点的度量,则根据所述坐标节点的行维度、列维度和度量表达式查询确定所述坐标节点的辅助维度,和/或者,辅助度量;
根据所述辅助维度,和/或者,辅助度量和所述算子计算得到所述节点的度量。
在一种实施方式中,接收模块21还用于,接收报表工具发送的MDX查询语句;
还包括格式整理模块,用于对查询结果进行格式整理;将经过格式整理的查 询结果发送给所述报表工具。
在一种实施方式中,查询模块23还用于,根据所述度量和维度的相关信息从分布式存储系统进行查询得到查询结果;
所述分布式存储系统中包括多批相互隔离的数据;其中,每批数据存储一组维度和度量。
在一种实施方式中,查询模块23还用于,根据所述度量表达式确定需要用到的辅助维度,和/或者,辅助度量;
根据所述辅助维度,和/或者,辅助度量的信息确定所述辅助维度,和/或者,辅助度量所在的数据批次;
从所述数据批次中获取所述辅助维度,和/或者,辅助度量。
下面详细介绍另一种数据查询装置,参见附图3所示的另一种数据查询装置的结构示意图;该装置包括:MDX语句解析模块31、查询执行模块32、数据提供模块33、结果构造模块34和分布式运算模块35。
下面将结合一个简单的样例查询来对四个模块做具体介绍,以一条查询了单个维度和单个计算度量的MDX查询为例:select[D].[H].members from[Catalog]where([Measures].[M]),其中D、H分别代表MDX维度(Dimension)、层次结构(Hierarchy),这里假设H只有两个级别(Level):L1、L2分别代表总计和明细,M代表了一个计算度量。
MDX语句解析模块31,用于根据报表分析工具组织MDX语句的模式,对来自各种报表分析工具的MDX查询语句进行分析,提取并组织各种查询时所需的信息发送至查询执行模块。
首先,收到报表分析工具发送的MDX查询语句后,根据该语句提取出用户的查询意图:所想要查询的维度、度量、筛选条件及其所处的位置(多维数据模型的若干条轴上);由于MDX查询往往需要多个聚合层级的数据且多个聚合层级之间的计算互相独立,因此根据提取出的上述信息,可以生成若干个相应的 Aggregation Query,并将上述Aggregation Query发送至查询执行模块并行执行。
由于报表分析工具在组织查询语句时有一定的规律,根据其语句模式与语义分析可知,上述样例查询中包含H这一层级结构下的总计与明细两个聚合层级上的数据,因此会转化出两个Aggregation Query,分别计算L1、M与L2、M,并行执行;
查询执行模块32,用于将Aggregation Query转化为分布式执行计划并提交至分布式运算模块35,接收分布式运算模块35发送的计算结果,最终将结果返回给结果构造模块34。
在从MDX语句解析模块获得了不同聚合层级的查询后,根据其中包括的行列维度信息及这些维度上的筛选信息,首先构建基本的数据块对应行、列轴上的元素;随后遍历查询度量表达式的抽象语法树,对于不同类型节点,通过一定规则,映射为不同的分布式算子,对基本数据块进行相应的添加、减少、计算、修改等操作,从而达成逐步解析、执行度量表达式的抽象语法树的目的。完成后将最终包含整个聚合层级查询结果的数据块转化为Aggregation Query Result,并返回至结果构造模块。
对应到上述样例查询,每个Aggregation Query首先查询自身包含的H的级别(L1或L2)的维度数据并构造为基本数据块,之后遍历M这一计算度量的抽象语法树:针对查询语句涉及到的不同算法,不同函数,根据其语义,在基本数据块上以列为单位,用分布式算子进行添加、减少、计算、修改等操作(若有必要添加数据时,则需要通过数据提供模块进行查询),最终在基本数据块上添加上M对应的数据内容。对最终计算完成的数据块,根据Aggregation Query中包含的级别、度量所在位置信息,提取出L1、L2的数据结果与M在L1、L2上的数据结果,将其转化为Aggregation Query Result数据结构,并转发到结果构造模块。
数据提供模块33,用于接收查询执行模块32发送的维度、度量的请求,根据特定规则,对请求的维度、度量范围进一步调整后,向分布式数据存储服务发起查询,获得维度、度量的结果后进行包装,将经过包装的维度、度量的数据块返 回至查询执行模块32。
对应到上述样例查询中,首先在最初的初始数据块构造时,需要根据查询中行列上使用的级别信息,组织合适的查询,从分布式数据存储系统中获取到相应的数据。其次,在遍历M的抽象语法树的过程中,若发现需要额外的数据,如该计算度量依赖于其他的基本度量或维度时,则再根据这些需要的维度、度量信息,组织查询获取数据、返回给执行模块;
结果构造模块34,用于从多个Aggregation Query Result构造最终返回结果。首先从不同的Aggregation Query中获取到行列上的维度、度量信息确定MDX查询结果的框架(行、列上的维度、度量分布),从查询执行模块返回的数据块的各数据单元中提取出行、列上的维度值及对应的各度量值,再将不同Aggregation Query的提取结果根据聚合层级的高低关系组织,以特定格式返回至报表分析工具。
分布式计算模块35,用于分布式计算计划,将计算结果发送给查询执行模块。
根据本申请的第三方面,提供了一种数据查询设备;参见附图4所示的一种数据查询设备的结构示意图,包括至少一个处理器41和至少一个存储器42;所述存储器42用于存储一个或多个程序指令;所述处理器41,用于运行一个或多个程序指令,用以执行上述任意一项的方法。
第四方面,本申请还提出了一种计算机可读存储介质,计算机可读存储介质中包含一个或多个程序指令,所述一个或多个程序指令用于执行上述任一项所述的方法。
可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。处理器读取存储介质中的信息,结合其硬件完成上述方法的步骤。
存储介质可以是存储器,例如可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。
其中,非易失性存储器可以是只读存储器(Read-Only Memory,简称ROM)、可编程只读存储器(Programmable ROM,简称PROM)、可擦除可编程只读存储器(Erasable PROM,简称EPROM)、电可擦除可编程只读存储器(Electrically EPROM,简称EEPROM)或闪存。
易失性存储器可以是随机存取存储器(Random Access Memory,简称RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,简称SRAM)、动态随机存取存储器(Dynamic RAM,简称DRAM)、同步动态随机存取存储器(Synchronous DRAM,简称SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data RateSDRAM,简称DDRSDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,简称ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,简称SLDRAM)和直接内存总线随机存取存储器(DirectRambus RAM,简称DRRAM)。
本发明实施例描述的存储介质旨在包括但不限于这些和任意其它适合类型的存储器。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件与软件组合来实现。当应用软件时,可以将相应功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
以上所述仅为本申请的优选实施例而已,并不用于限制本申请,对于本领域的技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (11)

  1. 一种数据查询方法,其特征在于,包括:
    接收MDX查询语句;
    获取所述MDX查询语句中的度量和维度的相关信息;
    根据所述度量和维度的相关信息进行查询得到查询结果。
  2. 如权利要求1所述的数据查询方法,其特征在于,根据所述度量和维度的相关信息进行查询,包括:
    确定行维度数据范围和列维度数据范围;
    根据所述行维度数据范围和列维度数据范围构建多维数据表格;
    对于所述多维数据表格中的任意的一个坐标节点;
    根据所述坐标节点的行维度和列维度查询确定所述节点的度量数据。
  3. 如权利要求2所述的数据查询方法,其特征在于,还包括:
    根据所述度量表达式构建抽象语法树;
    遍历所述抽象语法树;
    对于所述抽象语法树中的任意的一个节点,采用所述算子对所述节点行维度、列维度进行计算。
  4. 如权利要求2所述的数据查询方法,其特征在于,还包括:
    获取所述MDX查询语句的算子;
    对于所述坐标节点;如果查询不到所述坐标节点的度量,则根据所述坐标节点的行、列维度及度量表达式,确定所述节点的辅助维度,和/或者,辅助度量;
    根据所述辅助维度,和/或者,辅助度量和所述算子计算得到所述节点的度量。
  5. 如权利要求1所述的数据查询方法,其特征在于,接收MDX查询语句,包括:接收报表工具发送的MDX查询语句;
    对查询结果进行格式整理;将经过格式整理的查询结果发送给所述报表工具。
  6. 如权利要求4所述的数据查询方法,其特征在于,根据所述度量和维度的相关信息进行查询得到查询结果,包括:
    根据所述度量和维度的相关信息从分布式存储系统进行查询得到查询结果;
    所述分布式存储系统中包括多批相互隔离的数据;
    其中,每批数据中存储了一组维度和度量。
  7. 如权利要求6所述的数据查询方法,其特征在于,如果查询不到所述节点的直接度量,则查询所述节点的辅助维度,和/或者,辅助度量,包括:
    根据所述度量表达式确定需要用到的辅助维度,和/或者,辅助度量;
    根据所述辅助维度,和/或者,辅助度量确定所述辅助维度,和/或者,辅助度量所在的数据批次;
    从所述数据批次中获取所述辅助维度,和/或者,辅助度量。8.一种数据查询装置,其特征在于,包括:
    接收模块,用于接收MDX查询语句;
    获取模块,用于获取所述MDX查询语句中的度量和维度的相关信息;
    查询模块,用于根据所述度量和维度的相关信息进行查询得到查询结果。
  8. 如权利要求8所述的数据查询装置,其特征在于,查询模块还用于,
    确定行维度数据范围和列维度数据范围;
    根据所述行维度数据范围和列维度数据范围构建多维数据表格;
    对于所述多维数据表格中的任意的一个坐标节点;
    根据所述坐标节点的行维度和列维度查询确定所述节点的度量数据。
  9. 一种数据查询装置,其特征在于,包括:
    MDX语句解析模块,用于根据报表分析工具组织MDX语句的模式,对来自各种报表分析工具的MDX查询语句进行分析,提取并组织各种查询时所需的信息发送至查询执行模块;
    查询执行模块,用于将Aggregation Query转化为分布式执行计划并提交至分布式运算模块,接收分布式运算模块发送的计算结果,最终将结果返回给结果构造模块;
    数据提供模块,用于接收查询执行模块发送的维度、度量的请求,根据特定规则,对请求的维度、度量范围进一步调整后,向分布式数据存储服务发起查询,获得维度、度量的结果后进行包装,将经过包装的维度、度量的数据块返回至查询执行模块;
    结果构造模块,用于从多个Aggregation Query Result构造最终返回结果;
    分布式运算模块,用于分布式计算计划,将计算结果发送给查询执行模块。
  10. 一种数据查询设备,其特征在于,包括:至少一个处理器和至少一个存储器;所述存储器用于存储一个或多个程序指令;所述处理器,用于运行一个或多个程序指令,用以执行如权利要求1-7任一项所述的方法。
  11. 一种计算机可读存储介质,其特征在于,计算机可读存储介质中包含一个或多个程序指令,所述一个或多个程序指令用于执行如权利要求1-7任一项所述的方法。
PCT/CN2022/083616 2021-05-24 2022-03-29 数据查询方法、装置、设备和存储介质 WO2022247443A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22761040.9A EP4116838A4 (en) 2021-05-24 2022-03-29 DATA QUERY METHOD AND APPARATUS, DEVICE AND RECORDING MEDIUM
US18/092,330 US20230153298A1 (en) 2021-05-24 2023-01-01 Data query method, device and equipment and a storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110569913.5 2021-05-24
CN202110569913.5A CN113220728B (zh) 2021-05-24 2021-05-24 数据查询方法、装置、设备和存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/092,330 Continuation US20230153298A1 (en) 2021-05-24 2023-01-01 Data query method, device and equipment and a storage medium

Publications (1)

Publication Number Publication Date
WO2022247443A1 true WO2022247443A1 (zh) 2022-12-01

Family

ID=77098209

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/083616 WO2022247443A1 (zh) 2021-05-24 2022-03-29 数据查询方法、装置、设备和存储介质

Country Status (4)

Country Link
US (1) US20230153298A1 (zh)
EP (1) EP4116838A4 (zh)
CN (1) CN113220728B (zh)
WO (1) WO2022247443A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113220728B (zh) * 2021-05-24 2023-11-28 跬云(上海)信息科技有限公司 数据查询方法、装置、设备和存储介质
CN115729926A (zh) * 2021-08-30 2023-03-03 易保网络技术(上海)有限公司 一种数据处理方法和设备、存储介质、程序产品以及计算机设备
CN117807108B (zh) * 2024-02-28 2024-06-11 广州思迈特软件有限公司 基于双查询引擎的数据查询方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6189004B1 (en) * 1998-05-06 2001-02-13 E. Piphany, Inc. Method and apparatus for creating a datamart and for creating a query structure for the datamart
CN111159221A (zh) * 2019-12-31 2020-05-15 北京恒泰实达科技股份有限公司 一种通过动态构建立方体进行数据处理或查询的方法
CN112559567A (zh) * 2020-12-10 2021-03-26 跬云(上海)信息科技有限公司 适用于olap查询引擎的查询方法及装置
CN113220728A (zh) * 2021-05-24 2021-08-06 跬云(上海)信息科技有限公司 数据查询方法、装置、设备和存储介质

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100437589C (zh) * 2007-01-30 2008-11-26 金蝶软件(中国)有限公司 联机分析处理系统中多维表达式数据缓存的方法和装置
US7779031B2 (en) * 2007-02-15 2010-08-17 International Business Machines Corporation Multidimensional query simplification using data access service having local calculation engine
US8359305B1 (en) * 2011-10-18 2013-01-22 International Business Machines Corporation Query metadata engine
CN105488045A (zh) * 2014-09-16 2016-04-13 中兴通讯股份有限公司 一种数据展现的方法及装置
CN104933115B (zh) * 2015-06-05 2019-05-03 北京京东尚科信息技术有限公司 一种多维分析方法和系统
CN105404608B (zh) * 2015-10-27 2018-07-20 中通服公众信息产业股份有限公司 一种基于公式解析的复杂指标集计算方法和系统
CN106933845B (zh) * 2015-12-30 2020-07-24 阿里巴巴集团控股有限公司 使用sql实现mdx查询效果的方法和装置
US9396248B1 (en) * 2016-01-04 2016-07-19 International Business Machines Corporation Modified data query function instantiations
CN110222124A (zh) * 2019-05-08 2019-09-10 跬云(上海)信息科技有限公司 基于olap的多维数据处理方法及系统
CN111597237B (zh) * 2020-05-22 2024-03-29 北京明略昭辉科技有限公司 数据查询结果的生成方法及装置、电子设备、存储介质
CN111949658A (zh) * 2020-08-06 2020-11-17 浙江工业大学 一种面向数据立方体的可操作图形透视表构建方法
CN112418721A (zh) * 2020-12-08 2021-02-26 中国建设银行股份有限公司 指标确定方法和装置
CN112561642B (zh) * 2020-12-16 2024-04-09 中国平安人寿保险股份有限公司 多维度产品对比分析方法、装置、计算机设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6189004B1 (en) * 1998-05-06 2001-02-13 E. Piphany, Inc. Method and apparatus for creating a datamart and for creating a query structure for the datamart
CN111159221A (zh) * 2019-12-31 2020-05-15 北京恒泰实达科技股份有限公司 一种通过动态构建立方体进行数据处理或查询的方法
CN112559567A (zh) * 2020-12-10 2021-03-26 跬云(上海)信息科技有限公司 适用于olap查询引擎的查询方法及装置
CN113220728A (zh) * 2021-05-24 2021-08-06 跬云(上海)信息科技有限公司 数据查询方法、装置、设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4116838A4 *

Also Published As

Publication number Publication date
EP4116838A1 (en) 2023-01-11
US20230153298A1 (en) 2023-05-18
CN113220728B (zh) 2023-11-28
CN113220728A (zh) 2021-08-06
EP4116838A4 (en) 2023-09-27

Similar Documents

Publication Publication Date Title
WO2022247443A1 (zh) 数据查询方法、装置、设备和存储介质
CN110199273B (zh) 用于在多维数据库环境中的一次扫描中进行加载、聚合和批量计算的系统和方法
US20230084389A1 (en) System and method for providing bottom-up aggregation in a multidimensional database environment
US10817534B2 (en) Systems and methods for interest-driven data visualization systems utilizing visualization image data and trellised visualizations
US11741059B2 (en) System and method for extracting a star schema from tabular data for use in a multidimensional database environment
US20190138523A1 (en) Processing database queries using format conversion
US8983914B2 (en) Evaluating a trust value of a data report from a data processing tool
US7743071B2 (en) Efficient data handling representations
WO2023273073A1 (zh) 报表配置方法、装置、设备及计算机存储介质
US20180137180A1 (en) Systems and methods for interest-driven data visualization systems utilized in interest-driven business intelligence systems
US9934299B2 (en) Systems and methods for interest-driven data visualization systems utilizing visualization image data and trellised visualizations
US20130166498A1 (en) Model Based OLAP Cube Framework
US7720831B2 (en) Handling multi-dimensional data including writeback data
US9153051B2 (en) Visualization of parallel co-ordinates
US20170031980A1 (en) Visual Aggregation Modeler System and Method for Performance Analysis and Optimization of Databases
US20130238549A1 (en) Using Dimension Substitutions in OLAP Cubes
US11803865B2 (en) Graph based processing of multidimensional hierarchical data
CN107729500B (zh) 一种联机分析处理的数据处理方法、装置及后台设备
CN115081414A (zh) 基于数据模型的电子表格生成方法、装置、设备及介质
TWI480754B (zh) 使用條件群組之樞紐分析方法
JP2005018751A (ja) 測度間の関係を表現及び計算するシステム及び方法
CN114860759A (zh) 一种数据处理方法、装置、设备及可读存储介质
CN115062133A (zh) 基于数据模型的数据查询方法、装置、计算机设备及介质
CN115733787A (zh) 一种网络识别方法、装置、服务器及存储介质
US9128908B2 (en) Converting reports between disparate report formats

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022761040

Country of ref document: EP

Effective date: 20220908

NENP Non-entry into the national phase

Ref country code: DE