WO2023029854A1 - Data query method and apparatus, storage medium, and electronic device - Google Patents

Data query method and apparatus, storage medium, and electronic device Download PDF

Info

Publication number
WO2023029854A1
WO2023029854A1 PCT/CN2022/109468 CN2022109468W WO2023029854A1 WO 2023029854 A1 WO2023029854 A1 WO 2023029854A1 CN 2022109468 W CN2022109468 W CN 2022109468W WO 2023029854 A1 WO2023029854 A1 WO 2023029854A1
Authority
WO
WIPO (PCT)
Prior art keywords
statement
engine
query
query language
structured query
Prior art date
Application number
PCT/CN2022/109468
Other languages
French (fr)
Chinese (zh)
Inventor
孙科
韩帅
郭俊
Original Assignee
北京火山引擎科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京火山引擎科技有限公司 filed Critical 北京火山引擎科技有限公司
Publication of WO2023029854A1 publication Critical patent/WO2023029854A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution

Abstract

The present disclosure relates to a data query method and apparatus, a storage medium, and an electronic device, for use in reducing the use cost during data query of a user, and improving the data query efficiency. The method comprises: obtaining a structured query language statement determined on the basis of a unified structured query language standard; determining a query feature corresponding to the structured query language statement, the query feature being used for representing the query semantics of the structured query language statement; determining, from a plurality of calculation engines, a target calculation engine according to the query feature of the structured query language statement, and converting the structured query language statement into a target data query statement which can be executed by the target calculation engine; and executing the target data query statement by means of the target calculation engine.

Description

数据查询方法、装置、存储介质及电子设备Data query method, device, storage medium and electronic equipment
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111032755.6、申请日为2021年09月03日,名称为“数据查询方法、装置、存储介质及电子设备”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on the Chinese patent application with the application number 202111032755.6, the filing date is September 03, 2021, and the name is "data query method, device, storage medium and electronic equipment", and claims the priority of the Chinese patent application. The entire content of the Chinese patent application is hereby incorporated by reference into this application.
技术领域technical field
本公开涉及数据技术领域,具体地,涉及一种数据查询方法、装置、存储介质及电子设备。The present disclosure relates to the field of data technology, and in particular, to a data query method, device, storage medium and electronic equipment.
背景技术Background technique
OLAP(Online Analytical Processing,联机分析处理)是针对特定问题的联机数据访问和分析,OLAP的目标是满足决策支持或多维环境特定的查询和报表需求。OLAP (Online Analytical Processing, Online Analytical Processing) is online data access and analysis for specific problems. The goal of OLAP is to meet the specific query and report requirements of decision support or multi-dimensional environment.
相关技术中,在OLAP过程中,可以先通过计算引擎从数据源获取数据进行分析。但是,各种计算引擎的架构不同,在使用多种异构计算引擎时,要求用户掌握不同的引擎使用技巧,包括SQL(Structured Query Language,结构化查询语言)语法、函数定义和参数调优等,例如,用户使用Presto引擎时,需要按照Presto引擎的语法规范编写SQL,而使用Spark引擎时,则需要按照Spark引擎的语法规范编写SQL,极大的增加了用户在数据查询过程中的使用成本,从而影响OLAP效率。In related technologies, in the OLAP process, data may first be obtained from a data source through a computing engine for analysis. However, the architectures of various computing engines are different. When using multiple heterogeneous computing engines, users are required to master different engine usage skills, including SQL (Structured Query Language, Structured Query Language) syntax, function definition, and parameter tuning. For example, when users use the Presto engine, they need to write SQL according to the syntax specification of the Presto engine, while when using the Spark engine, they need to write SQL according to the syntax specification of the Spark engine, which greatly increases the user's cost in the data query process , thus affecting OLAP efficiency.
发明内容Contents of the invention
提供该发明内容部分以便以简要的形式介绍构思,这些构思将在后面的具体实施方式部分被详细描述。该发明内容部分并不旨在标识要求保护的技术方案的关键特征或必要特征,也不旨在用于限制所要求的保护的技术方案的范围。This Summary is provided to introduce a simplified form of concepts that are described in detail later in the Detailed Description. This summary of the invention is not intended to identify key features or essential features of the claimed technical solution, nor is it intended to be used to limit the scope of the claimed technical solution.
第一方面,本公开提供一种数据查询方法,所述方法包括:In a first aspect, the present disclosure provides a data query method, the method comprising:
获取基于统一结构化查询语言标准确定的结构化查询语言语句;Obtain the structured query language statement determined based on the unified structured query language standard;
确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;Determine the query feature corresponding to the structured query language statement, the query feature is used to characterize the query semantics of the structured query language statement;
根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;According to the query feature of the structured query language statement, determine a target computing engine among multiple computing engines, and convert the structured query language statement into a target data query statement executable by the target computing engine;
通过所述目标计算引擎执行所述目标数据查询语句。The target data query statement is executed by the target computing engine.
第二方面,本公开提供一种数据查询装置,所述装置包括:In a second aspect, the present disclosure provides a data query device, the device comprising:
获取模块,用于获取基于统一结构化查询语言标准确定的结构化查询语言语句;An acquisition module, configured to acquire the structured query language statement determined based on the unified structured query language standard;
第一确定模块,用于确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;A first determining module, configured to determine a query feature corresponding to the structured query language statement, where the query feature is used to characterize the query semantics of the structured query language statement;
第二确定模块,用于根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;The second determining module is configured to determine a target computing engine among a plurality of computing engines according to the query characteristics of the structured query language statement, and convert the structured query language statement into a target computing engine executable The target data query statement;
查询模块,用于通过所述目标计算引擎执行所述目标数据查询语句。A query module, configured to execute the target data query statement through the target computing engine.
第三方面,本公开提供一种非临时性计算机可读存储介质,其上存储有计算机程序,该程序被处理装置执行时实现第一方面中所述方法的步骤。In a third aspect, the present disclosure provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the program is executed by a processing device, the steps of the method described in the first aspect are implemented.
第四方面,本公开提供一种电子设备,包括:In a fourth aspect, the present disclosure provides an electronic device, including:
存储装置,其上存储有计算机程序;a storage device on which a computer program is stored;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现第一方面中所述方法的步骤。A processing device configured to execute the computer program in the storage device to implement the steps of the method in the first aspect.
通过上述技术方案,用户无需针对不同架构的计算引擎编写对应的SQL语句,而是可以基于统一结构化查询语言标准编写结构化查询语言语句,然后根据该结构化查询语言语句的查询特征,自动适配目标计算引擎来执行对应的数据查询操作。由此,可以减少用户在数据查询过程中的使用成本,提高数据查询效率,在联机分析处理(OLAP)的场景下,可以提高联机分析处理的效率。Through the above technical solution, users do not need to write corresponding SQL statements for computing engines of different architectures, but can write structured query language statements based on the unified structured query language standard, and then automatically adapt to the SQL statement based on the query characteristics of the structured query language statement. Configure the target computing engine to perform corresponding data query operations. Thereby, the usage cost of the user in the data query process can be reduced, the efficiency of data query can be improved, and the efficiency of OLAP can be improved in the scene of online analytical processing (OLAP).
本公开的其他特征和优点将在随后的具体实施方式部分予以详细说明。Other features and advantages of the present disclosure will be described in detail in the detailed description that follows.
附图说明Description of drawings
结合附图并参考以下具体实施方式,本公开各实施例的上述和其他特征、优点及方面将变得更加明显。贯穿附图中,相同或相似的附图标记表示相同或相似的元素。应当理解附图是示意性的,原件和元素不一定按照比例绘制。在附图中:The above and other features, advantages and aspects of the various embodiments of the present disclosure will become more apparent with reference to the following detailed description in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numerals denote the same or similar elements. It should be understood that the drawings are schematic and that elements and elements are not necessarily drawn to scale. In the attached picture:
图1是根据本公开一示例性实施例示出的一种数据查询方法的流程图;Fig. 1 is a flow chart showing a data query method according to an exemplary embodiment of the present disclosure;
图2是根据本公开一示例性实施例示出的一种数据查询方法的处理过程示意图;Fig. 2 is a schematic diagram of a processing procedure of a data query method shown according to an exemplary embodiment of the present disclosure;
图3是根据本公开一示例性实施例示出的一种数据查询装置的框图;Fig. 3 is a block diagram of a data query device according to an exemplary embodiment of the present disclosure;
图4是根据本公开一示例性实施例示出的一种电子设备的框图。Fig. 4 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.
具体实施方式Detailed ways
下面将参照附图更详细地描述本公开的实施例。虽然附图中显示了本公开的某些实施例,然而应当理解的是,本公开可以通过各种形式来实现,而且不应该被解释为限于这里阐述的实施例,相反提供这些实施例是为了更加透彻和完整地理解本公开。应当理解的是,本公开的附图及实施例仅用于示例性作用,并非用于限制本公开的保护范围。Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. Although certain embodiments of the present disclosure are shown in the drawings, it should be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein; A more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the present disclosure are for exemplary purposes only, and are not intended to limit the protection scope of the present disclosure.
应当理解,本公开的方法实施方式中记载的各个步骤可以按照不同的顺序执行,和/或并行执行。此外,方法实施方式可以包括附加的步骤和/或省略执行示出的步骤。本公开的范围在此方面不受限制。It should be understood that the various steps described in the method implementations of the present disclosure may be executed in different orders, and/or executed in parallel. Additionally, method embodiments may include additional steps and/or omit performing illustrated steps. The scope of the present disclosure is not limited in this regard.
本文使用的术语“包括”及其变形是开放性包括,即“包括但不限于”。术语“基于”是“至少部分地基于”。术语“一个实施例”表示“至少一个实施例”;术语“另一实施例”表示“至少一个另外的实施例”;术语“一些实施例”表示“至少一些实施例”。其他术语的相关定义将在下文描述中给出。As used herein, the term "comprise" and its variations are open-ended, ie "including but not limited to". The term "based on" is "based at least in part on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one further embodiment"; the term "some embodiments" means "at least some embodiments." Relevant definitions of other terms will be given in the description below.
需要注意,本公开中提及的“第一”、“第二”等概念仅用于对不同的装置、模块或单元进行区分,并非用于限定这些装置、模块或单元所执行的功能的顺序或者相互依存关系。另外需要注意,本公开中提及的“一个”、“多个”的修饰是示意性而非限制性的,本领域技术人员应当理解,除非在上下文另有明确指出,否则应该理解为“一个或多个”。It should be noted that concepts such as "first" and "second" mentioned in this disclosure are only used to distinguish different devices, modules or units, and are not used to limit the sequence of functions performed by these devices, modules or units or interdependence. In addition, it should be noted that the modifications of "a" and "plurality" mentioned in the present disclosure are illustrative and not restrictive. Those skilled in the art should understand that unless the context clearly indicates otherwise, it should be understood as "a or more".
本公开实施方式中的多个装置之间所交互的消息或者信息的名称仅用于说明性的目的,而并不是用于对这些消息或信息的范围进行限制。The names of messages or information exchanged between multiple devices in the embodiments of the present disclosure are used for illustrative purposes only, and are not used to limit the scope of these messages or information.
正如背景技术所言,相关技术在OLAP过程中,可以先通过大数据计算引擎从数据源获取数据进行分析。但是,各种计算引擎的架构不同,在使用多种异构计算引擎时,要求用户掌握不同的引擎使用技巧,包括SQL(Structured Query Language,结构化查询语言)语法、函数定义和参数调优等,例如,用户使用Presto引擎时,需要按照Presto引擎的语法规范编写SQL,而使用Spark引擎时,则需要按照Spark引擎的语法规范编写SQL,极大的增加了用户在数据查询过程中的使用成本,从而影响OLAP效率。As mentioned in the background art, in the OLAP process of related technologies, data can be obtained from data sources through a big data computing engine for analysis. However, the architectures of various computing engines are different. When using multiple heterogeneous computing engines, users are required to master different engine usage skills, including SQL (Structured Query Language, Structured Query Language) syntax, function definition, and parameter tuning. For example, when users use the Presto engine, they need to write SQL according to the syntax specification of the Presto engine, while when using the Spark engine, they need to write SQL according to the syntax specification of the Spark engine, which greatly increases the user's cost in the data query process , thus affecting OLAP efficiency.
有鉴于此,本公开提供一种数据查询方法,以基于统一的SQL语法自动适配各种计算引擎,减少用户在数据查询过程中的使用成本。In view of this, the present disclosure provides a data query method to automatically adapt to various computing engines based on a unified SQL syntax, thereby reducing user costs in the data query process.
图1是根据本公开一示例性实施例示出的一种数据查询方法的流程图。参照图1,该数据查询方法包括:Fig. 1 is a flowchart showing a data query method according to an exemplary embodiment of the present disclosure. Referring to Fig. 1, the data query method includes:
步骤101,获取基于统一结构化查询语言标准编写的结构化查询语言语句。 Step 101, acquiring a structured query language statement written based on a unified structured query language standard.
步骤102,确定结构化查询语言语句对应的查询特征,该查询特征用于表征结构化查询语言语句的查询语义。 Step 102, determine the query feature corresponding to the structured query language statement, and the query feature is used to characterize the query semantics of the structured query language statement.
步骤103,根据结构化查询语言语句的查询特征,在多个计算引擎中确定目标计算引擎,并将结构化查询语言语句转换为目标计算引擎能够执行的目标数据查询语句。 Step 103, according to the query features of the structured query language statement, determine a target computing engine among multiple computing engines, and convert the structured query language statement into a target data query statement executable by the target computing engine.
步骤104,通过目标计算引擎执行目标数据查询语句。 Step 104, execute the target data query statement through the target computing engine.
通过上述方式,用户无需针对不同架构的计算引擎编写对应的SQL语句,而是可以基于统一结构化查询语言标准编写结构化查询语言语句,然后根据该结构化查询语言语句的查询特征,自动适配目标计算引擎来执行对应的数据查询操作。由此,可以减少用户在数据查询过程中的使用成本,提高数据查询效率,在联机分析处理(OLAP)的场景下,可以提高联机分析处理的效率。Through the above method, users do not need to write corresponding SQL statements for computing engines of different architectures, but can write structured query language statements based on the unified structured query language standard, and then automatically adapt according to the query characteristics of the structured query language statement The target computing engine is used to perform corresponding data query operations. Thereby, the usage cost of the user in the data query process can be reduced, the efficiency of data query can be improved, and the efficiency of OLAP can be improved in the scene of online analytical processing (OLAP).
为了使得本领域技术人员更加理解本公开提供的数据查询方法,下面对上述各步骤进行详细举例说明。In order to make those skilled in the art better understand the data query method provided by the present disclosure, the above steps are described in detail below with examples.
示例地,可以选择以SQL ANSI-2011标准作为基础,辅以部分Hive风格的DDL(Data Definition Language,库数据模式定义语言)和Flink风格的流式语法作为本公开实施例中的统一结构化查询语言标准。当然,在其他可能的方式中,也可以选择任一计算引擎支持的SQL语法作为统一结构化查询语言标准,本公开实施例对此不作限定。Exemplarily, SQL ANSI-2011 standard can be selected as the basis, supplemented with part of Hive-style DDL (Data Definition Language, database data schema definition language) and Flink-style stream syntax as the unified structured query in the embodiment of the present disclosure language standards. Of course, in other possible manners, any SQL syntax supported by any computing engine may also be selected as the unified structured query language standard, which is not limited in this embodiment of the present disclosure.
本公开实施例预先设置了统一结构化查询语言标准,因此用户在数据查询过程中可以基于该统一结构化查询语言标准编写SQL语句,无需学习每一计算引擎的使用技巧并编写对应计算引擎的SQL语句。此外,本公开实施例还可以设置统一调用接口,然后通过该统一调用接口获取基于统一结构化查询语言标准编写的结构化查询语言语句。也即是说,本公开实施例提供统一的SQL入口和SQL标准。由此,可以减少用户在数据查询过程中的使用成本,提高数据查询效率。The embodiment of the present disclosure pre-sets the unified structured query language standard, so the user can write SQL statements based on the unified structured query language standard during the data query process, without learning the use skills of each computing engine and writing the SQL corresponding to the computing engine statement. In addition, the embodiment of the present disclosure can also set a unified calling interface, and then obtain the structured query language statement written based on the unified structured query language standard through the unified calling interface. That is to say, the embodiments of the present disclosure provide a unified SQL entry and SQL standard. In this way, the usage cost of the user in the process of data query can be reduced, and the efficiency of data query can be improved.
在获取到基于统一结构化查询语言标准编写的结构化查询语言语句后,为了准确地自动适配计算引擎来执行数据查询,提高数据查询效率,可以先确定结构化查询语言语句的查询特征,然后根据结构化查询语言语句的查询特征,可以在多个计算引擎中确定更适合进行数据查询的目标计算引擎。After obtaining the structured query language statement written based on the unified structured query language standard, in order to accurately and automatically adapt the computing engine to execute data query and improve the efficiency of data query, you can first determine the query characteristics of the structured query language statement, and then According to the query characteristics of the structured query language statement, a target computing engine that is more suitable for data query can be determined among multiple computing engines.
示例地,查询特征可以表征结构化查询语言语句的查询语义,可以包括结构化查询语言语句的复杂度特征和/或结构化查询语言语句中待查询数据的数据源特征。多个计算引擎可以包括纯计算类大数据计算引擎:Spark、Hive、Presto、Flink,或者还可以包括计算存储一体的大数据计算引擎:ClickHouse、Druid、ElasticSearch,本公开实施例对此不作限定。Exemplarily, the query feature may represent the query semantics of the structured query language statement, and may include the complexity feature of the structured query language statement and/or the data source feature of the data to be queried in the structured query language statement. Multiple computing engines may include pure computing big data computing engines: Spark, Hive, Presto, Flink, or may also include computing and storage integrated big data computing engines: ClickHouse, Druid, ElasticSearch, which is not limited in the embodiments of the present disclosure.
在可能的方式中,确定结构化查询语言语句对应的查询特征可以是:确定结构化查询语言语句对应的复杂度特征和/或结构化查询语言语句中待查询数据的数据源特征。相应地,根据结构化查询语言语句的查询特征,在多个计算引擎中确定目标计算引擎可以是:根据结构化查询语言语句的复杂度特征和/或数据源特征,在多个计算引擎中确定目标计算引擎。In a possible manner, determining the query feature corresponding to the structured query language statement may be: determining the complexity feature corresponding to the structured query language statement and/or the data source feature of the data to be queried in the structured query language statement. Correspondingly, according to the query characteristics of the structured query language statement, determining the target computing engine among the multiple computing engines may be: according to the complexity characteristics and/or data source characteristics of the structured query language statement, determining among the multiple computing engines target computing engine.
示例地,可以获取预先配置的引擎适配规则,该引擎适配规则用于表征结构化查询语言语句的查询特征与多个计算引擎之间的对应关系,从而可以根据结构化查询语言语句的查询特征和引擎适配规则,在多个计算引擎中确定目标计算引擎。Exemplarily, a pre-configured engine adaptation rule can be obtained, and the engine adaptation rule is used to characterize the corresponding relationship between the query features of the structured query language statement and multiple computing engines, so that the query can be based on the structured query language statement Feature and engine adaptation rules determine the target computing engine among multiple computing engines.
比如,可以获取预先配置的第一引擎适配规则和第二引擎适配规则,第一引擎适配规则用于表征结构化查询语言语句的复杂度与多个计算引擎之间的对应关系,第二引擎适配规则用于表征数据源与多个计算引擎之间的对应关系,从而可以根据所述结构化查询语言语句的复杂度特征和第一引擎适配规则、以及结构化查询语言语句的数据源特征和第二引擎适配规则,在多个计算引擎中确定目标计算引擎。For example, pre-configured first engine adaptation rules and second engine adaptation rules can be obtained. The first engine adaptation rules are used to characterize the correspondence between the complexity of structured query language statements and multiple computing engines. The second engine adaptation rule is used to characterize the corresponding relationship between the data source and multiple computing engines, so that according to the complexity characteristics of the structured query language statement and the first engine adaptation rule, and the structure query language statement The characteristics of the data source and the adaptation rules of the second engine determine the target computing engine among the multiple computing engines.
例如,预先配置第一引擎适配规则为:A引擎用于执行简单SQL语句,B引擎用于执行复杂SQL语句。因此,根据SQL语句的复杂度特征,确定目标计算引擎可以是:若确定SQL语句的复杂度大于预设复杂度,则可以确定目标计算引擎为B引擎,若确定SQL语句的复杂度小于或等于预设复杂度,则可以确定目标计算引擎为A引擎。For example, the pre-configured first engine adaptation rule is: engine A is used to execute simple SQL statements, and engine B is used to execute complex SQL statements. Therefore, according to the complexity characteristics of the SQL statement, determining the target computing engine can be: if it is determined that the complexity of the SQL statement is greater than the preset complexity, then it can be determined that the target computing engine is the B engine; if it is determined that the complexity of the SQL statement is less than or equal to If the complexity is preset, it can be determined that the target computing engine is the A engine.
又例如,预先配置第二引擎适配规则为:A计算引擎用于对a数据源进行数据查询,B计算引擎用于对b数据源进行数据查询。因此,若确定SQL语句中待查询数据的数据源为a数据源,则确定目标计算引擎为A计算引擎,若确定SQL语句中待查询数据的数据源为b数据源,则确定目标计算引擎为B计算引擎。For another example, the pre-configured adaptation rule of the second engine is: computing engine A is used for data query of data source a, and computing engine B is used for data query of data source b. Therefore, if it is determined that the data source of the data to be queried in the SQL statement is data source a, then the target computing engine is determined to be the A computing engine; if it is determined that the data source of the data to be queried in the SQL statement is the b data source, then the target computing engine is determined to be BCompute engine.
再例如,预先配置第一引擎适配规则为:A1计算引擎执行简单SQL语句,A2引擎执行复杂SQL语句,且预先配置了第二引擎适配规则:A1计算引擎和A2计算引擎用于对a数据源进行数据查询。在此种情况下,若确定SQL语句中待查询数据的数据源为a数据源,则可以基于该数据源特征和第二引擎适配规则确定目标计算引擎为A1计算引擎 和A2计算引擎。进一步,还可以根据SQL语句的复杂度特征和第一引擎适配规则,在A1计算引擎和A2计算引擎确定一目标计算引擎。For another example, the pre-configured first engine adaptation rule is: A1 computing engine executes simple SQL statements, A2 engine executes complex SQL statements, and the second engine adaptation rule is pre-configured: A1 computing engine and A2 computing engine are used for a Data source for data query. In this case, if it is determined that the data source of the data to be queried in the SQL statement is data source a, then the target computing engine can be determined to be A1 computing engine and A2 computing engine based on the characteristics of the data source and the second engine adaptation rule. Further, a target computing engine may be determined on the A1 computing engine and the A2 computing engine according to the complexity characteristics of the SQL statement and the first engine adaptation rule.
应当理解的是,上述举例仅是根据SQL语句的查询特征确定目标计算引擎的可能方式,并不用于限制本公开,在具体应用中,可以通过其他方式根据SQL语句的查询特征确定目标计算引擎,本公开实施例对此不作限定。It should be understood that the above examples are only possible ways to determine the target computing engine according to the query characteristics of the SQL statement, and are not intended to limit the present disclosure. In specific applications, the target computing engine can be determined according to the query characteristics of the SQL statement in other ways. The embodiment of the present disclosure does not limit this.
在可能的方式中,为了提高SQL语句的执行效率,可以先根据结构化查询语言语句的查询特征和预设的语句优化策略,对结构化查询语言语句进行优化,得到优化查询语句,然后根据优化查询语句,在多个计算引擎中确定目标计算引擎。In a possible way, in order to improve the execution efficiency of the SQL statement, the structured query language statement can be optimized first according to the query characteristics of the structured query language statement and the preset statement optimization strategy to obtain the optimized query statement, and then according to the optimized A query statement to determine the target computing engine among multiple computing engines.
应当理解的是,相关技术中,用户使用异构计算引擎时,不仅需要针对不同的计算引擎编写对应的SQL语句,由于不同计算引擎的执行特性不同,还需要针对不同的计算引擎设置对应的SQL语句优化策略,增加了用户的使用成本,影响数据查询效率。而本公开实施例中,在基于统一SQL标准编写SQL语句后,可以对SQL语句进行统一优化,从而减少用户的使用成本,提高数据查询效率。It should be understood that in related technologies, when users use heterogeneous computing engines, they not only need to write corresponding SQL statements for different computing engines, but also need to set corresponding SQL statements for different computing engines because of different execution characteristics of different computing engines. Statement optimization strategies increase user costs and affect data query efficiency. However, in the embodiments of the present disclosure, after the SQL statement is written based on the unified SQL standard, the SQL statement can be uniformly optimized, thereby reducing the user's use cost and improving data query efficiency.
示例地,预设的语句优化策略可以包括相关技术中的通用优化手段,比如物化视图选择策略、表达式合并策略、高级常量推断策略、内置函数优化策略(即低效函数转高效函数的策略)等,本公开实施例对此不作限定。Exemplarily, the preset statement optimization strategy may include general optimization methods in related technologies, such as materialized view selection strategy, expression merging strategy, advanced constant deduction strategy, and built-in function optimization strategy (that is, a strategy for converting inefficient functions into efficient functions) etc., which are not limited in the embodiments of the present disclosure.
例如,SQL语句用于查询表A、表B和表C的数据,包括“A join B join C”的表达,则根据该SQL语句的查询特征可以确定需要执行操作:A join B join C。在此种情况下,是执行操作A join B后再执行操作join C,还是执行操作B join C后再执行操作join A或者执行操作A join C后再执行操作join B,可以将提高SQL语句的执行效率为目标,根据预设的语句优化策略确定,即可以通过表达式合并策略对SQL语句进行优化,得到优化查询语句。之后,则可以根据优化查询语句在多个计算引擎中确定目标计算引擎。For example, if the SQL statement is used to query the data of tables A, B, and C, including the expression "A join B join C", then according to the query characteristics of the SQL statement, it can be determined that an operation needs to be performed: A join B join C. In this case, whether to perform operation A join B and then perform operation join C, or perform operation B join C and then perform operation join A, or perform operation A join C and then perform operation join B, which can improve the performance of SQL statements. Execution efficiency is the goal, determined according to the preset statement optimization strategy, that is, the SQL statement can be optimized through the expression combination strategy to obtain the optimized query statement. Afterwards, the target computing engine may be determined among multiple computing engines according to the optimized query statement.
由此,可以对SQL语句进行统一优化,从而减少用户在数据查询过程中的使用成本,进而提高数据查询效率。In this way, the SQL statements can be uniformly optimized, thereby reducing the user's usage cost in the data query process, and further improving the data query efficiency.
在确定目标计算引擎后,可以将获取到的SQL语句转换为目标计算引擎能够执行的目标数据查询语句。比如,可以针对原SQL语句对应的UDF(User Defined Function,用户自定义函数)和UDAF(User Defined Aggregation Funcation,用户定义聚合函数)进行转换,便于目标计算引擎根据转换后的目标数据查询语句执行对应的数据查询操作。After the target computing engine is determined, the acquired SQL statement can be converted into a target data query statement that the target computing engine can execute. For example, the UDF (User Defined Function, user-defined function) and UDAF (User Defined Aggregation Function, user-defined aggregation function) corresponding to the original SQL statement can be converted, so that the target computing engine can execute the corresponding query statement according to the converted target data. data query operations.
在可能的方式中,还可以在多个计算引擎中先选择一引擎作为标准引擎,并基于该标准引擎能够处理的结构化查询语言语句的数据格式,先将结构化查询语言语句转换为中间 查询语句。相应地,将结构化查询语言语句转换为目标计算引擎能够执行的目标数据查询语句可以是:确定目标计算引擎是否为标准引擎,若目标计算引擎不是标准引擎,则将中间查询语句转换为目标计算引擎能够执行的目标数据查询语句。In a possible way, one of the multiple computing engines can be selected as the standard engine first, and based on the data format of the structured query language statement that the standard engine can process, the structured query language statement is first converted into an intermediate query statement. Correspondingly, converting the structured query language statement into the target data query statement that the target computing engine can execute may be: determining whether the target computing engine is a standard engine, and if the target computing engine is not a standard engine, converting the intermediate query statement into the target computing engine The target data query statement that the engine can execute.
在其他可能的方式中,若目标计算引擎是标准引擎,则可以通过目标计算引擎直接执行该中间查询语句。In other possible manners, if the target computing engine is a standard engine, the intermediate query statement may be directly executed through the target computing engine.
示例地,标准引擎可以是SQL语句规则具有通用性、且转换为其他引擎能够执行的SQL语句的转换代价较小的计算引擎。由此,可以提高SQL语句的转换效率,进而提高数据查询效率。比如,考虑到Calcite引擎的通用性,且Calcite引擎与Spark引擎、Presto引擎之间的SQL语句转换代价较小,可以将Calcite引擎作为标准引擎。当然,在其他可能的方式中,也可以选择其他任意计算引擎作为标准引擎,比如选择Spark引擎或Presto引擎作为标准引擎等,本公开实施例对此不作限定。Exemplarily, the standard engine may be a computing engine with universal SQL statement rules and relatively low conversion costs for converting to SQL statements executable by other engines. In this way, the conversion efficiency of the SQL statement can be improved, thereby improving the data query efficiency. For example, considering the versatility of the Calcite engine and the low cost of SQL statement conversion between the Calcite engine, the Spark engine, and the Presto engine, the Calcite engine can be used as the standard engine. Of course, in other possible manners, any other computing engine may also be selected as the standard engine, for example, the Spark engine or the Presto engine may be selected as the standard engine, which is not limited in this embodiment of the present disclosure.
由此,可以在自动适配各种计算引擎的同时,进一步提高数据查询效率。As a result, data query efficiency can be further improved while automatically adapting to various computing engines.
在可能的方式中,若标准引擎为Calcite引擎,目标计算引擎为Spark引擎,则基于标准引擎能够处理的结构化查询语言语句的数据格式,将结构化查询语言语句转换为中间查询语句可以是:基于标准引擎能够处理的结构化查询语言语句的格式,将结构化查询语言语句转换为RelNode语句。相应地,将中间查询语句转换为目标计算引擎能够执行的目标数据查询语句可以是:将RelNode语句转换为目标计算引擎能够执行的DataFrame语句。In a possible manner, if the standard engine is the Calcite engine and the target computing engine is the Spark engine, then based on the data format of the structured query language statement that the standard engine can process, converting the structured query language statement into an intermediate query statement can be: Based on the format of the structured query language statement that the standard engine can process, the structured query language statement is converted into a RelNode statement. Correspondingly, converting the intermediate query statement into a target data query statement executable by the target computing engine may be: converting the RelNode statement into a DataFrame statement executable by the target computing engine.
也即是说,在本公开实施例中,为了自动适配计算引擎,可以先将输入的SQL语句转换为RelNode语句,然后若确定目标计算引擎为Spark引擎,则可以进一步将RelNode语句转换为DataFrame语句,然后通过Spark引擎执行该DataFrame语句,实现数据查询。若确定目标计算引擎为Calcite引擎,则可以直接通过Calcite引擎执行RelNode语句。That is to say, in the embodiment of the present disclosure, in order to automatically adapt the computing engine, the input SQL statement can be converted into a RelNode statement first, and then if the target computing engine is determined to be a Spark engine, the RelNode statement can be further converted into a DataFrame statement, and then execute the DataFrame statement through the Spark engine to realize data query. If it is determined that the target calculation engine is the Calcite engine, the RelNode statement can be directly executed through the Calcite engine.
应当理解的是,相较于将RelNode语句转换为Spark引擎能够处理的LogicalPlan语句或PhysicalPlan语句,DataFrame语句的API相对稳定,从而可以保证通过API调用对应语句的稳定性,进而保证数据查询的正常执行。另外,相较于将RelNode语句重新转换为对应的SQL语句,需要维护两套SQL解析,但是由于Spark引擎的SQL解析器与Calcite引擎的SQL解析器差异较大,在复杂场景下,解析的时间开销和资源开销都较大,从而会影响数据查询效率。因此,本公开实施例中将RelNode语句转换为DataFrame语句,可以在适配多引擎的场景下,进一步提高数据查询效率。It should be understood that, compared to converting a RelNode statement into a LogicalPlan statement or a PhysicalPlan statement that can be processed by the Spark engine, the API of the DataFrame statement is relatively stable, which can ensure the stability of the corresponding statement through the API call, thereby ensuring the normal execution of the data query . In addition, compared to reconverting RelNode statements into corresponding SQL statements, two sets of SQL parsing need to be maintained. However, due to the large difference between the SQL parser of the Spark engine and the SQL parser of the Calcite engine, in complex scenarios, the parsing time Both overhead and resource overhead are large, which will affect the efficiency of data query. Therefore, converting the RelNode statement into a DataFrame statement in the embodiment of the present disclosure can further improve the efficiency of data query in the scenario of adapting to multiple engines.
在可能的方式中,若标准引擎为Calcite引擎,目标计算引擎为Presto引擎,则基于标准引擎能够处理的结构化查询语言语句的格式,将结构化查询语言语句转换为中间查询 语句可以是:基于标准引擎能够处理的结构化查询语言语句的数据格式,将结构化查询语言语句转换为RelNode语句。相应地,将中间查询语句转换为目标计算引擎能够执行的目标数据查询语句可以是:将RelNode语句转换为目标计算引擎能够执行的结构化查询语言语句。In a possible manner, if the standard engine is the Calcite engine and the target computing engine is the Presto engine, then based on the format of the structured query language statement that the standard engine can handle, converting the structured query language statement into an intermediate query statement can be: based on The standard engine can process the data format of the structured query language statement, and convert the structured query language statement into a RelNode statement. Correspondingly, converting the intermediate query statement into a target data query statement executable by the target computing engine may be: converting the RelNode statement into a structured query language statement executable by the target computing engine.
应当理解的是,Calcite引擎没有原生的对Presto引擎的支持,并且Presto引擎也没有稳定的API接口。一种可能的实施方式是先将SQL语句转换为Calcite引擎能够处理的RelNode语句,然后再将RelNode语句转换为Presto引擎能够处理的Node结构。但是,由于Presto引擎缺乏高效API,因此该方式的对接成本非常高,难以快速应用。It should be understood that the Calcite engine does not have native support for the Presto engine, and the Presto engine does not have a stable API interface. A possible implementation is to first convert the SQL statement into a RelNode statement that the Calcite engine can process, and then convert the RelNode statement into a Node structure that the Presto engine can process. However, because the Presto engine lacks an efficient API, the docking cost of this method is very high and it is difficult to apply it quickly.
发明人研究发现,相比Spark引擎能够执行的数据查询语句与统一结构化查询语言标准之间的差异,Presto引擎和Calcite引擎能够执行的数据查询语句与统一结构化查询语言标准之间的差异更小,因此本公开实施例为了在多引擎场景下,适配Calcite引擎和Presto引擎,可以先将SQL语句转换为Calcite引擎能够处理的RelNode语句,然后再将RelNode语句转换为Presto引擎能够执行的SQL语句。由此,虽然需要维护两套SQL解析,但是由于Presto引擎和Calcite引擎的SQL解析比较相似,因此可以减少解析的时间开销和资源开销,保证数据查询效率。The inventor's research found that, compared with the difference between the data query statement that the Spark engine can execute and the unified structured query language standard, the difference between the data query statement that the Presto engine and the Calcite engine can execute and the unified structured query language standard is even greater. Small, so in order to adapt the Calcite engine and the Presto engine in the multi-engine scenario, the embodiment of the present disclosure can first convert the SQL statement into a RelNode statement that the Calcite engine can process, and then convert the RelNode statement into a SQL statement that the Presto engine can execute. statement. Therefore, although two sets of SQL parsing need to be maintained, since the SQL parsing of the Presto engine and the Calcite engine are relatively similar, the time and resource overhead of parsing can be reduced and the efficiency of data query can be guaranteed.
通过上述方式,可以基于统一SQL标准实现多引擎的自动适配,减少用户在数据查询过程中的使用成本,从而提高数据查询效率。Through the above method, the automatic adaptation of multiple engines can be realized based on the unified SQL standard, and the cost of using the user in the data query process can be reduced, thereby improving the efficiency of data query.
在实际应用中,若待查询的数据跨多个数据源,则需要先将其他数据源的数据先同步到目标数据源,然后再对数据同步后的目标数据源进行数据查询。其中,目标数据源为多个数据源中的任一数据源,其他数据源为多个数据源中除目标数据源外的剩余数据源。按照此种方式,进行跨数据源联合查询时,均需要执行数据同步任务,如果需要同步的数据众多,则会很大程度上影响数据查询效率。In practical applications, if the data to be queried spans multiple data sources, it is necessary to first synchronize the data of other data sources to the target data source, and then perform data query on the target data source after data synchronization. Wherein, the target data source is any data source in the multiple data sources, and other data sources are the remaining data sources in the multiple data sources except the target data source. According to this method, data synchronization tasks need to be performed when performing joint queries across data sources. If there is a lot of data to be synchronized, the efficiency of data query will be greatly affected.
本公开实施例中,为了实现多数据源的自动适配,支持跨数据源联合查询,省去额外的数据同步任务,可以在结构化查询语言语句中待查询数据的数据源包括至少两个不同的数据源的情况下,根据结构化查询语言语句对应的数据源特征,在多个计算引擎中确定与每一数据源对应的目标计算引擎,并基于数据源特征将所述结构化查询语言语句转换为对应的目标计算引擎能够执行的目标数据查询语句,然后在通过目标计算引擎执行对应的目标数据查询语句后,将多个计算引擎中的任一引擎确定为联合处理引擎,并将各目标计算引擎通过执行目标数据查询语句从对应的数据源查询到的目标数据发送给联合处理引擎,最后通过该联合处理引擎将每一目标数据进行联合处理。In the embodiment of the present disclosure, in order to realize automatic adaptation of multiple data sources, support cross-data source joint query, and save additional data synchronization tasks, the data source of the data to be queried in the structured query language statement may include at least two different In the case of the data source, according to the data source characteristics corresponding to the structured query language statement, determine the target computing engine corresponding to each data source among multiple computing engines, and based on the data source characteristics, the structured query language statement It is converted into a target data query statement that the corresponding target computing engine can execute, and then after executing the corresponding target data query statement through the target computing engine, any engine among multiple computing engines is determined as a joint processing engine, and each target The calculation engine sends the target data queried from the corresponding data source to the joint processing engine by executing the target data query statement, and finally performs joint processing on each target data through the joint processing engine.
示例地,联合处理引擎可以是各目标计算引擎中的任一者,也可以是多个计算引擎中除目标计算引擎外的其他任一引擎,可以根据实际情况进行配置。应当理解的是,将各目标计算引擎中的任一引擎确定为联合处理引擎,相较于将多个计算引擎中除目标计算引擎外的其他任一引擎确定为联合处理引擎的方式,可以减少数据传输,从而可以提高数据查询效率。Exemplarily, the joint processing engine may be any one of the target computing engines, or any other engine among the multiple computing engines except the target computing engine, and may be configured according to actual conditions. It should be understood that determining any one of the target computing engines as a joint processing engine can reduce Data transmission, which can improve the efficiency of data query.
例如,参照图2,用户通过BI(Business Intelligence,商业智能)工具触发了数据查询操作,并基于统一结构化查询语言标准生成了该数据查询操作对应的SQL语句。之后,可以通过JDBC(Java Database Connectivity,Java数据库连接)或REST接口将该SQL语句发送到数据库的联邦查询层。然后可以通过该联邦查询层的引擎适配模块根据SQL语句对应的数据源特征,在引擎层预置的多个计算引擎中确定与每一数据源对应的目标计算引擎,并基于数据源特征将SQL语句转换为对应的目标计算引擎能够执行的目标数据查询语句。之后,可以将目标数据查询语句发送到引擎层,并通过引擎层中的目标计算引擎对数据源层的对应数据源进行数据查询。最后,目标计算引擎可以将从数据源查询到的目标数据发送给引擎层中的任一计算引擎(即联合处理引擎),从而通过该计算引擎进行跨数据源联合处理,即可以通过某一计算引擎将各目标计算引擎查询到的目标数据进行关联后返回给用户。由此,可以支持跨数据源联合查询,省去额外的数据同步任务,从而提高数据查询效率。For example, referring to Figure 2, the user triggers a data query operation through a BI (Business Intelligence, business intelligence) tool, and generates an SQL statement corresponding to the data query operation based on a unified structured query language standard. After that, the SQL statement can be sent to the federated query layer of the database through JDBC (Java Database Connectivity, Java database connection) or REST interface. Then, the engine adaptation module of the federated query layer can determine the target computing engine corresponding to each data source among the multiple computing engines preset in the engine layer according to the characteristics of the data source corresponding to the SQL statement, and based on the characteristics of the data source, the The SQL statement is converted into a target data query statement that the corresponding target computing engine can execute. Afterwards, the target data query statement can be sent to the engine layer, and data query can be performed on the corresponding data source of the data source layer through the target computing engine in the engine layer. Finally, the target computing engine can send the target data queried from the data source to any computing engine in the engine layer (that is, the joint processing engine), so that the cross-data source joint processing can be performed through the computing engine, that is, through a certain computing The engine associates the target data queried by each target computing engine and returns it to the user. As a result, cross-data source joint query can be supported, and additional data synchronization tasks can be omitted, thereby improving data query efficiency.
另外,参照图2,联邦查询层还可以包括优化模块、管理模块、权限模块和元数据模块。其中,优化模块可以根据结构化查询语言语句的查询特征和预设的语句优化策略,对结构化查询语言语句进行统一优化。管理模块可以管理数据查询过程中将目标数据查询语句提交给目标计算引擎的过程,或者还可以执行日志采集、结果保存等过程。权限模块可以在获取到基于统一结构化查询语言标准确定的SQL语句后,先确定该SQL语句的发起用户是否有权限执行该SQL语句对应的数据操作权限或者可以校验SQL语句的正确性。元数据模块用于存储各用户的数据权限信息以及各数据源对应的元数据,从而更好地确定目标计算引擎。其中,联邦查询层中管理模块、权限模块和元数据模块的具体实现方式与相关技术类似,这里不再赘述。In addition, referring to FIG. 2 , the federated query layer may also include an optimization module, a management module, a permission module and a metadata module. Wherein, the optimization module can uniformly optimize the structured query language statement according to the query characteristics of the structured query language statement and a preset statement optimization strategy. The management module can manage the process of submitting target data query statements to the target computing engine during the data query process, or can also perform processes such as log collection and result saving. After obtaining the SQL statement determined based on the unified structured query language standard, the authority module can first determine whether the originating user of the SQL statement has the authority to execute the data operation authority corresponding to the SQL statement or can verify the correctness of the SQL statement. The metadata module is used to store the data permission information of each user and the metadata corresponding to each data source, so as to better determine the target computing engine. Among them, the specific implementation methods of the management module, the authority module and the metadata module in the federated query layer are similar to related technologies, and will not be repeated here.
继续参照图2,引擎层预置的多个计算引擎包括Spark、Hive、Presto、Flink、ClickHouse和ElasticSearch。数据源层包括HDFS、RDS、Kafka、ClickHouse和ElasticSearch。应当理解的是,ClickHouse和ElasticSearch为计算存储一体引擎,所以可以包括在引擎层和数据源层。Continuing to refer to Figure 2, the multiple computing engines preset at the engine layer include Spark, Hive, Presto, Flink, ClickHouse, and ElasticSearch. The data source layer includes HDFS, RDS, Kafka, ClickHouse, and ElasticSearch. It should be understood that ClickHouse and ElasticSearch are computing and storage integrated engines, so they can be included in the engine layer and data source layer.
例如,按照图2所示的架构,在联机分析处理(OLAP)场景下,数据查询过程可以是:通过JDBC或REST接口获取基于统一结构化查询语言标准确定的SQL语句。然后,然后通过联邦查询层的元数据模块校验SQL语句的正确性,并通过权限模块确定用户是否有查询该SQL语句中待查询数据的权限。之后,可以通过联邦查询层的优化模块对该SQL语句进行统一优化,得到优化查询语句。接着,可以通过联邦查询层的引擎适配模块根据SQL语句的查询特征确定目标计算引擎,并将SQL语句转换为目标计算引擎能够执行的目标数据查询语句后发送给引擎层中的目标计算引擎。之后,目标计算引擎可以根据自身物理层的优化能力,确定针对目标数据查询语句的执行策略,并按照该执行策略执行目标数据查询语句从对应数据源中查询数据。若数据源为多个,则查询数据后还可以将查询到的数据返回给在各目标计算,最后由联邦查询层进行联合处理。For example, according to the architecture shown in FIG. 2 , in an online analytical processing (OLAP) scenario, the data query process may be: obtaining an SQL statement determined based on a unified structured query language standard through a JDBC or REST interface. Then, verify the correctness of the SQL statement through the metadata module of the federated query layer, and determine whether the user has the authority to query the data to be queried in the SQL statement through the authority module. Afterwards, the SQL statement can be uniformly optimized through the optimization module of the federated query layer to obtain an optimized query statement. Then, the engine adaptation module of the federated query layer can determine the target computing engine according to the query characteristics of the SQL statement, and convert the SQL statement into a target data query statement that the target computing engine can execute and send it to the target computing engine in the engine layer. Afterwards, the target computing engine can determine the execution strategy for the target data query statement according to its own optimization capability of the physical layer, and execute the target data query statement according to the execution strategy to query data from the corresponding data source. If there are multiple data sources, after the data is queried, the queried data can be returned to each target calculation, and finally the federated query layer performs joint processing.
通过上述方式,既可以实现多引擎的数据查询,还可以实现多数据源的联合查询,可以极大程度上降低用户在数据查询过程中的使用成本,从而提高数据查询效率。Through the above method, it is possible to realize not only multi-engine data query, but also multi-data source joint query, which can greatly reduce the user's use cost in the data query process, thereby improving the efficiency of data query.
基于同一发明构思,本公开还提供一种数据查询装置,该装置可以通过软件、硬件或者两者结合的方式成为电子设备的部分或全部。参照图3,该数据查询装置300可以包括:Based on the same inventive concept, the present disclosure also provides a data query device, which can become a part or all of the electronic equipment through software, hardware or a combination of both. Referring to Fig. 3, the data query device 300 may include:
获取模块301,用于获取基于统一结构化查询语言标准确定的结构化查询语言语句;An acquisition module 301, configured to acquire a structured query language statement determined based on a unified structured query language standard;
第一确定模块302,用于确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;The first determination module 302 is configured to determine query features corresponding to the structured query language statement, where the query feature is used to characterize the query semantics of the structured query language statement;
第二确定模块303,用于根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;The second determining module 303 is configured to determine a target computing engine among multiple computing engines according to the query features of the structured query language statement, and convert the structured query language statement into a target computing engine capable of The executed target data query statement;
查询模块304,用于通过所述目标计算引擎执行所述目标数据查询语句。The query module 304 is configured to execute the target data query statement through the target calculation engine.
可选地,所述第一确定模块302用于:Optionally, the first determining module 302 is configured to:
确定所述结构化查询语言语句对应的复杂度特征和/或所述结构化查询语言语句中待查询数据的数据源特征;Determine the complexity characteristics corresponding to the structured query language statement and/or the data source characteristics of the data to be queried in the structured query language statement;
所述第二确定模块303用于:The second determination module 303 is used for:
根据所述结构化查询语言语句的所述复杂度特征和/或所述数据源特征,在多个计算引擎中确定目标计算引擎。According to the complexity feature of the structured query language statement and/or the data source feature, a target computing engine is determined among multiple computing engines.
可选地,所述结构化查询语言语句中待查询数据的数据源包括至少两个不同的数据源,所述第二确定模块303用于:Optionally, the data source of the data to be queried in the structured query language statement includes at least two different data sources, and the second determining module 303 is used for:
根据所述结构化查询语言语句对应的数据源特征,在多个计算引擎中确定与每一所述数据源对应的目标计算引擎,并基于所述数据源特征将所述结构化查询语言语句转换为对应的所述目标计算引擎能够执行的目标数据查询语句;According to the data source characteristics corresponding to the structured query language statement, determine a target computing engine corresponding to each of the data sources among multiple computing engines, and convert the structured query language statement based on the data source characteristics A target data query statement that can be executed by the corresponding target computing engine;
所述装置300还包括:The device 300 also includes:
联合模块,用于在通过所述目标计算引擎执行所述目标数据查询语句后,将所述多个计算引擎中的任一引擎确定为联合处理引擎,将各所述目标计算引擎通过执行所述目标数据查询语句从对应的所述数据源查询到的目标数据发送给所述联合处理引擎,并通过所述联合处理引擎将每一所述目标数据进行联合处理。A joint module, configured to determine any one of the multiple computing engines as a joint processing engine after executing the target data query statement through the target computing engine, and execute each of the target computing engines by executing the The target data query statement from the corresponding data source sends the target data to the joint processing engine, and each target data is jointly processed by the joint processing engine.
可选地,所述第二确定模块303用于:Optionally, the second determining module 303 is configured to:
根据所述结构化查询语言语句的所述查询特征和预设的语句优化策略,对所述结构化查询语言语句进行优化,得到优化查询语句;Optimizing the structured query language statement according to the query characteristics of the structured query language statement and a preset statement optimization strategy to obtain an optimized query statement;
根据所述优化查询语句,在多个计算引擎中确定目标计算引擎。According to the optimized query statement, a target computing engine is determined among multiple computing engines.
可选地,所述装置300还包括:Optionally, the device 300 also includes:
中间转换模块,用于在所述多个计算引擎中选择一引擎作为标准引擎,并基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句;An intermediate conversion module, configured to select an engine among the plurality of computing engines as a standard engine, and convert the structured query language statement into an intermediate query based on the format of the structured query language statement that the standard engine can process statement;
所述第二确定模块303用于:The second determination module 303 is used for:
确定所述目标计算引擎是否为所述标准引擎;determining whether the target computing engine is the standard engine;
若所述目标计算引擎不是所述标准引擎,则将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句。If the target computing engine is not the standard engine, converting the intermediate query statement into a target data query statement executable by the target computing engine.
可选地,所述标准引擎为Calcite引擎,所述目标计算引擎为Spark引擎,所述中间转换模块用于基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;Optionally, the standard engine is a Calcite engine, the target computing engine is a Spark engine, and the intermediate conversion module is used to convert the structured Query language statements are converted to RelNode statements;
所述第二确定模块303用于将所述RelNode语句转换为所述目标计算引擎能够执行的DataFrame语句。The second determining module 303 is used for converting the RelNode statement into a DataFrame statement executable by the target computing engine.
可选地,所述标准引擎为Calcite引擎,所述目标计算引擎为Presto引擎,所述中间转换模块用于基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;Optionally, the standard engine is a Calcite engine, the target calculation engine is a Presto engine, and the intermediate conversion module is used to convert the structured query language statement based on the data format of the standard engine to process Query language statements are converted to RelNode statements;
所述第二确定模块303用于将所述RelNode语句转换为所述目标计算引擎能够执行的结构化查询语言语句。The second determining module 303 is used to convert the RelNode statement into a structured query language statement executable by the target computing engine.
关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.
基于同一构思,本公开还提供一种非临时性计算机可读存储介质,其上存储有计算机程序,该程序被处理装置执行时实现上述任一数据查询方法的步骤。Based on the same idea, the present disclosure also provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the program is executed by a processing device, the steps of any one of the above data query methods are implemented.
基于同一构思,本公开还提供一种电子设备,包括:Based on the same idea, the present disclosure also provides an electronic device, including:
存储装置,其上存储有计算机程序;a storage device on which a computer program is stored;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现上述任一数据查询方法的步骤。A processing device, configured to execute the computer program in the storage device, so as to realize the steps of any data query method described above.
下面参考图4,其示出了适于用来实现本公开实施例的电子设备400的结构示意图。本公开实施例中的终端设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图4示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。Referring now to FIG. 4 , it shows a schematic structural diagram of an electronic device 400 suitable for implementing an embodiment of the present disclosure. The terminal equipment in the embodiment of the present disclosure may include but not limited to such as mobile phone, notebook computer, digital broadcast receiver, PDA (personal digital assistant), PAD (tablet computer), PMP (portable multimedia player), vehicle terminal (such as mobile terminals such as car navigation terminals) and fixed terminals such as digital TVs, desktop computers and the like. The electronic device shown in FIG. 4 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
如图4所示,电子设备400可以包括处理装置(例如中央处理器、图形处理器等)401,其可以根据存储在只读存储器(ROM)402中的程序或者从存储装置408加载到随机访问存储器(RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中,还存储有电子设备400操作所需的各种程序和数据。处理装置401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(I/O)接口405也连接至总线404。As shown in FIG. 4, an electronic device 400 may include a processing device (such as a central processing unit, a graphics processing unit, etc.) 401, which may be randomly accessed according to a program stored in a read-only memory (ROM) 402 or loaded from a storage device 408. Various appropriate actions and processes are executed by programs in the memory (RAM) 403 . In the RAM 403, various programs and data necessary for the operation of the electronic device 400 are also stored. The processing device 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404. An input/output (I/O) interface 405 is also connected to bus 404 .
通常,以下装置可以连接至I/O接口405:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置406;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置407;包括例如磁带、硬盘等的存储装置408;以及通信装置409。通信装置409可以允许电子设备400与其他设备进行无线或有线通信以交换数据。虽然图4示出了具有各种装置的电子设备400,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Typically, the following devices can be connected to the I/O interface 405: input devices 406 including, for example, a touch screen, touchpad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speaker, vibration an output device 407 such as a computer; a storage device 408 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 409. The communication means 409 may allow the electronic device 400 to perform wireless or wired communication with other devices to exchange data. While FIG. 4 shows electronic device 400 having various means, it should be understood that implementing or having all of the means shown is not a requirement. More or fewer means may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置409从网络上被下载和安装,或者从存储装 置408被安装,或者从ROM 402被安装。在该计算机程序被处理装置401执行时,执行本公开实施例的方法中限定的上述功能。In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts can be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product, which includes a computer program carried on a non-transitory computer readable medium, where the computer program includes program code for executing the method shown in the flowchart. In such an embodiment, the computer program may be downloaded and installed from a network via communication means 409, or from storage means 408, or from ROM 402. When the computer program is executed by the processing device 401, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。而在本公开中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the above-mentioned computer-readable medium in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium or any combination of the above two. A computer readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of computer-readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer diskettes, hard disks, random access memory (RAM), read-only memory (ROM), erasable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. In the present disclosure, however, a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave carrying computer-readable program code therein. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can transmit, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted by any appropriate medium, including but not limited to wires, optical cables, RF (radio frequency), etc., or any suitable combination of the above.
在一些实施方式中,可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(“LAN”),广域网(“WAN”),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) can be used to communicate, and can communicate with digital data in any form or medium (for example, communication network) interconnection. Examples of communication networks include local area networks ("LANs"), wide area networks ("WANs"), internetworks (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network of.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device, or may exist independently without being incorporated into the electronic device.
上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取基于统一结构化查询语言标准确定的结构化查询语言语句;确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;通过所述目标计算引擎执行所述目标数据查询语句。The above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device: obtains the structured query language statement determined based on the unified structured query language standard; determines the The query feature corresponding to the structured query language statement, the query feature is used to characterize the query semantics of the structured query language statement; according to the query feature of the structured query language statement, it is determined in multiple computing engines a target computing engine, and convert the structured query language statement into a target data query statement executable by the target computing engine; execute the target data query statement through the target computing engine.
可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言——诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)——连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program code for carrying out operations of the present disclosure may be written in one or more programming languages, or combinations thereof, including but not limited to object-oriented programming languages—such as Java, Smalltalk, C++, and Includes conventional procedural programming languages - such as "C" or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In cases involving a remote computer, the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (for example, using an Internet service provider to connected via the Internet).
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in a flowchart or block diagram may represent a module, program segment, or portion of code that contains one or more logical functions for implementing specified executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved. It should also be noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented by a dedicated hardware-based system that performs the specified functions or operations , or may be implemented by a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块的名称在某种情况下并不构成对该模块本身的限定。The modules involved in the embodiments described in the present disclosure may be implemented by software or by hardware. Wherein, the name of the module does not constitute a limitation on the module itself under certain circumstances.
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上系统(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), System on Chips (SOCs), Complex Programmable Logical device (CPLD) and so on.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行系统、装置或设备使用或与指令执行系统、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体系统、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读 存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in conjunction with an instruction execution system, apparatus, or device. A machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, apparatus, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer discs, hard drives, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
根据本公开的一个或多个实施例,示例1提供了一种数据查询方法,所述方法包括:According to one or more embodiments of the present disclosure, Example 1 provides a data query method, the method comprising:
获取基于统一结构化查询语言标准确定的结构化查询语言语句;Obtain the structured query language statement determined based on the unified structured query language standard;
确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;Determine the query feature corresponding to the structured query language statement, the query feature is used to characterize the query semantics of the structured query language statement;
根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;According to the query feature of the structured query language statement, determine a target computing engine among multiple computing engines, and convert the structured query language statement into a target data query statement executable by the target computing engine;
通过所述目标计算引擎执行所述目标数据查询语句。The target data query statement is executed by the target computing engine.
根据本公开的一个或多个实施例,示例2提供了示例1的方法,所述确定所述结构化查询语言语句对应的查询特征,包括:According to one or more embodiments of the present disclosure, Example 2 provides the method of Example 1, and the determining the query features corresponding to the structured query language statement includes:
确定所述结构化查询语言语句对应的复杂度特征和/或所述结构化查询语言语句中待查询数据的数据源特征;Determine the complexity characteristics corresponding to the structured query language statement and/or the data source characteristics of the data to be queried in the structured query language statement;
所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,包括:According to the query feature of the structured query language statement, determining a target computing engine among multiple computing engines includes:
根据所述结构化查询语言语句的所述复杂度特征和/或所述数据源特征,在多个计算引擎中确定目标计算引擎。According to the complexity feature of the structured query language statement and/or the data source feature, a target computing engine is determined among multiple computing engines.
根据本公开的一个或多个实施例,示例3提供了示例1的方法,所述结构化查询语言语句中待查询数据的数据源包括至少两个不同的数据源,所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:According to one or more embodiments of the present disclosure, Example 3 provides the method of Example 1, the data source of the data to be queried in the structured query language statement includes at least two different data sources, and the structured Querying the query features of a language statement, determining a target computing engine among multiple computing engines, and converting the structured query language statement into a target data query statement that can be executed by the target computing engine, including:
根据所述结构化查询语言语句对应的数据源特征,在多个计算引擎中确定与每一所述数据源对应的目标计算引擎,并基于所述数据源特征将所述结构化查询语言语句转换为对应的所述目标计算引擎能够执行的目标数据查询语句;According to the data source characteristics corresponding to the structured query language statement, determine a target computing engine corresponding to each of the data sources among multiple computing engines, and convert the structured query language statement based on the data source characteristics A target data query statement that can be executed by the corresponding target computing engine;
所述通过所述目标计算引擎执行所述目标数据查询语句后,还包括:After the target data query statement is executed by the target calculation engine, it also includes:
将所述多个计算引擎中的任一引擎确定为联合处理引擎;determining any one of the plurality of computing engines as a joint processing engine;
将各所述目标计算引擎通过执行所述目标数据查询语句从对应的所述数据源查询到的目标数据发送给所述联合处理引擎,并通过所述联合处理引擎将每一所述目标数据进行联合处理。sending the target data queried by each target computing engine from the corresponding data source to the joint processing engine by executing the target data query statement, and performing each target data through the joint processing engine joint processing.
根据本公开的一个或多个实施例,示例4提供了示例1-3任一所述的方法,所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,包括:According to one or more embodiments of the present disclosure, Example 4 provides the method described in any one of Examples 1-3, wherein the target is determined in multiple computing engines according to the query characteristics of the structured query language statement Calculation engine, including:
根据所述结构化查询语言语句的所述查询特征和预设的语句优化策略,对所述结构化查询语言语句进行优化,得到优化查询语句;Optimizing the structured query language statement according to the query characteristics of the structured query language statement and a preset statement optimization strategy to obtain an optimized query statement;
根据所述优化查询语句,在多个计算引擎中确定目标计算引擎。According to the optimized query statement, a target computing engine is determined among multiple computing engines.
根据本公开的一个或多个实施例,示例5提供了示例1-3任一所述的方法,所述方法还包括:According to one or more embodiments of the present disclosure, Example 5 provides the method described in any one of Examples 1-3, the method further comprising:
在所述多个计算引擎中选择一引擎作为标准引擎,并基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句;Selecting an engine among the plurality of computing engines as a standard engine, and converting the structured query language statement into an intermediate query statement based on the format of the structured query language statement that the standard engine can handle;
所述将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:The converting the structured query language statement into a target data query statement executable by the target computing engine includes:
确定所述目标计算引擎是否为所述标准引擎;determining whether the target computing engine is the standard engine;
若所述目标计算引擎不是所述标准引擎,则将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句。If the target computing engine is not the standard engine, converting the intermediate query statement into a target data query statement executable by the target computing engine.
根据本公开的一个或多个实施例,示例6提供了示例5所述的方法,所述标准引擎为Calcite引擎,所述目标计算引擎为Spark引擎,所述基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句,包括:According to one or more embodiments of the present disclosure, Example 6 provides the method described in Example 5, the standard engine is a Calcite engine, the target calculation engine is a Spark engine, and the structure that can be processed based on the standard engine is the format of the structured query language statement, and convert the structured query language statement into an intermediate query statement, including:
基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;Converting the structured query language statement into a RelNode statement based on the data format of the structured query language statement capable of being processed by the standard engine;
所述将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:The converting the intermediate query statement into a target data query statement executable by the target computing engine includes:
将所述RelNode语句转换为所述目标计算引擎能够执行的DataFrame语句。Converting the RelNode statement into a DataFrame statement executable by the target computing engine.
根据本公开的一个或多个实施例,示例7提供了示例5所述的方法,所述标准引擎为Calcite引擎,所述目标计算引擎为Presto引擎,所述基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句,包括:According to one or more embodiments of the present disclosure, Example 7 provides the method described in Example 5, the standard engine is a Calcite engine, the target calculation engine is a Presto engine, and the structure that can be processed based on the standard engine the format of the structured query language statement, and convert the structured query language statement into an intermediate query statement, including:
基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;Converting the structured query language statement into a RelNode statement based on the data format of the structured query language statement capable of being processed by the standard engine;
所述将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:The converting the intermediate query statement into a target data query statement executable by the target computing engine includes:
将所述RelNode语句转换为所述目标计算引擎能够执行的结构化查询语言语句。Converting the RelNode statement into a structured query language statement executable by the target computing engine.
根据本公开的一个或多个实施例,示例8提供了一种数据查询装置,所述装置包括:According to one or more embodiments of the present disclosure, Example 8 provides a data query device, the device comprising:
获取模块,用于获取基于统一结构化查询语言标准确定的结构化查询语言语句;An acquisition module, configured to acquire the structured query language statement determined based on the unified structured query language standard;
第一确定模块,用于确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;A first determining module, configured to determine a query feature corresponding to the structured query language statement, where the query feature is used to characterize the query semantics of the structured query language statement;
第二确定模块,用于根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;The second determining module is configured to determine a target computing engine among a plurality of computing engines according to the query characteristics of the structured query language statement, and convert the structured query language statement into a target computing engine executable The target data query statement;
查询模块,用于通过所述目标计算引擎执行所述目标数据查询语句。A query module, configured to execute the target data query statement through the target computing engine.
根据本公开的一个或多个实施例,示例9提供了一种非临时性计算机可读存储介质,其上存储有计算机程序,该程序被处理装置执行时实现示例1-7中任一项所述方法的步骤。According to one or more embodiments of the present disclosure, Example 9 provides a non-transitory computer-readable storage medium on which a computer program is stored, and when the program is executed by a processing device, any one of Examples 1-7 is implemented. steps of the method described above.
根据本公开的一个或多个实施例,示例10提供了一种电子设备,包括:According to one or more embodiments of the present disclosure, Example 10 provides an electronic device, comprising:
存储装置,其上存储有计算机程序;a storage device on which a computer program is stored;
处理装置,用于执行所述存储装置中的所述计算机程序,以实现示例1-7中任一项所述方法的步骤。A processing device configured to execute the computer program in the storage device to implement the steps of any one of the methods in Examples 1-7.
以上描述仅为本公开的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本公开中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本公开中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。The above description is only a preferred embodiment of the present disclosure and an illustration of the applied technical principle. Those skilled in the art should understand that the disclosure scope involved in this disclosure is not limited to the technical solution formed by the specific combination of the above-mentioned technical features, but also covers the technical solutions formed by the above-mentioned technical features or Other technical solutions formed by any combination of equivalent features. For example, a technical solution formed by replacing the above-mentioned features with technical features disclosed in this disclosure (but not limited to) having similar functions.
此外,虽然采用特定次序描绘了各操作,但是这不应当理解为要求这些操作以所示出的特定次序或以顺序次序执行来执行。在一定环境下,多任务和并行处理可能是有利的。同样地,虽然在上面论述中包含了若干具体实现细节,但是这些不应当被解释为对本公开的范围的限制。在单独的实施例的上下文中描述的某些特征还可以组合地实现在单个实施例中。相反地,在单个实施例的上下文中描述的各种特征也可以单独地或以任何合适的子组合的方式实现在多个实施例中。In addition, while operations are depicted in a particular order, this should not be understood as requiring that the operations be performed in the particular order shown or performed in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while the above discussion contains several specific implementation details, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
尽管已经采用特定于结构特征和/或方法逻辑动作的语言描述了本主题,但是应当理解所附权利要求书中所限定的主题未必局限于上面描述的特定特征或动作。相反,上面所描述的特定特征和动作仅仅是实现权利要求书的示例形式。关于上述实施例中的装置,其中各个模块执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are merely example forms of implementing the claims. Regarding the apparatus in the foregoing embodiments, the specific manner in which each module executes operations has been described in detail in the embodiments related to the method, and will not be described in detail here.

Claims (10)

  1. 一种数据查询方法,其特征在于,所述方法包括:A data query method, characterized in that the method comprises:
    获取基于统一结构化查询语言标准确定的结构化查询语言语句;Obtain the structured query language statement determined based on the unified structured query language standard;
    确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;Determine the query feature corresponding to the structured query language statement, the query feature is used to characterize the query semantics of the structured query language statement;
    根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;According to the query feature of the structured query language statement, determine a target computing engine among multiple computing engines, and convert the structured query language statement into a target data query statement executable by the target computing engine;
    通过所述目标计算引擎执行所述目标数据查询语句。The target data query statement is executed by the target computing engine.
  2. 根据权利要求1所述的方法,其特征在于,所述确定所述结构化查询语言语句对应的查询特征,包括:The method according to claim 1, wherein the determining the query features corresponding to the structured query language statement comprises:
    确定所述结构化查询语言语句对应的复杂度特征和/或所述结构化查询语言语句中待查询数据的数据源特征;Determine the complexity characteristics corresponding to the structured query language statement and/or the data source characteristics of the data to be queried in the structured query language statement;
    所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,包括:According to the query feature of the structured query language statement, determining a target computing engine among multiple computing engines includes:
    根据所述结构化查询语言语句的所述复杂度特征和/或所述数据源特征,在多个计算引擎中确定目标计算引擎。According to the complexity feature of the structured query language statement and/or the data source feature, a target computing engine is determined among multiple computing engines.
  3. 根据权利要求1所述的方法,其特征在于,所述结构化查询语言语句中待查询数据的数据源包括至少两个不同的数据源,所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:The method according to claim 1, wherein the data source of the data to be queried in the structured query language statement includes at least two different data sources, and the query according to the structured query language statement The feature is to determine a target computing engine among multiple computing engines, and convert the structured query language statement into a target data query statement that can be executed by the target computing engine, including:
    根据所述结构化查询语言语句对应的数据源特征,在多个计算引擎中确定与每一所述数据源对应的目标计算引擎,并基于所述数据源特征将所述结构化查询语言语句转换为对应的所述目标计算引擎能够执行的目标数据查询语句;According to the data source characteristics corresponding to the structured query language statement, determine a target computing engine corresponding to each of the data sources among multiple computing engines, and convert the structured query language statement based on the data source characteristics A target data query statement that can be executed by the corresponding target computing engine;
    所述通过所述目标计算引擎执行所述目标数据查询语句后,还包括:After the target data query statement is executed by the target calculation engine, it also includes:
    将所述多个计算引擎中的任一引擎确定为联合处理引擎;determining any one of the plurality of computing engines as a joint processing engine;
    将各所述目标计算引擎通过执行所述目标数据查询语句从对应的所述数据源查询到的目标数据发送给所述联合处理引擎,并通过所述联合处理引擎将每一所述目标数据进行联合处理。sending the target data queried by each target computing engine from the corresponding data source to the joint processing engine by executing the target data query statement, and performing each target data through the joint processing engine joint processing.
  4. 根据权利要求1-3任一所述的方法,其特征在于,所述根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,包括:The method according to any one of claims 1-3, wherein, according to the query feature of the structured query language statement, determining a target computing engine among multiple computing engines includes:
    根据所述结构化查询语言语句的所述查询特征和预设的语句优化策略,对所述结构化查询语言语句进行优化,得到优化查询语句;Optimizing the structured query language statement according to the query characteristics of the structured query language statement and a preset statement optimization strategy to obtain an optimized query statement;
    根据所述优化查询语句,在多个计算引擎中确定目标计算引擎。According to the optimized query statement, a target computing engine is determined among multiple computing engines.
  5. 根据权利要求1-3任一所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-3, wherein the method further comprises:
    在所述多个计算引擎中选择一引擎作为标准引擎,并基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句;Selecting an engine among the plurality of computing engines as a standard engine, and converting the structured query language statement into an intermediate query statement based on the format of the structured query language statement that the standard engine can handle;
    所述将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:The converting the structured query language statement into a target data query statement executable by the target computing engine includes:
    确定所述目标计算引擎是否为所述标准引擎;determining whether the target computing engine is the standard engine;
    若所述目标计算引擎不是所述标准引擎,则将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句。If the target computing engine is not the standard engine, converting the intermediate query statement into a target data query statement executable by the target computing engine.
  6. 根据权利要求5所述的方法,其特征在于,所述标准引擎为Calcite引擎,所述目标计算引擎为Spark引擎,所述基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句,包括:The method according to claim 5, wherein the standard engine is a Calcite engine, and the target computing engine is a Spark engine, and the format of the structured query language statement that can be processed by the standard engine is converted to The above structured query language statement is converted into an intermediate query statement, including:
    基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;Converting the structured query language statement into a RelNode statement based on the data format of the structured query language statement capable of being processed by the standard engine;
    所述将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:The converting the intermediate query statement into a target data query statement executable by the target computing engine includes:
    将所述RelNode语句转换为所述目标计算引擎能够执行的DataFrame语句。Converting the RelNode statement into a DataFrame statement executable by the target computing engine.
  7. 根据权利要求5所述的方法,其特征在于,所述标准引擎为Calcite引擎,所述目标计算引擎为Presto引擎,所述基于所述标准引擎能够处理的结构化查询语言语句的格式,将所述结构化查询语言语句转换为中间查询语句,包括:The method according to claim 5, wherein the standard engine is a Calcite engine, and the target calculation engine is a Presto engine, and the format of the structured query language statement that can be processed by the standard engine is converted to The above structured query language statement is converted into an intermediate query statement, including:
    基于所述标准引擎能够处理的结构化查询语言语句的数据格式,将所述结构化查询语言语句转换为RelNode语句;Converting the structured query language statement into a RelNode statement based on the data format of the structured query language statement capable of being processed by the standard engine;
    所述将所述中间查询语句转换为所述目标计算引擎能够执行的目标数据查询语句,包括:The converting the intermediate query statement into a target data query statement executable by the target computing engine includes:
    将所述RelNode语句转换为所述目标计算引擎能够执行的结构化查询语言语句。Converting the RelNode statement into a structured query language statement executable by the target computing engine.
  8. 一种数据查询装置,其特征在于,所述装置包括:A data query device, characterized in that said device comprises:
    获取模块,用于获取基于统一结构化查询语言标准确定的结构化查询语言语句;An acquisition module, configured to acquire the structured query language statement determined based on the unified structured query language standard;
    第一确定模块,用于确定所述结构化查询语言语句对应的查询特征,所述查询特征用于表征所述结构化查询语言语句的查询语义;A first determining module, configured to determine a query feature corresponding to the structured query language statement, where the query feature is used to characterize the query semantics of the structured query language statement;
    第二确定模块,用于根据所述结构化查询语言语句的所述查询特征,在多个计算引擎中确定目标计算引擎,并将所述结构化查询语言语句转换为所述目标计算引擎能够执行的目标数据查询语句;The second determining module is configured to determine a target computing engine among a plurality of computing engines according to the query characteristics of the structured query language statement, and convert the structured query language statement into a target computing engine executable The target data query statement;
    查询模块,用于通过所述目标计算引擎执行所述目标数据查询语句。A query module, configured to execute the target data query statement through the target computing engine.
  9. 一种非临时性计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理装置执行时实现权利要求1-7中任一项所述方法的步骤。A non-transitory computer-readable storage medium, on which a computer program is stored, characterized in that, when the program is executed by a processing device, the steps of the method in any one of claims 1-7 are implemented.
  10. 一种电子设备,其特征在于,包括:An electronic device, characterized in that it comprises:
    存储装置,其上存储有计算机程序;a storage device on which a computer program is stored;
    处理装置,用于执行所述存储装置中的所述计算机程序,以实现权利要求1-7中任一项所述方法的步骤。A processing device configured to execute the computer program in the storage device to implement the steps of the method according to any one of claims 1-7.
PCT/CN2022/109468 2021-09-03 2022-08-01 Data query method and apparatus, storage medium, and electronic device WO2023029854A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111032755.6A CN113704291A (en) 2021-09-03 2021-09-03 Data query method and device, storage medium and electronic equipment
CN202111032755.6 2021-09-03

Publications (1)

Publication Number Publication Date
WO2023029854A1 true WO2023029854A1 (en) 2023-03-09

Family

ID=78659456

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/109468 WO2023029854A1 (en) 2021-09-03 2022-08-01 Data query method and apparatus, storage medium, and electronic device

Country Status (2)

Country Link
CN (1) CN113704291A (en)
WO (1) WO2023029854A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113704291A (en) * 2021-09-03 2021-11-26 北京火山引擎科技有限公司 Data query method and device, storage medium and electronic equipment
CN114357276B (en) * 2021-12-23 2023-08-22 北京百度网讯科技有限公司 Data query method, device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026335A1 (en) * 2017-07-23 2019-01-24 AtScale, Inc. Query engine selection
CN110399388A (en) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 Data query method, system and equipment
CN111061766A (en) * 2019-11-27 2020-04-24 上海钧正网络科技有限公司 Business data processing method and device, computer equipment and storage medium
CN112699141A (en) * 2020-12-29 2021-04-23 医渡云(北京)技术有限公司 Data query method and device for multi-source heterogeneous data, storage medium and equipment
CN113704291A (en) * 2021-09-03 2021-11-26 北京火山引擎科技有限公司 Data query method and device, storage medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111221842A (en) * 2018-11-27 2020-06-02 北京奇虎科技有限公司 Big data processing system and method
CN111309751A (en) * 2018-11-27 2020-06-19 北京奇虎科技有限公司 Big data processing method and device
CN110633292B (en) * 2019-09-19 2022-06-21 上海依图网络科技有限公司 Query method, device, medium, equipment and system for heterogeneous database

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026335A1 (en) * 2017-07-23 2019-01-24 AtScale, Inc. Query engine selection
CN110399388A (en) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 Data query method, system and equipment
CN111061766A (en) * 2019-11-27 2020-04-24 上海钧正网络科技有限公司 Business data processing method and device, computer equipment and storage medium
CN112699141A (en) * 2020-12-29 2021-04-23 医渡云(北京)技术有限公司 Data query method and device for multi-source heterogeneous data, storage medium and equipment
CN113704291A (en) * 2021-09-03 2021-11-26 北京火山引擎科技有限公司 Data query method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN113704291A (en) 2021-11-26

Similar Documents

Publication Publication Date Title
US10311055B2 (en) Global query hint specification
CN109086409B (en) Microservice data processing method and device, electronic equipment and computer readable medium
WO2023029854A1 (en) Data query method and apparatus, storage medium, and electronic device
CN106687955B (en) Simplifying invocation of an import procedure to transfer data from a data source to a data target
WO2023273544A1 (en) Log file storage method and apparatus, device, and storage medium
WO2018196729A1 (en) Query processing method, data source registration method and query engine
WO2023036128A1 (en) Data management method and apparatus, storage medium, and electronic device
WO2023056934A1 (en) Data processing method and apparatus, and electronic device
CN108363741B (en) Big data unified interface method, device, equipment and storage medium
CN111221851A (en) Lucene-based mass data query and storage method and device
US10592506B1 (en) Query hint specification
US11704327B2 (en) Querying distributed databases
US20190258736A1 (en) Dynamic Execution of ETL Jobs Without Metadata Repository
WO2024001756A1 (en) Data storage method and apparatus, and electronic device and storage medium
WO2023231615A1 (en) Materialized-column creation method and data query method based on data lake
WO2023065937A1 (en) Data processing method and apparatus, and readable medium and electronic device
CN111241137B (en) Data processing method, device, electronic equipment and storage medium
WO2023029850A1 (en) Data processing method and apparatus, and electronic device and medium
CN114036107B (en) Medical data query method and device based on hudi snapshot
WO2023001281A1 (en) Table data processing method and apparatus, terminal, and storage medium
CN116127143A (en) Data query method, device, electronic equipment and readable storage medium
WO2022151835A1 (en) Sample message processing method and apparatus
US9275103B2 (en) Optimization of JOIN queries for related data
CN115344688A (en) Business data display method and device, electronic equipment and computer readable medium
CN110879818B (en) Method, device, medium and electronic equipment for acquiring data

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22862995

Country of ref document: EP

Kind code of ref document: A1