WO2018036549A1 - 分布式数据库查询方法、装置及管理系统 - Google Patents

分布式数据库查询方法、装置及管理系统 Download PDF

Info

Publication number
WO2018036549A1
WO2018036549A1 PCT/CN2017/098886 CN2017098886W WO2018036549A1 WO 2018036549 A1 WO2018036549 A1 WO 2018036549A1 CN 2017098886 W CN2017098886 W CN 2017098886W WO 2018036549 A1 WO2018036549 A1 WO 2018036549A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
execution information
distributed database
historical
statement
Prior art date
Application number
PCT/CN2017/098886
Other languages
English (en)
French (fr)
Inventor
丁岩
李彦中
陈小强
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2018036549A1 publication Critical patent/WO2018036549A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Definitions

  • the present disclosure relates to the field of database technologies, for example, to a distributed database query method, apparatus, and management system.
  • the centralized database system in the related technology shows deficiencies.
  • the centralized database system usually adopts centralized management, which is easy to cause performance bottleneck of the database.
  • the single hardware performance improvement can not meet the high data volume of the Internet and concurrently query and write.
  • Incoming demand will also generate a large amount of hardware maintenance costs and upgrade costs.
  • the centralized management of the database may cause multiple database associations. Once a database crashes, the entire database system will crash, so the distributed database system is gradually developed.
  • the data in the distributed database system is located on a large number of different data nodes.
  • the performance burden of the system will inevitably be aggravated, resulting in an increase in query latency and a decrease in concurrency.
  • Related distributed query technology generally based on single-machine data, query technology is limited to cache historical query result data and pre-compilation of statements, etc., can not solve the problem of long delay and low concurrency of distributed database query .
  • the embodiment provides a distributed database query method, device and management system to solve at least In the related technology, the distributed database query has a long delay and a low degree of concurrency.
  • This embodiment provides a distributed database query method, which may include:
  • a data query is performed in the distributed database based on the history execution information.
  • the historical execution information may include one or more of table data distribution information, destination database information, and location information of constant replacement.
  • the retrieving in the query cache of the distributed database according to the received query statement comprises: replacing a constant in the query statement with a preset character to obtain a new query statement, and the new query The statement performs a hash calculation to generate a corresponding hash value; and performs a search in the query buffer according to the hash value.
  • the method before performing the retrieving in the query cache of the distributed database according to the received query statement, the method further includes: collecting historical execution information in the historical query process; and the case where the historical execution information conforms to the preset rule. And saving the history execution information into the query buffer.
  • saving the historical execution information to the query buffer includes: determining whether the data that is queried in the historical query process needs the distribution The plurality of data nodes of the database perform calculation processing; and the data that is queried during the historical query process does not require the plurality of data nodes to perform calculation processing, and saves the history execution information into the query buffer.
  • the embodiment further provides a distributed database query device, which is disposed in the distributed database management system and may include:
  • Retrieve module set to be in the query cache of the distributed database according to the received query statement Performing a search, wherein the query execution area holds history execution information;
  • an execution module configured to perform data query in the distributed database according to the historical execution information if the history execution information corresponding to the query statement is retrieved by the retrieval module.
  • the historical execution information may include one or more of table data distribution information, destination database information, and constant replacement location information.
  • the parsing unit is configured to replace the constant in the query statement with a preset character to obtain a new query statement, perform hash calculation on the new query statement, and generate a corresponding hash value.
  • a retrieval unit configured to perform retrieval in the query buffer based on the hash value.
  • the foregoing apparatus further includes: a collecting module, configured to collect historical execution information during a historical query process before searching according to the received query statement in a query cache of the distributed database; and a cache module, configured to And storing the history execution information in the query buffer area if the history execution information conforms to a preset rule.
  • a collecting module configured to collect historical execution information during a historical query process before searching according to the received query statement in a query cache of the distributed database
  • a cache module configured to And storing the history execution information in the query buffer area if the history execution information conforms to a preset rule.
  • the cache module includes: a determining unit, configured to determine whether the data queryed in the historical query process requires multiple data nodes of the distributed database to perform calculation processing; and the cache unit is set to be in the history The data that is queried during the query process does not require the plurality of data nodes to perform calculation processing, and saves the history execution information into the query buffer.
  • the embodiment further provides a distributed database management system, a query cache area and any of the distributed database query devices provided by the foregoing embodiments.
  • the embodiment further provides a computer readable storage medium storing computer executable instructions for performing any of the above methods.
  • the embodiment also provides a server including one or more processors, a memory, and one or more programs, the one or more programs being stored in the memory when being one or more When the processor is executed, perform any of the above methods.
  • the embodiment further provides a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions, when the program instructions are executed by a computer Having the computer perform any of the methods described above.
  • This embodiment can save the partial execution tree generation process in the distributed database during the query process, thereby reducing the delay of the distributed database query and improving the concurrency.
  • FIG. 4 is a structural block diagram 1 of a distributed database query apparatus provided in this embodiment
  • FIG. 5 is a second structural block diagram of a distributed database query apparatus according to the embodiment.
  • FIG. 6 is a structural block diagram of a distributed database management system provided by this embodiment.
  • FIG. 7 is a schematic structural diagram of a distributed database management system provided by this embodiment.
  • FIG. 8 is a schematic structural diagram of a hardware of a server provided in this embodiment.
  • FIG. 1 is a flowchart 1 of a distributed database query method provided in this embodiment. As shown in FIG. 1 , the method may include the following steps.
  • a search is performed in a query cache of the distributed database according to the received query statement, wherein the query execution area stores history execution information.
  • the historical execution information may include one or more of table data distribution information, destination database information, and constant replacement location information.
  • the data nodes to be queried can be calculated based on the historical execution information and the constant value information in the query statement, thereby performing the query operation.
  • the search is performed in the query buffer according to the received query statement, wherein the query execution area stores history execution information; and when the history execution information corresponding to the query statement is retrieved, Historical execution information is queried.
  • the distributed database can save part of the execution tree generation process during the query process, thereby reducing the delay of the distributed database query and improving the concurrency.
  • FIG. 2 is a flowchart 2 of the distributed database query method provided in this embodiment. As shown in FIG. 2, the foregoing S102 can be implemented as follows:
  • the above query statement is subjected to constant substitution, and the replaced query statement is hashed, a corresponding hash value is generated, and then the search is performed in the query buffer according to the hash value.
  • the query statement in this embodiment may be a Structured Query Language (SQL) statement, and a SQL statement may have multiple constant values.
  • SQL Structured Query Language
  • the lexical analysis may be performed on the SQL statement to obtain a constant value in the SQL statement; in S204, at least one constant value in the obtained SQL statement is sequentially recorded in an array;
  • the above-mentioned constant value is replaced by a placeholder; in S208, a hash value is calculated by performing a hash calculation on the constant-replaced SQL statement; in S210, the hash value is a key (KEY) value in the above query. Search in the cache.
  • KEY key
  • the historical execution information may include one or more of table data distribution information, destination database information, and location information of constant replacement.
  • the table data distribution information may indicate a node of the table data distribution
  • the destination database information may indicate a database that the above query statement should deliver
  • the position information of the constant replacement may indicate the position of the replaced constant in the query statement.
  • the history execution information is extracted; in S214, the data node to be queried can be calculated according to the history execution information and the constant value information. In S216, a query is executed to obtain corresponding data from the corresponding data node.
  • the method before performing the retrieving in the query buffer according to the received query statement, the method further includes: collecting execution information in the historical query process; and if the historical execution information conforms to the preset rule, the historical execution information is Save to the above query cache.
  • the method further includes:
  • the execution information in the query process may also be collected after the query is searched according to the received query statement, and the corresponding historical execution information is retrieved regardless of whether the corresponding historical execution information is retrieved. You can collect execution information during this query.
  • FIG. 3 is a third flowchart of the distributed database query method provided in this embodiment. As shown in FIG. 3, the method may include the following steps.
  • the retrieval statement is parsed, and the corresponding execution tree is generated, and the query is performed in the distributed database according to the execution tree, and the corresponding query result is obtained.
  • the execution information of the query is collected in the process of querying, and it is determined whether the execution information of the query statement can be cached.
  • the preset rule for determining whether the execution information of the query statement can be cached may be: if the execution information of the query statement does not require the distributed system to participate in the result set calculation, for example, by querying a data node to obtain the query result, the query may be cached. Execution information of the statement; if the distributed system is required to participate in one of the grouping, sorting, aggregation calculation, result set deduplication, and result set limitation of the result set, for example, by using multiple data nodes to obtain the query result, then Cache the execution information of the query statement.
  • the execution information is cached to the query cache to invoke the execution information the next time the same or similar statement is sent.
  • the embodiment further provides a distributed database query device, which is disposed in the distributed database management system, and can execute any of the distributed database query methods provided in this embodiment.
  • the structure of the distributed database query device provided by the embodiment is as shown in FIG. 4.
  • the device may include:
  • the retrieval module 402 can be configured to perform retrieval in the query cache of the distributed database according to the received query statement, wherein the query cache contains historical execution information.
  • the executing module 404 may be configured to perform data query in the distributed database according to the historical execution information if the retrieval module retrieves the historical execution information corresponding to the query statement.
  • the historical execution information may include one or more of table data distribution information, destination database information, and constant replacement location information.
  • the data node to be queried can be calculated based on the historical execution information and the constant value information, thereby performing the query operation.
  • the retrieval module 402 is configured to perform a retrieval in the query buffer according to the received query statement, wherein the query execution area stores history execution information; and the execution module 404 is configured to retrieve the query corresponding to the query statement.
  • the query is performed based on the history execution information.
  • the distributed database can save part of the execution tree generation process during the query process, thereby reducing the delay of the distributed database query and improving the concurrency.
  • FIG. 5 is a block diagram 2 of the distributed database query apparatus provided in this embodiment.
  • the search module 402 includes a syntax analysis unit 5002 and a retrieval unit 5004.
  • the syntax analysis unit 5002 may be configured to replace the constant in the query statement with a preset character to obtain a new query statement, perform hash calculation on the new query statement, and generate a corresponding hash value.
  • the retrieval unit 5004 can be configured to perform retrieval in the query buffer according to the hash value.
  • the query statement can be an SQL statement.
  • the parsing unit 5002 can be configured to perform lexical analysis on the above SQL statement, find the constant value in the SQL statement, record the constant value in an array in order, and then replace the found constant value with a placeholder.
  • the hash calculation is performed on the hashed SQL statement to generate a hash value
  • the retrieval unit 5004 can be configured to perform the retrieval by using the hash value as a KEY (key) value in the query buffer.
  • the historical execution information generally includes one or more of table data distribution information, destination database information, and location information of constant replacement.
  • the device further includes a collection module 506 and a cache module 508.
  • the cache module 508 may further include a determining unit 5006 and a buffer unit 5008.
  • the collection module 506 can be configured to collect historical execution information during the historical query process before being retrieved in the query cache of the distributed database according to the received query statement; the cache module 508 can be configured to perform the information in the historical execution. In the case of a rule, the history execution information is saved into the query buffer.
  • the execution information in the query process may also be collected after the search is performed in the query buffer according to the received query statement, and the corresponding history execution information may be collected during the query process. Execution information.
  • the execution module 404 is further configured to, when the query cache area is not found and the corresponding history execution information is not retrieved, initiate a normal distributed query flow for the query statement (ie, the SQL statement).
  • the collecting module 506 is configured to collect the execution information of the current query in the process of the above query, and determine, by the determining unit 5006 in the cache module 508, whether the query statement can be cached; wherein the determining rule is: if the query statement is The execution information does not require the distributed system to participate in the result set calculation, and the execution information can be cached; if the distributed system is required to participate in the grouping, sorting, aggregation calculation, result set deduplication, and result set limitation of the result set, etc. , the execution information is not cached.
  • the determining unit 5006 is configured to determine whether the data queried in the historical query process requires a plurality of data nodes of the distributed database to perform calculation processing. Cache unit 5008 has been queried in the history If the data in the process does not need to be processed by the plurality of data nodes, the history execution information that needs to be cached is returned, and the history execution information is saved in the query buffer area, so that the next time is the same or similar. When the statement is sent, the history execution information is called for data query.
  • FIG. 6 is a block diagram of the distributed database management system provided by the embodiment
  • FIG. 7 is a schematic diagram of the distributed database management system provided in this embodiment, as shown in FIG. 6 and
  • the system includes at least the distributed database query device and the query buffer provided by the above embodiments.
  • the distributed database query device includes at least a retrieval module 402 and an execution module 404, wherein the retrieval module 402 can be configured to perform retrieval in the query buffer according to the received query statement, where the history execution information is saved in the query buffer;
  • the module 404 may be configured to perform a query according to the history execution information when the retrieval module retrieves the history execution information corresponding to the query statement.
  • the historical execution information may include one or more of table data distribution information, destination database information, and constant replacement location information.
  • the data node to be queried can be calculated based on the historical execution information and the constant value information, thereby performing the query operation.
  • the query statement can be queried according to the historical execution information, so that the distributed database can save part of the execution tree generation process in the query process, thereby reducing the delay of the distributed database query and improving the concurrency.
  • the embodiment further provides a computer readable storage medium storing computer executable instructions for performing any of the above methods.
  • FIG. 8 it is a hardware structure diagram of a server provided in this embodiment, as shown in FIG. 8 .
  • the server includes: a processor 810 and a memory 820; and may further include a communications interface 830 and a bus 840.
  • the processor 810, the memory 820, and the communication interface 830 can complete communication with each other through the bus 840.
  • Communication interface 830 can be used for information transfer.
  • Processor 810 can invoke logic instructions in memory 820 to perform any of the methods of the above-described embodiments.
  • the memory 820 may include a storage program area and a storage data area, and the storage program area may store an operating system and an application required for at least one function.
  • the storage data area can store data created according to the use of the server, and the like.
  • the memory may include, for example, a volatile memory of a random access memory, and may also include a non-volatile memory. For example, at least one disk storage device, flash memory device, or other non-transitory solid state storage device.
  • the logic instructions in the memory 820 described above can be implemented in the form of software functional units and sold or used as separate products, the logic instructions can be stored in a computer readable storage medium.
  • the method provided by the present disclosure may be embodied in the form of a computer software product, which may be stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) All or part of the steps of the method described in this embodiment are performed.
  • the storage medium may be a non-transitory storage medium or a transitory storage medium.
  • the non-transitory storage medium may include: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. medium.
  • All or part of the processes provided in the above embodiments may be completed by a computer program indicating related hardware, and the program may be stored in a non-transitory computer readable storage medium, and when the program is executed, may include The flow of an embodiment of the above method.
  • the disclosure provides a distributed database query method, device and management system, which can save a part of the execution tree generation process in the distributed database during the query process, thereby reducing the delay of the distributed database query and improving the concurrency.

Abstract

一种分布式数据库查询方法、装置及管理系统,该方法包括:根据接收到的查询语句在分布式数据库的查询缓存区中进行检索,其中,所述查询缓存区中保存有历史执行信息;以及在检索到与所述查询语句对应的历史执行信息的情况下,根据所述历史执行信息在所述分布式数据库中进行数据查询。

Description

分布式数据库查询方法、装置及管理系统 技术领域
本公开涉及数据库技术领域,例如涉及一种分布式数据库查询方法、装置及管理系统。
背景技术
随着数据库技术的日趋成熟,以及互联网应用的高速发展,数据库应用已普遍出现在互联网之上。相关技术中的集中式数据库系统也表现出不足之处,集中式数据库系统通常采用集中式管理,易造成数据库的性能瓶颈,单一的提高硬件性能已不能满足互联网的大数据量高并发查询及写入的需求,同时会产生大量的硬件维护成本和升级费用。此外,数据库的集中管理可能造成多个数据库关联,一旦一台数据库崩溃,将造成整个数据库系统崩溃,所以分布式数据库系统逐渐发展起来。
分布式数据库系统中的数据位于大量不同的数据节点上,在查询及存取数据的过程中会不可避免的加重系统的性能负担,进而造成查询时延增加及并发度降低的问题。相关的分布式查询技术中,一般以单机数据为基础,查询技术局限在缓存历史查询结果数据以及对语句进行预编译等方面,无法解决分布式数据库查询时延较长及并发度较低的问题。
发明内容
本实施例提供了一种分布式数据库查询方法、装置及管理系统,以至少解 决相关技术中分布式数据库查询时延较长、并发度较低的问题。
本实施例提供了一种分布式数据库查询方法,可以包括:
根据接收到的查询语句在分布式数据库的查询缓存区中进行检索,其中,所述查询缓存区中保存有历史执行信息;以及
在检索到与所述查询语句对应的历史执行信息的情况下,根据所述历史执行信息在所述分布式数据库中进行数据查询。
可选地,上述的历史执行信息可以包括表数据分布信息、目的数据库信息、常量替换的位置信息中的一项或多项。
可选地,所述根据接收到的查询语句在分布式数据库的查询缓存区中进行检索,包括:用预设字符替换所述查询语句中的常量得到新的查询语句,对所述新的查询语句进行哈希计算,生成对应的哈希值;以及根据所述哈希值在所述查询缓存区中进行检索。可选地,在根据接收到的查询语句在分布式数据库的查询缓存区中进行检索之前,还包括:收集历史查询过程中的历史执行信息;以及在所述历史执行信息符合预设规则的情况下,将所述历史执行信息保存到所述查询缓存区中。
可选地,在所述历史执行信息符合预设规则的情况下,将所述历史执行信息保存到所述查询缓存区中,包括:判断所述历史查询过程中查询的数据是否需要所述分布式数据库的多个数据节点进行计算处理;以及在所述历史查询过程中查询的数据不需要所述多个数据节点进行计算处理时,将所述历史执行信息保存到所述查询缓存区中。
本实施例还提供了一种分布式数据库查询装置,设置于分布式数据库管理系统中,可以包括:
检索模块,设置为根据接收到的查询语句在分布式数据库的查询缓存区中 进行检索,其中,所述查询缓存区中保存有历史执行信息;
执行模块,设置为在所述检索模块检索到与所述查询语句对应的所述历史执行信息的情况下,根据所述历史执行信息在所述分布式数据库中进行数据查询。
可选地,上述的历史执行信息可以包括表数据分布信息、目的数据库信息和常量替换的位置信息中的一项或多项。可选地,上述检索模块,语法分析单元,设置为用预设字符替换所述查询语句中的常量得到新的查询语句,对所述新的查询语句进行哈希计算,生成对应的哈希值;以及检索单元,设置为根据所述哈希值在所述查询缓存区中进行检索。
可选地,上述装置还包括:收集模块,设置为在根据接收到的查询语句在分布式数据库的查询缓存区中进行检索之前,收集历史查询过程中的历史执行信息;以及缓存模块,设置为在所述历史执行信息符合预设规则的情况下,将所述历史执行信息保存到所述查询缓存区中。
可选地,上述缓存模块,包括:判断单元,设置为判断所述历史查询过程中查询的数据是否需要所述分布式数据库的多个数据节点进行计算处理;缓存单元,设置为在所述历史查询过程中查询的数据不需要所述多个数据节点进行计算处理时,将所述历史执行信息保存到所述查询缓存区中。
本实施例还提供了一种分布式数据库管理系统,查询缓存区和上述实施例提供的任意一种分布式数据库查询装置。
本实施例还提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述任意一种方法。
本实施例还提供一种服务器,该服务器包括一个或多个处理器、存储器以及一个或多个程序,所述一个或多个程序存储在存储器中,当被一个或多个处 理器执行时,执行上述任意一种方法。
本实施例还提供了一种计算机程序产品,所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序,所述计算机程序包括程序指令,当所述程序指令被计算机执行时,使所述计算机执行上述任意一种方法。
本实施例能够实现在分布式数据库在查询过程中,节省部分执行树的生成过程,从而降低分布式数据库查询的时延,提高并发度。
附图说明
图1是本实施例提供的分布式数据库查询方法流程图一;
图2是本实施例提供的分布式数据库查询方法流程图二;
图3是本实施例提供的分布式数据库查询方法流程图三;
图4是本实施例提供的分布式数据库查询装置结构框图一;
图5是本实施例提供的分布式数据库查询装置结构框图二;
图6是本实施例提供的分布式数据库管理系统结构框图;
图7是本实施例提供的分布式数据库管理系统架构示意图;
图8是本实施例提供的一种服务器的硬件结构示意图。
具体实施方式
本发明实施例提供了一种分布式数据库查询方法,图1是本实施例提供的分布式数据库查询方法流程图一,如图1所示,该方法可以包括如下步骤。
在S102中,根据接收到的查询语句在分布式数据库的查询缓存区中进行检索,其中,所述查询缓存区中保存有历史执行信息。
在S104中,在检索到与所述查询语句对应的历史执行信息的情况下,根据 所述历史执行信息在所述分布式数据库中进行数据查询。
其中,上述的历史执行信息可以包括表数据分布信息、目的数据库信息和常量替换的位置信息中的一项或多项。当检索到这些历史执行信息时,就可以根据这些历史执行信息以及查询语句中的常量值信息计算出要查询的数据节点,从而执行查询操作即可。
在本实施例中,根据接收到的查询语句在查询缓存区中进行检索,其中,上述查询缓存区中保存有历史执行信息;在检索到与上述查询语句对应的历史执行信息的情况下,根据历史执行信息进行查询。使得分布式数据库在查询过程中可以节省部分执行树的生成过程,从而降低分布式数据库查询的时延并提高并发度。
图2是本实施例提供的分布式数据库查询方法流程图二,如图2所示,上述S102可以通过如下方式实现:
对上述查询语句进行常量替换,并对替换后的查询语句进行哈希计算,生成对应的哈希值,然后根据该哈希值在上述查询缓存区中进行检索。
本实施例中的查询语句可以为结构化查询语言(Structured Query Language,SQL)语句,一个SQL语句中可能有多个常量值。如图2所示,在S202中,可以对上述SQL语句进行词法分析,得到SQL语句中的常量值;在S204中,将得到的SQL语句中的至少一个常量值按顺序记录在一个数组中;在S206中,使用占位符替换上述常量值;在S208中对常量替换后的SQL语句进行哈希计算生成哈希值;在S210中以该哈希值为键(KEY)值,在上述查询缓存区中进行检索。
比如查询语句为“Select t1.a from t1 where t1.id=30”,经过词法分析可找出 其中的常量值为“30”,则可用占位符“?”将“30”进行替换,变成“Select t1.a from t1 where t1.id=?”。然后对被替换后的语句经过哈希计算得到一个整型值:0x87653127,将该整型值作为该查询语句的KEY值到上述查询缓存区中进行检索,如果检索到相同的KEY值,则认为该相同KEY值所对应的历史执行信息与本实施例中的查询语句“Select t1.a from t1 where t1.id=30”相对应,从而执行上述S104了,采用上述KEY值对应的历史执行信息进行查询。
可选地,上述历史执行信息可以包括表数据分布信息、目的数据库信息和常量替换的位置信息中的一项或多项。表数据分布信息可以指示出表数据分布的节点,目的数据库信息可以表示出上述查询语句应当下发的数据库,常量替换的位置信息可以指示出被替换的常量在查询语句中的位置。这些历史执行信息都可以是根据历史查询过程中的查询语句计算出并缓存起来的。
如图2所示,在S212中,当根据查询语句检索到相应的历史执行信息时,提取该历史执行信息;在S214中,可以根据该历史执行信息以及常量值信息计算出要查询的数据节点;在S216中,执行查询,从对应的数据节点上获取相应的数据。
可选地,在根据接收到的查询语句到查询缓存区中进行检索之前,还包括:收集历史查询过程中的执行信息;在所述历史执行信息符合预设规则的情况下,将历史执行信息保存到上述查询缓存区中。
可选地,在根据接收到的查询语句在查询缓存区中进行检索之后,还包括:
收集查询过程中的执行信息;以及在判断所述执行信息符合预设规则的情况下,将所述执行信息保存到所述查询缓存区中。
其中,还可以在根据接收到的查询语句到查询缓存区中进行检索之后收集此次查询过程中的执行信息,并且,无论是否检索到对应的历史执行信息,都 可以收集此次查询过程中的执行信息。
图3是本实施例提供的分布式数据库查询方法流程图三,如图3所示,该方法可以包括如下步骤。
在S302中,在检索了上述查询缓存区且未检索到对应的历史执行信息时,则对该查询语句(即SQL语句)执行正常的分布式查询流程。
例如,对检索语句进行语法分析,并生成相应的执行树,依据执行树在分布式数据库中进行查询,得到相应的查询结果。
在S304中,在查询的过程中收集本次查询的执行信息,并判断该条查询语句的执行信息是否可以缓存。
其中,判断是否可以缓存查询语句的执行信息的预设规则可以为:如果该查询语句的执行信息不需要分布式系统参与结果集计算,例如通过查询一个数据节点得到查询结果,则可以缓存该查询语句的执行信息;如果需要分布式系统参与对结果集进行分组、排序、汇聚计算、结果集去重和结果集限定等其中之一操作的,例如通过多个数据节点配合得到查询结果,则不缓存该查询语句的执行信息。
在S306中,如果可以对查询语句的执行信息进行缓存,则返回需要缓存的执行信息。
在S308中,将该执行信息缓存到上述查询缓存区,以在下次相同或相似的语句发来时调用该执行信息。
本实施例还提供了一种分布式数据库查询装置,设置于分布式数据库管理系统中,可以执行本实施例提供的任意一种分布式数据库查询方法,图4是本 实施例提供的分布式数据库查询装置结构框图一,如图4所示,该装置可以包括:
检索模块402,可以设置为根据接收到的查询语句在分布式数据库的查询缓存区中进行检索,其中,所述查询缓存区中保存有历史执行信息。
执行模块404,可以设置为在所述检索模块检索到与所述查询语句对应的所述历史执行信息的情况下,根据所述历史执行信息在所述分布式数据库中进行数据查询。
其中,上述的历史执行信息可以包括表数据分布信息、目的数据库信息和常量替换的位置信息中的一项或多项。当检索到这些历史执行信息时,可以根据历史执行信息以及常量值信息计算出要查询的数据节点,从而执行查询操作。
可选地,检索模块402,设置为根据接收到的查询语句在查询缓存区中进行检索,其中,上述查询缓存区中保存有历史执行信息;执行模块404设置为在检索到与上述查询语句对应的历史执行信息的情况下,根据该历史执行信息进行查询。使得分布式数据库在查询过程中可以节省部分执行树的生成过程,从而降低分布式数据库查询的时延并提高并发度。
图5是本实施例提供的分布式数据库查询装置结构框图二,如图5所示,检索模块402包括语法分析单元5002、检索单元5004。
其中,语法分析单元5002,可以设置为用预设字符替换所述查询语句中的常量得到新的查询语句,对所述新的查询语句进行哈希计算,生成对应的哈希值。
检索单元5004,可以设置为根据所述哈希值在所述查询缓存区中进行检索。本实施例中该查询语句可以为SQL语句。
即语法分析单元5002可以设置为对上述SQL语句进行词法分析,找出SQL语句中的常量值、并将常量值按顺序记录在一个数组中,然后使用占位符替换找出的常量值。再对常量替换后的SQL语句进行哈希计算生成哈希值,检索单元5004可以设置为以该哈希值为KEY(键)值到上述查询缓存区中进行检索。
可选地,上述的历史执行信息一般包括表数据分布信息、目的数据库信息、常量替换的位置信息中的一项或多项。可选地,该装置还包括收集模块506、缓存模块508。其中,缓存模块508还可以包括判断单元5006、缓存单元5008。
收集模块506可以设置为在根据接收到的查询语句在分布式数据库的查询缓存区中进行检索之前,收集历史查询过程中的历史执行信息;缓存模块508可以设置为在所述历史执行信息符合预设规则的情况下,将所述历史执行信息保存到所述查询缓存区中。
其中,也可以在根据接收到的查询语句到查询缓存区中进行检索之后收集此次查询过程中的执行信息,并且,无论是否检索到对应的历史执行信息,都可以收集此次查询过程中的执行信息。
执行模块404还设置为在上述查询缓存区且未检索到对应的历史执行信息时,则启动对该查询语句(即SQL语句)执行正常的分布式查询流程。
收集模块506设置为在上述查询的过程中收集本次查询的执行信息,并由缓存模块508中的判断单元5006判断该条查询语句是否可以缓存;其中,判断的规则是:如果该查询语句的执行信息不需要分布式系统参与结果集计算,则可以缓存该执行信息;如果需要分布式系统参与对结果集进行分组、排序、汇聚计算、结果集去重和结果集限定等其中之一的操作,都不缓存该执行信息。
判断单元5006是设置为判断所述历史查询过程中查询的数据是否需要所述分布式数据库的多个数据节点进行计算处理。缓存单元5008在所述历史查询过 程中查询的数据不需要所述多个数据节点进行计算处理时,则返回需要缓存的历史执行信息,,将所述历史执行信息保存到所述查询缓存区中,以备下次相同或相似的语句发来时调用该历史执行信息进行数据查询。
本实施例还提供了一种分布式数据库管理系统,图6是本实施例提供的分布式数据库管理系统结果框图,图7是本实施例提供的分布式数据库管理系统架构示意图,如图6和图7所示,该系统至少包括上述实施例提供的分布式数据库查询装置以及查询缓存区。该分布式数据库查询装置至少包括检索模块402和执行模块404,其中,检索模块402可以设置为根据接收到的查询语句在查询缓存区中进行检索,上述查询缓存区中保存有历史执行信息;执行模块404可以设置为在上述检索模块检索到与上述查询语句对应的历史执行信息的情况下,根据该历史执行信息进行查询。
其中,上述的历史执行信息可以包括表数据分布信息、目的数据库信息和常量替换的位置信息中的一项或多项。当检索到这些历史执行信息时,就可以根据这些历史执行信息以及常量值信息计算出要查询的数据节点,从而执行查询操作。
在本实施例中,可以根据历史执行信息对查询语句进行查询,使得分布式数据库在查询过程中可以节省部分执行树的生成过程,从而降低分布式数据库查询的时延并提高并发度。
本实施例还提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行上述任意一种方法。
如图8所示,是本实施例提供的一种服务器的硬件结构示意图,如图8所 示,该服务器包括:处理器(processor)810和存储器(memory)820;还可以包括通信接口(Communications Interface)830和总线840。
其中,处理器810、存储器820和通信接口830可以通过总线840完成相互间的通信。通信接口830可以用于信息传输。处理器810可以调用存储器820中的逻辑指令,以执行上述实施例的任意一种方法。
存储器820可以包括存储程序区和存储数据区,存储程序区可以存储操作系统和至少一个功能所需的应用程序。存储数据区可以存储根据服务器的使用所创建的数据等。此外,存储器可以包括,例如,随机存取存储器的易失性存储器,还可以包括非易失性存储器。例如至少一个磁盘存储器件、闪存器件或者其他非暂态固态存储器件。
此外,在上述存储器820中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时,该逻辑指令可以存储在一个计算机可读取存储介质中。本公开提供的方法可以以计算机软件产品的形式体现出来,该计算机软件产品可以存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本实施例所述方法的全部或部分步骤。
存储介质可以是非暂态存储介质,也可以是暂态存储介质。非暂态存储介质可以包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等多种可以存储程序代码的介质。
上述实施例提供的方法中的全部或部分流程,是可以通过计算机程序来指示相关的硬件完成的,该程序可存储于一个非暂态计算机可读存储介质中,该程序被执行时,可包括如上述方法的实施例的流程。
工业实用性
本公开提供一种分布式数据库查询方法、装置及管理系统,能够实现在分布式数据库在查询过程中,节省部分执行树的生成过程,从而降低分布式数据库查询的时延并提高并发度。

Claims (11)

  1. 一种分布式数据库查询方法,包括:
    根据接收到的查询语句在分布式数据库的查询缓存区中进行检索,其中,所述查询缓存区中保存有历史执行信息;以及
    在检索到与所述查询语句对应的历史执行信息的情况下,根据所述历史执行信息在所述分布式数据库中进行数据查询。
  2. 根据权利要求1所述的方法,其中,所述根据接收到的查询语句在分布式数据库的查询缓存区中进行检索,包括:
    用预设字符替换所述查询语句中的常量得到新的查询语句,对所述新的查询语句进行哈希计算,生成对应的哈希值;以及
    根据所述哈希值在所述查询缓存区中进行检索。
  3. 根据权利要求1或2所述的方法,其中,在根据接收到的查询语句在分布式数据库的查询缓存区中进行检索之前,还包括:
    收集历史查询过程中的历史执行信息;以及
    在所述历史执行信息符合预设规则的情况下,将所述历史执行信息保存到所述查询缓存区中。
  4. 根据权利要求3所述的方法,其中,在所述历史执行信息符合预设规则的情况下,将所述历史执行信息保存到所述查询缓存区中,包括:
    判断所述历史查询过程中查询的数据是否需要所述分布式数据库的多个数据节点进行计算处理;以及
    在所述历史查询过程中查询的数据不需要所述多个数据节点进行计算处理时,将所述历史执行信息保存到所述查询缓存区中。
  5. 根据权利要求1、2或4所述的方法,其中,所述历史执行信息包括以下至少之一:
    表数据分布信息、目的数据库信息和所述常量替换的位置信息。
  6. 一种分布式数据库查询装置,设置于分布式数据库管理系统中,包括:
    检索模块,设置为根据接收到的查询语句在分布式数据库的查询缓存区中进行检索,其中,所述查询缓存区中保存有历史执行信息;
    执行模块,设置为在所述检索模块检索到与所述查询语句对应的所述历史执行信息的情况下,根据所述历史执行信息在所述分布式数据库中进行数据查询。
  7. 根据权利要求6所述的装置,其中,所述检索模块,包括:
    语法分析单元,设置为用预设字符替换所述查询语句中的常量得到新的查询语句,对所述新的查询语句进行哈希计算,生成对应的哈希值;以及
    检索单元,设置为根据所述哈希值在所述查询缓存区中进行检索。
  8. 根据权利要求6或7所述的装置,其中,所述装置还包括:
    收集模块,设置为在根据接收到的查询语句在分布式数据库的查询缓存区中进行检索之前,收集历史查询过程中的历史执行信息;
    缓存模块,设置为在所述历史执行信息符合预设规则的情况下,将所述历史执行信息保存到所述查询缓存区中。
  9. 根据权利要求8所述的装置,其中,所述缓存模块,包括:
    判断单元,设置为判断所述历史查询过程中查询的数据是否需要所述分布式数据库的多个数据节点进行计算处理;
    缓存单元,设置为在所述历史查询过程中查询的数据不需要所述多个数据节点进行计算处理时,将所述历史执行信息保存到所述查询缓存区中。
  10. 一种分布式数据库管理系统,包括:
    如权利要求6-9中任一项所述的分布式数据库查询装置;
    查询缓存区,设置为保存历史执行信息。
  11. 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1-5任一项所述的方法。
PCT/CN2017/098886 2016-08-25 2017-08-24 分布式数据库查询方法、装置及管理系统 WO2018036549A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610723578.9A CN107783985B (zh) 2016-08-25 2016-08-25 一种分布式数据库查询方法、装置及管理系统
CN201610723578.9 2016-08-25

Publications (1)

Publication Number Publication Date
WO2018036549A1 true WO2018036549A1 (zh) 2018-03-01

Family

ID=61245492

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/098886 WO2018036549A1 (zh) 2016-08-25 2017-08-24 分布式数据库查询方法、装置及管理系统

Country Status (2)

Country Link
CN (1) CN107783985B (zh)
WO (1) WO2018036549A1 (zh)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008448A (zh) * 2019-04-02 2019-07-12 中国工商银行股份有限公司 将SQL代码自动转换为Java代码的方法和装置
CN111291054A (zh) * 2020-02-21 2020-06-16 苏宁云计算有限公司 一种数据处理方法、装置、计算机设备和存储介质
CN111309724A (zh) * 2019-12-31 2020-06-19 航天信息股份有限公司 一种用于对大数据进行处理的方法及系统
CN111639140A (zh) * 2020-06-08 2020-09-08 杭州复杂美科技有限公司 分布式数据存储方法、设备和存储介质
CN112597004A (zh) * 2020-12-11 2021-04-02 广州品唯软件有限公司 Sql语句性能测试方法、装置、计算机设备和存储介质
CN113064912A (zh) * 2021-03-24 2021-07-02 西安热工研究院有限公司 一种dcs后台快速查询历史告警信息的方法
CN113377764A (zh) * 2021-05-07 2021-09-10 北京锐服信科技有限公司 一种pcap数据包高速索引方法及系统
CN114238404A (zh) * 2021-12-15 2022-03-25 建信金融科技有限责任公司 数据的查询方法、装置、存储介质及设备
CN116541420A (zh) * 2023-07-07 2023-08-04 上海爱可生信息技术股份有限公司 向量数据的查询方法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930472A (zh) * 2010-09-09 2010-12-29 南京中兴特种软件有限责任公司 一种支持分布式数据库基于并行查询的方法
CN104216894A (zh) * 2013-05-31 2014-12-17 国际商业机器公司 用于数据查询的方法和系统

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102646130B (zh) * 2012-03-12 2013-08-14 华中科技大学 一种海量历史数据的存储及索引方法
CN104216955B (zh) * 2014-08-20 2017-12-26 百度在线网络技术(北京)有限公司 一种操作数据及管理事务的方法、装置及分布式系统

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101930472A (zh) * 2010-09-09 2010-12-29 南京中兴特种软件有限责任公司 一种支持分布式数据库基于并行查询的方法
CN104216894A (zh) * 2013-05-31 2014-12-17 国际商业机器公司 用于数据查询的方法和系统

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110008448B (zh) * 2019-04-02 2023-10-17 中国工商银行股份有限公司 将SQL代码自动转换为Java代码的方法和装置
CN110008448A (zh) * 2019-04-02 2019-07-12 中国工商银行股份有限公司 将SQL代码自动转换为Java代码的方法和装置
CN111309724A (zh) * 2019-12-31 2020-06-19 航天信息股份有限公司 一种用于对大数据进行处理的方法及系统
CN111291054B (zh) * 2020-02-21 2023-05-16 苏宁云计算有限公司 一种数据处理方法、装置、计算机设备和存储介质
CN111291054A (zh) * 2020-02-21 2020-06-16 苏宁云计算有限公司 一种数据处理方法、装置、计算机设备和存储介质
CN111639140A (zh) * 2020-06-08 2020-09-08 杭州复杂美科技有限公司 分布式数据存储方法、设备和存储介质
CN112597004A (zh) * 2020-12-11 2021-04-02 广州品唯软件有限公司 Sql语句性能测试方法、装置、计算机设备和存储介质
CN113064912B (zh) * 2021-03-24 2023-07-21 西安热工研究院有限公司 一种dcs后台快速查询历史告警信息的方法
CN113064912A (zh) * 2021-03-24 2021-07-02 西安热工研究院有限公司 一种dcs后台快速查询历史告警信息的方法
CN113377764A (zh) * 2021-05-07 2021-09-10 北京锐服信科技有限公司 一种pcap数据包高速索引方法及系统
CN113377764B (zh) * 2021-05-07 2024-04-12 北京锐服信科技有限公司 一种pcap数据包高速索引方法及系统
CN114238404A (zh) * 2021-12-15 2022-03-25 建信金融科技有限责任公司 数据的查询方法、装置、存储介质及设备
CN116541420A (zh) * 2023-07-07 2023-08-04 上海爱可生信息技术股份有限公司 向量数据的查询方法
CN116541420B (zh) * 2023-07-07 2023-09-15 上海爱可生信息技术股份有限公司 向量数据的查询方法

Also Published As

Publication number Publication date
CN107783985A (zh) 2018-03-09
CN107783985B (zh) 2021-04-16

Similar Documents

Publication Publication Date Title
WO2018036549A1 (zh) 分布式数据库查询方法、装置及管理系统
US10866971B2 (en) Hash collision tables for relational operations
US11157473B2 (en) Multisource semantic partitioning
US20180039671A1 (en) Method and apparatus for querying data in cross-shard databases
US11030196B2 (en) Method and apparatus for processing join query
WO2017096892A1 (zh) 索引构建方法、查询方法及对应装置、设备、计算机存储介质
CN104903894A (zh) 用于分布式数据库查询引擎的系统和方法
CN107704202B (zh) 一种数据快速读写的方法和装置
CN104252536A (zh) 一种基于hbase的上网日志数据查询方法及装置
US9229961B2 (en) Database management delete efficiency
US9378235B2 (en) Management of updates in a database system
CN111512283B (zh) 数据库中的基数估算
WO2017161540A1 (zh) 数据查询的方法、数据对象的存储方法和数据系统
US11132345B2 (en) Real time indexing
CN113297250A (zh) 一种分布式数据库多表关联查询的方法及系统
CA3057038C (en) Data filtering method, apparatus, electronic apparatus and storage medium
US9229969B2 (en) Management of searches in a database system
CN109033295B (zh) 超大数据集的合并方法及装置
JP2019040245A (ja) データ提供プロラム、データ提供方法、及びデータ提供装置
US11625399B2 (en) Methods and devices for dynamic filter pushdown for massive parallel processing databases on cloud
JP6189266B2 (ja) データ処理装置、データ処理方法及びデータ処理プログラム
CN109213972B (zh) 确定文档相似度的方法、装置、设备和计算机存储介质
CN113032368A (zh) 一种数据迁移方法、装置、存储介质及平台
CN111639099A (zh) 全文索引方法及系统
US11868352B2 (en) Systems and methods for spilling data for hash joins

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17842971

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17842971

Country of ref document: EP

Kind code of ref document: A1