CN110659327A - Method and related device for realizing interactive query of data between heterogeneous databases - Google Patents

Method and related device for realizing interactive query of data between heterogeneous databases Download PDF

Info

Publication number
CN110659327A
CN110659327A CN201910759791.9A CN201910759791A CN110659327A CN 110659327 A CN110659327 A CN 110659327A CN 201910759791 A CN201910759791 A CN 201910759791A CN 110659327 A CN110659327 A CN 110659327A
Authority
CN
China
Prior art keywords
query
data
logic execution
heterogeneous database
execution plan
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910759791.9A
Other languages
Chinese (zh)
Inventor
倪程伟
汪涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910759791.9A priority Critical patent/CN110659327A/en
Priority to PCT/CN2019/118024 priority patent/WO2021031407A1/en
Publication of CN110659327A publication Critical patent/CN110659327A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP

Abstract

The application discloses a method and a device for realizing interactive query of data between heterogeneous databases, and relates to the technical field of big data. The method comprises the following steps: the heterogeneous database system receives a data query request initiated by a query client through the set scheduling node; analyzing query statements in the data query request into a logic execution plan through a scheduling node, and performing distributed processing on the logic execution plan to obtain a plurality of sub-logic execution plans facing the heterogeneous database; performing state detection on working nodes arranged in the heterogeneous database system, and correspondingly allocating the sub-logic execution plans to the working nodes in an idle state, so that the working nodes execute data query of the heterogeneous database according to the allocated sub-logic execution plans; and after the working node returns the result data of executing the data query to the scheduling node for aggregation, the aggregated result data is returned to the query client through the scheduling node. According to the method and the device, efficient query of data between heterogeneous databases can be achieved.

Description

Method and related device for realizing interactive query of data between heterogeneous databases
Technical Field
The present application relates to the field of big data technologies, and in particular, to a method and an apparatus for implementing interactive query of data between heterogeneous databases, an electronic device, and a computer-readable storage medium.
Background
The heterogeneous database system is a collection of related databases, and although the architectures of the databases are different from each other, sharing and transparent access of data among the databases can be realized.
In the existing implementation, because data among the heterogeneous databases are correlated, when data query is performed on the heterogeneous database system, data in each heterogeneous database must be imported into the same database in a data exchange manner, and then interactive data query among the heterogeneous databases is performed in the database, so that the query process is very complicated.
In an actual business scenario, if a business department wants to analyze data of a plurality of business systems deployed by a company, the business department cannot efficiently analyze the business data due to the fact that a technical department needs to provide support for data exchange and other processes such as information security approval are added.
Therefore, how to realize efficient query of data between heterogeneous databases is a technical problem to be solved urgently in the prior art.
It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present application and therefore may include information that does not constitute prior art known to a person of ordinary skill in the art.
Disclosure of Invention
Based on the technical problem, the application provides a method and a device for realizing interactive query of data between heterogeneous databases, electronic equipment and a computer-readable storage medium.
The technical scheme disclosed by the application comprises the following steps:
a method for realizing interactive query of data between heterogeneous databases comprises the following steps: a heterogeneous database system receives a data query request initiated by a query client through a set scheduling node, wherein the heterogeneous database system is a set of a plurality of heterogeneous databases;
analyzing query statements in the data query request into a logic execution plan through the scheduling node, and performing distributed processing on the logic execution plan to obtain a plurality of sub-logic execution plans facing the heterogeneous database, wherein the logic execution plan is a general execution logic for executing data query on the heterogeneous database;
performing state detection on working nodes arranged in the heterogeneous database system, and correspondingly allocating the sub-logic execution plans to the working nodes in an idle state, so that the working nodes execute data query of the heterogeneous database according to the allocated sub-logic execution plans;
and after the working node returns the result data of executing the data query to the scheduling node for aggregation, returning the aggregated result data to the query client through the scheduling node.
In an exemplary embodiment, the query statement is a standard ANSI SQL query statement, and the parsing, by the scheduling node, the query statement in the data query request into a logic execution plan includes: performing semantic analysis on the standard ANSESQL query statement; after the semantic analysis is carried out, carrying out logic execution plan analysis on the standard ANSI SQL query statement to obtain an initial logic execution plan; and optimizing the initial logic execution plan to obtain the logic execution plan.
In an exemplary embodiment, the obtaining the logic execution plan by optimizing the initial logic execution plan includes: for each logic clause in the initial logic execution plan, rewriting equivalent predicates and simplifying specified conditions to obtain a locally optimized logic clause; eliminating external connection and nested connection between the locally optimized logic clauses to obtain an initial logic execution plan of correlation optimization; and performing semantic optimization on the initial logic execution plan subjected to the associated optimization to obtain the logic execution plan.
In an exemplary embodiment, the allocating the sub-logic execution plan to each working node in an idle state by performing state detection on the working nodes set in the heterogeneous database system correspondingly includes: adding all working nodes in the heterogeneous database system to a thread pool, so that the state of each working node is respectively detected by multiple threads started in the thread pool; and respectively distributing the sub-logic execution plan to each working node in an idle state.
In an exemplary embodiment, after the working node returns the result data of executing the data query to the scheduling node for aggregation, returning the aggregated result data to the query client through the scheduling node includes: and according to the sequence of the result data returned by the working nodes, the scheduling nodes collect the result data, and return the collected result data to the query client according to the original path of the data query request.
In an exemplary embodiment, before the heterogeneous database system receives a data query request initiated by a query client according to the set scheduling node, the method further includes: receiving account information sent by the inquiry client through the scheduling node, and verifying the account information; and after the account information passes the verification, the scheduling node opens the authority of the inquiry client to access the heterogeneous database system.
In an exemplary embodiment, after the heterogeneous database system receives a data query request initiated by a query client according to the set scheduling node, the method further includes: and inquiring the account information of the inquiry client in a configuration file preset in the heterogeneous database system, acquiring the inquiry authority of the inquiry client facing the heterogeneous database, and executing data inquiry on the heterogeneous database system according to the inquiry authority.
An apparatus for implementing interactive query of data between heterogeneous databases, comprising: the query request receiving module is used for controlling a heterogeneous database system to receive a data query request initiated by a query client through a set scheduling node, wherein the heterogeneous database system is a set of a plurality of heterogeneous databases; a logic execution plan conversion module, configured to parse a query statement in the data query request into a logic execution plan through the scheduling node, and perform distributed processing on the logic execution plan to obtain a plurality of sub-logic execution plans for the heterogeneous database, where the logic execution plan is a general execution logic for executing data query on the heterogeneous database; the logic execution plan execution module is used for correspondingly distributing the sub-logic execution plans to each working node in an idle state by carrying out state detection on the working nodes arranged in the heterogeneous database system, so that the working nodes execute data query of a target heterogeneous database according to the distributed sub-logic execution plans; and the query result acquisition module is used for returning the summarized result data to the query client through the scheduling node after the working node returns the result data for executing the data query to the scheduling node for summarization.
An electronic device, the electronic device comprising:
a processor;
a memory having computer readable instructions stored thereon which, when executed by the processor, implement a method as in any preceding item.
A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements a method as in any preceding claim.
The technical scheme provided by the embodiment of the application can have the following beneficial effects:
in the technical scheme, the heterogeneous database system analyzes the query statement in the data query request into a common logic execution plan among different heterogeneous databases through the scheduling node, and further converts the logic execution plan into a plurality of sub-logic execution plans facing each heterogeneous database, so that the working node can execute the data query of the corresponding heterogeneous database according to the sub-logic execution plans. The heterogeneous database system also carries out state detection on each working node, and divides and configures each sub-logic execution meter to the working node in an idle state, so that reasonable configuration of resources of the heterogeneous database system is realized.
Therefore, the heterogeneous database system disclosed by the application carries out conversion and distribution of the logic execution plan through the scheduling node, and carries out data query of the heterogeneous databases through the working node, and data exchange among the heterogeneous databases is not needed, so that efficient query of data among the heterogeneous databases is realized.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic diagram illustrating an implementation environment to which the present application relates, according to an example embodiment;
FIG. 2 is a schematic diagram of a heterogeneous database system, shown in accordance with an exemplary embodiment;
FIG. 3 is a flow diagram illustrating a method for implementing interactive querying of data between heterogeneous databases in accordance with an exemplary embodiment;
FIG. 4 is a flowchart illustrating the description of step 220 according to a corresponding embodiment of FIG. 3;
FIG. 5 is a flowchart illustrating a description of step 230 according to a corresponding embodiment of FIG. 3;
FIG. 6 is a flow diagram illustrating a method for implementing interactive querying of data between heterogeneous databases in accordance with an exemplary embodiment;
FIG. 7 is a block diagram illustrating an apparatus for interactive querying of data between disparate databases, in accordance with an exemplary embodiment;
FIG. 8 is a hardware block diagram of an electronic device shown in accordance with an example embodiment.
While certain embodiments of the present application have been illustrated by the accompanying drawings and described in detail below, such drawings and description are not intended to limit the scope of the inventive concepts in any manner, but are rather intended to explain the concepts of the present application to those skilled in the art by reference to the particular embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
FIG. 1 is a schematic diagram illustrating one implementation environment to which the present application relates, according to an example embodiment. As shown in FIG. 1, the implementation environment includes a query client 100 and a query server 200.
Wherein, a wired or wireless network connection is pre-established between the query client 100 and the query server 200, so as to realize the interaction between the query client 100 and the query server 200.
The query client 100 is configured to provide a user interaction interface for a user to query the query server 200 for data and display a query result. For example, the user interaction interface provided by the query client 100 is provided with an entry for inputting a query instruction, and the user inputs query information such as a query keyword, so that the result data obtained by the query can be correspondingly displayed on the user interaction interface.
By way of example, the query client 100 may be an electronic device such as a smartphone, a tablet, a laptop, a computer, etc., and the number thereof is not limited (only 2 are shown in fig. 1). The user interaction interface provided by the query client 100 may be a browser page or an APP (Application) page, which is not limited herein.
The query server 200 is deployed with a heterogeneous database system, which is a collection of multiple related databases with different architectures, and illustratively, the heterogeneous database system may include common databases such as Oracle, MySQL, Postgre, and the like. Although the databases are different in architecture, due to the association between the data stored in the databases, interactive query is usually performed on the databases to return the queried data set to the query client 100 as a query result. Therefore, data sharing and transparent access can be realized among the heterogeneous databases.
The query server 200 may be a server and provided with a plurality of related databases, or the query server 200 may also be a server cluster formed by a plurality of servers, and the databases set by different servers have different architectures, which is not limited in this place.
FIG. 2 is a schematic diagram of a heterogeneous database system, shown in accordance with an exemplary embodiment. As mentioned above, the heterogeneous database system is disposed at the query server 200 to implement sharing and transparent access of data between the databases at the bottom layer.
As shown in fig. 2, the heterogeneous database system exposes an API (Application Programming Interface) of the scheduling node, so that the query client sends a data query request to the query server by calling the API of the scheduling node.
The plurality of related heterogeneous databases form a distributed file system (HDFS) which serves as a data bottom layer of the heterogeneous database system to provide a data source of data query.
The heterogeneous database system is provided with a scheduling node and a plurality of working nodes, wherein the scheduling node is used for analyzing and processing the received data query request, obtaining a logic execution plan common to each heterogeneous database, converting the logic execution plan into a plurality of sub-logic execution plans and distributing the sub-logic execution plans to the working nodes so that the working nodes can execute data query of the heterogeneous databases. It should be understood that a scheduling node and a worker node refer to applications deployed by a heterogeneous database system that can independently perform specified tasks.
In an exemplary embodiment, a backup scheduling node and a backup working node are correspondingly arranged for a scheduling node and each working node arranged in a heterogeneous database system, and a data transfer module is arranged to ensure the consistency of data between each node and the backup node.
For example, if the current calling node or working node has a function abnormality, the backup calling node or the backup working node is automatically started to continue to execute the data query of the heterogeneous database, so that the overall performance of the heterogeneous database system is not affected.
Therefore, the heterogeneous database system performs conversion and distribution of the logic execution plan through the scheduling node, performs data query of the heterogeneous databases through the working node, and does not need data exchange between the heterogeneous databases, so that efficient query of data between the heterogeneous databases is realized.
FIG. 3 is a flow diagram illustrating a method for interactive querying of data between heterogeneous databases in accordance with the heterogeneous database system of FIG. 2, in an exemplary embodiment. As shown in fig. 3, the method comprises at least the following steps:
in step 210, the heterogeneous database system receives a data query request initiated by a query client through the set scheduling node.
The heterogeneous database system performs data interaction with the external device through the scheduling node, and since the API corresponding to the scheduling node is exposed, an external application program (for example, a query client) can query data stored in association with different databases from the heterogeneous database system by calling the API.
After acquiring query information such as query keywords input by a user, a query client processes the query information into a data query request, and sends the data query request to the heterogeneous database system by calling an API (application program interface) exposed in the heterogeneous database system, and a scheduling node in the heterogeneous database system receives the data query request.
Illustratively, the data query request is composed of the name and parameters of the API corresponding to the scheduling node and a specific query statement. The Query statement is an SQL (Structured Query Language) statement.
Step 220, analyzing the query statement in the data query request into a logic execution plan through the scheduling node, and performing distributed processing on the logic execution plan to obtain a plurality of sub-logic execution plans facing the heterogeneous database.
Wherein the logic execution plan is a generic execution logic that executes data queries on the underlying database. When different heterogeneous databases perform data query operation, due to the limitation of query language, the existing implementation cannot directly perform interactive query of the heterogeneous databases according to data query requests, and data exchange is necessary. The embodiment resolves the query statement in the data query request into a logic execution plan commonly used among heterogeneous databases.
In an exemplary embodiment, the query statement is a standard ANSI SQL query statement. The method is a relatively basic and standard structured query language and can be well converted into a logic execution plan common to underlying heterogeneous databases.
After the scheduling node receives the data query request, the analysis of the query statement in the data query request at least comprises the processes of semantic analysis and logic plan analysis of the query statement and logic plan optimization.
The semantic analysis comprises a process of performing syntax check and semantic check on the query statement respectively. When the scheduling node acquires that the query statement has a syntax error or a semantic error, returning error information to the query client; after the query statement is subjected to semantic analysis, logic plan analysis can be performed on the query statement to obtain an initial logic plan. And because the initial logic execution plan has partial redundant information, the initial logic plan can be optimized through the plan optimizer so as to finally obtain the logic execution plan which is universal to various heterogeneous databases.
It should be noted that the semantic analysis of the query statement may be implemented by a parser, the conversion of the query statement into the initial logic execution plan may be implemented by a logic parser, and the optimization of the initial logic execution plan may be implemented by a plan optimizer, and the parser, the logic parser and the plan optimizer should also be understood as an application program that can independently execute the specified processing tasks on the query statement.
The distributed processing of the logic execution plan is a process of converting a logic execution sub-plan into a plurality of sub-logic execution plans. For example, the logic execution plan may be converted into a corresponding number of sub-logic execution plans according to the number of heterogeneous databases queried by the logic execution plan, and each sub-logic execution plan is configured to execute a data query of one heterogeneous database. For example, assuming that A, B, C, D four heterogeneous databases are provided in the heterogeneous database system, when the logic execution plan only includes data queries for the heterogeneous databases a and C, the logic execution plan may be converted into sub-logic execution plans for respectively executing the data query in the heterogeneous database a and the data query in the heterogeneous database B. It should be noted that each sub-logic execution plan may execute a query of at least one type of data on the same heterogeneous database.
Or, because the relevance of the data stored between the heterogeneous databases requires that the query on certain data depends on the query results of other heterogeneous databases, the logic execution plan can be divided into a plurality of sub-logic execution plans according to the relevance of the data query between the heterogeneous databases. For example, a sub-logic execution plan executes related data queries against heterogeneous databases A, B and C.
It should be noted that, since the sub-logic execution plan is obtained by distributed processing of the logic execution plan, the sub-logic execution plan should also be commonly used for each underlying heterogeneous database in the heterogeneous database system.
And step 230, performing state detection on the working nodes arranged in the heterogeneous database system, and correspondingly allocating the sub-logic execution plans to the working nodes in the idle state, so that the working nodes execute data query of the heterogeneous database according to the allocated sub-logic execution plans.
The working nodes are used for executing data query of the bottom heterogeneous database in the heterogeneous database system. If the working node is executing the data query of the heterogeneous database, the working node is in a working state, otherwise, the working node is in an idle state.
Through the state detection of the working nodes, the currently idle working nodes of the heterogeneous database system can be obtained, and the idle working nodes can be used for executing the sub-logic execution plan to be distributed. By distributing the sub-logic execution plans obtained by distributed processing to the working nodes in the idle state, the working nodes can execute the data query corresponding to the heterogeneous database according to the distributed sub-logic execution plans.
Illustratively, the sub-logic execution plan corresponding to step 220 is to execute related data queries on the heterogeneous databases A, B and C, and the assigned work nodes execute related data queries on the heterogeneous databases A, B and C, respectively
Step 240, after the working node returns the result data of executing the data query to the scheduling node for aggregation, the scheduling node returns the aggregated result data to the query client.
And the working node executes the distributed sub-logic execution plan to obtain result data, and then returns the result data to the scheduling node. And the scheduling node summarizes the result data according to the sequence of the received result data, and the summarized result data is the query result set corresponding to the data query request sent by the query client.
And the scheduling node returns the summarized result data to the query client according to the original path of the data query request, so that the query client correspondingly displays the summarized result data, and a user can conveniently obtain a query result.
In this embodiment, the heterogeneous database system parses a query statement in the data query request into a logic execution plan common to different heterogeneous databases by setting the scheduling node, and further converts the logic execution plan into a plurality of sub-logic execution plans for each heterogeneous database, so that the working node can execute data query of the heterogeneous databases according to the sub-logic execution plans. Because data exchange among heterogeneous databases is not needed in the heterogeneous database system, the data among the heterogeneous databases can be efficiently inquired.
And the heterogeneous database system also carries out state detection on each working node and respectively divides and configures each sub-logic execution meter to the working node in an idle state, thereby realizing reasonable configuration of resources of the heterogeneous database system.
In addition, in this embodiment, the interactive query process for data between heterogeneous databases is executed in the memory, which can avoid redundant disk reading and writing and delay, and improve data query performance.
Fig. 4 is a flow chart of step 220 in an exemplary embodiment in the corresponding embodiment of fig. 3. As shown in FIG. 4, the process of optimizing the initial logic execution plan includes at least the following steps:
and step 221, for each logic clause in the initial logic execution plan, rewriting equivalent predicates and simplifying specified conditions to obtain a locally optimized logic clause.
The equivalent predicates contained in the logic clauses in the initial logic execution plan can comprise common predicates such as like, between, and, in and or the like.
The rewriting of the equivalent predicate on the logic clause means that the predicate contained in the logic clause is converted into another predicate expression under the condition that a preset predicate conversion rule is satisfied, and the converted new predicate is called the equivalent predicate.
Illustratively, assuming that the logical clauses are "sno between 10 and 20", it can be rewritten as "sno >10 and sno < 20" on condition that the between-and rule is satisfied, and is called the equivalent predicate of between. Similarly, assuming that the logical clause is "name like Abc", it can be rewritten as "name > -Abc and name < Abc" on condition that the like rule is satisfied, and this time, and is called the equivalent predicate of like.
The simplification of the specified condition for the logical clause means that when the aggregation function does not exist in the logical clause, the changing condition and the where condition existing in the logical clause are merged, so that some information such as redundant brackets in the logical clause can be removed.
Step 222, eliminating external connection and nested connection between the locally optimized logic clauses, and obtaining an initial logic execution plan of the associated optimization.
The external connection among the logic clauses comprises at least one of left external connection, right external connection and full external connection, and the elimination of the external connection refers to the conversion of the external connection among the logic clauses into the internal connection. It should be noted that, the connection relationship between the logic clauses is converted into the internal connection, so that the speed of the query operation corresponding to the logic clauses can be effectively increased.
Nested connections between logical clauses means that the order in which the connection operations are performed between logical clauses is not performed one by one from left to right. It should be noted that eliminating nested connections between logical clauses means eliminating parenthesized information in the nested connections, for example, for a statement "select from a join (b join c on b.b1 ═ c.c1) on a.a1 ═ b.b.b 1 where a.a1> 1; the "removal of parentheses has no effect on the meaning and can be eliminated.
Therefore, the initial logic execution plan with optimized association can be obtained by optimizing the connection relation between the logic clauses.
Step 223, a logic execution plan is obtained by performing semantic optimization on the initial logic execution plan of the association optimization.
The semantic optimization of the initial logic execution plan of the association optimization obtained in step 222 may include moving up or down grouping operation.
The group operation move-up means that the group operation in the initial logic execution plan is executed after the group operation. If the join operation can filter out most tuples, the join operation is performed first and then the grouping operation is performed, so that the grouping operation efficiency can be improved.
The grouping operation downward movement means that the grouping operation in the initial logic execution plan is executed in advance. Grouping operation can greatly reduce the number of the relation tuples, and if the grouping operation can be carried out first and then connection can be carried out, the connection efficiency can be improved.
Moreover, semantically optimizing the initial logic execution plan for the association optimization may further include eliminating unnecessary sorting operations in the initial logic execution plan, and avoiding the occurrence of sorting operations or operations resulting from sorting.
Therefore, the method provided by the embodiment can realize the optimization of the initial logic execution plan, for example, eliminate some redundant information in the initial logic execution plan, optimize the connection relationship between logic clauses and the like, and obtain a better logic execution plan.
Fig. 5 is a flow chart of step 230 in an exemplary embodiment in the corresponding embodiment of fig. 3. As shown in fig. 5, step 230 includes at least the following steps:
in step 231, all the working nodes in the heterogeneous database system are added to the thread pool, so that the multithread enabled in the thread pool detects the states of the working nodes respectively.
The thread pool is a multi-thread processing form, and in the starting of the thread pool, multiple threads respectively execute processing tasks added in the queue. All the working nodes in the heterogeneous database system are added to the thread pool, and tasks to be executed by multiple threads are added to the task queue, so that the threads in the thread pool respectively execute state detection of each working node, and the state of each working node is obtained in real time.
In an exemplary embodiment, since too many threads may cause scheduling overhead, thereby affecting the overall performance of the heterogeneous database system, the number of threads may be set according to the number of working nodes set by the heterogeneous database system, so as to meet the requirement of the working nodes on multithreading.
And step 232, distributing the sub-logic execution plan to each working node in the idle state respectively.
In an exemplary embodiment, after detecting that the number of working nodes in the idle state matches the number of the sub-logic execution plans, the sub-logic execution plans may be respectively allocated to the working nodes in the idle state.
However, in consideration of reasonable allocation of resources in the heterogeneous database system, when a working node in an idle state is detected, the sub-logic execution plans can be allocated until the last sub-logic execution plan is allocated to the working node.
If the execution sequence exists between different sub-logic execution plans, the sub-logic execution plans can be sequentially distributed to the detected working nodes in the idle state according to the execution sequence.
Therefore, in the embodiment, all the working nodes in the heterogeneous database are added to the thread pool, so that the multithreading enabled in the thread pool detects the states of the working nodes in real time, and the distribution of the sub-logic execution plan to the working nodes is very convenient.
FIG. 6 is a flowchart illustrating a method of implementing interactive querying of data between heterogeneous databases in accordance with another exemplary embodiment. As shown in fig. 6, before step 210, the method further comprises the steps of:
step 310, receiving the account information sent by the query client through the scheduling node, and verifying the account information.
The account information sent by the query client can comprise a user name and a password for logging in a query client interface, and the access authority control of the heterogeneous database system is realized by verifying the account information sent by the query client.
Illustratively, the verification of the account information sent by the client is implemented by LDAP (lightweight directory access protocol). When a user logs in the query client interface, the query client sends a logged-in user name and a logged-in password to the scheduling node, and after the scheduling node receives the user name and the logged-in password, the user name and the logged-in password are verified through the configured LDAP service.
In the configured LDAP service, the account information which is allowed to be accessed by the heterogeneous database is stored in a directory tree in advance, and each node in the directory tree is one piece of account information. Because the LDAP service is dynamic, account information that the heterogeneous database system allows access to may be dynamically updated.
And after receiving the account information sent by the query client, the scheduling node searches whether the account information sent by the query client is stored in the directory tree by traversing the directory tree so as to verify the account information, if so, the account information is verified, and otherwise, the account information is not verified.
And step 320, after the account information passes the verification, the scheduling node opens the authority of the query client to access the heterogeneous database system.
After the account information passes verification, the step of opening the authority for inquiring the client to access the heterogeneous database system by the scheduling node is as follows: and the scheduling node responds to the access of the query client and executes the data query of the bottom-layer heterogeneous database in the heterogeneous database system according to the data query request sent by the query client.
If the account information is not verified, the inquiry client does not have the authority of accessing the heterogeneous database system, and the heterogeneous database system does not open the API corresponding to the scheduling node, so that the inquiry client cannot send a data inquiry request to the scheduling node.
Therefore, the embodiment can realize the access authority control of the heterogeneous database system by configuring the LDAP service, and the access security of the heterogeneous database system is improved.
In another exemplary embodiment, after step 210, the method for implementing interactive query of data between heterogeneous databases further comprises the following steps:
and inquiring the account information of the inquiry client in a preset configuration file of the heterogeneous database system, acquiring inquiry authority of the inquiry client facing the heterogeneous database system, and executing data inquiry of the heterogeneous database system according to the inquiry authority.
In consideration of that, in an actual service scenario, different users often have different requirements for acquiring service data, and further in consideration of access security of the service data, it is necessary to perform role control on the query authority of the service data. For example, assuming that heterogeneous databases A, B and C are provided in the heterogeneous database system to store common business data, and heterogeneous database D is provided to store important business data, common business personnel are limited to querying the common business data in heterogeneous databases A, B and C, and business managers can query the important business data in heterogeneous database D. Or setting heterogeneous databases A and B in the heterogeneous database system to store business data of a first business department, setting heterogeneous databases C and D to store business data of a second business department, and enabling workers of all departments to only access the business data of the departments.
The preset configuration file refers to a query authority list preset by the heterogeneous database system for different service personnel. For example, the configuration file specifies a target heterogeneous database capable of performing a query operation for each account information, a type of query operation capable of being performed on the target database, and the like.
When the scheduling node acquires the data query request, the query permission of the current user to each heterogeneous database is acquired by matching the account information in the configuration file according to the account information of the current user, and therefore the query permission of the current query client facing the heterogeneous database system is acquired.
And in the process of executing the data query on the heterogeneous database system according to the obtained query authority, the scheduling node only contains the data query execution logic of the heterogeneous database authorized to be queried through a logic execution plan obtained by analyzing the query statement. And filtering the heterogeneous database of the unauthorized query in the logic execution plan generated by the scheduling node. Therefore, the sub-logic execution plan executed by the working node is also oriented to the heterogeneous database allowed by the query authority, and the role control of the query client for carrying out data query on the heterogeneous database system is realized.
Fig. 7 is a diagram illustrating an apparatus for implementing interactive query of data between heterogeneous databases, according to an example embodiment. As shown in fig. 7, the apparatus includes a query request receiving module 410, a logic execution plan converting module 420, a logic execution plan executing module 430, and a query result obtaining module 440.
The query request receiving module 410 is configured to control a heterogeneous database system to receive a data query request initiated by a query client through a set scheduling node, where the heterogeneous database system is a collection of several heterogeneous databases.
The logic execution plan conversion module 420 is configured to parse a query statement in the data query request into a logic execution plan through the scheduling node, and perform distributed processing on the logic execution plan to obtain a plurality of sub-logic execution plans for the heterogeneous database, where the logic execution plan is a general execution logic for executing data query on the heterogeneous database.
The logic execution plan executing module 430 is configured to perform state detection on the working nodes set in the heterogeneous database system, and correspondingly allocate the sub-logic execution plans to the working nodes in the idle state, so that the working nodes execute data queries of the target heterogeneous database according to the allocated sub-logic execution plans.
The query result obtaining module 440 is configured to, after the working node returns the result data of executing the data query to the scheduling node for aggregation, return the aggregated result data to the query client through the scheduling node.
In another exemplary embodiment, the logic execution plan conversion module 420 includes a semantic analysis unit, an initial plan acquisition unit, and an initial plan optimization unit.
The semantic analysis unit is used for performing semantic analysis on the standard ANSI SQL query statement.
The initial plan acquisition unit is used for carrying out logic execution plan analysis on the standard ANSI SQL query statement output by the semantic unit to obtain an initial logic execution plan.
The initial plan optimization unit is used for optimizing the initial logic execution plan to obtain a logic execution plan.
In another exemplary embodiment, the initial plan optimization unit includes a local optimization subunit, an association optimization subunit, and a semantic optimization subunit.
The local optimization subunit is used for rewriting equivalent predicates and simplifying specified conditions for each logic clause in the initial logic execution plan to obtain a locally optimized logic clause.
And the association optimization subunit is used for eliminating external connection and nested connection between the locally optimized logic clauses to obtain an initial logic execution plan of association optimization.
And the semantic optimization subunit is used for performing semantic optimization on the initial logic execution plan subjected to the association optimization to obtain a logic execution plan.
In another exemplary embodiment, the logic execution plan execution module 430 includes a multithreading detection unit and a plan allocation unit.
The multithreading detection unit is used for enabling multithreading started in the thread pool to respectively detect the states of all the working nodes by adding all the working nodes in the heterogeneous database system to the thread pool.
And the plan distribution unit is used for distributing the sub-logic execution plan to each working node in an idle state respectively.
In another exemplary embodiment, the apparatus further includes an account information verifying module and an access right performing module.
The account information verification module is used for receiving the account information sent by the inquiry client through the scheduling node and verifying the account information.
And the access authority execution module is used for opening the authority of the inquiry client to access the heterogeneous database system by the scheduling node after the account information passes the verification.
In another exemplary embodiment, the apparatus further comprises an access role control module. The access role control module is used for inquiring the account information of the inquiry client in a preset configuration file of the heterogeneous database system, acquiring inquiry authority of the inquiry client facing the heterogeneous database, and executing data inquiry of the heterogeneous database system according to the inquiry authority.
It should be noted that the apparatus provided in the foregoing embodiment and the method provided in the foregoing embodiment belong to the same concept, and the specific manner in which each module performs operations has been described in detail in the method embodiment, and is not described again here.
In an exemplary embodiment, the present application further provides an electronic device comprising:
a processor;
a memory having stored thereon computer readable instructions which, when executed by the processor, implement a method for interactive querying of data between heterogeneous databases as previously described.
FIG. 8 is a block diagram of an electronic device shown in accordance with an example embodiment. The electronic device may be embodied as the query service 200 in the implementation environment shown in fig. 1.
It should be noted that the electronic device is only an example adapted to the application and should not be considered as providing any limitation to the scope of use of the application. The electronic device is also not to be construed as requiring reliance on, or necessity of, one or more components of the exemplary electronic device illustrated in fig. 8.
The hardware structure of the electronic device may have a large difference due to the difference of configuration or performance, as shown in fig. 8, the electronic device includes: a power supply 610, an interface 630, at least one memory 650, and at least one Central Processing Unit (CPU) 670.
The power supply 610 is used for providing an operating voltage for each hardware device on the electronic device.
The interface 630 includes at least one wired or wireless network interface 631, at least one serial-to-parallel conversion interface 633, at least one input/output interface 635, and at least one USB interface 637, etc. for communicating with external devices.
The memory 650 is used as a carrier of resource storage, and may be a read-only memory, a random access memory, a magnetic disk or an optical disk, etc., on which the stored resources include an operating system 651, application programs 653, data 655, etc., and the storage mode may be a transient storage mode or a permanent storage mode. The operating system 651 is used for managing and controlling hardware devices and application programs 653 on the electronic device, so as to implement the computation and processing of the mass data 655 by the central processing unit 670, which may be windows server, Mac OS XTM, unix, linux, etc. Application programs 653 are computer programs that perform at least one particular task based on operating system 651, and can include at least one module (not shown in FIG. 8), each of which can contain a sequence of computer-readable instructions for the electronic device. Data 655 may be interface metadata or the like stored on disk.
The central processor 670 may include one or more processors and is arranged to communicate with the memory 650 via a bus for computing and processing the mass data 655 in the memory 650.
As described in detail above, an electronic device to which the present application is applied will read a series of computer readable instructions stored in the memory 650 by the central processor 670 to complete the method for implementing interactive query of data between heterogeneous databases as described above.
Furthermore, the present application can also be implemented by hardware circuits or hardware circuits in combination with software instructions, and thus, the implementation of the present application is not limited to any specific hardware circuits, software, or a combination of the two.
In an exemplary embodiment, the present application further provides a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the method for interactive query of data between heterogeneous databases as described above.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method for realizing interactive query of data between heterogeneous databases, which is characterized in that the method comprises the following steps:
a heterogeneous database system receives a data query request initiated by a query client through a set scheduling node, wherein the heterogeneous database system is a set of a plurality of heterogeneous databases;
analyzing query statements in the data query request into a logic execution plan through the scheduling node, and performing distributed processing on the logic execution plan to obtain a plurality of sub-logic execution plans facing the heterogeneous database, wherein the logic execution plan is a general execution logic for executing data query on the heterogeneous database;
performing state detection on working nodes arranged in the heterogeneous database system, and correspondingly allocating the sub-logic execution plans to the working nodes in an idle state, so that the working nodes execute data query of the heterogeneous database according to the allocated sub-logic execution plans;
and after the working node returns the result data of executing the data query to the scheduling node for aggregation, returning the aggregated result data to the query client through the scheduling node.
2. The method of claim 1, wherein the query statement is a standard ANSI SQL query statement, and wherein parsing, by the scheduling node, the query statement in the data query request into a logic execution plan comprises:
performing semantic analysis on the standard ANSI SQL query statement;
after the semantic analysis is carried out, carrying out logic execution plan analysis on the standard ANSI SQL query statement to obtain an initial logic execution plan;
and optimizing the initial logic execution plan to obtain the logic execution plan.
3. The method of claim 2, the obtaining the logic execution plan by optimizing the initial logic execution plan, comprising:
for each logic clause in the initial logic execution plan, rewriting equivalent predicates and simplifying specified conditions to obtain a locally optimized logic clause;
eliminating external connection and nested connection between the locally optimized logic clauses to obtain an initial logic execution plan of correlation optimization;
and performing semantic optimization on the initial logic execution plan subjected to the associated optimization to obtain the logic execution plan.
4. The method according to claim 1, wherein the allocating the sub-logic execution plans to the working nodes in an idle state by performing state detection on the working nodes set in the heterogeneous database system comprises:
adding all working nodes in the heterogeneous database system to a thread pool, so that the state of each working node is respectively detected by multiple threads started in the thread pool;
and respectively distributing the sub-logic execution plan to each working node in an idle state.
5. The method of claim 1, wherein after the working node returns the result data of executing the data query to the scheduling node for aggregation, returning the aggregated result data to the query client through the scheduling node comprises:
and according to the sequence of the result data returned by the working nodes, the scheduling nodes collect the result data, and return the collected result data to the query client according to the original path of the data query request.
6. The method of claim 1, wherein before the heterogeneous database system receives a data query request from a query client according to the set scheduling node, the method further comprises:
receiving account information sent by the inquiry client through the scheduling node, and verifying the account information;
and after the account information passes the verification, the scheduling node opens the authority of the inquiry client to access the heterogeneous database system.
7. The method of claim 6, wherein after the heterogeneous database system receives a data query request from a query client according to the set scheduling node, the method further comprises:
and inquiring the account information of the inquiry client in a configuration file preset in the heterogeneous database system, acquiring the inquiry authority of the inquiry client facing the heterogeneous database, and executing data inquiry on the heterogeneous database system according to the inquiry authority.
8. An apparatus for implementing interactive query of data between heterogeneous data users, the apparatus comprising:
the query request receiving module is used for controlling a heterogeneous database system to receive a data query request initiated by a query client through a set scheduling node, wherein the heterogeneous database system is a set of a plurality of heterogeneous databases;
a logic execution plan conversion module, configured to parse a query statement in the data query request into a logic execution plan through the scheduling node, and perform distributed processing on the logic execution plan to obtain a plurality of sub-logic execution plans for the heterogeneous database, where the logic execution plan is a general execution logic for executing data query on the heterogeneous database;
the logic execution plan execution module is used for correspondingly distributing the sub-logic execution plans to each working node in an idle state by carrying out state detection on the working nodes arranged in the heterogeneous database system, so that the working nodes execute data query of a target heterogeneous database according to the distributed sub-logic execution plans;
and the query result acquisition module is used for returning the summarized result data to the query client through the scheduling node after the working node returns the result data for executing the data query to the scheduling node for summarization.
9. An electronic device, characterized in that the device comprises:
a processor;
a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 7.
CN201910759791.9A 2019-08-16 2019-08-16 Method and related device for realizing interactive query of data between heterogeneous databases Pending CN110659327A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201910759791.9A CN110659327A (en) 2019-08-16 2019-08-16 Method and related device for realizing interactive query of data between heterogeneous databases
PCT/CN2019/118024 WO2021031407A1 (en) 2019-08-16 2019-11-13 Method and apparatus for implementing interactive data query between heterogeneous databases, and electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910759791.9A CN110659327A (en) 2019-08-16 2019-08-16 Method and related device for realizing interactive query of data between heterogeneous databases

Publications (1)

Publication Number Publication Date
CN110659327A true CN110659327A (en) 2020-01-07

Family

ID=69037680

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910759791.9A Pending CN110659327A (en) 2019-08-16 2019-08-16 Method and related device for realizing interactive query of data between heterogeneous databases

Country Status (2)

Country Link
CN (1) CN110659327A (en)
WO (1) WO2021031407A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625558A (en) * 2020-05-07 2020-09-04 苏州浪潮智能科技有限公司 Server architecture, database query method thereof and storage medium
CN111737284A (en) * 2020-08-18 2020-10-02 北京升鑫网络科技有限公司 Pipeline-based database query analysis method and device and computing equipment
CN112685142A (en) * 2020-12-30 2021-04-20 北京明朝万达科技股份有限公司 Distributed data processing system
CN113093681A (en) * 2021-04-08 2021-07-09 四川远星橡胶有限责任公司 Control system and method based on super-fusion and server virtualization
WO2021254288A1 (en) * 2020-06-14 2021-12-23 Wenfei Fan Querying shared data with security heterogeneity
CN113918996A (en) * 2021-11-24 2022-01-11 企查查科技有限公司 Distributed data processing method, device, computer equipment and storage medium
CN114756577A (en) * 2022-03-25 2022-07-15 北京友友天宇系统技术有限公司 Processing method of multi-source heterogeneous data, computer equipment and storage medium
CN115033595A (en) * 2022-08-10 2022-09-09 杭州悦数科技有限公司 Query statement processing method, system, device and medium based on super node
WO2023109725A1 (en) * 2021-12-15 2023-06-22 华为技术有限公司 Data access method and apparatus for database, and device

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100030758A1 (en) * 2008-07-30 2010-02-04 Oracle International Corporation Hybrid optimization strategies in automatic SQL tuning
CN101694665A (en) * 2009-10-27 2010-04-14 中兴通讯股份有限公司 Method and device for data query of heterogeneous data source
US20150254295A1 (en) * 2014-03-04 2015-09-10 International Business Machines Corporation Regression testing of sql execution plans for sql statements
CN105912624A (en) * 2016-04-07 2016-08-31 北京中安智达科技有限公司 Query method for distributed deployed heterogeneous database
CN106445991A (en) * 2016-06-30 2017-02-22 中国石化销售有限公司 Massive data processing method for SCADA system of gas station
CN106844545A (en) * 2016-12-30 2017-06-13 江苏瑞中数据股份有限公司 A kind of implementation method of the Database Systems with double engines based on stsndard SQL
CN107315790A (en) * 2017-06-14 2017-11-03 腾讯科技(深圳)有限公司 A kind of optimization method and device of irrelevant subquery
CN108052635A (en) * 2017-12-20 2018-05-18 江苏瑞中数据股份有限公司 A kind of heterogeneous data source unifies conjunctive query method
CN109284282A (en) * 2018-10-22 2019-01-29 北京极数云舟科技有限公司 One kind being based on MySQL database O&M method and system
CN110059103A (en) * 2019-04-28 2019-07-26 南京大学 A kind of cross-platform unified big data SQL query method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107329814B (en) * 2017-06-16 2020-05-26 电子科技大学 RDMA (remote direct memory Access) -based distributed memory database query engine system
CN109656968A (en) * 2018-11-15 2019-04-19 中国建设银行股份有限公司 Data query method, apparatus and storage medium under distributed environment

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100030758A1 (en) * 2008-07-30 2010-02-04 Oracle International Corporation Hybrid optimization strategies in automatic SQL tuning
CN101694665A (en) * 2009-10-27 2010-04-14 中兴通讯股份有限公司 Method and device for data query of heterogeneous data source
US20150254295A1 (en) * 2014-03-04 2015-09-10 International Business Machines Corporation Regression testing of sql execution plans for sql statements
CN105912624A (en) * 2016-04-07 2016-08-31 北京中安智达科技有限公司 Query method for distributed deployed heterogeneous database
CN106445991A (en) * 2016-06-30 2017-02-22 中国石化销售有限公司 Massive data processing method for SCADA system of gas station
CN106844545A (en) * 2016-12-30 2017-06-13 江苏瑞中数据股份有限公司 A kind of implementation method of the Database Systems with double engines based on stsndard SQL
CN107315790A (en) * 2017-06-14 2017-11-03 腾讯科技(深圳)有限公司 A kind of optimization method and device of irrelevant subquery
CN108052635A (en) * 2017-12-20 2018-05-18 江苏瑞中数据股份有限公司 A kind of heterogeneous data source unifies conjunctive query method
CN109284282A (en) * 2018-10-22 2019-01-29 北京极数云舟科技有限公司 One kind being based on MySQL database O&M method and system
CN110059103A (en) * 2019-04-28 2019-07-26 南京大学 A kind of cross-platform unified big data SQL query method

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111625558A (en) * 2020-05-07 2020-09-04 苏州浪潮智能科技有限公司 Server architecture, database query method thereof and storage medium
WO2021254288A1 (en) * 2020-06-14 2021-12-23 Wenfei Fan Querying shared data with security heterogeneity
CN111737284A (en) * 2020-08-18 2020-10-02 北京升鑫网络科技有限公司 Pipeline-based database query analysis method and device and computing equipment
CN112685142A (en) * 2020-12-30 2021-04-20 北京明朝万达科技股份有限公司 Distributed data processing system
CN113093681A (en) * 2021-04-08 2021-07-09 四川远星橡胶有限责任公司 Control system and method based on super-fusion and server virtualization
CN113918996A (en) * 2021-11-24 2022-01-11 企查查科技有限公司 Distributed data processing method, device, computer equipment and storage medium
CN113918996B (en) * 2021-11-24 2024-03-26 企查查科技股份有限公司 Distributed data processing method, device, computer equipment and storage medium
WO2023109725A1 (en) * 2021-12-15 2023-06-22 华为技术有限公司 Data access method and apparatus for database, and device
CN114756577A (en) * 2022-03-25 2022-07-15 北京友友天宇系统技术有限公司 Processing method of multi-source heterogeneous data, computer equipment and storage medium
CN115033595A (en) * 2022-08-10 2022-09-09 杭州悦数科技有限公司 Query statement processing method, system, device and medium based on super node
CN115033595B (en) * 2022-08-10 2022-11-22 杭州悦数科技有限公司 Query statement processing method, system, device and medium based on super node

Also Published As

Publication number Publication date
WO2021031407A1 (en) 2021-02-25

Similar Documents

Publication Publication Date Title
CN110659327A (en) Method and related device for realizing interactive query of data between heterogeneous databases
US11615087B2 (en) Search time estimate in a data intake and query system
US11580107B2 (en) Bucket data distribution for exporting data to worker nodes
US11586627B2 (en) Partitioning and reducing records at ingest of a worker node
US11593377B2 (en) Assigning processing tasks in a data intake and query system
US11599541B2 (en) Determining records generated by a processing task of a query
US11921672B2 (en) Query execution at a remote heterogeneous data store of a data fabric service
US11442935B2 (en) Determining a record generation estimate of a processing task
US11341131B2 (en) Query scheduling based on a query-resource allocation and resource availability
US11321321B2 (en) Record expansion and reduction based on a processing task in a data intake and query system
US11494380B2 (en) Management of distributed computing framework components in a data fabric service system
US11023463B2 (en) Converting and modifying a subquery for an external data system
US20200050612A1 (en) Supporting additional query languages through distributed execution of query engines
US20200065303A1 (en) Addressing memory limits for partition tracking among worker nodes
US10628415B1 (en) Data sharing and materialized views in multiple tenant database systems
US8903841B2 (en) System and method of massively parallel data processing
US7130838B2 (en) Query optimization via a partitioned environment
US8712994B2 (en) Techniques for accessing a parallel database system via external programs using vertical and/or horizontal partitioning
US20200379994A1 (en) Sharing Materialized Views In Multiple Tenant Database Systems
CN113297057A (en) Memory analysis method, device and system
US11500874B2 (en) Systems and methods for linking metric data to resources

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination