CN111190924A - Cross-domain data query method and device - Google Patents
Cross-domain data query method and device Download PDFInfo
- Publication number
- CN111190924A CN111190924A CN201911309743.6A CN201911309743A CN111190924A CN 111190924 A CN111190924 A CN 111190924A CN 201911309743 A CN201911309743 A CN 201911309743A CN 111190924 A CN111190924 A CN 111190924A
- Authority
- CN
- China
- Prior art keywords
- query
- data
- plan
- sub
- sql statement
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
- G06F16/244—Grouping and aggregation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2471—Distributed queries
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The patent refers to the field of 'electric digital data processing'. Wherein, the method comprises the following steps: analyzing an input Structured Query Language (SQL) statement to generate a query execution plan, wherein the query execution plan records operations required by executing the SQL statement; generating a plurality of sub-query tasks according to the query execution plan; according to the data directory recorded by each sub-query task, sending each sub-query task to the data system corresponding to the data directory recorded by the sub-query task; receiving data results obtained by each data system after executing the sub-query tasks received by the data system; and merging all the received data results to obtain a query result.
Description
Technical Field
The present disclosure relates to the field of data processing, and in particular, to a method and an apparatus for cross-domain data query.
Background
In the related technology, cross-domain data exchange refers to integrating a plurality of application information systems which are constructed in a dispersed manner, and transmitting and sharing information/data of a plurality of application subsystems through an information exchange platform which is constructed through a computer network, so that the utilization rate of information resources is improved, and the cross-domain data exchange becomes a basic target for information construction, so that interconnection and intercommunication among distributed heterogeneous systems are ensured, a central database is established, extraction, concentration, loading and display of data are completed, and unified data processing and exchange are constructed.
In the related technology, a cross-domain data exchange technology based on an API is adopted, in the cross-domain data exchange technology based on the API, different systems encapsulate their own data into API interfaces, and then expose API services to the outside, and users can access the data of the systems by calling the API services. The JSON-based cross-domain data exchange technology is a common implementation mode of API-based cross-domain data exchange. In the cross-domain data exchange technology based on JSON, before performing cross-domain data exchange, as shown in fig. 1, a user (system a) needs to encapsulate request data, perform JSON serialization on the data, configure a request address and other parameters, send the serialized JSON data to a target service (system B), the target service gives a response and returns the data, and the user deserializes the returned data to obtain the required data.
In the above-described related art, data exchange is realized by network call using a technique such as API. The system stores data in a database or a data warehouse, then uses a programming language to write a program, the program function mainly comprises the steps of receiving some parameters, generating a query Structured Query Language (SQL) statement, sending the query SQL statement to the database, and returning data to the database. Therefore, in the data exchange in the related art, the database is not directly accessed to obtain data, but the packaging program accesses the database through the program, and a layer of program is added between a data user and the database, so that on one hand, extra development work and butt joint work are needed, and on the other hand, the complexity of the system is increased.
Disclosure of Invention
To overcome the problems in the related art, a cross-domain data query scheme is provided.
According to a first aspect herein, there is provided a cross-domain data query method, comprising: analyzing an input Structured Query Language (SQL) statement to generate a query execution plan, wherein the query execution plan records operations required for executing the SQL statement; generating a plurality of sub-query tasks according to the query execution plan, wherein each sub-query task records data source information required to be processed by the sub-query task and a field processed by the sub-query task, the data source information comprises a data directory of a data source, the data directory records link addresses of metadata of each base table describing a locally stored database and a data table, and each metadata corresponds to the database and the data table described by the metadata; according to the data directory recorded by each sub-query task, sending each sub-query task to a data system corresponding to the data directory recorded by the sub-query task; receiving data results obtained by each data system after executing the sub-query tasks received by the data system; and merging all the received data results to obtain a query result.
Optionally, analyzing the input structured query language SQL statement to generate a query execution plan, including: performing lexical analysis and syntax analysis on the input SQL statement to acquire key information corresponding to the SQL statement, wherein the key information comprises: the SQL statement inquires a field list, an inquiry condition and a data directory of a data source inquired by the SQL statement; generating an abstract syntax tree according to the results of lexical analysis and syntax analysis on the SQL statement, wherein the abstract syntax tree comprises a plurality of TOKEN objects; traversing the abstract syntax tree, and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks; and generating the query execution plan corresponding to the query block according to the query block.
Optionally, generating the query execution plan corresponding to the query block according to the query block includes: converting the query block into a logical query plan to determine a logical operation to execute the SQL statement; analyzing the logic query plan, and converting the logic query plan into a physical query plan to obtain a path for acquiring the query result; and optimizing the physical query plan to select an optimal path for acquiring the query result so as to obtain the query execution plan.
Optionally, after converting the query block into a logical query plan, before analyzing the logical query plan, the method further comprises: and rewriting the logic query plan according to the incidence relation between the data sources queried by the SQL statement.
Optionally, after merging all the received data results to obtain a query result, the method further includes: and returning the query result to the inquirer.
According to another aspect of the present disclosure, there is provided a cross-domain data query apparatus, including: the query plan analysis module is used for analyzing the input SQL statement and generating a query execution plan, wherein the query execution plan records the operation required by executing the SQL statement; the query plan dividing module is used for generating a plurality of sub-query tasks according to the query execution plan, wherein each sub-query task records data source information required to be processed by the sub-query task and a field processed by the sub-query task, the data source information comprises a data directory of a data source, the data directory records link addresses of database tables and metadata of each base table of the database tables, and the metadata of each base table corresponds to the database and the data table described by the metadata; the query task distribution module is used for respectively sending each sub-query task to a data system corresponding to the data directory recorded by the sub-query task according to the data directory recorded by each sub-query task; the query result receiving module is used for receiving data results obtained by the data systems after the data systems execute the sub-query tasks received by the data systems; and the query result merging module is used for merging all the received data results to obtain query results.
Optionally, the query plan analysis module includes: the syntax parsing unit is used for performing lexical analysis and syntax parsing on the input SQL statement, acquiring key information corresponding to the SQL statement, and generating an abstract syntax tree according to the result of the lexical analysis and syntax parsing on the SQL statement, wherein the key information comprises: the SQL statement inquires a field list, inquiry conditions and a data directory of a data source inquired by the SQL statement, wherein the abstract syntax tree comprises a plurality of TOKEN objects; the semantic analysis unit is used for traversing the abstract syntax tree and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks; and the query plan generating unit is used for generating the query execution plan corresponding to the query block according to the query block.
Optionally, the query plan generating unit includes: a logic plan generating subunit, configured to convert the query block into a logic query plan to determine to execute a logic operation of the SQL statement; a physical plan generating subunit, configured to analyze the logical query plan, and convert the logical query plan into a physical query plan to obtain a path for obtaining the query result; and the plan optimization subunit is used for optimizing the physical query plan so as to select an optimal path for acquiring the query result, and obtain the query execution plan.
Optionally, the physical plan generating subunit is further configured to, after converting the query block into a logical query plan, rewrite the logical query plan according to an association relationship between data sources queried by the SQL statement.
Optionally, the method further comprises: and the query result returning module is used for returning the query result to the querier after the query result merging module obtains the query result.
In another aspect of this document, a computer readable storage medium is provided, which when executed, performs the steps of the above cross-domain data query method.
In another aspect of this document, a computer device is provided, comprising a processor, a memory and a computer program stored on the memory, the processor implementing the steps of the above cross-domain data query method when executing the computer program.
According to the method, through the SQL-driven cross-domain data query scheme, extra program development work and butt joint work can be avoided, and the complexity of the system is reduced. And because the database originally supports SQL, the data query is driven by the SQL and is closer to the database, and the mode of the data exchange process can be unified.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. In the drawings:
FIG. 1 is a schematic diagram of a cross-domain data switching system according to the related art;
FIG. 2 is a flow diagram illustrating a cross-domain data query method in accordance with an exemplary embodiment;
FIG. 3 is a diagram illustrating a database storage structure in accordance with an exemplary embodiment;
FIG. 4 is a schematic diagram illustrating the performance of a cross-domain data query method in accordance with an exemplary embodiment;
FIG. 5 is a block diagram illustrating a cross-domain data querying device in accordance with an exemplary embodiment;
FIG. 6 is a block diagram illustrating a computer device according to an example embodiment.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some but not all of the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments herein without making any creative effort, shall fall within the scope of protection. It should be noted that the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict.
The SQL language is simple and easy to use, has strong expressive force, and is the best mode for operating the database to realize data exchange. Therefore, in order to avoid extra program development work and docking work and reduce the complexity of the system, the embodiment of the invention provides a cross-domain data query method driven by SQL.
Fig. 2 is a flowchart illustrating a cross-domain data query method according to an exemplary embodiment, and as shown in fig. 2, the cross-domain data query method mainly includes the following steps S201 to S205, which are described below.
As shown in fig. 2, in step S201, the input SQL statement is analyzed, and a query execution plan is generated, in which operations required for executing the SQL statement are recorded in the query execution plan.
In this embodiment, in order to conveniently retrieve and locate the position of the data, before step S201 is executed, the database is collected and accessed to generate the data directory.
In a specific application, a user generates service data after using a service system, the generated service data is stored in a database of the service system, and different databases can be specifically established to store data of different services. In this embodiment, in order to drive data exchange through SQL, the metadata of the database tables (i.e., the description information of the database and tables) can be exposed, so that the user can determine whether he or she has data available. In this embodiment, the metadata of the library table is not the real data stored in the library table, but the data describing the data in the library table, so that the security of the database is not affected. For example, if the data in a library table is personal data such as the user's name, gender, age, and native place, the library table metadata of the library table may be "personal information".
In specific application, extraction of the base table metadata of the database can be performed in an automatic mode, and when the database is used, the base table metadata of the database can be obtained by only configuring a database connection mode and accessing the base table metadata of the database. To facilitate searching and locating the position of the data, as shown in fig. 3, the database table metadata of the database may be collected to form a data directory, and each database table metadata of the locally stored database is recorded in the data directory.
In specific application, when a data user needs to search and position data across domains, the corresponding SQL statement can be written and input into a currently used system, and the system analyzes the input SQL statement and acquires a query execution plan corresponding to the SQL statement. In the query execution plan, operations required to execute the SQL statement may be recorded.
In an optional implementation manner of this embodiment, step S201 may include the following step 2011-.
Step 2011, performing lexical analysis and syntax analysis on the input SQL statement to obtain key information corresponding to the SQL statement, where the key information includes but is not limited to: the SQL statement inquires a field list, inquiry conditions and a data directory of a data source inquired by the SQL statement.
Step 2012, generating an abstract syntax tree according to the result of the lexical analysis and the syntactic analysis of the SQL statement, where the abstract syntax tree includes a plurality of TOKEN Objects (TOKEN).
In a concrete application, the existing SQL syntax parser can be used to perform lexical analysis and syntax parsing on the SQL statement according to the SQL syntax, identify each part in the SQL statement, and then output in the form of an abstract syntax tree AST.
For example, lexical and syntactic parsing of SQL can be implemented using Antlr. The morphology and grammar analysis of SQL is realized by using the Antlr only by compiling a grammar file and defining the morphology and grammar replacement rules, and the Antlr completes the processes of morphology analysis, grammar analysis, semantic analysis and intermediate code generation.
Step 2013, traversing the abstract syntax tree, and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks.
In a specific application, in step 2013, the AST is traversed to abstract out a query basic unit QueryBlock (query block), which is a recursive process, and the query basic unit QueryBlock is mapped to a corresponding QueryBlock according to the analyzed Token. Specifically, mapping Token to a corresponding QueryBlock may include the following processes:
TOK _ QUERY ≧ Create QB object, recurse child nodes circularly
TOK _ FROM ═ stores library name and table name grammar parts in corresponding attributes of QB object
TOK _ SELECT > saves the query field list syntax portion to the corresponding attribute of QB object
TOK _ WHERE ≧ saving the query condition syntax portion into the corresponding attribute of QB object
Step 2014 is to generate a query execution plan corresponding to the query block according to the query block (QueryBlock).
In step 2014, a query execution plan is generated, that is, the QB and QB object syntax-preserving attributes generated in step 2013 are traversed, an execution operation tree is generated, and the execution operation tree is traversed and translated into the query execution plan. Because the operation level is specified, the depth-first traversal is performed downwards by the root node of the OperatorTree, and the operation level is converted into an execution plan of the query.
Traversing QueryBlock, translating into executing operation tree OperatorTree, may include the following processes:
TableScan ═ read data
Select Operator selection operation
Group By Operator ═ packet aggregation
In a specific application, if the input SQL statement relates to a plurality of heterogeneous data tables, in an optional implementation manner of this embodiment, step 2014 may include the following steps 1 to 3.
Step 1, converting a query block into a logic query plan to determine and execute logic operation of an SQL statement;
step 2, analyzing the logic query plan, and converting the logic query plan into a physical query plan to obtain a path for acquiring a query result;
and 3, optimizing the physical query plan to select an optimal path for acquiring the query result to obtain a query execution plan.
By adopting the optional embodiment, when a plurality of heterogeneous data tables are involved, the query block is converted into the logic query plan, the logic operation required by the execution of the input SQL statement is determined, then the logic operation is converted into the physical query plan, and the optimization is carried out, so that the time required by the query can be reduced, and the efficiency can be improved.
In the above optional embodiment, in step 1, after the query block is converted into the logical query plan, the logical query plan is further rewritten according to the association relationship between the data sources queried by the SQL statement, so as to optimize the logical query plan.
As shown in fig. 2, in step S202, a plurality of sub-query tasks are generated according to the query execution plan, wherein each sub-query task records data source information required to be processed by the sub-query task and a field processed by the sub-query task, wherein the data source information includes a data directory of a data source, the data directory records a link address of metadata describing each table of a locally stored database and a data table, and each metadata corresponds to the database and the data table described by the metadata.
In step S203, each sub-query task is sent to the data system corresponding to the data directory recorded by the sub-query task according to the data directory recorded by each sub-query task.
In step S204, data results obtained by each data system after executing the sub-query task received by the data system are received.
In a specific application, after the data system receiving the sub-query task receives the sub-query task, the database table metadata to be retrieved can be located according to the data directory of the data system, and the database table metadata is linked to the database for querying.
In step S205, all the received data results are merged to obtain a query result.
In an optional implementation manner of this embodiment, after all the received data results are combined to obtain a query result, the query result is returned to the querier.
Fig. 4 is a schematic diagram illustrating an implementation of the above cross-domain data query method according to an exemplary embodiment, as shown in fig. 4, in this embodiment, each service data system includes: the Query plan analyzer (Query Planner), the coordination task distributor (Query Coordinator), the Query executor (Query Worker) and the Database (DB), wherein a user inputs an SQL Query statement and sends the SQL Query statement to the Query plan analyzer of the current business data system (step 401), the Query plan analyzer analyzes the SQL Query statement to obtain a Query execution plan and analyzes the Query execution plan into a plurality of subtasks (step 402), the coordination task distributor distributes the subtasks to the Query executor of the corresponding business data system (step 403), the Query executor executes the received subtasks (step 404), queries the database (step 405), and acquires data. And the query executor continuously reports the current running state to the coordination task distributor in the execution process. After each query executor executes a task, the result is returned to the coordination task distributor in step 406, the coordination task distributor collects the subtask results for collection, summarizes the execution results of the plurality of subtasks, converts the execution results into the whole query result, and returns the whole query result to the querier (step 407).
By adopting the cross-domain data query method provided by the embodiment, through using the SQL to drive the cross-domain data query scheme, the extra program development work and butt joint work can be avoided, and the complexity of the system is reduced. And because the database originally supports SQL, the data query is driven by the SQL and is closer to the database, and the mode of the data exchange process can be unified.
The embodiment of the present disclosure also provides a cross-domain data query apparatus, which can be used to implement the above-mentioned cross-domain data query method.
Fig. 5 is a schematic structural diagram illustrating a cross-domain data query apparatus 500 according to an exemplary embodiment, and as shown in fig. 5, the apparatus may include: a query plan analysis module 501, a query plan dividing module 502, a query task distribution module 503, a query result receiving module 504 and a query result merging module 505.
In order to avoid repeated descriptions, only the functional modules of the device are described below, and for other relevant matters, reference may be made to the above description on a cross-domain data query method, which is not described herein again.
A query plan analysis module 501, configured to analyze an input structured query language SQL statement and generate a query execution plan, where the query execution plan records operations required for executing the SQL statement; a query plan dividing module 502, configured to generate a plurality of sub-query tasks according to the query execution plan, where each sub-query task records data source information that needs to be processed by the sub-query task and a field that is processed by the sub-query task, where the data source information includes a data directory of a data source, the data directory records link addresses of database tables and metadata of database tables that are used to describe a locally stored database, and each of the database table metadata corresponds to the database and the data table that are described by the database table metadata; the query task distribution module 503 is configured to send each sub-query task to a data system corresponding to the data directory recorded by the sub-query task according to the data directory recorded by each sub-query task; a query result receiving module 504, configured to receive data results obtained by each data system after executing the sub-query task received by the data system; and a query result merging module 505, configured to merge all the received data results to obtain a query result.
By adopting the device, through using the SQL to drive the cross-domain data query scheme, the extra program development work and butt joint work can be avoided, and the complexity of the system is reduced. Moreover, because the database native supports SQL, the SQL is used for driving data query, the data query is closer to the database, and the mode of the data exchange process can be unified
In an optional implementation manner of this embodiment, the query plan analyzing module 501 may include: the syntax parsing unit is used for performing lexical analysis and syntax parsing on the input SQL statement, acquiring key information corresponding to the SQL statement, and generating an abstract syntax tree according to the result of the lexical analysis and syntax parsing on the SQL statement, wherein the key information comprises: the SQL statement inquires a field list, inquiry conditions and a data directory of a data source inquired by the SQL statement, wherein the abstract syntax tree comprises a plurality of TOKEN objects; the semantic analysis unit is used for traversing the abstract syntax tree and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks; and the query plan generating unit is used for generating the query execution plan corresponding to the query block according to the query block.
In the foregoing optional embodiment, optionally, the query plan generating unit may include: a logic plan generating subunit, configured to convert the query block into a logic query plan to determine to execute a logic operation of the SQL statement; a physical plan generating subunit, configured to analyze the logical query plan, and convert the logical query plan into a physical query plan to obtain a path for obtaining the query result; and the plan optimization subunit is used for optimizing the physical query plan so as to select an optimal path for acquiring the query result, and obtain the query execution plan.
Optionally, in an optional implementation manner described above, the physical plan generating subunit is further configured to, after the query block is converted into a logical query plan, rewrite the logical query plan according to an association relationship between data sources queried by the SQL statement.
In an optional embodiment of this embodiment, the apparatus may further include: and the query result returning module is used for returning the query result to the querier after the query result merging module obtains the query result.
It should be noted that, each functional module described in the foregoing embodiments is a functional module of the apparatus, and in a specific application, the functions of multiple functional modules may be integrated on one module to be implemented, or the functions of one functional module may be implemented by being divided into multiple modules, which is not limited in this embodiment. For example, in a specific application, the query plan analyzing module 501 and the query plan dividing module 502 may be implemented by the query plan analyzer in fig. 4, and the query task distributing module 503, the query result receiving module 504, and the query result merging module 505 may be implemented by the task negotiation distributor in fig. 4.
FIG. 6 is a block diagram illustrating a computer device 600 for a cross-domain data query method in accordance with an exemplary embodiment. For example, the computer device 600 may be provided as a server. Referring to fig. 6, the computer device 600 includes a processor 601, and the number of processors may be set to one or more as necessary. The computer device 600 further comprises a memory 602 for storing instructions, such as application programs, executable by the processor 601. The number of the memories can be set to one or more according to needs. Which may store one or more application programs. The processor 601 is configured to execute instructions to perform the above-described method.
As will be appreciated by one skilled in the art, the embodiments herein may be provided as a method, apparatus (device), or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the medium. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, including, but not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer, and the like. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments herein. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of additional like elements in the article or device comprising the element.
While the preferred embodiments herein have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following appended claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of this disclosure.
It will be apparent to those skilled in the art that various changes and modifications may be made herein without departing from the spirit and scope thereof. Thus, it is intended that such changes and modifications be included herein, provided they come within the scope of the appended claims and their equivalents.
Claims (12)
1. A cross-domain data query method is characterized by comprising the following steps:
analyzing an input Structured Query Language (SQL) statement to generate a query execution plan, wherein the query execution plan records operations required for executing the SQL statement;
generating a plurality of sub-query tasks according to the query execution plan, wherein each sub-query task records data source information required to be processed by the sub-query task and a field processed by the sub-query task, the data source information comprises a data directory of a data source, the data directory records link addresses of metadata of each base table describing a locally stored database and a data table, and each metadata corresponds to the database and the data table described by the metadata;
according to the data directory recorded by each sub-query task, sending each sub-query task to a data system corresponding to the data directory recorded by the sub-query task;
receiving data results obtained by each data system after executing the sub-query tasks received by the data system;
and merging all the received data results to obtain a query result.
2. The method of claim 1, wherein analyzing the input Structured Query Language (SQL) statement to generate a query execution plan comprises:
performing lexical analysis and syntax analysis on the input SQL statement to acquire key information corresponding to the SQL statement, wherein the key information comprises: the SQL statement inquires a field list, an inquiry condition and a data directory of a data source inquired by the SQL statement;
generating an abstract syntax tree according to the results of lexical analysis and syntax analysis on the SQL statement, wherein the abstract syntax tree comprises a plurality of TOKEN objects;
traversing the abstract syntax tree, and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks;
and generating the query execution plan corresponding to the query block according to the query block.
3. The method of claim 2, wherein generating the query execution plan corresponding to the query block from the query block comprises:
converting the query block into a logical query plan to determine a logical operation to execute the SQL statement;
analyzing the logic query plan, and converting the logic query plan into a physical query plan to obtain a path for acquiring the query result;
and optimizing the physical query plan to select an optimal path for acquiring the query result so as to obtain the query execution plan.
4. The method of claim 3, wherein after converting the query block into a logical query plan, prior to analyzing the logical query plan, the method further comprises:
and rewriting the logic query plan according to the incidence relation between the data sources queried by the SQL statement.
5. The method according to any one of claims 1 to 4, wherein after combining all the received data results to obtain a query result, the method further comprises:
and returning the query result to the inquirer.
6. A cross-domain data query apparatus, comprising:
the query plan analysis module is used for analyzing an input Structured Query Language (SQL) statement and generating a query execution plan, wherein the query execution plan records the operation required by executing the SQL statement;
the query plan dividing module is used for generating a plurality of sub-query tasks according to the query execution plan, wherein each sub-query task records data source information required to be processed by the sub-query task and a field processed by the sub-query task, the data source information comprises a data directory of a data source, the data directory records link addresses of database tables and metadata of each base table of the database tables, and the metadata of each base table corresponds to the database and the data table described by the metadata;
the query task distribution module is used for respectively sending each sub-query task to a data system corresponding to the data directory recorded by the sub-query task according to the data directory recorded by each sub-query task;
the query result receiving module is used for receiving data results obtained by the data systems after the data systems execute the sub-query tasks received by the data systems;
and the query result merging module is used for merging all the received data results to obtain query results.
7. The apparatus of claim 6, wherein the query plan analysis module comprises:
the syntax parsing unit is used for performing lexical analysis and syntax parsing on the input SQL statement, acquiring key information corresponding to the SQL statement, and generating an abstract syntax tree according to the result of the lexical analysis and syntax parsing on the SQL statement, wherein the key information comprises: the SQL statement inquires a field list, inquiry conditions and a data directory of a data source inquired by the SQL statement, wherein the abstract syntax tree comprises a plurality of TOKEN objects;
the semantic analysis unit is used for traversing the abstract syntax tree and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks;
and the query plan generating unit is used for generating the query execution plan corresponding to the query block according to the query block.
8. The apparatus of claim 7, wherein the query plan generating unit comprises:
a logic plan generating subunit, configured to convert the query block into a logic query plan to determine to execute a logic operation of the SQL statement;
a physical plan generating subunit, configured to analyze the logical query plan, and convert the logical query plan into a physical query plan to obtain a path for obtaining the query result;
and the plan optimization subunit is used for optimizing the physical query plan so as to select an optimal path for acquiring the query result, and obtain the query execution plan.
9. The apparatus of claim 8, wherein the physical plan generating subunit is further configured to, after converting the query block into a logical query plan, rewrite the logical query plan according to an association relationship between data sources queried by the SQL statement.
10. The apparatus of any one of claims 5 to 9, further comprising:
and the query result returning module is used for returning the query result to the querier after the query result merging module obtains the query result.
11. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the steps of the method according to any one of claims 1 to 5.
12. A computer device comprising a processor, a memory and a computer program stored on the memory, characterized in that the steps of the method according to any of claims 1 to 5 are implemented when the computer program is executed by the processor.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911309743.6A CN111190924A (en) | 2019-12-18 | 2019-12-18 | Cross-domain data query method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911309743.6A CN111190924A (en) | 2019-12-18 | 2019-12-18 | Cross-domain data query method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111190924A true CN111190924A (en) | 2020-05-22 |
Family
ID=70707354
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911309743.6A Pending CN111190924A (en) | 2019-12-18 | 2019-12-18 | Cross-domain data query method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111190924A (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111782682A (en) * | 2020-06-30 | 2020-10-16 | 北京金山云网络技术有限公司 | Data query method, device, equipment and storage medium |
CN111913986A (en) * | 2020-08-03 | 2020-11-10 | 支付宝(杭州)信息技术有限公司 | Query optimization method and device |
CN112527848A (en) * | 2020-12-22 | 2021-03-19 | 苏州科达科技股份有限公司 | Multi-data-source-based report data query method, device, system and storage medium |
CN112579610A (en) * | 2020-12-23 | 2021-03-30 | 安徽航天信息有限公司 | Multi-data source structure analysis method, system, terminal device and storage medium |
CN112699141A (en) * | 2020-12-29 | 2021-04-23 | 医渡云(北京)技术有限公司 | Data query method and device for multi-source heterogeneous data, storage medium and equipment |
CN112860730A (en) * | 2021-03-29 | 2021-05-28 | 中信银行股份有限公司 | SQL statement processing method and device, electronic equipment and readable storage medium |
CN112925801A (en) * | 2021-02-26 | 2021-06-08 | 第四范式(北京)技术有限公司 | Method and system for realizing real-time query service based on SQL query statement |
CN112988801A (en) * | 2021-04-07 | 2021-06-18 | 拉卡拉支付股份有限公司 | Data processing method, data processing apparatus, electronic device, storage medium, and program product |
CN113032465A (en) * | 2021-05-31 | 2021-06-25 | 北京谷数科技股份有限公司 | Data query method and device, electronic equipment and storage medium |
CN113177062A (en) * | 2021-05-25 | 2021-07-27 | 深圳前海微众银行股份有限公司 | Data query method and device |
CN113254547A (en) * | 2021-05-27 | 2021-08-13 | 北京达佳互联信息技术有限公司 | Data query method, device, server and storage medium |
CN113568930A (en) * | 2021-08-12 | 2021-10-29 | 威讯柏睿数据科技(北京)有限公司 | Method and equipment for optimizing distributed memory data query |
CN113672651A (en) * | 2021-08-24 | 2021-11-19 | 杭州海康威视数字技术股份有限公司 | Task execution method and device and electronic equipment |
CN113919877A (en) * | 2021-10-15 | 2022-01-11 | 深圳市酷开网络科技股份有限公司 | Method and device for processing human-circled task progress based on DMP platform and readable storage medium |
CN114443699A (en) * | 2022-01-27 | 2022-05-06 | 腾讯科技(深圳)有限公司 | Information query method and device, computer equipment and computer readable storage medium |
CN115994152A (en) * | 2023-03-24 | 2023-04-21 | 云账户技术(天津)有限公司 | Verification method, device, equipment and storage medium of MySQL query statement |
CN116595232A (en) * | 2023-05-24 | 2023-08-15 | 杭州金智塔科技有限公司 | Cross-data-source data processing system, method and device |
CN117251472A (en) * | 2023-11-16 | 2023-12-19 | 中邮消费金融有限公司 | Cross-source data processing method, device, equipment and storage medium |
WO2024046015A1 (en) * | 2022-08-29 | 2024-03-07 | 支付宝(杭州)信息技术有限公司 | Data query method and apparatus, storage medium, and electronic device |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108052635A (en) * | 2017-12-20 | 2018-05-18 | 江苏瑞中数据股份有限公司 | A kind of heterogeneous data source unifies conjunctive query method |
CN110059103A (en) * | 2019-04-28 | 2019-07-26 | 南京大学 | A kind of cross-platform unified big data SQL query method |
CN110263105A (en) * | 2019-05-21 | 2019-09-20 | 北京百度网讯科技有限公司 | Inquiry processing method, query processing system, server and computer-readable medium |
CN110399388A (en) * | 2019-07-29 | 2019-11-01 | 中国工商银行股份有限公司 | Data query method, system and equipment |
-
2019
- 2019-12-18 CN CN201911309743.6A patent/CN111190924A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108052635A (en) * | 2017-12-20 | 2018-05-18 | 江苏瑞中数据股份有限公司 | A kind of heterogeneous data source unifies conjunctive query method |
CN110059103A (en) * | 2019-04-28 | 2019-07-26 | 南京大学 | A kind of cross-platform unified big data SQL query method |
CN110263105A (en) * | 2019-05-21 | 2019-09-20 | 北京百度网讯科技有限公司 | Inquiry processing method, query processing system, server and computer-readable medium |
CN110399388A (en) * | 2019-07-29 | 2019-11-01 | 中国工商银行股份有限公司 | Data query method, system and equipment |
Cited By (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111782682A (en) * | 2020-06-30 | 2020-10-16 | 北京金山云网络技术有限公司 | Data query method, device, equipment and storage medium |
CN111782682B (en) * | 2020-06-30 | 2024-01-02 | 北京金山云网络技术有限公司 | Data query method, device, equipment and storage medium |
CN111913986A (en) * | 2020-08-03 | 2020-11-10 | 支付宝(杭州)信息技术有限公司 | Query optimization method and device |
CN111913986B (en) * | 2020-08-03 | 2024-04-16 | 支付宝(杭州)信息技术有限公司 | Query optimization method and device |
CN112527848B (en) * | 2020-12-22 | 2023-05-12 | 苏州科达科技股份有限公司 | Report data query method, device and system based on multiple data sources and storage medium |
CN112527848A (en) * | 2020-12-22 | 2021-03-19 | 苏州科达科技股份有限公司 | Multi-data-source-based report data query method, device, system and storage medium |
CN112579610A (en) * | 2020-12-23 | 2021-03-30 | 安徽航天信息有限公司 | Multi-data source structure analysis method, system, terminal device and storage medium |
CN112699141A (en) * | 2020-12-29 | 2021-04-23 | 医渡云(北京)技术有限公司 | Data query method and device for multi-source heterogeneous data, storage medium and equipment |
CN112925801A (en) * | 2021-02-26 | 2021-06-08 | 第四范式(北京)技术有限公司 | Method and system for realizing real-time query service based on SQL query statement |
CN112860730A (en) * | 2021-03-29 | 2021-05-28 | 中信银行股份有限公司 | SQL statement processing method and device, electronic equipment and readable storage medium |
CN112988801A (en) * | 2021-04-07 | 2021-06-18 | 拉卡拉支付股份有限公司 | Data processing method, data processing apparatus, electronic device, storage medium, and program product |
CN113177062A (en) * | 2021-05-25 | 2021-07-27 | 深圳前海微众银行股份有限公司 | Data query method and device |
WO2022247201A1 (en) * | 2021-05-25 | 2022-12-01 | 深圳前海微众银行股份有限公司 | Data query method and apparatus |
CN113254547A (en) * | 2021-05-27 | 2021-08-13 | 北京达佳互联信息技术有限公司 | Data query method, device, server and storage medium |
CN113254547B (en) * | 2021-05-27 | 2024-04-16 | 北京达佳互联信息技术有限公司 | Data query method, device, server and storage medium |
CN113032465A (en) * | 2021-05-31 | 2021-06-25 | 北京谷数科技股份有限公司 | Data query method and device, electronic equipment and storage medium |
CN113032465B (en) * | 2021-05-31 | 2021-09-10 | 北京谷数科技股份有限公司 | Data query method and device, electronic equipment and storage medium |
CN113568930A (en) * | 2021-08-12 | 2021-10-29 | 威讯柏睿数据科技(北京)有限公司 | Method and equipment for optimizing distributed memory data query |
CN113672651A (en) * | 2021-08-24 | 2021-11-19 | 杭州海康威视数字技术股份有限公司 | Task execution method and device and electronic equipment |
CN113672651B (en) * | 2021-08-24 | 2024-06-04 | 杭州海康威视数字技术股份有限公司 | Task execution method and device and electronic equipment |
CN113919877A (en) * | 2021-10-15 | 2022-01-11 | 深圳市酷开网络科技股份有限公司 | Method and device for processing human-circled task progress based on DMP platform and readable storage medium |
CN114443699A (en) * | 2022-01-27 | 2022-05-06 | 腾讯科技(深圳)有限公司 | Information query method and device, computer equipment and computer readable storage medium |
WO2024046015A1 (en) * | 2022-08-29 | 2024-03-07 | 支付宝(杭州)信息技术有限公司 | Data query method and apparatus, storage medium, and electronic device |
CN115994152A (en) * | 2023-03-24 | 2023-04-21 | 云账户技术(天津)有限公司 | Verification method, device, equipment and storage medium of MySQL query statement |
CN116595232A (en) * | 2023-05-24 | 2023-08-15 | 杭州金智塔科技有限公司 | Cross-data-source data processing system, method and device |
CN117251472A (en) * | 2023-11-16 | 2023-12-19 | 中邮消费金融有限公司 | Cross-source data processing method, device, equipment and storage medium |
CN117251472B (en) * | 2023-11-16 | 2024-02-27 | 中邮消费金融有限公司 | Cross-source data processing method, device, equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111190924A (en) | Cross-domain data query method and device | |
KR101169091B1 (en) | Prescribed navigation using topology metadata and navigation path | |
US9778967B2 (en) | Sophisticated run-time system for graph processing | |
US9304835B1 (en) | Optimized system for analytics (graphs and sparse matrices) operations | |
US11593357B2 (en) | Databases and methods of storing, retrieving, and processing data | |
Sellami et al. | Complex queries optimization and evaluation over relational and NoSQL data stores in cloud environments | |
US10783193B2 (en) | Program, method, and system for execution of software services | |
CN110019314B (en) | Dynamic data packaging method based on data item analysis, client and server | |
CN112015754A (en) | Data query method, device and system | |
CN112182045A (en) | Metadata management method and device, computer equipment and storage medium | |
US20110055373A1 (en) | Service identification for resources in a computing environment | |
US20230185639A1 (en) | Mapping application programming interface schemas with semantic representations | |
CN114356964A (en) | Data blood margin construction method and device, storage medium and electronic equipment | |
US10169725B2 (en) | Change-request analysis | |
CN105447040B (en) | Binary file management and updating method, device and system | |
US11262986B2 (en) | Automatic software generation for computer systems | |
CN110245184B (en) | Data processing method, system and device based on tagSQL | |
US11016830B2 (en) | Entity-based service operation for object-based persistence | |
CN112541001A (en) | Data query method, device, storage medium and equipment | |
CN111221860A (en) | Mixed query optimization method and device based on big data | |
US11354312B2 (en) | Access-plan-based querying for federated database-management systems | |
US20210311942A1 (en) | Dynamically altering a query access plan | |
CN111368146A (en) | Path information query method and device, storage medium and processor | |
US11991254B1 (en) | Ontology-based approach for modeling service dependencies in a provider network | |
US11809919B2 (en) | Central event catalog |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |