CN111190924A - Cross-domain data query method and device - Google Patents

Cross-domain data query method and device Download PDF

Info

Publication number
CN111190924A
CN111190924A CN201911309743.6A CN201911309743A CN111190924A CN 111190924 A CN111190924 A CN 111190924A CN 201911309743 A CN201911309743 A CN 201911309743A CN 111190924 A CN111190924 A CN 111190924A
Authority
CN
China
Prior art keywords
query
data
plan
sub
sql statement
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911309743.6A
Other languages
Chinese (zh)
Inventor
王贺冬
周雷皓
龚廖安
闫发腾
杨乾磊
龚本威
林智峰
毕伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongsi Boan Technology Beijing Co ltd
Original Assignee
Zhongsi Boan Technology Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongsi Boan Technology Beijing Co ltd filed Critical Zhongsi Boan Technology Beijing Co ltd
Priority to CN201911309743.6A priority Critical patent/CN111190924A/en
Publication of CN111190924A publication Critical patent/CN111190924A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/244Grouping and aggregation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The patent refers to the field of 'electric digital data processing'. Wherein, the method comprises the following steps: analyzing an input Structured Query Language (SQL) statement to generate a query execution plan, wherein the query execution plan records operations required by executing the SQL statement; generating a plurality of sub-query tasks according to the query execution plan; according to the data directory recorded by each sub-query task, sending each sub-query task to the data system corresponding to the data directory recorded by the sub-query task; receiving data results obtained by each data system after executing the sub-query tasks received by the data system; and merging all the received data results to obtain a query result.

Description

Cross-domain data query method and device
Technical Field
The present disclosure relates to the field of data processing, and in particular, to a method and an apparatus for cross-domain data query.
Background
In the related technology, cross-domain data exchange refers to integrating a plurality of application information systems which are constructed in a dispersed manner, and transmitting and sharing information/data of a plurality of application subsystems through an information exchange platform which is constructed through a computer network, so that the utilization rate of information resources is improved, and the cross-domain data exchange becomes a basic target for information construction, so that interconnection and intercommunication among distributed heterogeneous systems are ensured, a central database is established, extraction, concentration, loading and display of data are completed, and unified data processing and exchange are constructed.
In the related technology, a cross-domain data exchange technology based on an API is adopted, in the cross-domain data exchange technology based on the API, different systems encapsulate their own data into API interfaces, and then expose API services to the outside, and users can access the data of the systems by calling the API services. The JSON-based cross-domain data exchange technology is a common implementation mode of API-based cross-domain data exchange. In the cross-domain data exchange technology based on JSON, before performing cross-domain data exchange, as shown in fig. 1, a user (system a) needs to encapsulate request data, perform JSON serialization on the data, configure a request address and other parameters, send the serialized JSON data to a target service (system B), the target service gives a response and returns the data, and the user deserializes the returned data to obtain the required data.
In the above-described related art, data exchange is realized by network call using a technique such as API. The system stores data in a database or a data warehouse, then uses a programming language to write a program, the program function mainly comprises the steps of receiving some parameters, generating a query Structured Query Language (SQL) statement, sending the query SQL statement to the database, and returning data to the database. Therefore, in the data exchange in the related art, the database is not directly accessed to obtain data, but the packaging program accesses the database through the program, and a layer of program is added between a data user and the database, so that on one hand, extra development work and butt joint work are needed, and on the other hand, the complexity of the system is increased.
Disclosure of Invention
To overcome the problems in the related art, a cross-domain data query scheme is provided.
According to a first aspect herein, there is provided a cross-domain data query method, comprising: analyzing an input Structured Query Language (SQL) statement to generate a query execution plan, wherein the query execution plan records operations required for executing the SQL statement; generating a plurality of sub-query tasks according to the query execution plan, wherein each sub-query task records data source information required to be processed by the sub-query task and a field processed by the sub-query task, the data source information comprises a data directory of a data source, the data directory records link addresses of metadata of each base table describing a locally stored database and a data table, and each metadata corresponds to the database and the data table described by the metadata; according to the data directory recorded by each sub-query task, sending each sub-query task to a data system corresponding to the data directory recorded by the sub-query task; receiving data results obtained by each data system after executing the sub-query tasks received by the data system; and merging all the received data results to obtain a query result.
Optionally, analyzing the input structured query language SQL statement to generate a query execution plan, including: performing lexical analysis and syntax analysis on the input SQL statement to acquire key information corresponding to the SQL statement, wherein the key information comprises: the SQL statement inquires a field list, an inquiry condition and a data directory of a data source inquired by the SQL statement; generating an abstract syntax tree according to the results of lexical analysis and syntax analysis on the SQL statement, wherein the abstract syntax tree comprises a plurality of TOKEN objects; traversing the abstract syntax tree, and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks; and generating the query execution plan corresponding to the query block according to the query block.
Optionally, generating the query execution plan corresponding to the query block according to the query block includes: converting the query block into a logical query plan to determine a logical operation to execute the SQL statement; analyzing the logic query plan, and converting the logic query plan into a physical query plan to obtain a path for acquiring the query result; and optimizing the physical query plan to select an optimal path for acquiring the query result so as to obtain the query execution plan.
Optionally, after converting the query block into a logical query plan, before analyzing the logical query plan, the method further comprises: and rewriting the logic query plan according to the incidence relation between the data sources queried by the SQL statement.
Optionally, after merging all the received data results to obtain a query result, the method further includes: and returning the query result to the inquirer.
According to another aspect of the present disclosure, there is provided a cross-domain data query apparatus, including: the query plan analysis module is used for analyzing the input SQL statement and generating a query execution plan, wherein the query execution plan records the operation required by executing the SQL statement; the query plan dividing module is used for generating a plurality of sub-query tasks according to the query execution plan, wherein each sub-query task records data source information required to be processed by the sub-query task and a field processed by the sub-query task, the data source information comprises a data directory of a data source, the data directory records link addresses of database tables and metadata of each base table of the database tables, and the metadata of each base table corresponds to the database and the data table described by the metadata; the query task distribution module is used for respectively sending each sub-query task to a data system corresponding to the data directory recorded by the sub-query task according to the data directory recorded by each sub-query task; the query result receiving module is used for receiving data results obtained by the data systems after the data systems execute the sub-query tasks received by the data systems; and the query result merging module is used for merging all the received data results to obtain query results.
Optionally, the query plan analysis module includes: the syntax parsing unit is used for performing lexical analysis and syntax parsing on the input SQL statement, acquiring key information corresponding to the SQL statement, and generating an abstract syntax tree according to the result of the lexical analysis and syntax parsing on the SQL statement, wherein the key information comprises: the SQL statement inquires a field list, inquiry conditions and a data directory of a data source inquired by the SQL statement, wherein the abstract syntax tree comprises a plurality of TOKEN objects; the semantic analysis unit is used for traversing the abstract syntax tree and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks; and the query plan generating unit is used for generating the query execution plan corresponding to the query block according to the query block.
Optionally, the query plan generating unit includes: a logic plan generating subunit, configured to convert the query block into a logic query plan to determine to execute a logic operation of the SQL statement; a physical plan generating subunit, configured to analyze the logical query plan, and convert the logical query plan into a physical query plan to obtain a path for obtaining the query result; and the plan optimization subunit is used for optimizing the physical query plan so as to select an optimal path for acquiring the query result, and obtain the query execution plan.
Optionally, the physical plan generating subunit is further configured to, after converting the query block into a logical query plan, rewrite the logical query plan according to an association relationship between data sources queried by the SQL statement.
Optionally, the method further comprises: and the query result returning module is used for returning the query result to the querier after the query result merging module obtains the query result.
In another aspect of this document, a computer readable storage medium is provided, which when executed, performs the steps of the above cross-domain data query method.
In another aspect of this document, a computer device is provided, comprising a processor, a memory and a computer program stored on the memory, the processor implementing the steps of the above cross-domain data query method when executing the computer program.
According to the method, through the SQL-driven cross-domain data query scheme, extra program development work and butt joint work can be avoided, and the complexity of the system is reduced. And because the database originally supports SQL, the data query is driven by the SQL and is closer to the database, and the mode of the data exchange process can be unified.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification. In the drawings:
FIG. 1 is a schematic diagram of a cross-domain data switching system according to the related art;
FIG. 2 is a flow diagram illustrating a cross-domain data query method in accordance with an exemplary embodiment;
FIG. 3 is a diagram illustrating a database storage structure in accordance with an exemplary embodiment;
FIG. 4 is a schematic diagram illustrating the performance of a cross-domain data query method in accordance with an exemplary embodiment;
FIG. 5 is a block diagram illustrating a cross-domain data querying device in accordance with an exemplary embodiment;
FIG. 6 is a block diagram illustrating a computer device according to an example embodiment.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the drawings of the embodiments of the present invention, and it is obvious that the described embodiments are some but not all of the embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments herein without making any creative effort, shall fall within the scope of protection. It should be noted that the embodiments and features of the embodiments may be arbitrarily combined with each other without conflict.
The SQL language is simple and easy to use, has strong expressive force, and is the best mode for operating the database to realize data exchange. Therefore, in order to avoid extra program development work and docking work and reduce the complexity of the system, the embodiment of the invention provides a cross-domain data query method driven by SQL.
Fig. 2 is a flowchart illustrating a cross-domain data query method according to an exemplary embodiment, and as shown in fig. 2, the cross-domain data query method mainly includes the following steps S201 to S205, which are described below.
As shown in fig. 2, in step S201, the input SQL statement is analyzed, and a query execution plan is generated, in which operations required for executing the SQL statement are recorded in the query execution plan.
In this embodiment, in order to conveniently retrieve and locate the position of the data, before step S201 is executed, the database is collected and accessed to generate the data directory.
In a specific application, a user generates service data after using a service system, the generated service data is stored in a database of the service system, and different databases can be specifically established to store data of different services. In this embodiment, in order to drive data exchange through SQL, the metadata of the database tables (i.e., the description information of the database and tables) can be exposed, so that the user can determine whether he or she has data available. In this embodiment, the metadata of the library table is not the real data stored in the library table, but the data describing the data in the library table, so that the security of the database is not affected. For example, if the data in a library table is personal data such as the user's name, gender, age, and native place, the library table metadata of the library table may be "personal information".
In specific application, extraction of the base table metadata of the database can be performed in an automatic mode, and when the database is used, the base table metadata of the database can be obtained by only configuring a database connection mode and accessing the base table metadata of the database. To facilitate searching and locating the position of the data, as shown in fig. 3, the database table metadata of the database may be collected to form a data directory, and each database table metadata of the locally stored database is recorded in the data directory.
In specific application, when a data user needs to search and position data across domains, the corresponding SQL statement can be written and input into a currently used system, and the system analyzes the input SQL statement and acquires a query execution plan corresponding to the SQL statement. In the query execution plan, operations required to execute the SQL statement may be recorded.
In an optional implementation manner of this embodiment, step S201 may include the following step 2011-.
Step 2011, performing lexical analysis and syntax analysis on the input SQL statement to obtain key information corresponding to the SQL statement, where the key information includes but is not limited to: the SQL statement inquires a field list, inquiry conditions and a data directory of a data source inquired by the SQL statement.
Step 2012, generating an abstract syntax tree according to the result of the lexical analysis and the syntactic analysis of the SQL statement, where the abstract syntax tree includes a plurality of TOKEN Objects (TOKEN).
In a concrete application, the existing SQL syntax parser can be used to perform lexical analysis and syntax parsing on the SQL statement according to the SQL syntax, identify each part in the SQL statement, and then output in the form of an abstract syntax tree AST.
For example, lexical and syntactic parsing of SQL can be implemented using Antlr. The morphology and grammar analysis of SQL is realized by using the Antlr only by compiling a grammar file and defining the morphology and grammar replacement rules, and the Antlr completes the processes of morphology analysis, grammar analysis, semantic analysis and intermediate code generation.
Step 2013, traversing the abstract syntax tree, and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks.
In a specific application, in step 2013, the AST is traversed to abstract out a query basic unit QueryBlock (query block), which is a recursive process, and the query basic unit QueryBlock is mapped to a corresponding QueryBlock according to the analyzed Token. Specifically, mapping Token to a corresponding QueryBlock may include the following processes:
TOK _ QUERY ≧ Create QB object, recurse child nodes circularly
TOK _ FROM ═ stores library name and table name grammar parts in corresponding attributes of QB object
TOK _ SELECT > saves the query field list syntax portion to the corresponding attribute of QB object
TOK _ WHERE ≧ saving the query condition syntax portion into the corresponding attribute of QB object
Step 2014 is to generate a query execution plan corresponding to the query block according to the query block (QueryBlock).
In step 2014, a query execution plan is generated, that is, the QB and QB object syntax-preserving attributes generated in step 2013 are traversed, an execution operation tree is generated, and the execution operation tree is traversed and translated into the query execution plan. Because the operation level is specified, the depth-first traversal is performed downwards by the root node of the OperatorTree, and the operation level is converted into an execution plan of the query.
Traversing QueryBlock, translating into executing operation tree OperatorTree, may include the following processes:
TableScan ═ read data
Select Operator selection operation
Group By Operator ═ packet aggregation
In a specific application, if the input SQL statement relates to a plurality of heterogeneous data tables, in an optional implementation manner of this embodiment, step 2014 may include the following steps 1 to 3.
Step 1, converting a query block into a logic query plan to determine and execute logic operation of an SQL statement;
step 2, analyzing the logic query plan, and converting the logic query plan into a physical query plan to obtain a path for acquiring a query result;
and 3, optimizing the physical query plan to select an optimal path for acquiring the query result to obtain a query execution plan.
By adopting the optional embodiment, when a plurality of heterogeneous data tables are involved, the query block is converted into the logic query plan, the logic operation required by the execution of the input SQL statement is determined, then the logic operation is converted into the physical query plan, and the optimization is carried out, so that the time required by the query can be reduced, and the efficiency can be improved.
In the above optional embodiment, in step 1, after the query block is converted into the logical query plan, the logical query plan is further rewritten according to the association relationship between the data sources queried by the SQL statement, so as to optimize the logical query plan.
As shown in fig. 2, in step S202, a plurality of sub-query tasks are generated according to the query execution plan, wherein each sub-query task records data source information required to be processed by the sub-query task and a field processed by the sub-query task, wherein the data source information includes a data directory of a data source, the data directory records a link address of metadata describing each table of a locally stored database and a data table, and each metadata corresponds to the database and the data table described by the metadata.
In step S203, each sub-query task is sent to the data system corresponding to the data directory recorded by the sub-query task according to the data directory recorded by each sub-query task.
In step S204, data results obtained by each data system after executing the sub-query task received by the data system are received.
In a specific application, after the data system receiving the sub-query task receives the sub-query task, the database table metadata to be retrieved can be located according to the data directory of the data system, and the database table metadata is linked to the database for querying.
In step S205, all the received data results are merged to obtain a query result.
In an optional implementation manner of this embodiment, after all the received data results are combined to obtain a query result, the query result is returned to the querier.
Fig. 4 is a schematic diagram illustrating an implementation of the above cross-domain data query method according to an exemplary embodiment, as shown in fig. 4, in this embodiment, each service data system includes: the Query plan analyzer (Query Planner), the coordination task distributor (Query Coordinator), the Query executor (Query Worker) and the Database (DB), wherein a user inputs an SQL Query statement and sends the SQL Query statement to the Query plan analyzer of the current business data system (step 401), the Query plan analyzer analyzes the SQL Query statement to obtain a Query execution plan and analyzes the Query execution plan into a plurality of subtasks (step 402), the coordination task distributor distributes the subtasks to the Query executor of the corresponding business data system (step 403), the Query executor executes the received subtasks (step 404), queries the database (step 405), and acquires data. And the query executor continuously reports the current running state to the coordination task distributor in the execution process. After each query executor executes a task, the result is returned to the coordination task distributor in step 406, the coordination task distributor collects the subtask results for collection, summarizes the execution results of the plurality of subtasks, converts the execution results into the whole query result, and returns the whole query result to the querier (step 407).
By adopting the cross-domain data query method provided by the embodiment, through using the SQL to drive the cross-domain data query scheme, the extra program development work and butt joint work can be avoided, and the complexity of the system is reduced. And because the database originally supports SQL, the data query is driven by the SQL and is closer to the database, and the mode of the data exchange process can be unified.
The embodiment of the present disclosure also provides a cross-domain data query apparatus, which can be used to implement the above-mentioned cross-domain data query method.
Fig. 5 is a schematic structural diagram illustrating a cross-domain data query apparatus 500 according to an exemplary embodiment, and as shown in fig. 5, the apparatus may include: a query plan analysis module 501, a query plan dividing module 502, a query task distribution module 503, a query result receiving module 504 and a query result merging module 505.
In order to avoid repeated descriptions, only the functional modules of the device are described below, and for other relevant matters, reference may be made to the above description on a cross-domain data query method, which is not described herein again.
A query plan analysis module 501, configured to analyze an input structured query language SQL statement and generate a query execution plan, where the query execution plan records operations required for executing the SQL statement; a query plan dividing module 502, configured to generate a plurality of sub-query tasks according to the query execution plan, where each sub-query task records data source information that needs to be processed by the sub-query task and a field that is processed by the sub-query task, where the data source information includes a data directory of a data source, the data directory records link addresses of database tables and metadata of database tables that are used to describe a locally stored database, and each of the database table metadata corresponds to the database and the data table that are described by the database table metadata; the query task distribution module 503 is configured to send each sub-query task to a data system corresponding to the data directory recorded by the sub-query task according to the data directory recorded by each sub-query task; a query result receiving module 504, configured to receive data results obtained by each data system after executing the sub-query task received by the data system; and a query result merging module 505, configured to merge all the received data results to obtain a query result.
By adopting the device, through using the SQL to drive the cross-domain data query scheme, the extra program development work and butt joint work can be avoided, and the complexity of the system is reduced. Moreover, because the database native supports SQL, the SQL is used for driving data query, the data query is closer to the database, and the mode of the data exchange process can be unified
In an optional implementation manner of this embodiment, the query plan analyzing module 501 may include: the syntax parsing unit is used for performing lexical analysis and syntax parsing on the input SQL statement, acquiring key information corresponding to the SQL statement, and generating an abstract syntax tree according to the result of the lexical analysis and syntax parsing on the SQL statement, wherein the key information comprises: the SQL statement inquires a field list, inquiry conditions and a data directory of a data source inquired by the SQL statement, wherein the abstract syntax tree comprises a plurality of TOKEN objects; the semantic analysis unit is used for traversing the abstract syntax tree and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks; and the query plan generating unit is used for generating the query execution plan corresponding to the query block according to the query block.
In the foregoing optional embodiment, optionally, the query plan generating unit may include: a logic plan generating subunit, configured to convert the query block into a logic query plan to determine to execute a logic operation of the SQL statement; a physical plan generating subunit, configured to analyze the logical query plan, and convert the logical query plan into a physical query plan to obtain a path for obtaining the query result; and the plan optimization subunit is used for optimizing the physical query plan so as to select an optimal path for acquiring the query result, and obtain the query execution plan.
Optionally, in an optional implementation manner described above, the physical plan generating subunit is further configured to, after the query block is converted into a logical query plan, rewrite the logical query plan according to an association relationship between data sources queried by the SQL statement.
In an optional embodiment of this embodiment, the apparatus may further include: and the query result returning module is used for returning the query result to the querier after the query result merging module obtains the query result.
It should be noted that, each functional module described in the foregoing embodiments is a functional module of the apparatus, and in a specific application, the functions of multiple functional modules may be integrated on one module to be implemented, or the functions of one functional module may be implemented by being divided into multiple modules, which is not limited in this embodiment. For example, in a specific application, the query plan analyzing module 501 and the query plan dividing module 502 may be implemented by the query plan analyzer in fig. 4, and the query task distributing module 503, the query result receiving module 504, and the query result merging module 505 may be implemented by the task negotiation distributor in fig. 4.
FIG. 6 is a block diagram illustrating a computer device 600 for a cross-domain data query method in accordance with an exemplary embodiment. For example, the computer device 600 may be provided as a server. Referring to fig. 6, the computer device 600 includes a processor 601, and the number of processors may be set to one or more as necessary. The computer device 600 further comprises a memory 602 for storing instructions, such as application programs, executable by the processor 601. The number of the memories can be set to one or more according to needs. Which may store one or more application programs. The processor 601 is configured to execute instructions to perform the above-described method.
As will be appreciated by one skilled in the art, the embodiments herein may be provided as a method, apparatus (device), or computer program product. Accordingly, the present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media having computer-usable program code embodied in the medium. Computer storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data, including, but not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical disk storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer, and the like. In addition, communication media typically embodies computer readable instructions, data structures, program modules or other data in a modulated data signal such as a carrier wave or other transport mechanism and includes any information delivery media as known to those skilled in the art.
The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (devices) and computer program products according to embodiments herein. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that an article or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such article or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of additional like elements in the article or device comprising the element.
While the preferred embodiments herein have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following appended claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of this disclosure.
It will be apparent to those skilled in the art that various changes and modifications may be made herein without departing from the spirit and scope thereof. Thus, it is intended that such changes and modifications be included herein, provided they come within the scope of the appended claims and their equivalents.

Claims (12)

1. A cross-domain data query method is characterized by comprising the following steps:
analyzing an input Structured Query Language (SQL) statement to generate a query execution plan, wherein the query execution plan records operations required for executing the SQL statement;
generating a plurality of sub-query tasks according to the query execution plan, wherein each sub-query task records data source information required to be processed by the sub-query task and a field processed by the sub-query task, the data source information comprises a data directory of a data source, the data directory records link addresses of metadata of each base table describing a locally stored database and a data table, and each metadata corresponds to the database and the data table described by the metadata;
according to the data directory recorded by each sub-query task, sending each sub-query task to a data system corresponding to the data directory recorded by the sub-query task;
receiving data results obtained by each data system after executing the sub-query tasks received by the data system;
and merging all the received data results to obtain a query result.
2. The method of claim 1, wherein analyzing the input Structured Query Language (SQL) statement to generate a query execution plan comprises:
performing lexical analysis and syntax analysis on the input SQL statement to acquire key information corresponding to the SQL statement, wherein the key information comprises: the SQL statement inquires a field list, an inquiry condition and a data directory of a data source inquired by the SQL statement;
generating an abstract syntax tree according to the results of lexical analysis and syntax analysis on the SQL statement, wherein the abstract syntax tree comprises a plurality of TOKEN objects;
traversing the abstract syntax tree, and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks;
and generating the query execution plan corresponding to the query block according to the query block.
3. The method of claim 2, wherein generating the query execution plan corresponding to the query block from the query block comprises:
converting the query block into a logical query plan to determine a logical operation to execute the SQL statement;
analyzing the logic query plan, and converting the logic query plan into a physical query plan to obtain a path for acquiring the query result;
and optimizing the physical query plan to select an optimal path for acquiring the query result so as to obtain the query execution plan.
4. The method of claim 3, wherein after converting the query block into a logical query plan, prior to analyzing the logical query plan, the method further comprises:
and rewriting the logic query plan according to the incidence relation between the data sources queried by the SQL statement.
5. The method according to any one of claims 1 to 4, wherein after combining all the received data results to obtain a query result, the method further comprises:
and returning the query result to the inquirer.
6. A cross-domain data query apparatus, comprising:
the query plan analysis module is used for analyzing an input Structured Query Language (SQL) statement and generating a query execution plan, wherein the query execution plan records the operation required by executing the SQL statement;
the query plan dividing module is used for generating a plurality of sub-query tasks according to the query execution plan, wherein each sub-query task records data source information required to be processed by the sub-query task and a field processed by the sub-query task, the data source information comprises a data directory of a data source, the data directory records link addresses of database tables and metadata of each base table of the database tables, and the metadata of each base table corresponds to the database and the data table described by the metadata;
the query task distribution module is used for respectively sending each sub-query task to a data system corresponding to the data directory recorded by the sub-query task according to the data directory recorded by each sub-query task;
the query result receiving module is used for receiving data results obtained by the data systems after the data systems execute the sub-query tasks received by the data systems;
and the query result merging module is used for merging all the received data results to obtain query results.
7. The apparatus of claim 6, wherein the query plan analysis module comprises:
the syntax parsing unit is used for performing lexical analysis and syntax parsing on the input SQL statement, acquiring key information corresponding to the SQL statement, and generating an abstract syntax tree according to the result of the lexical analysis and syntax parsing on the SQL statement, wherein the key information comprises: the SQL statement inquires a field list, inquiry conditions and a data directory of a data source inquired by the SQL statement, wherein the abstract syntax tree comprises a plurality of TOKEN objects;
the semantic analysis unit is used for traversing the abstract syntax tree and recording the attributes of different TOKEN nodes on the abstract syntax tree so as to convert the TOKEN nodes into query blocks;
and the query plan generating unit is used for generating the query execution plan corresponding to the query block according to the query block.
8. The apparatus of claim 7, wherein the query plan generating unit comprises:
a logic plan generating subunit, configured to convert the query block into a logic query plan to determine to execute a logic operation of the SQL statement;
a physical plan generating subunit, configured to analyze the logical query plan, and convert the logical query plan into a physical query plan to obtain a path for obtaining the query result;
and the plan optimization subunit is used for optimizing the physical query plan so as to select an optimal path for acquiring the query result, and obtain the query execution plan.
9. The apparatus of claim 8, wherein the physical plan generating subunit is further configured to, after converting the query block into a logical query plan, rewrite the logical query plan according to an association relationship between data sources queried by the SQL statement.
10. The apparatus of any one of claims 5 to 9, further comprising:
and the query result returning module is used for returning the query result to the querier after the query result merging module obtains the query result.
11. A computer-readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed, implements the steps of the method according to any one of claims 1 to 5.
12. A computer device comprising a processor, a memory and a computer program stored on the memory, characterized in that the steps of the method according to any of claims 1 to 5 are implemented when the computer program is executed by the processor.
CN201911309743.6A 2019-12-18 2019-12-18 Cross-domain data query method and device Pending CN111190924A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911309743.6A CN111190924A (en) 2019-12-18 2019-12-18 Cross-domain data query method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911309743.6A CN111190924A (en) 2019-12-18 2019-12-18 Cross-domain data query method and device

Publications (1)

Publication Number Publication Date
CN111190924A true CN111190924A (en) 2020-05-22

Family

ID=70707354

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911309743.6A Pending CN111190924A (en) 2019-12-18 2019-12-18 Cross-domain data query method and device

Country Status (1)

Country Link
CN (1) CN111190924A (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782682A (en) * 2020-06-30 2020-10-16 北京金山云网络技术有限公司 Data query method, device, equipment and storage medium
CN111913986A (en) * 2020-08-03 2020-11-10 支付宝(杭州)信息技术有限公司 Query optimization method and device
CN112527848A (en) * 2020-12-22 2021-03-19 苏州科达科技股份有限公司 Multi-data-source-based report data query method, device, system and storage medium
CN112579610A (en) * 2020-12-23 2021-03-30 安徽航天信息有限公司 Multi-data source structure analysis method, system, terminal device and storage medium
CN112699141A (en) * 2020-12-29 2021-04-23 医渡云(北京)技术有限公司 Data query method and device for multi-source heterogeneous data, storage medium and equipment
CN112860730A (en) * 2021-03-29 2021-05-28 中信银行股份有限公司 SQL statement processing method and device, electronic equipment and readable storage medium
CN112925801A (en) * 2021-02-26 2021-06-08 第四范式(北京)技术有限公司 Method and system for realizing real-time query service based on SQL query statement
CN112988801A (en) * 2021-04-07 2021-06-18 拉卡拉支付股份有限公司 Data processing method, data processing apparatus, electronic device, storage medium, and program product
CN113032465A (en) * 2021-05-31 2021-06-25 北京谷数科技股份有限公司 Data query method and device, electronic equipment and storage medium
CN113177062A (en) * 2021-05-25 2021-07-27 深圳前海微众银行股份有限公司 Data query method and device
CN113254547A (en) * 2021-05-27 2021-08-13 北京达佳互联信息技术有限公司 Data query method, device, server and storage medium
CN113568930A (en) * 2021-08-12 2021-10-29 威讯柏睿数据科技(北京)有限公司 Method and equipment for optimizing distributed memory data query
CN113672651A (en) * 2021-08-24 2021-11-19 杭州海康威视数字技术股份有限公司 Task execution method and device and electronic equipment
CN113919877A (en) * 2021-10-15 2022-01-11 深圳市酷开网络科技股份有限公司 Method and device for processing human-circled task progress based on DMP platform and readable storage medium
CN114443699A (en) * 2022-01-27 2022-05-06 腾讯科技(深圳)有限公司 Information query method and device, computer equipment and computer readable storage medium
CN115994152A (en) * 2023-03-24 2023-04-21 云账户技术(天津)有限公司 Verification method, device, equipment and storage medium of MySQL query statement
CN116595232A (en) * 2023-05-24 2023-08-15 杭州金智塔科技有限公司 Cross-data-source data processing system, method and device
CN117251472A (en) * 2023-11-16 2023-12-19 中邮消费金融有限公司 Cross-source data processing method, device, equipment and storage medium
WO2024046015A1 (en) * 2022-08-29 2024-03-07 支付宝(杭州)信息技术有限公司 Data query method and apparatus, storage medium, and electronic device

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052635A (en) * 2017-12-20 2018-05-18 江苏瑞中数据股份有限公司 A kind of heterogeneous data source unifies conjunctive query method
CN110059103A (en) * 2019-04-28 2019-07-26 南京大学 A kind of cross-platform unified big data SQL query method
CN110263105A (en) * 2019-05-21 2019-09-20 北京百度网讯科技有限公司 Inquiry processing method, query processing system, server and computer-readable medium
CN110399388A (en) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 Data query method, system and equipment

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052635A (en) * 2017-12-20 2018-05-18 江苏瑞中数据股份有限公司 A kind of heterogeneous data source unifies conjunctive query method
CN110059103A (en) * 2019-04-28 2019-07-26 南京大学 A kind of cross-platform unified big data SQL query method
CN110263105A (en) * 2019-05-21 2019-09-20 北京百度网讯科技有限公司 Inquiry processing method, query processing system, server and computer-readable medium
CN110399388A (en) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 Data query method, system and equipment

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111782682A (en) * 2020-06-30 2020-10-16 北京金山云网络技术有限公司 Data query method, device, equipment and storage medium
CN111782682B (en) * 2020-06-30 2024-01-02 北京金山云网络技术有限公司 Data query method, device, equipment and storage medium
CN111913986A (en) * 2020-08-03 2020-11-10 支付宝(杭州)信息技术有限公司 Query optimization method and device
CN111913986B (en) * 2020-08-03 2024-04-16 支付宝(杭州)信息技术有限公司 Query optimization method and device
CN112527848B (en) * 2020-12-22 2023-05-12 苏州科达科技股份有限公司 Report data query method, device and system based on multiple data sources and storage medium
CN112527848A (en) * 2020-12-22 2021-03-19 苏州科达科技股份有限公司 Multi-data-source-based report data query method, device, system and storage medium
CN112579610A (en) * 2020-12-23 2021-03-30 安徽航天信息有限公司 Multi-data source structure analysis method, system, terminal device and storage medium
CN112699141A (en) * 2020-12-29 2021-04-23 医渡云(北京)技术有限公司 Data query method and device for multi-source heterogeneous data, storage medium and equipment
CN112925801A (en) * 2021-02-26 2021-06-08 第四范式(北京)技术有限公司 Method and system for realizing real-time query service based on SQL query statement
CN112860730A (en) * 2021-03-29 2021-05-28 中信银行股份有限公司 SQL statement processing method and device, electronic equipment and readable storage medium
CN112988801A (en) * 2021-04-07 2021-06-18 拉卡拉支付股份有限公司 Data processing method, data processing apparatus, electronic device, storage medium, and program product
CN113177062A (en) * 2021-05-25 2021-07-27 深圳前海微众银行股份有限公司 Data query method and device
WO2022247201A1 (en) * 2021-05-25 2022-12-01 深圳前海微众银行股份有限公司 Data query method and apparatus
CN113254547A (en) * 2021-05-27 2021-08-13 北京达佳互联信息技术有限公司 Data query method, device, server and storage medium
CN113254547B (en) * 2021-05-27 2024-04-16 北京达佳互联信息技术有限公司 Data query method, device, server and storage medium
CN113032465A (en) * 2021-05-31 2021-06-25 北京谷数科技股份有限公司 Data query method and device, electronic equipment and storage medium
CN113032465B (en) * 2021-05-31 2021-09-10 北京谷数科技股份有限公司 Data query method and device, electronic equipment and storage medium
CN113568930A (en) * 2021-08-12 2021-10-29 威讯柏睿数据科技(北京)有限公司 Method and equipment for optimizing distributed memory data query
CN113672651A (en) * 2021-08-24 2021-11-19 杭州海康威视数字技术股份有限公司 Task execution method and device and electronic equipment
CN113672651B (en) * 2021-08-24 2024-06-04 杭州海康威视数字技术股份有限公司 Task execution method and device and electronic equipment
CN113919877A (en) * 2021-10-15 2022-01-11 深圳市酷开网络科技股份有限公司 Method and device for processing human-circled task progress based on DMP platform and readable storage medium
CN114443699A (en) * 2022-01-27 2022-05-06 腾讯科技(深圳)有限公司 Information query method and device, computer equipment and computer readable storage medium
WO2024046015A1 (en) * 2022-08-29 2024-03-07 支付宝(杭州)信息技术有限公司 Data query method and apparatus, storage medium, and electronic device
CN115994152A (en) * 2023-03-24 2023-04-21 云账户技术(天津)有限公司 Verification method, device, equipment and storage medium of MySQL query statement
CN116595232A (en) * 2023-05-24 2023-08-15 杭州金智塔科技有限公司 Cross-data-source data processing system, method and device
CN117251472A (en) * 2023-11-16 2023-12-19 中邮消费金融有限公司 Cross-source data processing method, device, equipment and storage medium
CN117251472B (en) * 2023-11-16 2024-02-27 中邮消费金融有限公司 Cross-source data processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111190924A (en) Cross-domain data query method and device
KR101169091B1 (en) Prescribed navigation using topology metadata and navigation path
US9778967B2 (en) Sophisticated run-time system for graph processing
US9304835B1 (en) Optimized system for analytics (graphs and sparse matrices) operations
US11593357B2 (en) Databases and methods of storing, retrieving, and processing data
Sellami et al. Complex queries optimization and evaluation over relational and NoSQL data stores in cloud environments
US10783193B2 (en) Program, method, and system for execution of software services
CN110019314B (en) Dynamic data packaging method based on data item analysis, client and server
CN112015754A (en) Data query method, device and system
CN112182045A (en) Metadata management method and device, computer equipment and storage medium
US20110055373A1 (en) Service identification for resources in a computing environment
US20230185639A1 (en) Mapping application programming interface schemas with semantic representations
CN114356964A (en) Data blood margin construction method and device, storage medium and electronic equipment
US10169725B2 (en) Change-request analysis
CN105447040B (en) Binary file management and updating method, device and system
US11262986B2 (en) Automatic software generation for computer systems
CN110245184B (en) Data processing method, system and device based on tagSQL
US11016830B2 (en) Entity-based service operation for object-based persistence
CN112541001A (en) Data query method, device, storage medium and equipment
CN111221860A (en) Mixed query optimization method and device based on big data
US11354312B2 (en) Access-plan-based querying for federated database-management systems
US20210311942A1 (en) Dynamically altering a query access plan
CN111368146A (en) Path information query method and device, storage medium and processor
US11991254B1 (en) Ontology-based approach for modeling service dependencies in a provider network
US11809919B2 (en) Central event catalog

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination