CN112699141A - Data query method and device for multi-source heterogeneous data, storage medium and equipment - Google Patents

Data query method and device for multi-source heterogeneous data, storage medium and equipment Download PDF

Info

Publication number
CN112699141A
CN112699141A CN202011588689.6A CN202011588689A CN112699141A CN 112699141 A CN112699141 A CN 112699141A CN 202011588689 A CN202011588689 A CN 202011588689A CN 112699141 A CN112699141 A CN 112699141A
Authority
CN
China
Prior art keywords
data
query
sub
source
structured query
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011588689.6A
Other languages
Chinese (zh)
Inventor
娄志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
NANJING YIJI CLOUD MEDICAL DATA RESEARCH INSTITUTE Co.,Ltd.
Original Assignee
Yidu Cloud Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yidu Cloud Beijing Technology Co Ltd filed Critical Yidu Cloud Beijing Technology Co Ltd
Priority to CN202011588689.6A priority Critical patent/CN112699141A/en
Publication of CN112699141A publication Critical patent/CN112699141A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data query method, a device, a computer readable storage medium and equipment of multi-source heterogeneous data, aiming at the multi-source heterogeneous data, the method analyzes a received structured query statement, determines a sub-data source to be queried and query semantics, generates a query statement suitable for the type of the sub-data source, queries each sub-data source respectively to obtain corresponding sub-target information, and integrates the obtained sub-target information into target information according to the query semantics, so that the query of the multi-source heterogeneous data can be realized by using one query interface, the query interface of data query is simplified, the query semantics maintenance of various data source types is avoided, the method has very strong applicability to the multi-source heterogeneous data, and the reusability of data query codes is obviously increased. The query efficiency is effectively improved, and computing resources and storage resources are saved.

Description

Data query method and device for multi-source heterogeneous data, storage medium and equipment
Technical Field
The invention relates to the technical field of data processing, in particular to a data query method and device for multi-source heterogeneous data, a computer readable storage medium and equipment.
Background
With the rapid development of big data technology, databases with various structures are widely applied. Many enterprise data systems are also built based on databases of various structures. For example, for a data system with a high instantaneity requirement, the data system can be constructed based on relational database architectures such as mysql, oracle, sql server, mongo and the like; and for a data system with general instantaneity requirements, the data system can be constructed based on hive, presto, spark and hadoop database architectures. The data system can configure data sources with various data structures, and requirements of enterprises on the data sources are met to a great extent.
However, in the current multi-data source data system constructed based on a plurality of different database structures, the business logic is very complex. The implementation manner of performing data query based on multiple data sources mainly adopts an Application Programming Interface (API) corresponding to the open based on the existing service system or an SDK (Software Development Kit) corresponding to the data system to perform query. If a plurality of data sources have logical relations, after data are acquired from different data sources, the acquired data are organized into a data form meeting requirements. Also, the query statements supported by different data sources are different, for example: the mysql database supports sql queries; while the mongoDB database supports json queries and not sql queries. Therefore, when the existing data system is used for data query, the data request needs to be converted into different query statements to query different data sources. The data query efficiency is seriously influenced, and in order to adapt to the writing, debugging and running of query statements of different database structures, a larger computer memory and storage resources are consumed.
Disclosure of Invention
In order to solve the above problems in the data query process, embodiments of the present invention creatively provide a data query method, an apparatus, a computer-readable storage medium, and a device for multi-source heterogeneous data.
In a first aspect, the present invention provides a data query method for multi-source heterogeneous data, where the method includes: receiving a first structured query statement; analyzing the first structured query statement to obtain an analysis result, wherein the analysis result is used for showing the subdata source information and the query semantics corresponding to the first structured query statement; generating a plurality of second structured query statements according to the subdata source information; performing data query according to the second structured query statement to obtain sub-target information; integrating the sub-target information into target information according to the query semantics; and outputting the target information.
Preferably, the parsing the first structured query statement to obtain a parsing result includes: and disassembling the first structured query statement in an abstract syntax tree form to determine subdata source information and query semantics corresponding to the first structured query statement.
Preferably, the generating a plurality of second structured query statements according to the sub data source information includes: determining the data source type of the sub data source according to the sub data source information; and generating a second structured query statement supported by the sub data source of the data source type according to the data source type.
Preferably, the generating a plurality of second structured query statements according to the sub data source information includes: judging whether the subdata source supports data partitioning or not according to the subdata information; and when the sub data source supports data partitioning, generating a second structured query statement of the corresponding sub data source according to a data partitioning rule.
Preferably, the performing data query according to the second structured query statement to obtain sub-target information includes: executing a multi-threaded query according to a plurality of the second structured query statements; and receiving corresponding sub-target information returned by aiming at each second structured query statement.
Preferably, the receiving a first structured query statement comprises: and receiving an HTTP request, wherein the parameters of the HTTP request comprise SQL statements.
Preferably, the method further comprises: and caching the sub-target information for carrying out data query operation according to the corresponding second structured query statement at the next time.
In a second aspect, the present invention provides a data query apparatus for multi-source heterogeneous data, the apparatus comprising: a receiving module, configured to receive a first structured query statement; the analysis module is used for analyzing the first structured query statement to obtain an analysis result, and the analysis result is used for showing a subdata source and query semantics corresponding to the first structured query statement; the query statement generation module is used for generating a plurality of second structured query statements according to the subdata sources; the query module is used for carrying out data query according to the second structured query statement to obtain sub-target information; the information integration module is used for integrating the sub-target information into target information according to the query semantics; and the output module is used for outputting the target information.
In a third aspect, the present invention provides an apparatus comprising: one or more processors; a storage device, configured to store one or more programs, when the one or more programs are executed by the one or more processors, cause the one or more processors to implement the data query method for multi-source heterogeneous data according to any one of the first aspects.
In a fourth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the data query method for multi-source heterogeneous data according to any one of the first aspects.
The embodiment of the invention analyzes the received structured query statement aiming at multi-source heterogeneous data, determines the subdata source and the query semantics which need to be queried, generates the query statement suitable for the type of the subdata source, queries each subdata source respectively to obtain corresponding sub-target information, and integrates the obtained sub-target information into the target information according to the query semantics. Therefore, the query of the multi-source heterogeneous data can be realized by using one interface, the query interface of the data query is simplified, the query semantic maintenance of various data source types is avoided, the method has very strong applicability to the multi-source heterogeneous data, and the reusability of data query codes is obviously increased. The query efficiency is effectively improved, and computing resources and storage resources are saved.
It is to be understood that the teachings of the present invention need not achieve all of the above-described benefits, but rather that specific embodiments may achieve specific technical results, and that other embodiments of the present invention may achieve benefits not mentioned above.
Drawings
The above and other objects, features and advantages of exemplary embodiments of the present invention will become readily apparent from the following detailed description read in conjunction with the accompanying drawings. Several embodiments of the invention are illustrated by way of example, and not by way of limitation, in the figures of the accompanying drawings and in which:
in the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.
Fig. 1 is a schematic flow chart illustrating an implementation of a data query method for multi-source heterogeneous data according to an embodiment of the present invention;
fig. 2 is a schematic flow chart illustrating an implementation of a data query method for multi-source heterogeneous data according to another embodiment of the present invention;
fig. 3 is a schematic structural diagram illustrating a data query apparatus for multi-source heterogeneous data according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention.
Detailed Description
The principles and spirit of the present invention will be described with reference to a number of exemplary embodiments. It is understood that these embodiments are given only to enable those skilled in the art to better understand and to implement the present invention, and do not limit the scope of the present invention in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
The technical solution of the present invention is further elaborated below with reference to the drawings and the specific embodiments.
Fig. 1 is a schematic flow chart illustrating an implementation process of a data query method for multi-source heterogeneous data according to an embodiment of the present invention.
Referring to fig. 1, a data query method for multi-source heterogeneous data according to an embodiment of the present invention at least includes the following operation flows.
At operation 101, a first structured query statement is received.
In this embodiment of the present invention, the first structured query statement may be an sql statement, which may be input through an HTTP interface.
For example, a data system composed of a plurality of sub-data sources with different database structure types may be configured in advance. Specifically, information of the sub data sources may be configured, for example: the user name, the password, the database type, the service address and the port and the like corresponding to each sub data source.
Furthermore, the metadata information can be configured according to the requirement of the data service. For example, the following steps are carried out: 1. general add query conditions, such as: all the query operations are performed on data in a fixed time period, the fixed time period can be configured as a fixed query condition, and the query condition is automatically added each time the data query operation is performed. 2. The mapping relationship of the configuration fields, for example: the field mapping relation can be defined aiming at one sub data source in advance, and when interactive query is executed, the field mapping requirement can be automatically completed according to the predefined field mapping relation. 3. The data system is configured with the custom view, and the view can contain the logic relationship among the sub data sources, so that when the query requirement exists, the query can be carried out based on the pre-configured custom view without researching the logic relationship among the sub data sources for many times. 4. And configuring the partition query for the sub data source according to the query requirement, for example: for a child data source with a requirement based on a time partition query or a keyword partition query, the configuration of the partition query is predefined in the metadata. For example: a field defining a key or a field of a time partition, and an enumerated value corresponding to the key field or the time partition field. Thus, when a sub-data source is subjected to partition query, the partition query can be performed according to a predefined enumeration value.
Through the above operations, a data system is predefined and an HTTP interface that can be used for data query is configured. And receiving a first structured query statement sent by a data demand party in an access mode of an HTTP interface. The first structured query statement may be an sql statement, for example: field _ a, field _ b from a left join b on primary id b primary id '123'.
In operations 102 to 103, the first structured query statement is analyzed to obtain an analysis result, where the analysis result is used to show the sub-data source information and the query semantics corresponding to the first structured query statement. And generating a plurality of second structured query statements according to the sub data source information.
In this embodiment of the present invention, the sub data source information may include a name, a user name, a password, a data source type, and the like of the sub data source, for example: mysql, mongo, hive, presto, spark, hadoop, etc.
The parsing of the first structured query statement is to determine names, data structures, etc. of the plurality of sub data sources involved in the first structured query statement. Query semantics may include query semantics across multiple data sources, such as: inner join, left join, right join, full join, etc. inner join only retains the result set of the complete match in the two tables when the two tables are subjected to join query. left join returns all rows of the left table when two tables are connected for a query, even if there are no matching records in the right table. Right join returns all the rows of the right table when two tables are subjected to join queries, even if there are no matching records in the left table. full join returns all the non-matching rows in the left and right tables when two tables are subjected to join query. Query semantics may also include processing within a single child data source, such as: an input directory from, an input directory join, an association condition on of a multi-table, a filter condition where, a grouping by, and a filter condition having, etc.
For example, the sql statement "select a. field _ a, b. field _ b from a left join b. primary _ id ═ b. primary _ id ═ 123'" is analyzed, and it can be known that the data to be queried in the sql statement is derived from the sub-data source a and the sub-data source b.
In this embodiment of the present invention, a second structural query statement for the corresponding data source is generated according to the data source type of the sub data source. For example, after the sql query statement is analyzed, a data query related to the sub-data source a and the sub-data source b is obtained, and if the data of the sub-data source a includes a filtering operation where, then a query for the sub-data source a is generated as select a.field _ a, a.primary _ id from a _ where a.primary _ id ═ 123'. And the query statement for the sub data source b becomes select b. Here, if the data source type of the sub data source a is mysql and the sql query statement is supported, it can be known that select a.field _ a, a.primary _ id from a where a.primary _ id ═ 123' is directly applied as a second structured query statement. If the data source type of the child data source b is mongo, then select b.field _ b, b.primary _ id from b can be translated to db.t _ b.find ({ }, { "pid": 1. "field _ b": 1 }).
In this embodiment of the present invention, according to the sub-data source information, the generated plurality of second structured query statements do not reflect query semantics among the plurality of sub-data sources, but perform data query for each sub-data source.
For example, the query semantics of the left join between the sub data source a and the sub data source b in the sql query statement indicates that all rows of the sub data source a are returned when the sub data source a and the sub data source b are queried in a connected manner, even though there is no matching record in the sub data source b.
Here, the queried data is not directly processed, but a second structure query statement for the sub data source a and a query statement for the sub data source b are generated, and after corresponding data is obtained respectively, the obtained data is integrated. Therefore, the data system can record the semantic analysis process of each data source in the data query process in detail, can quickly locate problems when the data query has problems, obviously improves the problem processing efficiency in the data query process, and effectively saves the calculation and analysis resources of a computer.
And operation 104, performing data query according to the second structured query statement to obtain the sub-target information.
In this embodiment of the present invention, the sub-data sources to be queried by using the second structured query statement support the query statement, so that the data system can obtain sub-target information from the corresponding sub-data sources respectively according to the received first structured query statement and a plurality of second structured statements obtained after the processing of operations 102 and 103.
For example: two second structured query statements are obtained in operation 103: 1. field _ a, primary _ id from a person a primary _ id { '123', 2, db.t _ b.find ({ }, { "pid": 1. "field _ b": 1 }). In the operation 104, the data to be queried is obtained from the sub-data source a with the data source type mysql according to the query statement 1, and the data to be queried is obtained from the sub-data source b with the data source type mongo according to the query statement 2.
Operation 105 integrates the sub-target information into target information according to the query semantics.
In this embodiment of the present invention, after the required data is acquired from each of the plurality of sub-data sources, the acquired data is integrated into the final destination information according to the query semantics between the plurality of sub-data sources.
For example, for the first structural word, a query statement "select a. field _ a, b. field _ b from a left join b. primary _ id ═ b. primary _ id where a. primary _ id ═ 123'", defines the left join relationship between the sub-data source a and the sub-data source b. Corresponding sub-target information is obtained from the sub-data source a and the sub-data source b, respectively, according to operation 104, for example: the sub-directory information obtained from the sub-data source a is recorded as table a1, and the sub-directory information obtained from the sub-data source B is recorded as table B1. Then in this operation 105 the data of table a1 and table B1 are consolidated and the left join indicates that all the rows in the left table are returned when a join query is made to the two tables, even if there are no matching records in the right table, and therefore, in this operation, the destination information is determined to be all the rows in table a 1.
In operation 106, target information is output.
In this embodiment of the present invention, the target information finally required is determined and then fed back to the data demander.
Therefore, the data query of the multi-source heterogeneous data is realized. According to the technical scheme, the beneficial effects of the embodiment are as follows: aiming at multi-source heterogeneous data, analyzing a received structured query statement, determining a sub-data source and query semantics which need to be queried, generating a query statement suitable for the type of the sub-data source, respectively querying each sub-data source to obtain corresponding sub-target information, and integrating the obtained sub-target information into target information according to the query semantics, so that the query of the multi-source heterogeneous data can be realized by using one query interface, the query interface of data query is simplified, the query semantics maintenance of various data source types is avoided, the method has very strong applicability to the multi-source heterogeneous data, and the reusability of data query codes is remarkably increased. The query efficiency is effectively improved, and computing resources and storage resources are saved.
Fig. 1 shows only a basic embodiment of the method of the present invention, and based on this, certain optimization and expansion can be performed, and other preferred embodiments of the method can also be obtained.
Fig. 2 is a schematic flow chart illustrating an implementation process of a data query method for multi-source heterogeneous data according to another embodiment of the present invention. In this embodiment of the present invention, the data query method for multi-source heterogeneous data includes the following operation steps:
in operations 201 to 202, an HTTP request is received, and parameters of the HTTP request include SQL statements. And disassembling the first structured query statement in an abstract syntax tree form to determine the subdata source information and the query semantics corresponding to the first structured query statement.
In this embodiment of the present invention, for a data system, the source configuration and the metadata configuration of the sub data sources are performed in advance. And configures a plurality of optimization rules and engine rules for the sub data query.
When the sql query statement is received through the HTTP interface, the first structured query statement is disassembled in the form of an Abstract Syntax Tree (AST) according to the preconfigured metadata rule, the engine rule, the data optimization rule, and the like, so as to determine the sub data source and the query semantics corresponding to the first structured query statement.
First, to illustrate the configuration of engine rules in a data system, for one sql query statement, the definition of its corresponding metadata may have only one definition, but the data sources may be two. For example: for a common usage scenario of big data, OLAP (Online Analytical Processing), where there is a table a existing in the hive database, the common query entry of hive is an ad hoc query entry similar to presto or spark. However, some statistical queries are very time consuming in ad hoc queries, such as: a select count (discrete _ component _ id) as pidmumber from a group by. Thus, for common applications of big data, a special OLAP analysis system ad hoc query portal may be configured for executing an OLAP query, such as: for the sql query statement select _ probability _ id from a where age 60, the data query may be performed using an ad hoc query engine. Meanwhile, an aggregation query engine is configured for the data system, so that a general statistical query can be performed by using the statistical query engine of the OLAP system as long as the computation function and the dimensionality both conform to the sql query, for example: druid, kylin, clickwouse, and the like.
In an embodiment of the present invention, a general optimization rule is configured in advance, and when an sql query statement is received, the received sql statement is analyzed based on the general rule and the antlr4 technique of the AST.
For example, the following general rules are preconfigured: 1. if a field in the where condition contains a large number of equivalence queries, merging the large number of equivalence queries, for example: the selected sum (atmospheric) from a where type is '1' or '2' or '3' or '4', optimized according to the general rule, and then becomes the selected sum (atmospheric) from a where type in ('1', '2', '3', '4'). 2. For no as operation after a complex calculation function, complement the complete corresponding as operation, for example: the selection sum (amunt) from a where type is ═ 1'or type ═ 2' or type ═ 3'or type ═ 4', and the optimization is performed according to the general rule, and then the selection sum (amunt) as sum _ a from a where type is ═ 1'or type ═ 2' or type ═ 3'or type ═ 4'.
Further, when analyzing the selection sum (atmospheric) as sum _ a from a where type ═ 1'or type ═ 2' or type ═ 3'or type ═ 4', it is possible to obtain:
Figure RE-GDA0002967010770000101
in one embodiment of the invention, the query engine is further validated according to preconfigured rules and metadata, such as: for the sql query statement select sum (amount) as sum _ a from a where type ═ 1'and status ═ NEW', the structure of the sub data source a in the metadata is defined as follows:
Figure RE-GDA0002967010770000102
Figure RE-GDA0002967010770000111
Figure RE-GDA0002967010770000121
in the metadata we define the fields of the child data source a, identifying which fields can be queried as dimensions and which fields can be queried as metrics.
By parsing the sql query statement, it can be confirmed that the sql query statement is an aggregated query containing a filter field. In the metadata, the data query can be performed by using olapEngine as an OLAP ad hoc query by matching, for example: a drive in the preconfigured engine rule.
For another example, sql query statement: select status, type from a limit 10, the sql query statement may be parsed into a query of detailed data, and may directly use ad hoc query to perform data lookup, where presto is defined in metadata.
At operation 203, a plurality of second structured query statements are generated according to the sub data source information.
In an embodiment of the present invention, a data source type of a sub data source is determined according to sub data source information; and generating a second structured query statement supported by the sub data source of the data source type according to the data source type.
The specific implementation flow of this embodiment is similar to the specific implementation processes of operations 102 to 103 in the embodiment shown in fig. 1, and is not described here again.
In an embodiment of the present invention, whether a sub data source supports data partitioning is determined according to sub data source information; and when the sub-data source supports the data partition, generating a second structured query statement of the corresponding sub-data source according to the data partition rule.
For example, partition rules for performing partition queries may be configured in the data system for data optimization, for example: partition information partitionInfo: { fieldName: createTime, partitionList: [2019-01-01,2019-07-01| 20190-7-01,2020-01-01 |2020-01-01,2020-07-01] } is defined in the metadata of the sub-data source inpatient. Then, upon receiving the sql statement that requires querying data from the child data source inpatient, a partition query may be employed, such as: the received sql query statement is: select sum (amount) as total _ amount from _ inactive. Then the sql query statement may be optimized using the partition in the child data source inpatient, and thus, the sql query statement may be decomposed into three second structure query statements: select sum (amount) as total _ amount from _ inactive where createTime > [ start time ] and createTime < [ end time ].
And the three second structure query statements respectively execute data query operation in the thread pool, respectively return corresponding data, and accumulate the query results of the three threads according to an accumulator sumACCUMULATOR corresponding to a computation function sum to obtain a final query result.
From operations 204 to 205, a multi-threaded query is executed according to the plurality of second structured query statements. Corresponding sub-target information returned for each second structured query statement is received.
For example, the received sql query statement is: select sum (amount) as sum _ a from a where type ═ 1'and status ═ NEW'. If a data partition is defined in the metadata, for example: partitionInfo { "partitionType": range "," fieldName ": createTime", partitionList [ "2020-01-01,2020-02-01", "2020-02-01,2020-03-01" ] }, then the sql query "select sum (amont) as sum _ a from a where type ═ 1'and status ═ NEW'" is parsed into two second structure query statements:
1、sql:select sum(amount)as sum_a from a where type='1'and status='NEW' and createTime>='2020-01-01'and createTime<'2020-02-01';
2、select sum(amount)as sum_a from a where type='1'and status='NEW'and createTime>='2020-02-01'and createTime<'2020-03-01'。
the sql query statement is decomposed into two second structure query statements, and the data query operation is performed at the same time, and the data query results { sum _ a:100}, { sum _ a:200}, are returned.
In another embodiment of the present invention, the metadata is defined as follows:
Figure RE-GDA0002967010770000131
Figure RE-GDA0002967010770000141
Figure RE-GDA0002967010770000151
one sub-query is defined above.
The received sql query statement is: a query term (argument) as sum _ a from a left join b on, type b, type where b, type 1, and status NEW 'corresponds to a query term (argument) as sum _ a from (choice status, type, sum) as argument from a left year >2019 group by status, type a, type b, type where b, type 1, and status NEW', whereby the written meaning of the query term is unambiguous and expresses a simple convention.
And for the sql query statement, the sql query statement is decomposed into two second structure query statements, wherein one is used for querying data from the sub-data source a with the data source type presto, and the other is used for querying data from the sub-data source b with the data source type mysql. And the two second structure query statements are simultaneously sent to the thread pool, so that data query can be simultaneously executed, and the data obtained by query can be returned.
Preferably, in another embodiment of the present invention, the actual data queried by the plurality of query statements of the second structure is further cached for the next query. In particular, the sub-target information may be cached for the next data query operation according to the corresponding second structured query statement. The Cache supports various Cache modes, such as redis, memcache, mongo, Guava Cache and the like.
Operation 206, integrate the sub-target information into the target information according to the query semantics.
For example, for an sql query statement: select sum (amount) as sum _ a from a where type ═ 1'and status ═ NEW', referring back to operation 205 above, splitting the sql query statement into two second structure query statements to perform query operations, then returning two results { sum _ a:100} and { sum _ a:200}, whereby an accumulator can be triggered, for example: triggering a sum accumulator in the sql query statement, and integrating the queried data according to the query semantics in the initially received sql query statement to obtain final target information.
If the request of the left join is mentioned in the sql query statement, after the two second structure query statements respectively acquire data, data integration is performed according to the definition of the left join in the sql query statement.
For example: the sql query statement is: select sum (a.around) as sum _ a, b.type _ name as name from a left join b.type where b.type ═ 1'and a.status ═ NEW', then split into two second structure query statements, triggering two requests:
1、select amount,type from(select status,type,sum(amount)as amount from a where year>2019group by status,type)as a where a.status='NEW'and b.type='1';
2、select type,type_name from b where type='1'。
the returned results of two second structure query statements executing the query operation are { amount:100, type: "1" } and { "type _ name": example "," type ": 1" }, because this is an operation of a left join, which triggers the execution of the left join, and the results of the join are { amount:100, "type": 1"," type _ name ": example", and then according to the requirement after selection, sum operation is executed on the execution field to complete data integration and obtain the target information.
Operation 207, output target information
The specific implementation process of operation 207 is similar to the specific implementation process of operation 106 in the embodiment shown in fig. 1, and is not described here again.
Further, based on the above data query method for multi-source heterogeneous data, an embodiment of the present invention further provides a data query apparatus for multi-source heterogeneous data, as shown in fig. 3, where the apparatus 30 includes: a receiving module 301, configured to receive a first structured query statement; the parsing module 302 is configured to parse the first structured query statement to obtain a parsing result, where the parsing result is used to show a sub data source and query semantics corresponding to the first structured query statement; a query statement generating module 303, configured to generate a plurality of second structured query statements according to the sub-data sources; the query module 304 is configured to perform data query according to the second structured query statement to obtain sub-target information; an information integration module 305, configured to integrate the sub-target information into target information according to query semantics; and an output module 306 for outputting the target information.
Further, on the basis of the embodiment shown in fig. 3, preferably, the parsing module 302 includes: and the disassembling submodule is used for disassembling the first structured query statement in an abstract syntax tree form so as to determine a subdata source and query semantics corresponding to the first structured query statement.
Preferably, the query statement generation module includes: the type determining submodule is used for determining the data source type of the sub data source; and the first generation submodule is used for generating a second structured query statement supported by the data source type according to the data source type.
Preferably, the query statement generating module 303 includes: the partition judgment submodule is used for judging whether the sub data source supports data partitioning or not; and the second generation submodule is used for generating a second structured query statement of the corresponding sub data source according to the data partition rule when the sub data source supports the data partition.
Preferably, the query module 304 comprises: a query submodule for executing a multi-threaded query according to a plurality of said second structured query statements; and the information receiving submodule is used for receiving corresponding sub-target information returned by aiming at each second structured query statement.
Preferably, the receiving module 301 includes: and the statement receiving submodule is used for receiving the HTTP request, and the parameters of the HTTP request comprise SQL statements.
Preferably, the method further comprises: and the cache module is used for caching the sub-target information so as to perform data query operation according to the corresponding second structured query statement at the next time.
Here, it should be noted that: the above description of the embodiment of the data query apparatus for multi-source heterogeneous data is similar to the description of the method embodiment shown in fig. 1 to 2, and has similar beneficial effects to the method embodiment shown in fig. 1 to 2, and therefore, the description is omitted. For technical details not disclosed in the embodiment of the data query apparatus for multi-source heterogeneous data of the present invention, please refer to the description of the method embodiments shown in fig. 1 to 2 of the present invention for understanding, and therefore, for brevity, no further description is provided.
Fig. 4 is a schematic structural diagram of an apparatus according to an embodiment of the present invention. On the hardware level, the device comprises a processor and optionally an internal bus, a network interface and a memory. The Memory may include a Memory, such as a Random-Access Memory (RAM), and may further include a non-volatile Memory, such as at least 1 disk Memory. Of course, the device may also include hardware required for other services.
The processor, the network interface, and the memory may be connected to each other via an internal bus, which may be an ISA (Industry Standard Architecture) bus, a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 4, but that does not indicate only one bus or one type of bus.
And the memory is used for storing the execution instruction. In particular, a computer program that can be executed by executing instructions. The memory may include both memory and non-volatile storage and provides execution instructions and data to the processor.
In a possible implementation manner, the processor reads corresponding execution instructions from the nonvolatile memory to the memory and then runs the corresponding execution instructions, and corresponding execution instructions can also be obtained from other equipment, so as to form the data query device of the multi-source heterogeneous data on a logic level. The processor executes the execution instruction stored in the memory, so that the data query method of the multi-source heterogeneous data provided by any embodiment of the invention is realized through the executed execution instruction.
The method executed by the data query apparatus for multi-source heterogeneous data according to the embodiment of the present invention shown in fig. 3 may be applied to a processor, or implemented by a processor. The processor may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and completes the steps of the method in combination with hardware of the processor.
An embodiment of the present invention further provides a readable storage medium, where the readable storage medium stores an execution instruction, and when the stored execution instruction is executed by a processor of an electronic device, the electronic device can execute the data query method for multi-source heterogeneous data provided in any embodiment of the present invention, and is specifically configured to execute the method shown in fig. 1 or fig. 2.
The device in the foregoing embodiments may be a computer.
It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.
The embodiments of the present invention are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.
The above are merely examples of the present invention, and are not intended to limit the present invention. Various modifications and alterations to this invention will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the scope of the claims of the present invention.

Claims (10)

1. A data query method of multi-source heterogeneous data is characterized by comprising the following steps:
receiving a first structured query statement;
analyzing the first structured query statement to obtain an analysis result, wherein the analysis result is used for showing the subdata source information and the query semantics corresponding to the first structured query statement;
generating a plurality of second structured query statements according to the subdata source information;
performing data query according to the second structured query statement to obtain sub-target information;
integrating the sub-target information into target information according to the query semantics;
and outputting the target information.
2. The method of claim 1, wherein parsing the first structured query statement to obtain a parsed result comprises:
and disassembling the first structured query statement in an abstract syntax tree form to determine subdata source information and query semantics corresponding to the first structured query statement.
3. The method of claim 1, wherein generating a plurality of second structured query statements according to the child data source information comprises:
determining the data source type of the sub data source according to the sub data source information;
and generating a second structured query statement supported by the data source type according to the data source type.
4. The method of claim 1, wherein generating a plurality of second structured query statements according to the child data source information comprises:
judging whether the subdata source supports data partitioning or not according to the subdata source information;
and when the sub data source supports data partitioning, generating a second structured query statement of the corresponding sub data source according to a data partitioning rule.
5. The method of claim 1, wherein the performing a data query according to the second structured query statement to obtain sub-target information comprises:
executing a multi-threaded query according to a plurality of the second structured query statements;
and receiving corresponding sub-target information returned by aiming at each second structured query statement.
6. The method of any of claims 1-5, wherein receiving the first structured query statement comprises:
and receiving an HTTP request, wherein the parameters of the HTTP request comprise SQL statements.
7. The method according to any one of claims 1-5, further comprising:
and caching the sub-target information for carrying out data query operation according to the corresponding second structured query statement at the next time.
8. An apparatus for querying data of multi-source heterogeneous data, the apparatus comprising:
a receiving module, configured to receive a first structured query statement;
the analysis module is used for analyzing the first structured query statement to obtain an analysis result, and the analysis result is used for showing the subdata source information and the query semantics corresponding to the first structured query statement;
the query statement generation module is used for generating a plurality of second structured query statements according to the subdata source information;
the query module is used for carrying out data query according to the second structured query statement to obtain sub-target information;
the information integration module is used for integrating the sub-target information into target information according to the query semantics;
and the output module is used for outputting the target information.
9. An apparatus, characterized in that the apparatus comprises:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the data query method for multi-source heterogeneous data of any of claims 1-7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a data query method for multi-source heterogeneous data according to any one of claims 1 to 7.
CN202011588689.6A 2020-12-29 2020-12-29 Data query method and device for multi-source heterogeneous data, storage medium and equipment Pending CN112699141A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011588689.6A CN112699141A (en) 2020-12-29 2020-12-29 Data query method and device for multi-source heterogeneous data, storage medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011588689.6A CN112699141A (en) 2020-12-29 2020-12-29 Data query method and device for multi-source heterogeneous data, storage medium and equipment

Publications (1)

Publication Number Publication Date
CN112699141A true CN112699141A (en) 2021-04-23

Family

ID=75513045

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011588689.6A Pending CN112699141A (en) 2020-12-29 2020-12-29 Data query method and device for multi-source heterogeneous data, storage medium and equipment

Country Status (1)

Country Link
CN (1) CN112699141A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177062A (en) * 2021-05-25 2021-07-27 深圳前海微众银行股份有限公司 Data query method and device
CN113468204A (en) * 2021-06-28 2021-10-01 深信服科技股份有限公司 Data query method, device, equipment and medium
CN113901083A (en) * 2021-09-14 2022-01-07 威讯柏睿数据科技(北京)有限公司 Heterogeneous data source operation resource analysis positioning method and equipment based on multiple analyzers
WO2023029854A1 (en) * 2021-09-03 2023-03-09 北京火山引擎科技有限公司 Data query method and apparatus, storage medium, and electronic device
WO2023115252A1 (en) * 2021-12-20 2023-06-29 Boe Technology Group Co., Ltd. Data query method, data query apparatus, and computer-program product

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9189515B1 (en) * 2013-03-08 2015-11-17 Amazon Technologies, Inc. Data retrieval from heterogeneous storage systems
CN106970943A (en) * 2017-02-21 2017-07-21 南京中新赛克科技有限责任公司 Fusion query method based on heterogeneous data source and distributed file system
CN108446289A (en) * 2017-09-26 2018-08-24 北京中安智达科技有限公司 A kind of data retrieval method for supporting heterogeneous database
CN110399388A (en) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 Data query method, system and equipment
CN110633292A (en) * 2019-09-19 2019-12-31 上海依图网络科技有限公司 Query method, device, medium, equipment and system for heterogeneous database
CN111190924A (en) * 2019-12-18 2020-05-22 中思博安科技(北京)有限公司 Cross-domain data query method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9189515B1 (en) * 2013-03-08 2015-11-17 Amazon Technologies, Inc. Data retrieval from heterogeneous storage systems
CN106970943A (en) * 2017-02-21 2017-07-21 南京中新赛克科技有限责任公司 Fusion query method based on heterogeneous data source and distributed file system
CN108446289A (en) * 2017-09-26 2018-08-24 北京中安智达科技有限公司 A kind of data retrieval method for supporting heterogeneous database
CN110399388A (en) * 2019-07-29 2019-11-01 中国工商银行股份有限公司 Data query method, system and equipment
CN110633292A (en) * 2019-09-19 2019-12-31 上海依图网络科技有限公司 Query method, device, medium, equipment and system for heterogeneous database
CN111190924A (en) * 2019-12-18 2020-05-22 中思博安科技(北京)有限公司 Cross-domain data query method and device

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113177062A (en) * 2021-05-25 2021-07-27 深圳前海微众银行股份有限公司 Data query method and device
WO2022247201A1 (en) * 2021-05-25 2022-12-01 深圳前海微众银行股份有限公司 Data query method and apparatus
CN113468204A (en) * 2021-06-28 2021-10-01 深信服科技股份有限公司 Data query method, device, equipment and medium
WO2023029854A1 (en) * 2021-09-03 2023-03-09 北京火山引擎科技有限公司 Data query method and apparatus, storage medium, and electronic device
CN113901083A (en) * 2021-09-14 2022-01-07 威讯柏睿数据科技(北京)有限公司 Heterogeneous data source operation resource analysis positioning method and equipment based on multiple analyzers
CN113901083B (en) * 2021-09-14 2023-05-12 北京柏睿数据技术股份有限公司 Heterogeneous data source operation resource analysis positioning method and equipment based on multiple resolvers
WO2023115252A1 (en) * 2021-12-20 2023-06-29 Boe Technology Group Co., Ltd. Data query method, data query apparatus, and computer-program product

Similar Documents

Publication Publication Date Title
CN112699141A (en) Data query method and device for multi-source heterogeneous data, storage medium and equipment
US10133778B2 (en) Query optimization using join cardinality
Hueske et al. Opening the black boxes in data flow optimization
CN110795455B (en) Dependency analysis method, electronic device, computer apparatus, and readable storage medium
US9141678B2 (en) Distributed query cache in a database system
US8332389B2 (en) Join order for a database query
US8903841B2 (en) System and method of massively parallel data processing
Bruno et al. Advanced join strategies for large-scale distributed computation
US8396852B2 (en) Evaluating execution plan changes after a wakeup threshold time
Gedik Generic windowing support for extensible stream processing systems
US11687546B2 (en) Executing conditions with negation operators in analytical databases
US8442971B2 (en) Execution plans with different driver sources in multiple threads
US8478733B2 (en) Substitute function in projection list
US20240045860A1 (en) Data query method and system, heterogeneous acceleration platform, and storage medium
CN108431766B (en) Method and system for accessing a database
CN112395303A (en) Query execution method and device, electronic equipment and computer readable medium
US20180365294A1 (en) Artificial intelligence driven declarative analytic platform technology
Rodrigues et al. Big data processing tools: An experimental performance evaluation
CN111198898A (en) Big data query method and big data query device
Creus Tomàs et al. RoSeS: A continuous content-based query engine for RSS feeds
JP2017537398A (en) Generating unstructured search queries from a set of structured data terms
US9824122B2 (en) Requests for source code text
EP4174680A1 (en) Sql unification method, system, and device, and medium
CN114238387A (en) Data query method and device, electronic equipment and storage medium
CN112765286A (en) Query method and device based on relational database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20210817

Address after: 211199 room 1502, 15 / F, building a, Yangzi science and innovation center, 211 pubin Road, Jiangbei new area, Nanjing, Jiangsu

Applicant after: NANJING YIJI CLOUD MEDICAL DATA RESEARCH INSTITUTE Co.,Ltd.

Address before: 100089 801, 8th floor, building 9, No.35 Huayuan North Road, Haidian District, Beijing

Applicant before: YIDU CLOUD Ltd.

TA01 Transfer of patent application right