CN117493429A - Processing system and method for heterogeneous data joint query - Google Patents

Processing system and method for heterogeneous data joint query Download PDF

Info

Publication number
CN117493429A
CN117493429A CN202311320529.7A CN202311320529A CN117493429A CN 117493429 A CN117493429 A CN 117493429A CN 202311320529 A CN202311320529 A CN 202311320529A CN 117493429 A CN117493429 A CN 117493429A
Authority
CN
China
Prior art keywords
data
query
module
sub
submodule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311320529.7A
Other languages
Chinese (zh)
Inventor
裴衡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
QIMING INFORMATION TECHNOLOGY CO LTD
Original Assignee
QIMING INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by QIMING INFORMATION TECHNOLOGY CO LTD filed Critical QIMING INFORMATION TECHNOLOGY CO LTD
Priority to CN202311320529.7A priority Critical patent/CN117493429A/en
Publication of CN117493429A publication Critical patent/CN117493429A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/256Integrating or interfacing systems involving database management systems in federated or virtual databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/2433Query languages
    • G06F16/2448Query languages for particular applications; for extensibility, e.g. user defined types
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24526Internal representations for queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24542Plan optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2141Access rights, e.g. capability lists, access control lists, access tables, access matrices
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Operations Research (AREA)
  • Automation & Control Theory (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a processing system and a processing method for heterogeneous data joint query, wherein the system comprises a data source connection and access module, a data model and grammar difference module, a query optimization and performance module, a data transmission and integration module, a security and authority control module and an exception handling and fault tolerance mechanism module. The processing system and the processing method for the heterogeneous data joint query improve the efficiency and the accuracy of data processing and provide better data query and analysis experience for users through the advantages of data integration, improvement of query flexibility, resource optimization, unified query interface, real-time query and analysis, expansibility and compatibility and the like.

Description

Processing system and method for heterogeneous data joint query
Technical Field
The invention relates to the field of database management and data query, in particular to a processing system and a processing method for heterogeneous data joint query.
Background
In modern enterprises and organizations, it has become commonplace to use a plurality of different database engines and clusters to manage and store data. However, there are challenges to doing federated queries directly on heterogeneous data, which need to address the shortcomings of the following prior art:
data replication and data integration: current methods typically involve copying heterogeneous data into a central data warehouse or data lake, then performing joint queries in the central store, requiring extensive data copying and synchronization operations, taking up storage space and network bandwidth, and placing a limit on the real-time nature of the data.
ETL extraction, conversion and loading processes: many organizations use ETL tools to extract heterogeneous data into a unified format for joint querying, however, ETL processes are complex and time consuming, requiring definition and maintenance of data transformation rules and workflow. In addition, the ETL process is processed in batches, and the requirement of real-time query cannot be met.
Database linking and cross-engine query: some database engines provide functionality to link to other engines, making it possible to access multiple engines in one query, however, links are often limited, applicable only to specific engines or limited data operations, and cannot meet complex federated query requirements. In addition, the query grammar and functions of different engines are different, so that the writing and debugging of the query statement are complex.
Data model differences: heterologous data typically has different data structures, data models, and query grammars. For example, there are differences in structured and unstructured data between relational databases and NoSQL databases, making it difficult to directly conduct joint queries. Current methods typically require data model conversion and mapping, adding additional development and maintenance costs.
Query performance and optimization: heterogeneous data joint queries often involve multiple data sources and complex query plans, query performance may be affected by data transmission, network delays, and data processing, in the prior art, query optimization is often performed for a single data source, the advantages of different data sources cannot be fully utilized, and global optimization strategies are lacking.
Thus, the prior art has several drawbacks in handling challenges based on heterogeneous data federated queries, including complexity of data replication and integration, limitations of ETL flow, limitations of database linking, handling of data model differences, and optimization of query performance.
In summary, while existing federated query techniques and tools provide powerful support, there is a lack of a system that helps users to more efficiently query and integrate data in heterogeneous database environments, increasing the flexibility and performance of data processing.
Disclosure of Invention
The invention aims to solve the technical problems of data source connection and access, data model and grammar difference, query optimization and performance, data transmission and integration, security and authority control and exception handling and fault tolerance mechanisms, and provides a processing system and method for heterogeneous data joint query.
A processing system for heterogeneous data joint query comprises a data source connection and access module, a data model and grammar difference module, a query optimization and performance module, a data transmission and integration module, a security and authority control module, an exception handling and fault tolerance mechanism module:
the data source connection and access module is used for connecting programs by using a database and configuring parameters for the database;
the data model and grammar difference module uses different data models and query grammars for different database engines and clusters;
the query optimization and performance module functions are to analyze query sentences and generate a query plan by using a query analysis and optimization technology;
the data transmission and integration module is used for carrying out data transmission, combination and integration among different data sources;
the security and authority control module is used for ensuring the security of data and perfecting the authority control mechanism of each data source when the cross-data source inquiry is carried out;
the exception handling and fault tolerance mechanism module functions to implement exception handling and fault tolerance mechanisms, ensuring correct execution of queries and results.
Further, the processing system for the heterogeneous data joint query is characterized in that the data source connection and access module comprises a data source connection sub-module and a data source access sub-module;
the data source connection submodule is used for supporting connection of different database engines and clusters by using a database connection driver or an API;
the data source access submodule is used for configuring connection parameters and authority verification information for each data source and realizing safe access to data.
Further, the processing system for the heterogeneous data joint query comprises a data model sub-module and a grammar difference sub-module;
the data model submodule functions are used for establishing a unified data model, mapping data of different data sources into the unified model and eliminating data model differences;
the grammar difference submodule functions are used for developing a cross-data-source query grammar converter, converting query sentences from one grammar to another grammar and realizing the requirements of different database engines.
Further, the system for processing the heterogeneous data joint query comprises a query optimization and performance module and a query acceleration operation module, wherein the query optimization and performance module comprises a query performance improvement sub-module and a query acceleration operation sub-module;
the function of the sub-module for improving the query performance is to select the best data source and query plan by utilizing the characteristic and index information of each data source;
the accelerated query operation submodule is used for distributing the query task to each data source and executing the query task in parallel.
Further, the processing system for the heterogeneous data joint query is characterized in that the data transmission and integration module comprises a data transmission sub-module and a data integration sub-module;
the data transmission submodule is used for adopting a data transmission mode according to the query requirement and the data volume;
the data transmission mode comprises batch transmission and streaming transmission;
the data integration submodule functions to handle large-scale data integration using a distributed computing framework or a data processing engine.
Further, the processing system for the heterogeneous data joint query is characterized in that the security and authority control module comprises a data source security sub-module and an authority control sub-module;
the data source security submodule is used for carrying out authority verification and identity verification on the query of the cross data source so as to ensure the security and source compliance of the data;
the permission control submodule is used for ensuring that a user can only access the data with permission in query processing, and the security and permission control mechanism of each data source is followed.
Further, the processing system for the heterogeneous data joint query is characterized in that the exception handling and fault tolerance mechanism module comprises an exception handling machine sub-module and a fault tolerance machine sub-module;
the abnormal handling mechanism submodule is used for capturing and handling abnormal situations of connection failure and inconsistent data;
the fault tolerance mechanism submodule functions include retry connection, data recovery or rollback operation.
A processing method for heterogeneous data joint query comprises the following steps:
s1: determining the requirement of the joint query, and combining the data in the databases into a result set;
s2: storing the URLs and parameter information of a plurality of databases in a configuration file, and carrying out parameter transfer for executing joint inquiry;
s3: selecting one of the databases to execute SQL query, and extracting a required result set;
s4: exporting the result set data into a Markdown file, and analyzing by using an online Markdown analyzer;
s5: selecting another database to execute the same SQL query, and reading the required data;
s6: combining the read data with the derived result set, and updating the data in the MarkDown file;
s7: the updated Markdown file is uploaded to a server, and a user reads the Markdown file locally.
The invention has the beneficial effects that: by the processing system and the processing method for the heterogeneous data joint query, the efficiency and the accuracy of data processing are improved, and better data query and analysis experience is provided for users through the advantages of data integration, query flexibility improvement, resource optimization, unified query interface, real-time query and analysis, expansibility and compatibility and the like.
Drawings
Fig. 1 is a system configuration diagram of the present invention.
Fig. 2 is a system service architecture layer of the present invention.
Fig. 3 is a flow chart of the method of the present invention.
Detailed Description
For a clearer understanding of technical features, objects, and effects of the present invention, a specific embodiment of the present invention will be described with reference to the accompanying drawings.
As shown in fig. 1, a processing system for heterogeneous data joint query comprises a data source connection and access module, a data model and grammar difference module, a query optimization and performance module, a data transmission and integration module, a security and authority control module, and an exception handling and fault tolerance mechanism module:
the data source connection and access module is used for connecting programs by using a database and configuring parameters for the database;
the data model and grammar difference module uses different data models and query grammars for different database engines and clusters;
the query optimization and performance module functions are to analyze query sentences and generate a query plan by using a query analysis and optimization technology;
the data transmission and integration module is used for carrying out data transmission, combination and integration among different data sources;
the security and authority control module is used for ensuring the security of data and perfecting the authority control mechanism of each data source when the cross-data source inquiry is carried out;
the exception handling and fault tolerance mechanism module functions to implement exception handling and fault tolerance mechanisms, ensuring correct execution of queries and results.
Because the heterogeneous data are usually stored in different database engines and clusters and have different connection and access modes, the technical problem is how to establish connection with a heterogeneous data source, and can effectively acquire the data and execute query operation, and therefore, the data source connection and access module comprises a data source connection sub-module and a data source access sub-module;
the data source connection submodule is used for supporting connection of different database engines and clusters by using a database connection driver or an API;
the data source access submodule is used for configuring connection parameters and authority verification information for each data source and realizing safe access to data.
Because different database engines and clusters have different data models and query grammars, the technical problem is how to process and unify the data models and query grammar differences among different data sources, so that the data models and the grammar difference modules can perform joint query, and comprise a data model sub-module and a grammar difference sub-module;
the data model submodule functions are used for establishing a unified data model, mapping data of different data sources into the unified model and eliminating data model differences;
the grammar difference submodule functions are used for developing a cross-data-source query grammar converter, converting query sentences from one grammar to another grammar and realizing the requirements of different database engines.
Because the heterogeneous data joint query can relate to a plurality of data sources and complex query operation, the technical problem is how to perform query optimization, select a proper query plan, and utilize the advantages of each data source to improve the query performance and the execution efficiency, wherein the query optimization and performance module comprises a query performance improving sub-module and a query accelerating operation sub-module;
the function of the sub-module for improving the query performance is to select the best data source and query plan by utilizing the characteristic and index information of each data source;
the accelerated query operation submodule is used for distributing the query task to each data source and executing the query task in parallel.
Because the heterogeneous data joint query may need to perform data transmission, merging and integration between different data sources, the technical problem is how to effectively process data transmission and integration operation, reduce network delay and data processing overhead, and the data transmission and integration module comprises a data transmission sub-module and a data integration sub-module;
the data transmission submodule is used for adopting a data transmission mode according to the query requirement and the data volume;
the data transmission mode comprises batch transmission and streaming transmission;
the data integration submodule functions to handle large-scale data integration using a distributed computing framework or a data processing engine.
Because the heterogeneous data joint query may relate to the security and authority control of a plurality of data sources, the technical problem is how to ensure the security and compliance of data when the cross-data-source query is performed, and the authority control mechanism of each data source can be correctly applied, wherein the security and authority control module comprises a data source security sub-module and an authority control sub-module;
the data source security submodule is used for carrying out authority verification and identity verification on the query of the cross data source so as to ensure the security and source compliance of the data;
the permission control submodule is used for ensuring that a user can only access the data with permission in query processing, and the security and permission control mechanism of each data source is followed.
Due to the complexity of the heterogeneous data, various abnormal conditions, such as connection failure, inconsistent data and the like, may occur in the query process, and the technical problem is how to design and implement an abnormal processing and fault tolerant mechanism so as to ensure the correct execution of the query and the accuracy of the result, wherein the abnormal processing and fault tolerant mechanism module comprises an abnormal processing machine sub-module and a fault tolerant machine sub-module;
the abnormal handling mechanism submodule is used for capturing and handling abnormal situations of connection failure and inconsistent data;
the fault tolerance mechanism submodule functions include retry connection, data recovery or rollback operation.
As shown in fig. 2, the heterogeneous database query processing layer is responsible for sending a query request, acquiring data from a plurality of clusters after receiving the query request, and merging the query results into a final result;
the query result storage layer is responsible for storing query results into different data stores, and ensuring consistency and availability of the query results by using a caching technology;
the inquiry result inquiry engine layer is responsible for receiving the inquiry result through the middleware, carrying out service logic processing according to the inquiry result and finally outputting the result;
the final output result layer is responsible for outputting the results using the Markdown parsing engine.
As shown in fig. 3, a processing method for heterogeneous data joint query comprises the following steps:
s1: determining the requirement of the joint query, and combining the data in the databases into a result set;
s2: storing the URLs and parameter information of a plurality of databases in a configuration file, and carrying out parameter transfer for executing joint inquiry;
s3: selecting one of the databases to execute SQL query, and extracting a required result set;
s4: exporting the result set data into a Markdown file, and analyzing by using an online Markdown analyzer;
s5: selecting another database to execute the same SQL query, and reading the required data;
s6: combining the read data with the derived result set, and updating the data in the MarkDown file;
s7: the updated Markdown file is uploaded to a server, and a user reads the Markdown file locally.
The specific core codes are as follows:
public Pair<List<String>, List<List<String>>>execute(ExecuteSqlDTO dto) {
JdbcBase jdbcBase = dto.getJdbcBase();
JdbcConfig jdbcConfig = jdbcBase.getJdbcConfig();
String sqlContent = dto.getSql();
String key = dto.getKey();
String type = dto.getType();
Integer count = dto.getMaxCount();
FunctionTypeEnum functionType = dto.getFunctionType();
Pair<List<String>, List<List<String>>>columnAndResultPair = null;
Connection connection = null;
Statement statement = null;
ResultSet resultSet = null;boolean hasResultSet = false;
try {
String[] subSqlArr = sqlContent.split(";");
log.info("JdbcBaseBiz.execute.subSqlArr:{},",
JsonUtils.toJson(subSqlArr));
sqlrunodis manager. Addlog (key, "start connection database … …",
SqlLogLevelEnum.WARNING.getCode(), false);
data source for/(and/or acquisition)
connection = getConnection(jdbcBase, type);
sqlrunodis manager. Addlog (key, "get to database connection … …",
SqlLogLevelEnum.WARNING.getCode(), false);
statement = connection.createStatement();
if (StringUtils.isNotEmpty(jdbcConfig.getSchema())) {
setting oracle schema
statement.execute(SQL_SENTENCE_JOIN + jdbcConfig.getSchema());
}
statement.setMaxRows(count);
if (statement instanceof HiveStatement) {
sqlStatementMonitor.monitoringStatement(statement, key);
}
for (String sql : subSqlArr) {
if (StringUtils.isNotEmpty(sql.trim())) {
sql = SqlCheckUtil.deleteNote(sql);
if (StringUtils.isEmpty(sql.trim())) {
continue;
}
sqlrunedis manger.addlog (key, "start execution of sql:" +sql,
SqlLogLevelEnum.INFO.getCode(), false);
hasResultSet =
statement.execute(StringUtils.trimWhitespace(sql));
}
}
if (hasResultSet&&Objects.nonNull(resultSetHandler)) {
resultSet = statement.getResultSet();
results of the/(and/or processing)
columnAndResultPair =
resultSetHandler.handleResultSet(functionType, resultSet, key);
} else {
int updateCount = statement.getUpdateCount();
log.error ("jdbcbasebiz. Execution. Updatecount affects the number of lines: { }",
updateCount);
}
} catch (Exception e) {
log.error("JdbcBaseBiz.execute.e", e);
sqlreducismanger.addlog (key, "execution error:" +e getmessage (),
SqlLogLevelEnum.ERROR.getCode(), true);
buildUpDateHistoryQuery(key,
SqlQueryStatusEnum.EXCEPTION.getCode());
} finally {
sqlnrudredismanger.addlog (key, "end of execution",
SqlLogLevelEnum.WARNING.getCode(), true, null, System.currentTimeMillis());
close(connection, statement, resultSet, jdbcBase);
}
return columnAndResultPair;
}。
the processing system and the processing method for the heterogeneous data joint query have the advantages that:
(1) Data integration and consistency: and the data integration and consistency are realized by jointly inquiring the data of different data sources. The organization can acquire the required information from a plurality of data sources without copying data or performing complicated data conversion operation, so that the problems of data redundancy and inconsistency are reduced;
(2) The query flexibility is improved: the heterogeneous data joint query method allows the user to execute query operation in different database engines and clusters, provides greater flexibility and freedom, enables the user to span different data sources, obtains a more comprehensive data view, and performs more complex query and analysis;
(3) Resource optimization and performance improvement: by means of query optimization and parallel query execution technology, the heterogeneous data combined query method can utilize advantages of all data sources to the greatest extent, query performance and execution efficiency are improved, meanwhile, overhead of data copying and integrating processes is avoided, and storage space and network bandwidth are saved;
(4) Unified query interface and grammar: the heterogeneous data joint query method provides a unified query interface and grammar, so that a user does not need to learn and adapt to a plurality of database engines and query grammars, and the user accesses and queries heterogeneous data sources through a unified interface by using familiar query sentences, thereby simplifying the complexity of development and query;
(5) Real-time data query and analysis: the heterogeneous data combined query method supports real-time query and analysis, does not need batch data extraction and conversion, and a user can acquire the latest data immediately and conduct real-time data analysis and decision making, so that the service response speed and accuracy are improved;
(6) Extensibility and compatibility: the heterogeneous data combined query method has good expansibility and compatibility, integrates with different database engines and clusters, and can integrate through proper interfaces and adapters no matter a relational database, a NoSQL database or other types of data storage systems.
The foregoing has shown and described the basic principles and main features of the present invention and the advantages of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (8)

1. The processing system for the heterogeneous data joint query is characterized by comprising a data source connection and access module, a data model and grammar difference module, a query optimization and performance module, a data transmission and integration module, a security and authority control module and an exception handling and fault tolerance mechanism module:
the data source connection and access module is used for connecting programs by using a database and configuring parameters for the database;
the data model and grammar difference module uses different data models and query grammars for different database engines and clusters;
the query optimization and performance module functions are to analyze query sentences and generate a query plan by using a query analysis and optimization technology;
the data transmission and integration module is used for carrying out data transmission, combination and integration among different data sources;
the security and authority control module is used for ensuring the security of data and perfecting the authority control mechanism of each data source when the cross-data source inquiry is carried out;
the exception handling and fault tolerance mechanism module functions to implement exception handling and fault tolerance mechanisms, ensuring correct execution of queries and results.
2. The system of claim 1, wherein the data source connection and access module comprises a data source connection sub-module and a data source access sub-module;
the data source connection submodule is used for supporting connection of different database engines and clusters by using a database connection driver or an API;
the data source access submodule is used for configuring connection parameters and authority verification information for each data source and realizing safe access to data.
3. The system for processing heterogeneous data joint queries according to claim 1, wherein the data model and syntax difference module comprises a data model sub-module and a syntax difference sub-module;
the data model submodule functions are used for establishing a unified data model, mapping data of different data sources into the unified model and eliminating data model differences;
the grammar difference submodule functions are used for developing a cross-data-source query grammar converter, converting query sentences from one grammar to another grammar and realizing the requirements of different database engines.
4. The heterogeneous data joint query processing system of claim 1, wherein the query optimization and performance module comprises a query performance improvement sub-module and an accelerated query operation sub-module;
the function of the sub-module for improving the query performance is to select the best data source and query plan by utilizing the characteristic and index information of each data source;
the accelerated query operation submodule is used for distributing the query task to each data source and executing the query task in parallel.
5. The system for processing heterogeneous data joint queries according to claim 1, wherein the data transmission and integration modules comprise a data transmission sub-module and a data integration sub-module;
the data transmission submodule is used for adopting a data transmission mode according to the query requirement and the data volume;
the data transmission mode comprises batch transmission and streaming transmission;
the data integration submodule functions to handle large-scale data integration using a distributed computing framework or a data processing engine.
6. The system for processing heterogeneous data joint queries according to claim 1, wherein the security and entitlement control module comprises a data source security sub-module and an entitlement control sub-module;
the data source security submodule is used for carrying out authority verification and identity verification on the query of the cross data source so as to ensure the security and source compliance of the data;
the permission control submodule is used for ensuring that a user can only access the data with permission in query processing, and the security and permission control mechanism of each data source is followed.
7. The system of claim 1, wherein the exception handling and fault tolerance mechanism module comprises an exception handling machine sub-module, a fault tolerance machine sub-module;
the abnormal handling mechanism submodule is used for capturing and handling abnormal situations of connection failure and inconsistent data;
the fault tolerance mechanism submodule functions include retry connection, data recovery or rollback operation.
8. The method for processing the heterogeneous data joint query is realized based on the processing system for processing the heterogeneous data joint query according to any one of claims 1-7, and is characterized by comprising the following steps:
s1: determining the requirement of the joint query, and combining the data in the databases into a result set;
s2: storing the URLs and parameter information of a plurality of databases in a configuration file, and carrying out parameter transfer for executing joint inquiry; s3: selecting one of the databases to execute SQL query, and extracting a required result set;
s4: exporting the result set data into a Markdown file, and analyzing by using an online Markdown analyzer;
s5: selecting another database to execute the same SQL query, and reading the required data;
s6: combining the read data with the derived result set, and updating the data in the MarkDown file;
s7: the updated Markdown file is uploaded to a server, and a user reads the Markdown file locally.
CN202311320529.7A 2023-10-12 2023-10-12 Processing system and method for heterogeneous data joint query Pending CN117493429A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311320529.7A CN117493429A (en) 2023-10-12 2023-10-12 Processing system and method for heterogeneous data joint query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311320529.7A CN117493429A (en) 2023-10-12 2023-10-12 Processing system and method for heterogeneous data joint query

Publications (1)

Publication Number Publication Date
CN117493429A true CN117493429A (en) 2024-02-02

Family

ID=89683802

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311320529.7A Pending CN117493429A (en) 2023-10-12 2023-10-12 Processing system and method for heterogeneous data joint query

Country Status (1)

Country Link
CN (1) CN117493429A (en)

Similar Documents

Publication Publication Date Title
US8321450B2 (en) Standardized database connectivity support for an event processing server in an embedded context
CN110032575A (en) Data query method, apparatus, equipment and storage medium
US9489325B2 (en) Method and a system for polling and processing data
US8504650B2 (en) Methods and systems for exchanging data between a command and control information system and an enterprise resource planning system
EP3513317A1 (en) Data serialization in a distributed event processing system
CN109656963B (en) Metadata acquisition method, apparatus, device and computer readable storage medium
CN111324610A (en) Data synchronization method and device
US11874875B2 (en) Graph processing system
CN114039792B (en) Data access authority control method, device, equipment and readable storage medium
EP3958126A1 (en) Micro-service component-based database system and related method
CN104199978A (en) System and method for realizing metadata cache and analysis based on NoSQL and method
CN113032421A (en) MongoDB-based distributed transaction processing system and method
CN112487075B (en) Method for integrating relational database data conversion operators and non-relational database data conversion operators
CN112688802B (en) High-performance exchange middleware based on API gateway
CN117608553A (en) Multi-data-source-based configurable basic data synchronization method, device and equipment
CN113761079A (en) Data access method, system and storage medium
CN113761016A (en) Data query method, device, equipment and storage medium
CN117493429A (en) Processing system and method for heterogeneous data joint query
CN112800064B (en) Real-time big data application development method and system based on Confluent community open source version
CN115982278A (en) Self-service real-time data comparison method and system based on MPP database
CN115905313A (en) MySQL big table association query system and method
CN112083914A (en) Method and system for realizing soft bus of object model embedded operating system
CN114490842B (en) Interface data query method and data query engine for multi-source data
CN111459965B (en) Information monitoring system for edge calculation
CN118193499A (en) Device, method and system for heterogeneous database full migration

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination