CN115292348A

CN115292348A - Database processing method and system, electronic equipment and storage medium

Info

Publication number: CN115292348A
Application number: CN202210887052.XA
Authority: CN
Inventors: 田地; 赵化臣
Original assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Current assignee: Gree Electric Appliances Inc of Zhuhai; Zhuhai Lianyun Technology Co Ltd
Priority date: 2022-07-26
Filing date: 2022-07-26
Publication date: 2022-11-04

Abstract

The application provides a database processing method, a database processing system, electronic equipment and a storage medium, which belong to the field of big data query, and the method comprises the following steps: receiving a Structured Query Language (SQL) and acquiring a database list and data information of the SQL; judging whether the SQL has a cross-database operation condition; if the SQL has cross-database operation, an ETL tool is selected based on the database list of the SQL, and an execution engine is selected based on the data information of the SQL; translating the SQL into an execution plan corresponding to the execution engine with the selected ETL tool; and executing the tasks based on the execution plan and feeding back the execution condition. The system comprises: the system comprises an SQL receiving module, a cross-library judging module, a cross-library selecting module, an SQL translation module and an execution feedback module. According to the method and the system, various data processing works are analyzed for task execution only through standard SQL query statements, and differences of bottom-layer execution engines are shielded.

Description

Database processing method and system, electronic equipment and storage medium

Technical Field

The application belongs to the field of big data query, and particularly relates to a database processing method, a database processing system, electronic equipment and a storage medium.

Background

In the age of current data explosion growth, how to make reasonable use of large data becomes crucial. Accordingly, more and more data storage schemes and data query engines are therefore generated to cope with the data usage demands in various scenarios.

Data warehousing works, i.e., works that uniformly organize and use data from a variety of sources. Most of the data warehouse work belongs to ETL work, namely data extraction, cleaning conversion and storage. SQL is usually used directly to perform data warehouse work in the same database, but for some data warehouse work across databases, SQL cannot be used, and some tools such as Kettle, sqoop and the like are used. The tools cannot use SQL, the usage methods need to be learned independently, and the use cost is high.

In the prior art, an SQL query statement is submitted, a white list is checked, and a result set is returned by inputting database information. According to the scheme, a bastion machine is not used, and a white list is verified in a unified mode to submit the SQL query statement. However, the scheme does not relate to automatic selection of multiple execution schemes, lacks data link monitoring, and does not relate to data blood-source tracking, so that the scheme cannot adapt to data warehouse work of a large data scene.

The problems in the prior art are as follows: in the cross-database type ETL work, SQL cannot be used for data processing; in data processing, different tools can achieve the same effect in many cases, and developers generally select the most familiar mode, which may cause resource use imbalance; different databases and data tools have different entries, which causes difficulties in managing metadata, statistics of jobs and management and control of resources.

Disclosure of Invention

Based on the above technical problem, the present application provides a database processing method, a database processing system, an electronic device, and a storage medium.

In a first aspect, the present application provides a database processing method, including the following steps:

receiving a Structured Query Language (SQL) and acquiring a database list and data information of the SQL;

judging whether the SQL has cross-database operation condition;

if the SQL has cross-database operation, an ETL tool is selected based on the database list of the SQL, and an execution engine is selected based on the data information of the SQL;

translating the SQL into an execution plan corresponding to the execution engine by adopting the selected ETL tool;

and executing the tasks based on the execution plan and feeding back the execution condition.

The data information includes at least one of a type of data source and a data destination.

The judging whether the SQL has cross-database operation condition comprises the following steps:

and acquiring the type of the SQL, and judging whether cross-database insertion exists or not if the type of the SQL is insertion.

The step of selecting an execution engine based on the SQL data information comprises the following steps: judging whether the current resources are sufficient; if the current resources are sufficient, selecting an execution engine according to a first rule; otherwise, the execution engine is selected according to the second rule.

The first rule is that the historical success rate is highest or the historical execution speed is fastest; the second rule is a rule with minimum occupied corresponding resources.

And if the SQL does not have cross-database query, translating the SQL into the SQL which accords with the corresponding database syntax.

Before the step of translating the SQL into the execution plan corresponding to the execution engine by using the selected ETL tool, the method further includes: performing permission verification on the database corresponding to the SQL, wherein if the permission verification is not passed, the SQL is not allowed to be executed; if the permission check is passed, the SQL is allowed to be executed.

After the task is executed and the execution situation is fed back based on the execution plan, the method further comprises the following steps: and recording the blood relationship between the SQL execution condition and the database table related to the SQL.

The recording of the blood relationship of the database table to which the SQL relates comprises: and calling metadata of each database, and recording the relation between the database table and the table in the metadata.

The database processing method further comprises the following steps: if the execution condition is successful, finishing the SQL query and waiting for the next SQL query; if the execution condition is failure, removing the execution engine selection scheme with execution failure, sorting the rest execution engine selection schemes according to a first rule or a second rule, selecting the optimal execution engine to execute the query again, and if the failure times reach the failure time threshold, exiting the SQL query.

In a second aspect, the present application provides a database processing system, comprising: the system comprises an SQL receiving module, a cross-library judging module, a cross-library selecting module, an SQL translation module and an execution feedback module;

the SQL receiving module, the cross-library judging module, the cross-library selecting module, the SQL translation module and the execution feedback module are sequentially connected;

the SQL receiving module is used for receiving a Structured Query Language (SQL) and acquiring a database list and data information of the SQL;

the cross-database judging module is used for judging whether the SQL has a cross-database operation condition;

the cross-database selection module is used for selecting an ETL tool based on the database list of the SQL and selecting an execution engine based on the data information of the SQL if the SQL has cross-database operation;

the SQL translation module is used for translating the SQL into an execution plan corresponding to the execution engine by adopting the selected ETL tool;

the execution feedback module is used for executing tasks and feeding back execution conditions based on the execution plan.

Before the feedback execution module, a permission inspection module is also included, which is respectively connected with the SQL translation module and the feedback execution module, and is used for performing permission inspection on the database corresponding to the SQL, and if the permission inspection is not passed, the SQL is not allowed to be executed; if the permission check is passed, the SQL is allowed to be executed.

And a recording module connected with the execution feedback module and used for recording the SQL execution condition and the blood relationship of the database table related to the SQL is further included after the execution feedback module.

The cross-library selection module comprises: an ETL selection unit and an execution engine selection unit;

the ETL selection unit is used for selecting an ETL tool based on the SQL database list;

the execution engine selection unit is used for selecting an execution engine based on the SQL data information, and specifically comprises the following steps: judging whether the current resources are sufficient; if the current resources are sufficient, selecting an execution engine according to a first rule; otherwise, the execution engine is selected according to the second rule.

In a third aspect, the present application provides an electronic device, comprising: one or more processors, and a memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform a database processing method as described above.

In a fourth aspect, the present application proposes a storage medium storing executable instructions that, when executed, cause a machine to perform a database processing method as described above.

The beneficial technical effects are as follows:

the method and the device reduce the learning cost of the ETL tool, and enable developers to finish most data processing work through standard SQL query statements.

All SQL operations can be recorded through the unified entry, and a complete metadata management system of a data warehouse can be established by combining metadata of each database, and data bloodlines and data links are monitored.

Through the unified entry and the self-adaptive analysis, the platform can automatically select the data processing scheme most suitable for the current situation by combining the current resource use situation.

According to the method and the system, various data processing works are submitted only through standard SQL query statements, the background completes analysis on task execution, and differences of bottom-layer execution engines are shielded.

Drawings

Fig. 1 is a flowchart of a database processing method according to an embodiment of the present application;

FIG. 2 is a flow chart of example 1 of the present application;

FIG. 3 is a flow chart of example 2 of the present application;

fig. 4 is a schematic block diagram of a database processing system according to an embodiment of the present application.

Detailed Description

The disclosure will be further described with reference to the embodiments shown in the drawings.

The application provides a database processing method, a database processing system, electronic equipment and a storage medium.

The principle of the database processing method provided by the application is as follows: the work of the data warehouse is often called ETL (Extract-Transform-Load, i.e. the process of data extraction, transformation, and loading), and the basic flow is as follows: reading data, processing the data, and finally writing the data. The read operation is a select operation in SQL (Structured Query Language), the write data is an insert operation in SQL, and most data processing can be represented by various functions in SQL such as substr (representing a string or string expression to be intercepted), trim (representing removal of a header or a trailer from a string), and the like. That is, most of the operations on the data can be expressed by SQL, which is the theoretical basis of the unified SQL database processing method. After developers use SQL to express various operations of data, two situations exist, a read source and a write purpose are in the same database, most databases support the same SQL standard and have slight differences, and only SQL submitted by users needs to be slightly changed into SQL suitable for the syntax of the database according to the types of the databases. For data operations across database categories, SQL is translated into corresponding execution schemes according to executable ETL tools, and the source, destination, required fields, etc. are obtained through submitted SQL. And (4) translating the SQL into an execution scheme corresponding to the ETL tool to be submitted.

The metadata is actually to record which tables each library has, which fields each table has, and what types the fields are. The database is recorded with metadata, but the data warehouse is related to a plurality of databases, which requires that a metadata system actively pull the metadata of each database for aggregation. The data lineage records the relationships between tables, and more precisely how the various fields of a table are obtained, which fields of which tables are processed. These are accessed uniformly to obtain a complete data bloodline.

Through the description of the principle, the method and the device solve the problem that the data processing cannot be performed by using SQL in the ETL work of the cross-database type.

In a first aspect, the present application provides a database processing method, as shown in fig. 1, including the following steps:

step S1: receiving a Structured Query Language (SQL) and acquiring a database list and data information of the SQL;

optionally, the SQL query statement that the data information is in the unified standard query format includes: at least one of a database definition operation, a database data operation, a data source, and a data destination.

Optionally, the database definition operation includes: creating a table, deleting the table and modifying the table; the database data operations include: at least one of insert data, delete data, update data, query data.

The data information includes at least one of a type of the data source and a data destination. The SQL comprises a database list, wherein the database list is in a format of library name + table name, and optionally, the database list used in the SQL query statement in the unified standard query format is in a format of database type name + library name + table name.

Step S2: judging whether the SQL has cross-database operation condition;

And step S3: if the SQL has cross-database operation, an ETL tool is selected based on the database list of the SQL, and an execution engine is selected based on the data information of the SQL;

the step of selecting an execution engine based on the SQL data information, that is, selecting an execution engine according to the types of the data source and the data destination, includes: judging whether the current resources are sufficient; if the current resources are sufficient, selecting an execution engine according to a first rule; otherwise, the execution engine is selected according to the second rule.

The ETL tool comprises: oracle, mysql, hive, hbase, impala, es, redis, kettle, sqoop, etc.

And step S4: translating the SQL into an execution plan corresponding to the execution engine by adopting the selected ETL tool;

step S5: and executing the tasks based on the execution plan and feeding back the execution condition.

If the SQL does not have cross-database query, translating the SQL into the SQL which accords with the corresponding database grammar, executing tasks based on the SQL of the corresponding database grammar and feeding back the execution condition.

The execution case comprises the following steps: whether the execution was successful, resource overhead, execution duration, etc.

Before the step of translating the SQL into the execution plan corresponding to the execution engine by using the selected ETL tool, the method further includes: performing permission check on the database corresponding to the SQL, wherein if the permission check is not passed, the SQL is not allowed to be executed; if the permission check is passed, the SQL is allowed to be executed.

The database processing method further comprises the following steps: if the execution condition is successful, finishing the SQL query and waiting for the next SQL query; if the execution condition is failure, removing the execution engine selection scheme with execution failure, sorting the rest execution engine selection schemes according to the first rule, selecting the optimal execution engine to execute the query again, and if the failure times reach the failure time threshold, exiting the SQL query.

Example 1:

in this embodiment, when the database is operated, the situation is selected from multiple execution engines, as shown in fig. 2, the specific flow is as follows:

step S101: receiving SQL query statements in a unified standard query format, and acquiring database lists and data information of the SQL;

step S102: judging whether the SQL has cross-database operation condition;

step S103: if the SQL has cross-database operation, selecting an ETL tool based on the database list of the SQL, selecting an execution engine based on the data information of the SQL, and translating the SQL into an execution plan corresponding to the execution engine by adopting the selected ETL tool;

the database list based on the SQL selects an ETL tool, which is described in detail in this embodiment as follows: assuming there are N databases, then there may be N-1 operations across the databases. Predefining which ETL tools can be realized corresponding to each kind of cross-database operation, assigning a default priority through a test before online, and selecting the ETL tools based on the cross-database operation corresponding to the SQL database list when the cross-database operation is judged.

Step S104: if the SQL does not have cross-database query, translating the SQL into the SQL which accords with the corresponding database syntax;

step S105: performing permission check on the database corresponding to the SQL, wherein if the permission check is not passed, the SQL is not allowed to be executed; if the SQL passes the permission check, the SQL is allowed to be executed;

the permission check is described in detail as follows: if select is a read operation, the user has read right to the data to be read, insert is a write operation, the user has corresponding write right, and create is a table creation operation, which requires the corresponding database table creation right. The specific authority of each user is stored in the authority setting database, and the checking authority is the checking in the authority setting database.

Step S106: based on the execution plan or the SQL execution task of the corresponding database grammar, feeding back the execution condition;

step S107: recording the blood relationship between the SQL execution condition and the database table related to the SQL;

step S108: judging the execution situation to finish the corresponding operation, specifically comprising: if the execution condition is successful, finishing the SQL query and waiting for the next SQL query; if the execution condition is failure, removing the execution engine selection scheme with execution failure, sorting the remaining execution engine selection schemes according to the first rule or the second rule, selecting the optimal execution engine to execute the query again, and if the failure times reach the failure time threshold, exiting the SQL query.

The current resource usage is obtained by monitoring the CPU (Central Processing Unit) and memory usage of the server. If the current remaining resources exceed a certain threshold (typically set to 30%) and no tasks are waiting for resources to execute, then the current resources are deemed sufficient.

The first rule is that the historical success rate is highest or the historical execution speed is fastest; the second rule is a rule with minimum occupied resources.

The corresponding resource occupation is estimated as follows: based on the estimation of the operation data volume, the complexity of the SQL and the historical resource use conditions of various SQL execution schemes, the estimation method comprises the following steps: the data volume is used as a base number to multiply the SQL complexity as a coefficient, and then the deviation is used for calibrating according to the historical resource use condition of various SQL execution schemes.

Example 2:

for some special databases, the same database also needs to be selected among multiple execution engines, as shown in fig. 3, which is described in detail as follows:

step S201: receiving SQL query statements in a unified standard query format, and acquiring database lists and data information of the SQL;

step S202: judging whether the SQL has a cross-database operation condition;

step S203: if the SQL has cross-database operation, selecting an ETL tool based on the database list of the SQL, selecting an execution engine based on the data information of the SQL, and translating the SQL into an execution plan corresponding to the execution engine by adopting the selected ETL tool;

step S204: if the SQL does not have cross-database query, translating the SQL into the SQL which accords with the corresponding database grammar, and selecting an SQL execution engine for executing the corresponding database grammar;

step S205: performing permission check on the database corresponding to the SQL, wherein if the permission check is not passed, the SQL is not allowed to be executed; if the SQL passes the permission check, the SQL is allowed to be executed;

step S206: based on the execution plan or the SQL of the corresponding database grammar, executing tasks and feeding back execution conditions;

step S207: recording the blood relationship between the SQL execution condition and the database table related to the SQL;

step S208: judging the execution condition to finish the corresponding operation, specifically comprising: if the execution condition is successful, finishing the SQL query and waiting for the next SQL query; if the execution condition is failure, removing the execution engine selection scheme with execution failure, sorting the remaining execution engine selection schemes according to the first rule or the second rule, selecting the optimal execution engine to execute the query again, and if the failure times reach the failure time threshold, exiting the SQL query.

The step of selecting the SQL execution engine for executing the corresponding database syntax, that is, selecting the execution engine according to the types of the data source and the data destination, includes: judging whether the current resources are sufficient; if the current resources are sufficient, selecting an execution engine according to a first rule; otherwise, the execution engine is selected according to the second rule.

In a second aspect, the present application provides a database processing system, as shown in fig. 4, including: the system comprises an SQL receiving module, a cross-library judging module, a cross-library selecting module, an SQL translating module and an execution feedback module;

Before the execution feedback module, a permission check module is also included, which is respectively connected with the SQL translation module and the execution feedback module, and is used for performing permission check on the database corresponding to the SQL, and if the permission check is not passed, the SQL is not allowed to be executed; if the permission check is passed, the SQL is allowed to be executed.

The electronic device may be a mobile phone, a computer, a tablet computer, or the like, and includes a memory and a processor, where the memory stores thereon a computer program, and the computer program implements the database processing method as described in the embodiment when executed by the processor. It is to be appreciated that the electronic device can also include input/output (I/O) interfaces, as well as communication components.

Wherein, the processor is used for executing all or part of the steps in the database processing method in the embodiment. The memory is used to store various types of data, which may include, for example, instructions for any application or method in the electronic device, as well as application-related data.

The Processor may be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a controller, a microcontroller, a microprocessor, or other electronic components, and is configured to execute the database processing method in the foregoing embodiments.

The Memory may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically Erasable Programmable Read-Only Memory (EEPROM), erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk.

Each functional unit in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium.

Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention.

And the aforementioned storage medium includes: flash Memory, hard disks, multimedia cards, card-type memories (e.g., SD (Secure Digital Memory Card) or DX (Memory Data Register, MDR abbreviation), random Access Memories (RAM), static Random Access Memories (SRAM), read Only Memories (ROM), electrically Erasable Programmable Read Only Memories (EEPROM), programmable Read Only Memories (PROM), magnetic memories, magnetic disks, optical disks, servers, APP (Application), application stores, and the like, which store various media that can store program check codes, and on which computer programs, when executed by a processor, can implement the steps of the SQL database processing method described above, are stored.

The embodiments in the disclosure are described in a progressive manner, and the same and similar parts among the embodiments can be referred to each other, and each embodiment focuses on the differences from the other embodiments.

The scope of the present disclosure is not limited to the above-described embodiments, and it is apparent that various modifications and variations can be made to the present disclosure by those skilled in the art without departing from the scope and spirit of the present disclosure. It is intended that the present disclosure also cover the modifications and variations of this invention provided they come within the scope of the appended claims and their equivalents.

Claims

1. A database processing method is characterized by comprising the following steps:

judging whether the SQL has cross-database operation condition;

2. The database processing method of claim 1, wherein the data information comprises at least one of a type of data source and data destination.

3. The database processing method according to claim 1, wherein said determining whether there is a cross-database operation condition in the SQL comprises:

and acquiring the type of the SQL, and judging whether the SQL is inserted across databases if the type of the SQL is inserted.

4. The method of claim 1, wherein the step of selecting an execution engine based on the SQL data information comprises: judging whether the current resources are sufficient; if the current resources are sufficient, selecting an execution engine according to a first rule; otherwise, the execution engine is selected according to the second rule.

5. The database processing method according to claim 4, wherein the first rule is that the historical success rate is highest or the historical execution speed is fastest; the second rule is a rule with minimum occupied corresponding resources.

6. The database processing method according to claim 1, further comprising: and if the SQL does not have cross-database query, translating the SQL into the SQL which accords with the corresponding database syntax.

7. The database processing method according to claim 1, further comprising, after the executing of the task based on the execution plan and the feedback of the execution situation, the steps of: and recording the blood relationship between the SQL execution condition and the database table related to the SQL.

8. A database processing system, comprising: the system comprises an SQL receiving module, a cross-library judging module, a cross-library selecting module, an SQL translation module and an execution feedback module;

9. An electronic device, comprising: one or more processors, and a memory storing instructions that, when executed by the one or more processors, cause the one or more processors to perform the database processing method of any of claims 1-7 above.

10. A storage medium storing executable instructions which, when executed, cause a machine to perform the database processing method of any one of claims 1 to 7.