CN113760966A - Data processing method and device based on heterogeneous database system - Google Patents

Data processing method and device based on heterogeneous database system Download PDF

Info

Publication number
CN113760966A
CN113760966A CN202010769291.6A CN202010769291A CN113760966A CN 113760966 A CN113760966 A CN 113760966A CN 202010769291 A CN202010769291 A CN 202010769291A CN 113760966 A CN113760966 A CN 113760966A
Authority
CN
China
Prior art keywords
data
data table
query
database
processed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010769291.6A
Other languages
Chinese (zh)
Inventor
屠志强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN202010769291.6A priority Critical patent/CN113760966A/en
Publication of CN113760966A publication Critical patent/CN113760966A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing

Abstract

The invention discloses a data processing method and device based on a heterogeneous database system, and relates to the technical field of computers. One embodiment of the method comprises: acquiring a data table to be processed according to a data query log stored in a heterogeneous database system, and determining a target database corresponding to the data table to be processed; generating a data processing task according to the data table to be processed, the storage database corresponding to the data table to be processed and the target database; and executing the data processing task by utilizing the data table to be processed, the storage database and the target database based on a preset read-write scheduling rule. The implementation method can optimize the data query function of the heterogeneous database system, solves the technical problem that the optimization query needs to be performed by means of a manual intervention method in the prior art, and is good in user experience.

Description

Data processing method and device based on heterogeneous database system
Technical Field
The invention relates to the technical field of computers, in particular to a data processing method and device based on a heterogeneous database system.
Background
Databases are warehouses that organize, store, and manage data according to data structures, and are widely used in various aspects, from the simplest tables that store various data to large database systems that can store large amounts of data. The database technology is a core part of various information systems such as a management information system, an office automation system, a decision support system and the like, and is an important technical means for scientific research and decision management. Therefore, how to quickly search the needed data from the database has important significance.
The existing database query is a database appointed by query, the query performance and the storage cost are not considered, each database query is independent, the query does not change the data storage structure, after slow query occurs, the query needs to be optimized by means of a manual intervention method, and the user experience is poor.
Disclosure of Invention
In view of this, embodiments of the present invention provide a data processing method and apparatus based on a heterogeneous database system, which can optimize a data query function of the heterogeneous database system, solve the technical problem in the prior art that an optimization query needs to be performed by means of a manual intervention method, and provide good user experience.
To achieve the above object, according to a first aspect of the embodiments of the present invention, a data processing method based on a heterogeneous database system is provided.
The data processing method based on the heterogeneous database system comprises the following steps: acquiring a data table to be processed according to a data query log stored in a heterogeneous database system, and determining a target database corresponding to the data table to be processed; generating a data processing task according to the data table to be processed, the storage database corresponding to the data table to be processed and the target database; and executing the data processing task by utilizing the to-be-processed data table, the storage database and the target database based on a preset read-write scheduling rule.
Optionally, the obtaining a to-be-processed data table according to a data query log stored in the heterogeneous database system includes: inquiring the data inquiry log to obtain inquiry information corresponding to a data table stored in the heterogeneous database system, wherein the inquiry information comprises: inquiring average response time, inquiring response monitoring indexes, inquiring failure rate and inquiring frequency; if the query information corresponding to the data table meets the data synchronization condition, determining the data table as a first data table needing data synchronization; if the query information corresponding to the data table meets the data migration condition, determining the data table as a second data table needing data migration; wherein the first data table and the second data table are the data tables to be processed.
Optionally, the query information corresponding to the data table meeting the data synchronization condition includes at least one of the following options: the query average response time length corresponding to the data table is greater than the preset query average response time length, the query response index corresponding to the data table is greater than the preset query response index, the query failure rate corresponding to the data table is greater than the preset query failure rate, and the query frequency corresponding to the data table is greater than the first preset query frequency.
Optionally, the step of enabling the query information corresponding to the data table to satisfy the data migration condition includes: and the query frequency corresponding to the data table is less than a second preset query frequency.
Optionally, the determining the target database corresponding to the to-be-processed data table includes: determining a first target database needing to perform data synchronization on the first data table according to the corresponding relation between the query information and the database type and the query information corresponding to the first data table; based on the corresponding relationship between the database type and the storage cost, according to the storage information corresponding to the second data table, determining a second target database which needs to perform data migration on the second data table and migration data corresponding to the second data table, where the storage information includes: a storage database, a storage time range, and a storage cost.
Optionally, the executing the data processing task by using the to-be-processed data table, the storage database, and the target database includes: synchronizing the first data table from a first storage database corresponding to the first data table to the first target database according to a table structure corresponding to the first target database; and migrating migration data corresponding to the second data table from a second storage database corresponding to the second data table to the second target database according to the table structure corresponding to the second target database.
Optionally, after the data processing task is executed, the method further includes: performing data verification on the data table to be processed and a target data table corresponding to the data processing task; if the verification is passed, updating storage information in a data dictionary according to the data table to be processed and the target data table, and updating query information in the data dictionary by using the data query log; and if the verification fails, re-executing the data processing task.
Optionally, the method further comprises: receiving a data query request, and acquiring a reference corresponding to the data query request, wherein the reference comprises: a data table to be queried, query dimensions and query conditions; inquiring a database to be inquired corresponding to the data table to be inquired by using the storage information and the inquiry information in the data dictionary; generating a data query task corresponding to the database to be queried according to the data table to be queried, the query dimension and the query condition; and executing the data query task based on a preset read-write scheduling rule.
Optionally, the preset read-write scheduling rule includes at least one of the following options: under the condition that the data query task exists, the data query task is executed first, then the data processing task is executed, and under the condition that the data query task does not exist, the data processing task is executed directly; and under the condition that the data table to be processed corresponding to the data processing task is the same as the data table to be inquired corresponding to the data inquiry task, executing the data processing task first and then executing the data inquiry task.
Optionally, the method further comprises: generating a data query log corresponding to the data query request; and updating the data dictionary by using the generated data query log.
To achieve the above object, according to a second aspect of the embodiments of the present invention, a data processing apparatus based on a heterogeneous database system is provided.
The data processing device based on the heterogeneous database system in the embodiment of the invention comprises: the determining module is used for acquiring a data table to be processed according to a data query log stored in the heterogeneous database system and determining a target database corresponding to the data table to be processed; the generating module is used for generating a data processing task according to the to-be-processed data table, the storage database corresponding to the to-be-processed data table and the target database; and the execution module is used for executing the data processing task by utilizing the to-be-processed data table, the storage database and the target database based on a preset read-write scheduling rule.
Optionally, the determining module is further configured to: inquiring the data inquiry log to obtain inquiry information corresponding to a data table stored in the heterogeneous database system, wherein the inquiry information comprises: inquiring average response time, inquiring response monitoring indexes, inquiring failure rate and inquiring frequency; if the query information corresponding to the data table meets the data synchronization condition, determining the data table as a first data table needing data synchronization; if the query information corresponding to the data table meets the data migration condition, determining the data table as a second data table needing data migration; wherein the first data table and the second data table are the data tables to be processed.
Optionally, the query information corresponding to the data table meeting the data synchronization condition includes at least one of the following options: the query average response time length corresponding to the data table is greater than the preset query average response time length, the query response index corresponding to the data table is greater than the preset query response index, the query failure rate corresponding to the data table is greater than the preset query failure rate, and the query frequency corresponding to the data table is greater than the first preset query frequency.
Optionally, the query information corresponding to the data table meeting the data migration condition includes at least one of the following options: the query failure rate corresponding to the data table is greater than a second preset query failure rate, and the query frequency corresponding to the data table is less than a second preset query frequency.
Optionally, the determining module is further configured to: determining a first target database needing to perform data synchronization on the first data table according to the corresponding relation between the query information and the database type and the query information corresponding to the first data table; based on the corresponding relationship between the database type and the storage cost, according to the storage information corresponding to the second data table, determining a second target database which needs to perform data migration on the second data table and migration data corresponding to the second data table, where the storage information includes: a storage database, a storage time range, and a storage cost.
Optionally, the execution module is further configured to: synchronizing the first data table from a first storage database corresponding to the first data table to the first target database according to a table structure corresponding to the first target database; and migrating migration data corresponding to the second data table from a second storage database corresponding to the second data table to the second target database according to the table structure corresponding to the second target database.
Optionally, the execution module is further configured to: performing data verification on the data table to be processed and a target data table corresponding to the data processing task; if the verification is passed, updating storage information in a data dictionary according to the data table to be processed and the target data table, and updating query information in the data dictionary by using the data query log; and if the verification fails, re-executing the data processing task.
Optionally, the apparatus further comprises a query module configured to: receiving a data query request, and acquiring a reference corresponding to the data query request, wherein the reference comprises: a data table to be queried, query dimensions and query conditions; inquiring a database to be inquired corresponding to the data table to be inquired by using the storage information and the inquiry information in the data dictionary; generating a data query task corresponding to the database to be queried according to the data table to be queried, the query dimension and the query condition; and executing the data query task based on a preset read-write scheduling rule.
Optionally, the preset read-write scheduling rule includes at least one of the following options: under the condition that the data query task exists, the data query task is executed first, then the data processing task is executed, and under the condition that the data query task does not exist, the data processing task is executed directly; and under the condition that the data table to be processed corresponding to the data processing task is the same as the data table to be inquired corresponding to the data inquiry task, executing the data processing task first and then executing the data inquiry task.
Optionally, the query module is further configured to: generating a data query log corresponding to the data query request; and updating the data dictionary by using the generated data query log.
To achieve the above object, according to a third aspect of embodiments of the present invention, there is provided an electronic apparatus.
An electronic device of an embodiment of the present invention includes: one or more processors; and a storage device for storing one or more programs, which when executed by one or more processors, cause the one or more processors to implement the data processing method based on the heterogeneous database system according to the embodiment of the present invention.
To achieve the above object, according to a fourth aspect of embodiments of the present invention, there is provided a computer-readable medium.
A computer readable medium of an embodiment of the present invention stores thereon a computer program, and when the computer program is executed by a processor, the computer program implements a data processing method based on a heterogeneous database system of an embodiment of the present invention.
One embodiment of the above invention has the following advantages or benefits: the data processing method comprises the steps of obtaining a data table to be processed by inquiring a stored data inquiry log, then generating a data processing task by combining a storage database and a target database corresponding to the data table to be processed, and finally executing the data processing task based on a preset read-write scheduling rule.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main steps of a data processing method based on heterogeneous database systems according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a main flow of a data synchronization method based on a heterogeneous database system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a main flow of a data migration method based on a heterogeneous database system according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a main flow of a data query method based on a heterogeneous database system according to an embodiment of the present invention;
FIG. 5 is a block diagram of a heterogeneous database system according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of the main modules of a heterogeneous database system based data processing apparatus according to an embodiment of the present invention;
FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 8 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram of main steps of a data processing method based on a heterogeneous database system according to an embodiment of the present invention. As shown in fig. 1, the main steps of the data processing method based on the heterogeneous database system may include step S101 to step S103.
Step S101: acquiring a data table to be processed according to a data query log stored in a heterogeneous database system, and determining a target database corresponding to the data table to be processed;
the heterogeneous database system is a set of a plurality of related database systems, sharing and transparent access of data can be achieved, each database system already exists before being added into the heterogeneous database system, and the heterogeneous database system has management software of the database system. The heterogeneous database system of embodiments of the present invention may include different types of databases, such as MySql (i.e., a relational database management system), MongoDB (i.e., a distributed document storage database), HDFS (i.e., a distributed file system, suitable for large-scale datasets), HBase (i.e., a distributed, column-oriented, open source database), Redis (i.e., a key-value storage system), etc., where MySql belongs to a relational database, MongoDB and HDFS belong to a document type database, HBase belongs to a storage database, and redisphere belongs to a cache.
The heterogeneous database system of the embodiment of the invention can comprise various types of databases, and each database stores a data table. The user can access the heterogeneous database system, query the data table stored in the heterogeneous database system, acquire the required data information, and generate a query record of the data table, namely a data query log. In the embodiment of the invention, the query records of each data table in the heterogeneous database system can be analyzed, and whether each data table needs to be subjected to data synchronization or data migration is judged. Specifically, if the query performance of one data table is poor, for example, the query response time of the data table a1 is too long, it indicates that the data table a1 needs to perform data synchronization, so that the query response time of the data table a1 can be reduced, and the performance of the heterogeneous database system can be improved. If the storage cost of a data table is high, such as the storage cost of data table a2, data migration can be performed on data table a2, and the data can be stored by using a database with low storage cost. In the embodiment of the invention, the data table to be processed may include a data table requiring data synchronization and a data table requiring data migration, and the data table to be processed may be obtained through the data query log.
Step S102: and generating a data processing task according to the data table to be processed, the storage database corresponding to the data table to be processed and the target database.
After the data table to be processed is obtained, a target database corresponding to the data table to be processed is also required to be determined. The target database refers to a database that needs to perform data processing, for example, if the data table a1 needs to be synchronized with the database M10, the database M10 is the target database corresponding to the data table a1, or if the data table a2 needs to be migrated to the database M8, the database M8 is the target database corresponding to the data table a 2.
After the data table to be processed and the target database corresponding to the data table to be processed are obtained, the data processing task can be generated by combining the storage database corresponding to the data table to be processed. The data processing task may include a data synchronization task and a data migration task, where if the to-be-processed data table is a data table requiring data synchronization, the generated data processing task is the data synchronization task, and if the to-be-processed data table is a data table requiring data migration, the generated data processing task is the data migration task. In addition, the storage database corresponding to the to-be-processed data table refers to a database currently storing the to-be-processed data table, for example, if the storage database corresponding to the data table a1 is M1, it indicates that the data table a1 is stored in the current storage database M1.
Step S103: and executing the data processing task by utilizing the data table to be processed, the storage database and the target database based on a preset read-write scheduling rule.
After the data processing task is generated, the data processing task can be executed by using the data table to be processed, the storage database and the target database based on the preset read-write scheduling rule. For example, if a data synchronization task is generated according to the data table a1, the corresponding storage database M1, and the corresponding target database M10 that need to perform data synchronization, the data table a1 may be synchronized from the storage database M1 to the target database M10; if a data migration task is generated according to the data table a2, the corresponding storage database M2, and the corresponding target database M8 that need to be migrated, the data table a2 may be migrated from the storage database M2 to the target database M8.
For a heterogeneous database system, data synchronization is to synchronize a data table from one database to another database, and data migration is to migrate a data table from one database to another database, so that both data synchronization and data migration belong to write operations. It is contemplated that the heterogeneous database system may also provide data query functionality, and the data query is a reading of a data table from a database, which is a read operation. The read-write scheduling rules are preset, namely the execution sequence of data synchronization, data migration and data query is preset, so that read-write conflict can be avoided, and the heterogeneous database system can be guaranteed to provide good service.
In the embodiment of the present invention, the preset read-write scheduling rule may include at least one of the following options: under the condition that the data query task exists, the data query task is executed first, then the data processing task is executed, and under the condition that the data query task does not exist, the data processing task is executed directly; and under the condition that the data table to be processed corresponding to the data processing task is the same as the data table to be inquired corresponding to the data inquiry task, executing the data processing task first and then executing the data inquiry task.
Specifically, when there is a read operation, the write operation is not executed, that is, when there is a data query task, the data query task needs to be executed preferentially, and after the data query task is executed, the data processing task is executed again. For example, if there are the data query task D1, the data synchronization task D2, and the data migration task D3, the data query task D1 needs to be executed first, and then the data synchronization task D2 and the data migration task D3 need to be executed, and in addition, D2 and D3 may be executed simultaneously, or D2 may be executed first, and then D3 is executed, which is not limited by the comparison.
When executing the write operation, if there is a read operation at this time, if the write operation and the read operation correspond to the same data table, then the read operation is not executed first, if the write operation and the read operation correspond to different data tables, then the read operation can be executed, i.e. under the condition of executing the data processing task, if there is a data query task at this time, if the data processing task is the same as the data table corresponding to the data query task, then the data query task is not executed first, if the data processing task is different from the data table corresponding to the data query task, then the data query task can be executed. For example, when the data synchronization task D2 is executed, there is a data query task D1, and at this time, it is determined whether the data tables corresponding to D2 and D1 are the same, if so, it is necessary to execute D1 after the execution of D2 is finished, and if not, it is possible to execute D1. It is also contemplated that when there is a read operation, no write operation is performed, then execution of D2 may be suspended and D2 may be resumed after the end of the execution of D1.
According to the data processing technical scheme based on the heterogeneous database system, the data table to be processed is obtained by inquiring the stored data query log, then the data processing task is generated by combining the storage database corresponding to the data table to be processed and the target database, finally the data processing task can be executed based on the preset read-write scheduling rule, and the data processing can be automatically executed on the data table to be processed through the data query log, so that the data query function of the heterogeneous database system can be optimized, the technical problem that the data query needs to be optimized by means of a manual intervention method in the prior art is solved, and the user experience is good.
The data processing method based on the heterogeneous database system can automatically execute data processing on the data table to be processed, so that the acquisition of the data table to be processed is an important component. In a referential embodiment of the present invention, acquiring a to-be-processed data table according to a data query log stored in a heterogeneous database system may include:
step S1011, inquiring the data inquiry log, and acquiring inquiry information corresponding to a data table stored in the heterogeneous database system;
step S1012, if the query information corresponding to the data table satisfies the data synchronization condition, determining the data table as a first data table requiring data synchronization;
step S1013, if the query information corresponding to the data table satisfies the data migration condition, determining that the data table is a second data table requiring data migration.
In step S1011, the data query log is queried, and query information corresponding to each data table in the heterogeneous database system is obtained. Wherein, the query information may include: the query average response time, the query response monitoring index, the query failure rate and the query frequency. The query average response time refers to the average response time of the query data table, for example, the average response time of the query data table a1 in one day; the query response monitoring index may be TP99, TP90, etc., where TP is Top percentage, i.e., Top percentage, and the average and median are a category, which is a statistical term, TP99 is the minimum time consumption for ensuring that 99% of data query requests can be responded to, TP90 is the minimum time consumption for ensuring that 90% of data query requests can be responded to, for example, the query request response time of 100 data tables a1 is sorted from small to large, the response time arranged at the 99 th position is TP99, and the response time arranged at the 90 th position is TP 90; the query failure rate refers to a failure rate of a data query request, for example, if there are 100 query requests of the data table a1, where 85 requests can query the required data, the query failure rate is 15%; the query frequency refers to the number of times of querying the data table in unit time, such as the number of times of querying the data table a1 in one day.
After the query information of each data table is acquired, each data table can be analyzed to judge whether the data table is a data table to be processed. In this embodiment of the present invention, the to-be-processed data table may include a first data table requiring data synchronization and a second data table requiring data migration, and the following is a specific method for determining the to-be-processed data table.
And (I) if the query information corresponding to the data table meets the data synchronization condition, determining the data table as a first data table needing data synchronization.
In the embodiment of the present invention, the query information corresponding to the data table meeting the data synchronization condition may include at least one of the following options: (1) the average query response time length corresponding to the data table is longer than the preset average query response time length; (2) the query response index corresponding to the data table is larger than the preset query response index; (3) the query failure rate corresponding to the data table is greater than the preset query failure rate; (4) the query frequency corresponding to the data table is greater than the first preset query frequency. Analyzing each data table in the heterogeneous database system, and if one data table meets at least one of the conditions (1) to (4), indicating that the query performance corresponding to the data table is not good, so that the data table can be determined to be the first data table needing data synchronization.
The preset average query response time length may be set according to time or times. For example, if the preset average query response duration is set according to time, a certain time may be set as the preset average query response duration directly according to historical experience or specific services; if the preset average query response time length is set according to the times, the average query response time lengths of the data tables can be sorted from large to small, and N data tables with the top rank are selected as the data tables needing data synchronization. Correspondingly, the preset query response index, the preset query failure rate and the first preset query frequency can also be set according to time or times, and detailed description is not given in the embodiment of the invention.
And (II) if the query information corresponding to the data table meets the data migration condition, determining the data table as a second data table needing data migration. In the embodiment of the present invention, the condition that the query information corresponding to the data table satisfies the data migration condition may include: and the query frequency corresponding to the data table is less than the second preset query frequency. Analyzing each data table in the heterogeneous database system, and if the query frequency corresponding to one data table is less than a second preset query frequency, indicating that the query frequency corresponding to the data table is low, for example, if the data table records data before 1 year, the query frequency is low, and then migrating the data table to a database with low storage cost, so that the data table can be determined to be a second data table which needs data migration.
According to the data processing method based on the heterogeneous database system, the data table which needs data synchronization and data migration can be obtained directly by analyzing the query information of the data table, so that the data synchronization task and the data migration task can be executed, the effect of automatically optimizing the data table storage of the heterogeneous database system is achieved, and the query performance of the heterogeneous database system can be improved.
After the data table requiring data synchronization and the data table requiring data migration are acquired, the corresponding target database may be determined, which has been explained in step S102 above and will not be described again here. In a referential embodiment of the present invention, determining a target database corresponding to a to-be-processed data table may include: a first target database corresponding to the first data table is determined by step S1021, and a second target database corresponding to the second data table is determined by step S1022.
Step S1021: and determining a first target database needing to perform data synchronization on the first data table according to the corresponding relation between the query information and the database type and the query information corresponding to the first data table.
The database type refers to a specific type of the database, for example, MySql belongs to a relational database, MongoDB and HDFS belong to a document type database, HBase belongs to a columnar storage database, and Redis belongs to a cache.
In the embodiment of the present invention, the correspondence between the query information and the database type defines an optimal database type corresponding to different query information, which may specifically be: (1) for the data tables with more query frequency and small table row number, the corresponding optimal database is a relational database MySql; (2) for a data table with less query frequency and larger table row number, the optimal storage mode can be that the data table is stored to a document type database HDFS in an Orc format (namely, a file storage format); (3) for the data table with more query conditions, the optimal storage mode can be that the data table is stored to a document type database HDFS in an Orc format; (4) for the data table which has more query frequency, aggregation dimensionality and can be generalized to a result table, the corresponding optimal database is a relational database MySql; (5) for a data table which has long query average response time and can be generalized to a result table, the optimal storage mode can be that the data table is stored to a document type database HDFS in an Orc format; (6) and for the data tables which are smaller in generalized result table and more in query frequency, the corresponding optimal database is cache Redis. Because thousands of records are involved in the data table, the query quantity of the data can be greatly reduced by using aggregation technologies such as summarization, averaging, extremum and the like, and the aggregation dimension refers to the dimension involved in the aggregation technology; the query condition refers to a specific condition set when data is queried for the data table. In addition, the aggregation dimension and the query condition can be regarded as query information of the data table, and a certain data table is queried according to the specific aggregation dimension and the query condition. Data generalization is an analysis process for abstracting and summarizing a large amount of data related to tasks in a database from a relatively low-level concept to a higher-level concept, and generalization to a result table refers to the existence of a result table obtained by generalizing part of data in the data table.
After the query information corresponding to the first data table is determined, the optimal database for storing the first data table can be obtained based on the corresponding relation between the query information and the database type, so that the optimal database can be directly determined as the first target database. In addition, it should be noted that if it is known that the database currently storing the first data table is already the optimal database according to the corresponding relationship between the query information and the database type, in this case, the data synchronization task may not be performed on the first data table.
Step S1022: and determining a second target database needing to perform data migration on the second data table and migration data corresponding to the second data table according to the storage information corresponding to the second data table based on the corresponding relation between the database type and the storage cost. Wherein storing information may include: a storage database, a storage time range, and a storage cost.
The corresponding relation between the database type and the storage cost sets the storage cost corresponding to different database types, the storage cost is equivalent to the occupied memory amount, for example, the storage cost required by the relational database MySql for storing the data table a1 is C1, the storage cost required by the document database MongoDB for storing the data table a1 is C2, the storage cost required by the column-wise storage database HBase for storing the data table a1 is C3, and the storage cost required by the cache Redis for storing the data table a1 is C4. Because the query frequency corresponding to the second data table is low, the second data table can be migrated to the database with low storage cost.
According to the technical scheme of the embodiment of the invention, the first target database and the second target database can be determined based on the corresponding relation between the query information and the database type and the corresponding relation between the database type and the storage cost, the data table storage of the heterogeneous database system can be automatically optimized, the technical problem that the data optimization needs to be manually carried out in the prior art is solved, and the query performance of the heterogeneous database system is further improved.
In a referential embodiment of the present invention, the executing of the data processing task by using the to-be-processed data table, the storage database and the target database may include: step S1031, synchronizing the first data table from a first storage database corresponding to the first data table to the first target database according to the table structure corresponding to the first target database; step S1032 is configured to migrate migration data corresponding to the second data table from the second storage database corresponding to the second data table to the second target database according to the table structure corresponding to the second target database. The migration data refers to data that needs to be migrated from the second data table to the second target database.
For different databases, different table structures are used to store data tables, so if data synchronization or data migration needs to be performed by using a target database, the table structure of the target database needs to be generated first, and then a data synchronization task or a data migration task is performed according to the generated table structure. The table structure may be schema, data synchronization may be performed by using an open source tool Sqoop, which is mainly used for data transmission between a Hadoop (i.e., a distributed system infrastructure, in which HDFS is one of cores) and a conventional database, and may direct data in one relational database into the HDFS or may direct data of the HDFS into the relational database.
For the data synchronization task, generating a table structure corresponding to a first target database, and then synchronizing the first data table to the first target data table of the first target database according to the table structure corresponding to the first target database; and for the data migration task, generating a table structure corresponding to the second target database, and then migrating migration data corresponding to the second data table to the second target data table of the second target database according to the table structure corresponding to the second target database. In addition, for the data migration task, the migration data corresponding to the second data table may be synchronized into the second target data table first, and then the migration data may be deleted from the second data table.
Furthermore, after executing the data processing task, it is necessary to verify the data processing task and update the data table information stored in the heterogeneous database system. Therefore, in a referential embodiment of the present invention, after executing a data processing task, the data processing method based on the heterogeneous database system may further include: performing data verification on the data table to be processed and a target data table corresponding to the data processing task; if the verification is passed, updating the storage information in the data dictionary according to the data table to be processed and the target data table, and updating the query information in the data dictionary by using the data query log; and if the verification fails, the data processing task is executed again. The target data table can be a data table corresponding to the data synchronization task or the data migration task.
Assuming that the data table a1 needs to be synchronized into the database M10, a new data table a10 needs to be added into M10, the data in a1 needs to be synchronized into a10, and a10 is defined as a target data table. After the data synchronization task is executed, consistency verification is performed on the data in A1 and A10, and whether the data in A1 and A10 are consistent or not is verified. If the verification is passed, the data synchronization task can be considered to be successful, the information of A10 in the data dictionary can be updated, besides, the query record of A1 can be extracted from the data query log, A10 is queried to obtain the query test result of A10, and the query information of A10 in the data dictionary is updated by utilizing the query test result. If the verification fails, the data synchronization task needs to be executed again.
For another example, if data in the time range from t1 to t2 in the data table a2 needs to be migrated to the database M8, a new data table A8 needs to be added to the database M8, data in the time range from t1 to t2 in the data table a2 needs to be migrated to the database A8, and A8 is defined as a target data table. After the data migration task was performed, it was verified whether data within the time range of t1 to t2 was purged from a2, and whether data within the time range of t1 to t2 in a2 existed in A8. If the verification is passed, the data synchronization task can be considered to be successful, the information of A8 in the data dictionary can be updated, in addition, data query records in the time range from t1 to t2 in the A2 can be extracted from the data query logs, query is carried out on A8, a query test result of A8 is obtained, and the query information of A8 in the data dictionary is updated by utilizing the query test result. If the verification fails, the data migration task needs to be executed again.
Fig. 2 is a schematic diagram of a main flow of a data synchronization method based on a heterogeneous database system according to an embodiment of the present invention. As shown in fig. 2, the main flow of the data synchronization method based on the heterogeneous database system may include:
step S201, inquiring a data inquiry log, and acquiring inquiry information corresponding to a data table stored in a heterogeneous database system;
step S202, if the query information corresponding to the data table meets the data synchronization condition, determining the data table as a first data table needing data synchronization;
step S203, determining a first target database needing to perform data synchronization on the first data table according to the corresponding query information of the first data table based on the corresponding relation between the query information and the database type;
step S204, generating a data synchronization task according to the first data table, a storage database corresponding to the first data table and a first target database;
step S205, based on a preset read-write scheduling rule, synchronizing a first data table from a first storage database corresponding to the first data table to a first target data table of a first target database according to a table structure corresponding to the first target database;
step S206, carrying out data consistency verification on the first data table and the first target data table;
step S207, if the verification is passed, updating the storage information in the data dictionary according to the first data table and the first target data table, and updating the query information in the data dictionary by using the data query log;
in step S208, if the verification fails, the data synchronization task is executed again.
The query information corresponding to the data table meeting the data synchronization condition may include at least one of the following options: the query average response time length corresponding to the data table is greater than the preset query average response time length, the query response index corresponding to the data table is greater than the preset query response index, the query failure rate corresponding to the data table is greater than the preset query failure rate, and the query frequency corresponding to the data table is greater than the first preset query frequency.
Fig. 3 is a schematic diagram of a main flow of a data migration method based on a heterogeneous database system according to an embodiment of the present invention. As shown in fig. 3, the main flow of the data migration method based on the heterogeneous database system may include:
step S301, querying a data query log to obtain query information corresponding to a data table stored in a heterogeneous database system;
step S302, if the query information corresponding to the data table meets the data migration condition, determining the data table as a second data table needing data migration;
step S303, determining a second target database and migration data corresponding to a second data table, which need to perform data migration on the second data table, according to the storage information corresponding to the second data table based on the corresponding relation between the database type and the storage cost;
step S304, generating a data migration task according to the second data table, a storage database corresponding to the second data table and a second target database;
step S305, migrating migration data from a second storage database corresponding to a second data table to the second target data table of a second target database according to a table structure corresponding to the second target database based on a preset read-write scheduling rule;
step S306, verifying the second data table and the second target data table;
step S307, if the verification is passed, updating the storage information in the data dictionary according to the second data table and the second target data table, and updating the query information in the data dictionary by using the data query log;
in step S308, if the verification fails, the data migration task is executed again.
The step of satisfying the data migration condition by the query information corresponding to the data table may include: and the query frequency corresponding to the data table is less than the second preset query frequency.
The heterogeneous database system may provide a data query function, and therefore in a referential embodiment of the present invention, the data processing method based on the heterogeneous database system may further include: receiving a data query request, and acquiring access parameters corresponding to the data query request; inquiring a database to be inquired corresponding to the data table to be inquired by using the storage information and the inquiry information in the data dictionary; generating a data query task corresponding to the database to be queried according to the data table to be queried, the query dimension and the query condition; and executing a data query task based on a preset read-write scheduling rule.
The data table to be queried, the query dimension and the query condition are entries corresponding to the data query request. Specifically, a unique identifier of the data table to be queried, such as a table name table 1; the query dimension may be a city dimension, a season dimension, etc. of the data; the query condition refers to a specific query condition, such as data of the query date of 2020, 1 month and 1 day.
The data dictionary records query information and storage information of each data table, for example, for the data table a1, the data dictionary records a database for storing a1, a query condition corresponding to a1, an aggregation dimension corresponding to a1, an average response duration of the query a1, a response monitoring index of the query a1, a failure rate of the query a1, a frequency of the query a1, a number of table rows of the query a1, and an update time of the record a1, and the data dictionary also records a storage date range of the query a1 and a storage cost.
Therefore, after the data table to be queried is determined, the database for storing the data table to be queried can be determined according to the information stored in the data dictionary, and then the database to be queried can be determined according to the query performance of the determined database, wherein the query performance can be determined by setting the query average response time, the query response monitoring index, the query failure rate and the query frequency weight, then the query performance of each determined database is calculated, and the database with the best query performance is selected as the database to be queried. And then, generating a data query task corresponding to the database to be queried by using the data table to be queried, the query dimension and the query condition, and finally executing the generated data query task based on a preset read-write scheduling rule. The preset scheduling read-write rule is already explained in the above step S101, and will not be described here again.
The data query task can be a database query statement, and the formats of the database query statements are different for different types of databases, so that after the database to be queried is determined, the query statement corresponding to the database to be queried is generated by using a data table to be queried, a query dimension and a query condition.
In addition, in a referential embodiment of the present invention, the data processing method based on the heterogeneous database system may further include: generating a data query log corresponding to the data query request; and updating the data dictionary by using the generated data query log. The embodiment of the invention can determine the data table to be processed by inquiring the data query log, so that after receiving the data query request, the data query log corresponding to the data query request needs to be generated, and the query information and the storage information stored in the data dictionary are updated by using the data query log.
Fig. 4 is a schematic diagram of a main flow of a data query method based on a heterogeneous database system according to an embodiment of the present invention. As shown in fig. 4, the main flow of the data query method based on the heterogeneous database system may include:
step S401, receiving the data query request, and obtaining a reference corresponding to the data query request, where the reference may include: a data table to be queried, query dimensions and query conditions;
step S402, inquiring a database to be inquired corresponding to the data table to be inquired by using the storage information and the inquiry information in the data dictionary;
step S403, generating a data query task corresponding to the database to be queried according to the data table to be queried, the query dimension and the query condition;
step S404, executing a data query task based on a preset read-write scheduling rule;
step S405, generating a data query log corresponding to the data query request, and updating the data dictionary using the generated data query log.
According to the embodiment of the invention, the database with good query performance can be selected for data query according to the storage information and the query information of the data dictionary, so that the data query efficiency is improved, and the user experience is good.
The following is a detailed description of the structure of a heterogeneous database system. Fig. 5 is a schematic structural diagram of a heterogeneous database system according to an embodiment of the present invention. As shown in fig. 5, a heterogeneous database system 500 may include: the system comprises a data query interface 501, a data dictionary 502, a data query log component 503, a data processing task generation component 504, a data query statement component 505, a read-write scheduling control component 506 and a different type database 507.
The data query interface 501 is configured to receive a data processing request, and determine a reference according to the data processing request, where the reference may include: a data table to be queried, query dimensions and query conditions; and for interfacing with different types of databases, configuration databases. The data dictionary 502 is used for storing query information and storage information of a data table; and the database to be queried corresponding to the data table to be queried is determined according to the query information and the storage information.
The data query log component 503 is configured to collect query records of the data table periodically or in real time, and update query information and storage information of the data dictionary using the query records; and the data processing device is also used for acquiring a data table which needs data synchronization and a data table which needs data migration regularly or in real time according to the data query log.
The data processing task generating component 504 is configured to determine, based on a correspondence between the query information and the database type, a target database corresponding to a data table that needs data synchronization, and add a new target data table according to a table structure of the determined target database; and the method is also used for determining a target database corresponding to the data table needing data migration based on the corresponding relation between the database type and the storage cost, and newly adding the target data table according to the table structure of the determined target database. The data processing task generation component 504 is also used to validate data processing tasks and update data table information stored in the heterogeneous database system.
The data query statement component 505 is configured to generate a query statement corresponding to the database to be queried by using the data table to be queried, the query dimension, and the query condition. The read-write scheduling control component 506 is configured to execute a data synchronization task, a data migration task, and a data query task based on a preset read-write scheduling rule. The different types of databases 507 may include databases such as MySql, MongoDB, HDFS, HBase, Redis, etc.
The heterogeneous database system provided by the embodiment of the invention can acquire the data table to be processed by inquiring the stored data query log, then generate a data processing task by combining the storage database corresponding to the data table to be processed and the target database, finally execute the data processing task based on the preset read-write scheduling rule, and automatically execute data processing on the data table to be processed by using the data query log, so that the data query function of the heterogeneous database system can be optimized, the technical problem that the query needs to be optimized by means of a manual intervention method in the prior art is solved, and the user experience is good.
Fig. 6 is a schematic diagram of main modules of a data processing apparatus based on a heterogeneous database system according to an embodiment of the present invention. As shown in fig. 6, the main modules of the heterogeneous database system-based data processing apparatus 600 may include: a determination module 601, a generation module 602, and an execution module 603.
The determining module 601 is configured to obtain a to-be-processed data table according to a data query log stored in the heterogeneous database system, and determine a target database corresponding to the to-be-processed data table; the generating module 602 may be configured to generate a data processing task according to the to-be-processed data table, the storage database corresponding to the to-be-processed data table, and the target database; the execution module 603 may be configured to execute a data processing task by using the to-be-processed data table, the storage database, and the target database based on a preset read-write scheduling rule.
In this embodiment of the present invention, the determining module 601 may further be configured to: inquiring the data inquiry log to obtain inquiry information corresponding to a data table stored in the heterogeneous database system; if the query information corresponding to the data table meets the data synchronization condition, determining the data table as a first data table needing data synchronization; and if the query information corresponding to the data table meets the data migration condition, determining the data table as a second data table needing data migration. Wherein, the query information may include: inquiring average response time, inquiring response monitoring indexes, inquiring failure rate and inquiring frequency; the first data table and the second data table are data tables to be processed.
In the embodiment of the present invention, the query information corresponding to the data table meeting the data synchronization condition may include at least one of the following options: the query average response time length corresponding to the data table is longer than the preset query average response time length, the query response index corresponding to the data table is longer than the preset query response index, the query failure rate corresponding to the data table is longer than the preset query failure rate, and the query frequency corresponding to the data table is longer than the first preset query frequency.
In the embodiment of the present invention, the query information corresponding to the data table meeting the data migration condition may include at least one of the following options: the query failure rate corresponding to the data table is greater than a second preset query failure rate, and the query frequency corresponding to the data table is less than a second preset query frequency.
In this embodiment of the present invention, the determining module 601 may further be configured to: determining a first target database needing to perform data synchronization on the first data table according to the corresponding relation between the query information and the database type and the query information corresponding to the first data table; and determining a second target database needing to perform data migration on the second data table and migration data corresponding to the second data table according to the storage information corresponding to the second data table based on the corresponding relation between the database type and the storage cost. Wherein storing information may include: a storage database, a storage time range, and a storage cost.
In this embodiment of the present invention, the executing module 603 may further be configured to: synchronizing the first data table from a first storage database corresponding to the first data table to the first target database according to the table structure corresponding to the first target database; and migrating migration data corresponding to the second data table from a second storage database corresponding to the second data table to the second target database according to the table structure corresponding to the second target database.
In this embodiment of the present invention, the executing module 603 may further be configured to: performing data verification on the data table to be processed and a target data table corresponding to the data processing task; if the verification is passed, updating the storage information in the data dictionary according to the data table to be processed and the target data table, and updating the query information in the data dictionary by using the data query log; and if the verification fails, the data processing task is executed again.
In this embodiment of the present invention, the data processing apparatus based on the heterogeneous database system may further include: query module (not shown). The query module is operable to: receiving a data query request, and acquiring access parameters corresponding to the data query request; inquiring a database to be inquired corresponding to the data table to be inquired by using the storage information and the inquiry information in the data dictionary; generating a data query task corresponding to the database to be queried according to the data table to be queried, the query dimension and the query condition; and executing a data query task based on a preset read-write scheduling rule. Wherein, participating may include: a data table to be queried, a query dimension, and a query condition.
In the embodiment of the present invention, the preset read-write scheduling rule may include at least one of the following options: under the condition that the data query task exists, the data query task is executed first, then the data processing task is executed, and under the condition that the data query task does not exist, the data processing task is executed directly; and under the condition that the data table to be processed corresponding to the data processing task is the same as the data table to be inquired corresponding to the data inquiry task, executing the data processing task first and then executing the data inquiry task.
In this embodiment of the present invention, the query module may further be configured to: generating a data query log corresponding to the data query request; and updating the data dictionary by using the generated data query log.
As can be seen from the above description, the data processing apparatus based on the heterogeneous database system according to the embodiment of the present invention can acquire the to-be-processed data table by querying the stored data query log, then generate the data processing task by combining the stored database corresponding to the to-be-processed data table and the target database, and finally execute the data processing task based on the preset read-write scheduling rule, and can automatically execute the data processing on the to-be-processed data table through the data query log, so that the data query function of the heterogeneous database system can be optimized, the technical problem that the optimization query needs to be performed by means of a manual intervention method in the prior art is solved, and the user experience is good.
Fig. 7 shows an exemplary system architecture 700 of a heterogeneous database system based data processing method or a heterogeneous database system based data processing apparatus to which an embodiment of the present invention may be applied.
As shown in fig. 7, the system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. The network 704 serves to provide a medium for communication links between the terminal devices 701, 702, 703 and the server 705. Network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 701, 702, 703 to interact with a server 705 over a network 704, to receive or send messages or the like. The terminal devices 701, 702, 703 may have installed thereon various communication client applications, such as a shopping-like application, a web browser application, a search-like application, an instant messaging tool, a mailbox client, social platform software, etc. (by way of example only).
The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 705 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the terminal devices 701, 702, 703. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.
It should be noted that the data processing method based on the heterogeneous database system provided by the embodiment of the present invention is generally executed by the server 705, and accordingly, the data processing apparatus based on the heterogeneous database system is generally disposed in the server 705.
It should be understood that the number of terminal devices, networks, and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program executes the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 801.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a determination module, a generation module, and an execution module. For example, the determining module may be further described as a module that acquires a to-be-processed data table according to a data query log stored in the heterogeneous database system and determines a target database corresponding to the to-be-processed data table.
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring a data table to be processed according to a data query log stored in a heterogeneous database system, and determining a target database corresponding to the data table to be processed; generating a data processing task according to the data table to be processed, the storage database corresponding to the data table to be processed and the target database; and executing the data processing task by utilizing the data table to be processed, the storage database and the target database based on a preset read-write scheduling rule.
According to the technical scheme of the embodiment of the invention, the data table to be processed is obtained by inquiring the stored data query log, then the data processing task is generated by combining the storage database and the target database corresponding to the data table to be processed, finally the data processing task can be executed based on the preset read-write scheduling rule, and the data processing can be automatically executed on the data table to be processed through the data query log, so that the data query function of the heterogeneous database system can be optimized, the technical problem that the query needs to be optimized by means of a manual intervention method in the prior art is solved, and the user experience is good.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (13)

1. A data processing method based on a heterogeneous database system is characterized by comprising the following steps:
acquiring a data table to be processed according to a data query log stored in a heterogeneous database system, and determining a target database corresponding to the data table to be processed;
generating a data processing task according to the data table to be processed, the storage database corresponding to the data table to be processed and the target database;
and executing the data processing task by utilizing the to-be-processed data table, the storage database and the target database based on a preset read-write scheduling rule.
2. The method of claim 1, wherein the obtaining the to-be-processed data table according to the data query log stored in the heterogeneous database system comprises:
inquiring the data inquiry log to obtain inquiry information corresponding to a data table stored in the heterogeneous database system, wherein the inquiry information comprises: inquiring average response time, inquiring response monitoring indexes, inquiring failure rate and inquiring frequency;
if the query information corresponding to the data table meets the data synchronization condition, determining the data table as a first data table needing data synchronization;
if the query information corresponding to the data table meets the data migration condition, determining the data table as a second data table needing data migration; wherein the content of the first and second substances,
the first data table and the second data table are the data tables to be processed.
3. The method according to claim 2, wherein the query information corresponding to the data table satisfying the data synchronization condition includes at least one of the following options: the query average response time length corresponding to the data table is greater than the preset query average response time length, the query response index corresponding to the data table is greater than the preset query response index, the query failure rate corresponding to the data table is greater than the preset query failure rate, and the query frequency corresponding to the data table is greater than the first preset query frequency.
4. The method according to claim 2, wherein the query information corresponding to the data table satisfying the data migration condition comprises: and the query frequency corresponding to the data table is less than a second preset query frequency.
5. The method according to claim 2, wherein the determining the target database corresponding to the to-be-processed data table comprises:
determining a first target database needing to perform data synchronization on the first data table according to the corresponding relation between the query information and the database type and the query information corresponding to the first data table;
based on the corresponding relationship between the database type and the storage cost, according to the storage information corresponding to the second data table, determining a second target database which needs to perform data migration on the second data table and migration data corresponding to the second data table, where the storage information includes: a storage database, a storage time range, and a storage cost.
6. The method of claim 5, wherein performing the data processing task using the pending data table, the stored database, and the target database comprises:
synchronizing the first data table from a first storage database corresponding to the first data table to the first target database according to a table structure corresponding to the first target database;
and migrating migration data corresponding to the second data table from a second storage database corresponding to the second data table to the second target database according to the table structure corresponding to the second target database.
7. The method of claim 1, wherein after performing the data processing task, the method further comprises:
performing data verification on the data table to be processed and a target data table corresponding to the data processing task;
if the verification is passed, updating storage information in a data dictionary according to the data table to be processed and the target data table, and updating query information in the data dictionary by using the data query log;
and if the verification fails, re-executing the data processing task.
8. The method of claim 1, further comprising:
receiving a data query request, and acquiring a reference corresponding to the data query request, wherein the reference comprises: a data table to be queried, query dimensions and query conditions;
inquiring a database to be inquired corresponding to the data table to be inquired by using the storage information and the inquiry information in the data dictionary;
generating a data query task corresponding to the database to be queried according to the data table to be queried, the query dimension and the query condition;
and executing the data query task based on a preset read-write scheduling rule.
9. The method according to claim 1 or claim 8, wherein the preset read-write scheduling rule comprises at least one of the following options:
under the condition that the data query task exists, the data query task is executed first, then the data processing task is executed, and under the condition that the data query task does not exist, the data processing task is executed directly;
and under the condition that the data table to be processed corresponding to the data processing task is the same as the data table to be inquired corresponding to the data inquiry task, executing the data processing task first and then executing the data inquiry task.
10. The method of claim 8, further comprising: generating a data query log corresponding to the data query request; and
and updating the data dictionary by using the generated data query log.
11. A data processing apparatus based on a heterogeneous database system, comprising:
the determining module is used for acquiring a data table to be processed according to a data query log stored in the heterogeneous database system and determining a target database corresponding to the data table to be processed;
the generating module is used for generating a data processing task according to the to-be-processed data table, the storage database corresponding to the to-be-processed data table and the target database;
and the execution module is used for executing the data processing task by utilizing the to-be-processed data table, the storage database and the target database based on a preset read-write scheduling rule.
12. An electronic device, comprising:
one or more processors;
a storage device for storing one or more programs,
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-10.
13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-10.
CN202010769291.6A 2020-08-03 2020-08-03 Data processing method and device based on heterogeneous database system Pending CN113760966A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010769291.6A CN113760966A (en) 2020-08-03 2020-08-03 Data processing method and device based on heterogeneous database system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010769291.6A CN113760966A (en) 2020-08-03 2020-08-03 Data processing method and device based on heterogeneous database system

Publications (1)

Publication Number Publication Date
CN113760966A true CN113760966A (en) 2021-12-07

Family

ID=78785572

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010769291.6A Pending CN113760966A (en) 2020-08-03 2020-08-03 Data processing method and device based on heterogeneous database system

Country Status (1)

Country Link
CN (1) CN113760966A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521738A (en) * 2023-05-06 2023-08-01 零束科技有限公司 Data processing method, system, electronic device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116521738A (en) * 2023-05-06 2023-08-01 零束科技有限公司 Data processing method, system, electronic device and storage medium

Similar Documents

Publication Publication Date Title
CN109189835B (en) Method and device for generating data wide table in real time
CN109947668B (en) Method and device for storing data
US20190228093A1 (en) Temporal optimization of data operations using distributed search and server management
CN109614402B (en) Multidimensional data query method and device
CN108629029B (en) Data processing method and device applied to data warehouse
Lai et al. Towards a framework for large-scale multimedia data storage and processing on Hadoop platform
CN111190888A (en) Method and device for managing graph database cluster
CN112307037A (en) Data synchronization method and device
CN109918425A (en) A kind of method and system realized data and import non-relational database
US20220121652A1 (en) Parallel Stream Processing of Change Data Capture
CN109960212B (en) Task sending method and device
CN114297173A (en) Knowledge graph construction method and system for large-scale mass data
CN113282611A (en) Method and device for synchronizing stream data, computer equipment and storage medium
CN113190517B (en) Data integration method and device, electronic equipment and computer readable medium
CN112182138A (en) Catalog making method and device
CN111753019A (en) Data partitioning method and device applied to data warehouse
CN113760966A (en) Data processing method and device based on heterogeneous database system
CN113760600B (en) Database backup method, database restoration method and related devices
CN112817930A (en) Data migration method and device
CN112711572B (en) Online capacity expansion method and device suitable for database and table division
CN112783914B (en) Method and device for optimizing sentences
CN113760861A (en) Data migration method and device
CN113448957A (en) Data query method and device
CN111984686A (en) Data processing method and device
CN113779048A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination