CN117786005A

CN117786005A - Method for realizing cross-database type synchronous data based on configuration parameters

Info

Publication number: CN117786005A
Application number: CN202311765345.1A
Authority: CN
Inventors: 胡传伟; 杨新宇; 张强; 于大伟
Original assignee: Shandong Rongke Data Service Co ltd
Current assignee: Shandong Rongke Data Service Co ltd
Priority date: 2023-12-20
Filing date: 2023-12-20
Publication date: 2024-03-29

Abstract

The invention discloses a method for realizing cross-database type synchronous data based on configuration parameters, which comprises the following steps: determining a plurality of databases, selecting corresponding database connection drivers and parameter configuration according to different database types, simultaneously creating data synchronization logic, and designing a system basic table; asynchronously backing up the database based on the message queue and by the backup library executing program; and setting a synchronous database prediction task, and completing automatic backup recommendation of the database based on a synchronous prediction algorithm. The invention can realize real-time backup of the original business, and can realize the modification of the content of the backup library under the condition that the original business data is not influenced when the backup content is required to be adjusted and the original data cannot be modified; the workload on enterprise software application is greatly reduced; the background development language has universality, and any mainstream development language can be used.

Description

Method for realizing cross-database type synchronous data based on configuration parameters

Technical Field

The invention relates to the field of database data synchronization, provides an automatic backup tool for enterprise management software application, thereby greatly improving office efficiency of enterprise personnel, and particularly relates to a method for realizing cross-database type data synchronization based on configuration parameters.

Background

Synchronizing data across database types refers to synchronizing data of one database management system into another database management system, which may use different database types and architectures. The synchronization may be unidirectional, i.e. only one party of data is synchronized to the other, or bidirectional, i.e. both sides of data are updated in synchronization with each other. Synchronizing data across database types requires consideration of differences in data formats, data types, index and query languages, and the like, and requires implementation using specialized tools or writing applications. This technique is very common in cross-platform applications and enterprise-level applications because it allows data sharing and collaboration between different databases.

When an enterprise manages software applications, data backup is an indispensable content, and many enterprises can have the requirement that business data between different sub-companies and different departments need to be modified after backup. The conventional data backup is to directly backup the database in full, and the difference adjustment cannot be realized; moreover, server resources are often occupied during backup, system use is affected, and conventional backup has no instantaneity.

For example, chinese patent 201310719511.4 discloses a method for implementing cross-database type synchronization data based on configuration parameters, which includes reading database data and writing the database data into a memory, and reading a memory data synchronization database for database data synchronization of different data types. However, the method for synchronizing data across database types has the following disadvantages when applied to specific applications: if the business data of different sub-companies and different departments need to be modified after backup, the conventional full-volume backup mode cannot realize the difference adjustment. Secondly, server resources are occupied during backup, and normal use of the system is affected. In addition, the effect of real-time backup cannot be realized by conventional backup. Meanwhile, the existing method for synchronizing data does not optimize automatic synchronization among different databases, so that the automatic backup efficiency is low.

For the problems in the related art, no effective solution has been proposed at present.

Disclosure of Invention

Aiming at the problems in the related art, the invention provides a method for realizing cross-database type synchronous data based on configuration parameters, so as to overcome the technical problems in the prior art.

For this purpose, the invention adopts the following specific technical scheme:

a method for implementing cross database type synchronization data based on configuration parameters, the method comprising the steps of:

s1, determining a plurality of databases, selecting corresponding database connection drivers and parameter configurations according to different database types, simultaneously creating data synchronization logic, and designing a system basic table;

s2, backing up the database asynchronously based on the message queue and by the backup library execution program;

s3, setting a synchronous database prediction task, and completing automatic backup recommendation of the database based on a synchronous prediction algorithm.

Further, the creating data synchronization logic includes the steps of:

reading data in a backup source database, converting the data type and format, and determining the data written in a backup destination database;

acquiring characteristics and limitations of a backup destination database, and mapping data types of the backup source database and the backup destination database;

converting grammar rules among different databases, converting data in a backup source database into a data format in a backup destination database, and performing data verification and calibration;

setting error processing and exception processing mechanisms in the data synchronization task to ensure the reliability and stability of data synchronization;

testing and optimizing data synchronization tasks, and ensuring the accuracy and efficiency of data synchronization.

Further, the designing of the system basic table includes the following steps:

setting a backup source database with a unique primary key field;

the backup destination database is set to have a unique primary key, last modification time and backup source database ID field.

Further, the step of backing up the database asynchronously by the backup library executing program based on the message queue includes the steps of:

if the database structured query statement is executed, the database structured query statement is sent to a message queue, and a backup database executing program asynchronously receives the database structured query statement from the message queue, and executes the database structured query statement on a backup target database;

before executing the database structured query statement, the backup database execution program firstly checks the last modification time of the data corresponding to the database structured query statement in the backup destination database;

if the last modification time is the same as the last modification time in the backup source database, the backup library execution program directly executes the database structured query statement;

if the last modification time is the same as the last modification time in the backup source database, the backup library executing program executes other operations;

wherein the other operations include choosing to ignore the structured query statement of the database and overwriting the data in the backup destination database.

Further, the step of setting a synchronous database prediction task and completing automatic backup recommendation of the database based on a synchronous prediction algorithm comprises the following steps:

setting a synchronous database prediction task, and determining the starting time of the prediction task;

completing automatic backup recommendation of the database through a synchronous prediction algorithm;

and when the time point of the starting of the prediction task is determined, the prediction task is not executed any more and the prediction task after expiration is not executed any more if the time point of the starting of the prediction task is cancelled.

Further, the automatic backup recommendation of the database completed by the synchronous prediction algorithm comprises the following steps:

obtaining a scoring data matrix of the backup destination database to the backup source database, wherein the number of the backup source databases is n, and the number of the backup destination databases is m;

clustering the data in all backup source databases through a clustering algorithm to obtain a plurality of data blocks;

calculating to obtain a scoring matrix of the backup destination database for the synchronous tasks of the backup source databases, calculating to obtain scoring similarity of the two backup destination databases for the synchronous tasks in one backup source database through cosine similarity, and simultaneously, enabling the synchronous tasks to comprise a plurality of data blocks;

calculating to obtain the preference degree of two backup destination databases to the data blocks of the synchronous task in one backup source database;

multiplying the score similarity with the preference degree to obtain a similarity value of the synchronous task;

if the similarity value is greater than or equal to a preset threshold value, the two backup target databases are calibrated to be recommended to each other, and if one of the synchronous tasks completes data synchronization to one backup target database, the data of the synchronous task is automatically recommended to the other backup target database and marked.

Further, the calculating to obtain the score matrix of the synchronization task of the backup destination database to the backup source database includes the following steps:

the method comprises the steps of obtaining synchronous tasks of each backup source database for data synchronization and scoring data of each synchronous task by a backup destination database, calculating to obtain scoring matrixes of the synchronous tasks, wherein the number of the synchronous tasks of each backup source database is k;

multiplying the scoring matrix of the synchronous task with the scoring data matrix to obtain the scoring matrix of the backup destination database for each synchronous task in the backup source database:

wherein m is the number of backup target databases;

k is the number of sync tasks of the backup source database.

Further, the calculating to obtain the preference of the two backup destination databases to the data blocks of the synchronous task in the backup source database includes the following steps:

acquiring the access times of a backup destination database to a certain data block in a backup source database and the synchronization times of the data block;

dividing the synchronization times by the access times to obtain the interest value of the backup destination database on the data block;

calculating preference degree of a certain data block in the backup source database between backup destination databases:

wherein L is _ua Interest value of data block a for backup destination database u;

lva is the interest value of the backup destination database v to the data block a;

n is a non-zero natural number.

Further, the clustering of the data in all the backup source databases by the clustering algorithm to obtain a plurality of data blocks includes the following steps:

randomizing and disturbing data in any backup source database to obtain a plurality of data sets with different sequences;

clustering the data sets in different sequences to obtain a plurality of clusters;

drawing a histogram of the clusters, wherein the horizontal axis is an attribute item of the data, and the vertical axis is the occurrence frequency of the attribute item;

the cluster with the largest profit value is selected as the optimal cluster of the round, and the profit value calculation formula is as follows:

wherein L is the number of clusters, C _k Is the kth cluster;

s is the area of the histogram, W is the width of the histogram;

r is a rejection factor;

inputting the optimal clusters into the data set of the next iteration, randomizing the corresponding data set, and selecting the cluster with the largest profit value as the optimal cluster of the round until the profit value is unchanged and each cluster is a data block.

Further, the randomizing includes the steps of:

randomly selecting one record and exchanging the last record for the data set of the X records;

and selecting one record from the data set which is not randomized to be exchanged with the last record, and finishing randomization after all records are exchanged.

The beneficial effects of the invention are as follows:

(1) The method for realizing the cross-database type synchronous data based on the configuration parameters can realize the real-time backup of the original service, and can realize the modification of the content of the backup database under the condition that the original service data is not influenced under the condition that the backup content is required to be regulated and the original data cannot be modified. The workload on enterprise software applications is greatly reduced.

(2) According to the invention, the automatic backup recommendation of the database is completed through the synchronous prediction algorithm, the data of one synchronous task can be automatically recommended to another backup destination database with similarity, the accurate data recommendation is realized, different departments can obtain the same valuable data, and the efficient and automatic data synchronization is completed. In the process of clustering the data in all backup source databases through the clustering algorithm, the improved CLOPE algorithm is adopted, the data are randomized and disturbed, and the quality of clustering the data blocks is improved.

(3) The invention has universality, and the background development language can use any mainstream development language, such as java, python, C and the like; message queues may use message handling tools such as Redis, rabbitMQ; the database supports MySql, postgrelSQL and other relational databases. The data synchronization has real-time performance, realizes the non-inductive backup of the backup source, and has negligible influence on the system execution efficiency. Screening, filtering and processing (ignoring or overlaying) the difference content may be implemented for the difference statement.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for implementing synchronizing data across database types based on configuration parameters in accordance with an embodiment of the present invention.

Detailed Description

For the purpose of further illustrating the various embodiments, the present invention provides the accompanying drawings, which are a part of the disclosure of the present invention, and which are mainly used to illustrate the embodiments and, together with the description, serve to explain the principles of the embodiments, and with reference to these descriptions, one skilled in the art will recognize other possible implementations and advantages of the present invention, wherein elements are not drawn to scale, and like reference numerals are generally used to designate like elements.

According to the embodiment of the invention, a method for realizing cross-database type synchronous data based on configuration parameters is provided, and the method has wide application requirements in the field of enterprise management software application, wherein the content comprises table mechanism design of database business documents, message queue application method and verification mechanism design of data synchronization.

The invention will be further described with reference to the accompanying drawings and detailed description, as shown in fig. 1, a method for implementing cross database type synchronization data based on configuration parameters according to an embodiment of the invention, the method includes the following steps:

s1, determining a plurality of databases, selecting corresponding database connection drivers and parameter configurations according to different database types, simultaneously creating data synchronization logic, and designing a system basic table.

In one embodiment, the creating data synchronization logic includes the steps of:

setting error processing and exception processing mechanisms in the data synchronization task to ensure the reliability and stability of data synchronization; for example, retry mechanisms, exception capture and logging operations may be provided to address possible anomalies.

Testing and optimizing data synchronization tasks, and ensuring the accuracy and efficiency of data synchronization. For example, whether the data synchronization is successful or not can be checked by comparing the data in the source database and the target database, and meanwhile, performance optimization can be performed on the data synchronization logic so as to improve the speed and efficiency of the data synchronization.

In one embodiment, the designing the system base table includes the steps of:

setting a backup source database with a unique primary key field;

S2, backing up the database asynchronously based on the message queue and by the backup library execution program.

In one embodiment, the backing up the database asynchronously by the backup library executive based on the message queue comprises the steps of:

if the database structured query Statement (SQL) is executed, the database structured query statement is sent to a message queue, and a backup library executing program asynchronously receives the database structured query statement from the message queue, and executes the database structured query statement on a backup target database;

wherein the other operations include choosing to ignore the structured query statement of the database and to overwrite the data in the backup destination database or to perform other processing.

Message queues are a communication mechanism for passing messages between applications. It consists of three parts, a message Producer (Producer), a message Queue (Queue) and a message Consumer (Consumer).

The message producer sends the message to a message queue, which stores the message temporarily and sends it to the message consumer. The message consumer reads the message from the message queue and processes it. Message queues provide a way of asynchronous communication that the producer and consumer need not be online at the same time, and they can send and receive messages at any time. Message queues typically have the following characteristics: reliability, asynchronism, decoupling, buffering.

The invention is mainly characterized in that:

the application object of the present invention is a software implementation and application personnel.

The invention can support different development languages (Java/Python/C) and different databases (MySQL/PostgreSQL).

Database all table structure is as in table 1:

table 1 database all table structure

Field name	Type(s)	Remarks
			ID	Integer	Unique primary key
LAST_UPDATE_TIME	Datetime	Data last modification time
			SYNC_ID	Integer	Backup source ID

The synchronization initiation is performed when the system interacts with the database, and the interaction SQL is sent to the message queue.

And the synchronous execution program executes SQL in the backup library in real time after the SQL is executed through the information queue. Specific codes and notes are as follows (Python):

In one embodiment, the setting the synchronous database prediction task and completing the automatic backup recommendation of the database based on the synchronous prediction algorithm comprises the following steps:

In one embodiment, the automatic backup recommendation of the database accomplished by the synchronous predictive algorithm comprises the steps of:

obtaining a scoring data matrix of the backup destination database on the backup source database, wherein the scoring data is set to be one to five minutes, and meanwhile, the number of the backup source databases is n, and the number of the backup destination databases is m;

In one embodiment, the calculating the score matrix of the synchronization task of the backup destination database to the backup source database includes the following steps:

the method comprises the steps of obtaining a synchronous task of each backup source database for data synchronization and scoring data (scoring by professionals) of each synchronous task by a backup destination database, and calculating to obtain a scoring matrix of the synchronous task, wherein the number of the synchronous tasks of each backup source database is k;

wherein m is the number of backup target databases;

k is the number of sync tasks of the backup source database.

In one embodiment, the calculating the preference of the two backup destination databases to the data blocks of the synchronization task in one backup source database includes the following steps:

n is a non-zero natural number.

In one embodiment, the clustering the data in all the backup source databases by the clustering algorithm to obtain a plurality of data blocks includes the following steps:

wherein L is the number of clusters, C _k Is the kth cluster;

s is the area of the histogram, W is the width of the histogram;

r is a rejection factor, and the larger r is, the more clusters are;

The attribute items include:

numerical attribute: including real numbers, integers, fractions, etc. Category type attribute: including discrete values, nominal values, etc. Time-type attribute: including date, time stamp, etc. Text type attribute: including natural language text, code, etc. Geographic location attribute: including longitude, latitude, etc. Image, video, etc.

In one embodiment, the randomization comprises the steps of:

In summary, the method for implementing cross-database type synchronous data based on configuration parameters can implement real-time backup of original service, and can implement modification of the content of the backup database without affecting the original service data when the backup content needs to be adjusted and the original data cannot be modified. The workload on enterprise software applications is greatly reduced. According to the invention, the automatic backup recommendation of the database is completed through the synchronous prediction algorithm, the data of one synchronous task can be automatically recommended to another backup destination database with similarity, the accurate data recommendation is realized, different departments can obtain the same valuable data, and the efficient and automatic data synchronization is completed. In the process of clustering the data in all backup source databases through the clustering algorithm, the improved CLOPE algorithm is adopted, the data are randomized and disturbed, and the quality of clustering the data blocks is improved. The invention has universality, and the background development language can use any mainstream development language, such as java, python, C and the like; message queues may use message handling tools such as Redis, rabbitMQ; the database supports MySql, postgrelSQL and other relational databases. The data synchronization has real-time performance, realizes the non-inductive backup of the backup source, and has negligible influence on the system execution efficiency. Screening, filtering and processing (ignoring or overlaying) the difference content may be implemented for the difference statement.

The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, alternatives, and improvements that fall within the spirit and scope of the invention.

Claims

1. A method for synchronizing data across database types based on configuration parameters, the method comprising the steps of:

2. The method for implementing cross database type synchronization data based on configuration parameters of claim 1, wherein the creating data synchronization logic comprises the steps of:

3. The method for implementing cross-database type synchronization of data based on configuration parameters according to claim 1, wherein said designing the system base table comprises the steps of:

setting a backup source database with a unique primary key field;

4. The method for implementing cross-database type synchronous data based on configuration parameters according to claim 1, wherein the step of backing up the database asynchronously by the backup library execution program based on the message queue comprises the steps of:

5. The method for implementing cross-database type synchronization data based on configuration parameters according to claim 1, wherein said setting a synchronization database prediction task and completing automatic backup recommendation of a database based on a synchronization prediction algorithm comprises the steps of:

6. The method for implementing synchronization of data across database types based on configuration parameters of claim 5, wherein said automatic backup recommendation of databases by means of a synchronization prediction algorithm comprises the steps of:

7. The method for implementing cross-database type synchronization data based on configuration parameters according to claim 6, wherein said calculating a score matrix of a backup destination database to a synchronization task of a backup source database comprises the steps of:

wherein m is the number of backup target databases;

k is the number of sync tasks of the backup source database.

8. The method for implementing cross-database type synchronization of data based on configuration parameters according to claim 7, wherein said calculating the preference of two backup destination databases for data blocks of a synchronization task in a backup source database comprises the steps of:

n is a non-zero natural number.

9. The method for implementing cross-database type synchronization of data based on configuration parameters according to claim 8, wherein the clustering of data in all backup source databases by a clustering algorithm to obtain a plurality of data blocks comprises the following steps:

wherein L is the number of clusters, C _k Is the kth cluster;

s is the area of the histogram, W is the width of the histogram;

r is a rejection factor;

10. A method of implementing cross database type synchronization of data based on configuration parameters according to claim 9, wherein the randomization comprises the steps of: