CN113282599A - Data synchronization method and system - Google Patents

Data synchronization method and system Download PDF

Info

Publication number
CN113282599A
CN113282599A CN202110605474.9A CN202110605474A CN113282599A CN 113282599 A CN113282599 A CN 113282599A CN 202110605474 A CN202110605474 A CN 202110605474A CN 113282599 A CN113282599 A CN 113282599A
Authority
CN
China
Prior art keywords
data
source table
synchronization
target
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110605474.9A
Other languages
Chinese (zh)
Inventor
李斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An International Smart City Technology Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202110605474.9A priority Critical patent/CN113282599A/en
Publication of CN113282599A publication Critical patent/CN113282599A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to the technical field of computers, in particular to a data synchronization method and a data synchronization system, which comprise the following steps: searching a business database through a query statement to acquire target data; analyzing the query statement to obtain a source table and a source table filtering condition; copying the data to be synchronized to a preset target table in a data warehouse according to a source table of the target data, field definitions of the source table and the preset target table, and screening the target data according to the source table filtering conditions to obtain the data to be synchronized. In the embodiment of the invention, related workers can realize data synchronization only through query sentences without assistance of other technical engineers, so that the technical requirements on the workers are low, and the efficiency of a data preparation link is improved; and synchronization is carried out as required, so that the synchronous data volume can be reduced, the synchronization efficiency is improved, and meanwhile, the manpower investment and hardware investment for data warehouse construction can be reduced, thereby saving the cost.

Description

Data synchronization method and system
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data synchronization method and system.
Background
A relational database refers to a database that uses a relational model to organize data, and stores data in rows and columns for a user to understand conveniently, a series of rows and columns of the relational database are called tables, and a group of tables constitutes the database. A user retrieves data in a database by a query, which is an executable code that defines certain areas in the database.
The Data Warehouse, known in english under the name Data Warehouse, may be abbreviated as DW or DWH. A data warehouse is a theme-oriented, Integrated (Integrated), relatively stable collection of data that reflects historical changes in enterprise management and decision-making. That is, for all application systems, such as customer relationship management systems, financial systems, etc., integration by subject and recording the entire history of changes. With the continuous improvement of the informatization degree of the enterprise, a large amount of business data are accumulated in the enterprise, and the data warehouse is used for uniformly processing the mutually independent and dispersed data so as to meet the high-level decision and analysis requirements of the enterprise.
The data of the data warehouse is extracted from the original scattered database data (relational database such as mysql). Data warehouses are generally divided into three layers: ODS (operation data layer), DW (data warehouse layer) and DM (data mart layer), wherein the ODS is used for storing the original data in a classified mode, the DW is used for slightly cleaning and summarizing the data, and the DM is used for one-to-one construction, deep cleaning and summarizing according to the analysis statistical requirements.
The data synchronization of the ODS layer in the data warehouse is mainly to synchronize the service data of each service system to the data warehouse. Due to the fact that clear analysis requirements are lacked during construction of an ODS layer, if full-scale synchronization is selected, some unused data occupy a large amount of hardware resources and cause waste, due to cost, only a part of core service data tables are synchronized, and therefore the situation that data are lacked inevitably in an analysis stage and data are needed to be synchronized in a data warehouse is caused.
The existing mainstream relational database and data warehouse data synchronization solution is as follows: the capability of an ETL (data warehouse technology) tool is continuously strengthened, operators are provided as many as possible, and the response speed of the demand is improved. However, the ETL tool is essentially a graphical programming tool, requiring a user to have programming ideas, still having technical thresholds for analysts, often being used by trained staffs, and requiring complex data synchronization even by developers. Therefore, the method has high requirements on personnel skills. Each vendor provides ETL tools such as dataworks, open source, ketle, to ali.
Therefore, a data synchronization method for relational databases and data warehouses, which is simple and efficient in operation, is needed.
Disclosure of Invention
The embodiment of the invention provides a data synchronization method and system, which are used for solving the problems of complex synchronization operation and low efficiency of the existing data warehouse.
In a first aspect, an embodiment of the present invention provides a data synchronization method, including:
searching a business database through an inquiry statement to obtain target data, wherein the business database is a relational database;
analyzing the query statement to obtain a source table and a source table filtering condition, wherein the source table comprises the target data, a source table of the target data and a field definition of the source table;
copying data to be synchronized to a preset target table in a data warehouse according to a source table of the target data, field definitions of the source table and the preset target table, wherein the data to be synchronized is obtained by screening the target data according to the source table filtering conditions.
Preferably, the parsing the query statement to obtain a source table and a source table filtering condition includes:
parsing the query statement into a syntax tree by a parsing tool;
extracting the source table and the source table filtering rules based on the syntax tree.
Preferably, the extracting the source table and the source table filtering rule based on the syntax tree includes:
extracting the source table based on a tableName sub-tree in the syntax tree;
and extracting the source table filtering rule based on a whereState subtree in the syntax tree.
Preferably, the copying the data to be synchronized to the preset target table in the data warehouse according to the source table of the target data, the field definition of the source table, and the preset target table, and then further includes:
and repeatedly executing the steps according to the preset synchronous frequency.
Preferably, after copying the data to be synchronized to the preset target table in the data warehouse according to the source table of the target data, the field definition of the source table, and the preset target table, the method further includes:
identifying sensitive data in the service database according to a preset synchronization increment identifier, wherein the sensitive data is data to be synchronized which changes in the service database, the preset synchronization increment identifier comprises a timestamp and a synchronization rule, and the synchronization rule comprises any one of newly-added synchronization, modified synchronization and deleted synchronization;
updating the sensitive data into the data warehouse.
Preferably, the query statement is used to search the service database to obtain the target data, and the method includes the following steps:
acquiring a query statement input by a user;
acquiring an accessible database in the service database according to the user authority;
and searching in the accessible database through the query statement to acquire the target data.
Preferably, the copying the data to be synchronized to the preset target table in the data warehouse according to the source table of the target data, the field definition of the source table, and the preset target table includes:
establishing the preset target table according to an operation instruction input by a user on a visual graphical interface, wherein the visual graphical interface is used for guiding the user to establish the preset target table in the data warehouse, and the preset target table and the source table have the same structure;
establishing a synchronous task according to a source table of the target data, a field definition of the source table and the preset target table;
and executing the synchronization task through ETL, and copying the data to be synchronized to the preset target table in the data warehouse. In a second aspect, an embodiment of the present invention provides a data synchronization system, including:
the system comprises an acquisition module, a query module and a processing module, wherein the acquisition module is used for searching a business database through a query statement to acquire target data, and the business database is a relational database;
the analysis module is used for analyzing the query statement to acquire a source table and a source table filtering condition, wherein the source table comprises the target data, a source table of the target data and a field definition of the source table;
and the synchronization module is used for copying the data to be synchronized to the preset target table in the data warehouse according to the source table of the target data, the field definition of the source table and the preset target table, wherein the data to be synchronized is obtained by screening the target data according to the source table filtering condition.
In a third aspect, an embodiment of the present invention provides a computer device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the data synchronization method when executing the computer program.
In a fourth aspect, an embodiment of the present invention provides a computer storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the data synchronization method.
The embodiment of the invention provides a data synchronization method and a data synchronization system, when data synchronization is needed, target data needing to be synchronized is found in a business database through an inquiry statement, the position of the target data and a filtering condition are obtained through analyzing the inquiry statement, the target data are filtered through the filtering condition to obtain data to be synchronized, the position of the data to be synchronized in the business database can be found according to the position of the target data, the data to be synchronized are copied to a preset target table of a data warehouse, and the data synchronization function in the business database and the data warehouse is realized.
In the embodiment of the invention, related workers can realize data synchronization only through query sentences, so that the problem of complex data synchronization operation in a data warehouse is solved, the assistance of other technical engineers is not needed, the technical requirements on the workers are low, and the efficiency of a data preparation link in data analysis work is improved; and only the data searched by the query statement is synchronized, and the data synchronization belongs to on-demand synchronization, so that the data synchronization amount can be reduced, the data synchronization efficiency is improved, and the manpower investment and hardware investment in data warehouse construction can be reduced while the synchronization efficiency is improved, so that the cost is saved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a schematic diagram of an application environment of a data synchronization method according to an embodiment of the present invention;
fig. 2 is a flowchart of a data synchronization method according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating parsing of a query statement according to an embodiment of the present invention;
FIG. 4 is a diagram illustrating a process of parsing a query statement according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating the source table and source table filtering rules according to the syntax tree in accordance with an embodiment of the present invention;
FIG. 6 is a flowchart illustrating a data synchronization method according to an embodiment of the present invention;
FIG. 7 is a detailed flowchart of the lookup of the business database according to an embodiment of the present invention;
FIG. 8 is a flowchart illustrating a specific process of copying data to be synchronized to a predetermined target table according to an embodiment of the present invention;
fig. 9 is a flowchart of a data synchronization method according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a data synchronization system according to an embodiment of the present invention;
FIG. 11 is a diagram of a computing device in accordance with an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
As shown in fig. 1, when the enterprise side executes a certain service, some service data is generated and transmitted to the service side, and for service data of the same topic type, these service data are stored in the service database in a decentralized manner. When a certain type of theme data in the data warehouse needs to be synchronized, a user inputs an inquiry statement at a client, the inquiry statement is transmitted to a server, the server executes the data synchronization method, and the dispersed business data are extracted from a business database and then are classified and stored in the data warehouse according to the same theme.
It should be noted that the server may be implemented by an independent server or a server cluster composed of a plurality of servers. The terminal and the enterprise terminal may be, but are not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, and the like. The terminal and the enterprise terminal may be connected through bluetooth, USB (Universal Serial Bus), or other communication connection methods, which is not limited in this embodiment of the present invention.
In an embodiment, as shown in fig. 2, a data synchronization method is provided, which is described by taking the application of the method to the server in fig. 1 as an example, and includes the following steps:
in order to improve the data synchronization efficiency, in the embodiment of the invention, all data in the service database are not synchronized, but the data meeting the requirements in the service database are synchronized according to the requirements, so that the data synchronization amount is reduced, and the aim of improving the synchronization efficiency is fulfilled.
S210, searching a business database through a query statement to obtain target data, wherein the business database is a relational database;
when data of a certain theme type needs to be synchronized, firstly, a server side needs to acquire an input query statement for a business database, the query statement can be specifically a select query statement, and the select query statement is used for searching data from the business database according to conditions, so that target data meeting requirements are obtained.
Common select query statement functions include: inquiring all records, inquiring appointed field records, filtering the same records, appointing inquiry results, inquiring according to an inquiry path, changing the inquiry of list name display and the like, and can be specifically selected according to actual needs. When all records are inquired, all data in the business database are synchronized to the data bin correspondingly, and when the inquiry is carried out according to the inquiry path, the data in the business database in the designated path database are synchronized to the data bin correspondingly. The embodiment of the present invention is not particularly limited to this.
It should be noted that, in the embodiment of the present invention, the service database is limited to a relational database, and cannot be other types of databases. The relational database includes an SQL database, a maria db database, an Oracle database, and the like, and in the embodiment of the present invention, the service database is exemplified as the SQL database, but is not limited thereto.
S220, analyzing the query statement to obtain a source table and a source table filtering condition, wherein the source table comprises the target data, a source table of the target data and a field definition of the source table;
then, after obtaining the select query statement, the server analyzes the select query statement to obtain a source table and a source table filtering condition, the source table comprises target data, a source table of the target data and a field definition of the source table, the target data is obtained by searching the business database according to the select query statement, the source table refers to which table in the business database the target data comes from, and the field definition of the source table is used for subsequently finding the target data in the business database.
The target data is typically cleaned due to noise or interference from invalid data, and the source table filter criteria represent the conditions under which the target data is filtered or cleaned.
In specific implementation, the function of the step is realized through a synchronous task creating module, the synchronous task creating module analyzes a select query statement of SQL into a syntax tree, extracts two related elements of a source table and a filtering condition of the source table in a data synchronization task, and prepares for subsequently creating the synchronous task.
And S230, copying data to be synchronized to a preset target table in a data warehouse according to a source table of the target data, a field definition of the source table and the preset target table, wherein the data to be synchronized is obtained by screening the target data according to the source table filtering condition.
Due to the existence of noise or interference factors, after the source table and the filtering condition of the source table are obtained, the target data in the source table are cleaned through the filtering condition, and the cleaned data are the data to be synchronized.
For example, in synchronizing contact phone fields, if a piece of target data is: 138jk10004 × 567, the non-numeric character "jk" in the target data is cleaned, and the cleaned data to be synchronized "13810004567" is stored in the corresponding position in the ODS data table. Meanwhile, the cleaned data "jk" can be stored in a preset ODS cleaning table for subsequent calling and analysis.
And after the data to be synchronized is obtained, finding the data to be synchronized in the business database according to the field definitions of the source table and the source table, and copying the data to be synchronized into a preset table of the data warehouse.
The preset target table is located in the ODS layer of the data warehouse for storing the synchronous data, and thus, the name and structure of the preset target table can be set to be the same as those of the source table.
In a specific implementation process, the server may provide a configuration interface, and through the configuration interface, a user may configure various information required by the synchronization task, such as a preset target table.
In the embodiment of the invention, the synchronous task is created through the synchronous task creating module. The synchronous task creating module comprises an analyzer and a task generator, the analyzer analyzes the select query statement, and the task generator takes the source table, the source table filtering rule and the preset target table as elements of the synchronous task and helps a user to complete the setting of the synchronous task through page guidance. And the synchronous task creating module sends the generated synchronous task to the data integration module, and the data integration module converts the synchronous task into an executable file based on the synchronous task and executes the executable file to take effect, so that the data to be synchronized is copied to a preset target table in the data warehouse.
The embodiment of the invention provides a data synchronization method, when data synchronization is needed, target data needing to be synchronized is found in a business database through a select query statement, the position of the target data and a filtering condition are obtained through analyzing the select query statement, the target data are filtered through the filtering condition to obtain data to be synchronized, the position of the data to be synchronized in the business database can be found according to the position of the target data, the data to be synchronized are copied to a preset target table of a data warehouse, and the data synchronization function in the business database and the data warehouse is achieved.
In the embodiment of the invention, related workers can realize data synchronization only through select query statements without assistance of other technical engineers, so that the technical requirements on the workers are low, and the efficiency of a data preparation link in data analysis work is improved; and synchronization is carried out as required, so that the synchronous data volume can be reduced, the synchronization efficiency is improved, and meanwhile, the manpower investment and hardware investment for data warehouse construction can be reduced, thereby saving the cost.
In an embodiment, as shown in fig. 3, step S220 includes step S310 and step S320, which are as follows:
s310, analyzing the query statement into a syntax tree through an analysis tool;
as shown in fig. 4, when a select query statement is parsed, the select query statement is parsed into an abstract syntax code (AST), where the abstract syntax code represents a graphical structure of a sentence, represents a derivation result of the sentence, and is beneficial to understanding a hierarchical structure of the sentence grammar.
Specifically, commonly used analytical tools are antlr and aridriid.
S320, extracting the source table and the source table filtering rule based on the syntax tree.
As shown in fig. 5, step S320 includes step S510 and step S520, and specifically includes the following steps:
s510, extracting the source table based on a tableNAme subtree in the grammar tree;
s520, extracting the source table filtering rule based on the whereState subtree in the syntax tree.
And extracting a tableNAme subtree and a whereState subtree from the syntax tree, wherein other parts can be ignored, a table corresponding to the tableNAme subtree is called a source table, and the whereState subtree corresponds to a source table filtering condition.
In a specific implementation process, the embodiment of the present invention further groups the filtering conditions in the where table according to the source table, and then obtains each decomposed source table query statement according to the grouped obtained by sorting, where the embodiment of the present invention is a select name, and a version from my _ table where name is "xiaoming". And obtains metadata of the source table, the metadata including fields and a primary key.
In the prior art, a wizard-type created data synchronization of an online analytical processing (OLAP) model is also provided, and when modeling is performed through OLAP, related analysts are also required to have related knowledge of OLAP modeling, so that the operation is complex and the efficiency is low.
Compared with the prior art, the tableme subtree and the whereState subtree in the syntax tree are extracted through the syntax tree obtained through analysis and are respectively used as the source table and the source table filtering rule, so that the key elements for establishing the synchronization task are extracted, and only relevant useful information is extracted, so that the synchronization task can be obtained. The analysis process in the embodiment of the invention is completed by the existing analysis tool without human intervention, thereby avoiding the OLAP modeling process in the prior art, reducing the technical requirements on related workers, simplifying the operation of data synchronization and improving the efficiency.
In an embodiment, the copying the data to be synchronized to the preset target table in the data warehouse according to the source table of the target data, the field definition of the source table, and the preset target table, and then further includes:
and repeatedly executing the steps according to the preset synchronous frequency.
By setting the preset synchronization frequency, the change of the data in the business database is monitored, and the enterprise management system can continuously generate new business data in the operation process, and the generated business data can be recorded in the business database, so that the content in the business database can also be changed.
In view of such a situation, the embodiment of the present invention monitors the change of the data in the service database through the preset synchronization frequency, where the preset synchronization frequency represents the interval time for repeatedly executing the scheme, so as to ensure that the data synchronized into the data warehouse is real-time. Common synchronization update strategies are: the steps S210 to S230 are performed once every 5 minutes, the steps S210 to S230 are performed once every week, the steps S210 to S230 are performed once every month, and the like.
In the embodiment of the invention, the data in the service database and the data warehouse are ensured to be synchronous by presetting the synchronization frequency so as to ensure that the data in the data warehouse is up-to-date and complete, thereby laying a good foundation for the subsequent data analysis work.
In an embodiment, as shown in fig. 6, after the copying the data to be synchronized to the preset target table in the data warehouse according to the source table of the target data, the field definition of the source table, and the preset target table, that is, after step S230, step S240 and step S250 are further included, as follows:
s240, identifying sensitive data in the service database according to a preset synchronization increment identifier, wherein the sensitive data is data to be synchronized, which changes in the service database, the preset synchronization increment identifier comprises a timestamp and a synchronization rule, and the synchronization rule comprises any one of newly-added synchronization, modified synchronization and deleted synchronization;
and S250, updating the sensitive data to the data warehouse.
If the data in the source table changes after the data to be synchronized is copied to the data warehouse, the corresponding data in the preset target table needs to be changed together when the data in the source table changes, such as newly adding data, modifying data, deleting data and the like in the source table.
Generally, the preset synchronization increment identifier is composed of a timestamp and a synchronization rule, and the synchronization rule may be any one of new synchronization, modified synchronization and deleted synchronization.
In the specific implementation process, the task generator in the synchronization task creation module also uses the preset synchronization frequency and the preset synchronization increment identification as elements of the synchronization task, helps a user to complete the setting of the synchronization task through page guidance, after the setting of the elements of the synchronization task is completed, a complete synchronization task can be generated and sent to the data integration module, and the data integration module converts the synchronization task into an executable file and executes the executable file to take effect, so that the data synchronization function of the relational database and the data warehouse is realized.
In the embodiment of the invention, the synchronous data in the data warehouse is ensured to be updated in real time by presetting the synchronous increment identification, and the change in the service database is synchronously updated to the data warehouse, so that the accuracy of the data in the data warehouse is ensured, and the accuracy of the subsequent data analysis work is improved.
In an embodiment, as shown in fig. 7, the searching the service database through the query statement to obtain the target data may be regarded as being completed by step S710, step S720, and step S730, which is specifically as follows:
s710, acquiring a query statement input by a user;
firstly, if data synchronization is needed, a user inputs a select query statement to a business database, and a service end acquires the select query statement.
In the embodiment of the present invention, a selection name, a version from my _ table where name is "xiaoming" is taken as an example for description.
S720, acquiring an accessible database in the service database according to the user authority;
specifically, the service database may include a plurality of databases, such as database 1, database 2, … …, database N, and the like. In order to ensure the safety performance of the application, users are graded through the authority, accessible data are directly related to the authority of the users in the application, when the authority of the users is higher, the accessible databases are relatively more, and when the authority of the users is lower, the accessible databases are relatively less.
And searching in the accessible database through the select query statement to acquire the target data.
In a specific implementation process, the embodiment of the invention records information such as a user account, a password, corresponding permissions and the like through a data source management module, and obtains the permissions corresponding to the user by searching the account of the user in the data source management module.
And S730, searching in the accessible database through the query statement to obtain corresponding target data.
In the embodiment of the invention, the connection information between the service database and the data warehouse is managed through the data source management module, the connection information comprises IP, ports, database account passwords, the corresponding relation between the database accounts and real users, and the corresponding relation is the user authority. Before a user inputs a select query statement, the user needs to log in a server through an account password, the data source management module can determine the authority of the user according to the account password, further determine an accessible database of the user in a business database, and search the select query statement in the accessible database to obtain target data meeting conditions.
In the embodiment of the invention, the user permission is graded, and the number of accessible databases is determined according to different grades, so that the scheme can better meet the actual requirement, and the safety of the relational database in the execution process of the method is also ensured.
In an embodiment, as shown in fig. 8, the step S230 specifically includes a step S810, a step S820, and a step S830, where the step S230 copies data to be synchronized to the preset target table in the data warehouse according to the source table of the target data, the field definition of the source table, and the preset target table, as follows:
s810, establishing the preset target table according to an operation instruction input by a user on a visual graphical interface, wherein the visual graphical interface is used for guiding the user to establish the preset target table in the data warehouse, and the preset target table and the source table have the same structure;
specifically, in the embodiment of the present invention, a user is guided through an interface to define creation and naming of a preset target table in a source database, where the preset target table corresponds to a source table one to one.
S820, establishing a synchronization task according to a source table of the target data, a field definition of the source table and the preset target table;
specifically, a configuration file is constructed based on a query statement of each source table, metadata of each source table, and preset target tables corresponding to the source tables one to one, and the configuration file is converted into a synchronization task of the data integration module.
In the embodiment of the present invention, the data integration module is specifically a key.
S830, the synchronization task is executed through ETL, and the data to be synchronized is copied to the preset target table in the data warehouse.
The configuration file is converted into ktr (conversion) and kjb (scheduling) configuration files, and the configuration files are enabled in the data integration module through an interface function provided by a key, so that the data to be synchronized is copied to a preset target.
In a preferred embodiment, as shown in fig. 9, when the data synchronization method is implemented, the specific steps are as follows:
(1) and the synchronous task creating module acquires a select query statement input by a user, searches in the data source management module according to the authority of the user and finds an accessible database of the authority in the service data. The data source management module is used for managing information of a service database and information of a data warehouse, wherein the information of the service database comprises a database type, and the information of the data warehouse comprises the database type, an IP (Internet protocol), a port, an account and a password.
(2) And searching in the accessible database through the select query statement to obtain target data.
(3) The AST parser parses the select query statement into a syntax tree by the antlr parsing tool.
(4) Extracting a tableName sub-tree from the syntax tree as a source table, extracting a whereState sub-tree from the syntax tree as a source table filtering rule. The other parts may be omitted.
(5) The conditions in the whereState are grouped according to the source table, and the structure is exemplified as follows:
Figure BDA0003093943740000101
Figure BDA0003093943740000111
(6) grouping sorted in the step 5 to obtain each decomposed source table query statement, wherein the embodiment of the invention comprises the following steps: select from my table where name is 'xiaoming'.
(7) And acquiring metadata, fields and primary keys of the source table.
(8) And guiding a user to establish a preset target table in the source pasting database through a visual graphical interface, wherein the preset target table and the source table have the same structure.
(9) And continuously guiding the user to define the preset synchronization frequency and the preset synchronization increment identification through another visual graphical interface.
(10) And the task generator takes the source table, the source table filtering condition, the preset target table, the preset synchronization frequency and the preset synchronization increment identification configured in the steps 6 to 9 as synchronization task elements, converts the synchronization task elements into synchronization tasks through an interface of a data integration module KETTLE, specifically converts the synchronization tasks into ktr (conversion) and kjb (scheduling) configuration files, and takes effect in the integration module.
(11) When the data integration module executes a synchronization task, the database table can be accessed only by acquiring information such as the type, IP, port, account number, password and the like of the database, so that the data synchronization function of the relational database and the data warehouse is realized.
To sum up, an embodiment of the present invention provides a data synchronization method, where when data synchronization is needed, target data to be synchronized is found in a business database through a select query statement, a location where the target data is located and a filtering condition are obtained by analyzing the select query statement, the target data is filtered through the filtering condition to obtain data to be synchronized, the location where the data to be synchronized is located in the business database can be found according to the location where the target data is located, and the data to be synchronized is copied to a preset target table in a data warehouse, so as to implement a data synchronization function in the business database and the data warehouse.
In the embodiment of the invention, related workers can realize data synchronization only through select query statements without assistance of other technical engineers, so that the technical requirements on the workers are low, and the efficiency of a data preparation link in data analysis work is improved; and synchronization is carried out as required, so that the synchronous data volume can be reduced, the synchronization efficiency is improved, and meanwhile, the manpower investment and hardware investment for data warehouse construction can be reduced, thereby saving the cost.
And the data in the service database and the data warehouse are ensured to be synchronous through presetting the synchronization frequency so as to ensure that the data in the data warehouse is up-to-date and complete, thereby laying a good foundation for the subsequent data analysis work.
And the real-time update of the synchronous data in the data warehouse is ensured by presetting the synchronous increment identification, and the change in the service database is synchronously updated to the data warehouse, so that the accuracy of the data in the data warehouse is ensured, and the accuracy of the subsequent data analysis work is improved.
And the user authority is graded, and the number of accessible databases is determined according to different grades, so that the scheme can better meet the actual requirement, and the safety of the relational database in the execution process of the method is also ensured.
In an embodiment, a data synchronization system is provided, and the data synchronization system corresponds to the data synchronization method in the above embodiments one to one. As shown in fig. 10, the data synchronization system includes an acquisition module 10, a parsing module 20, and a synchronization module 30. The functional modules are explained in detail as follows:
the obtaining module 10 is configured to search a service database through a query statement to obtain target data, where the service database is a relational database;
the parsing module 20 is configured to parse the query statement to obtain a source table and a source table filter condition, where the source table includes the target data, a source table of the target data, and a field definition of the source table;
the synchronization module 30 is configured to copy, according to a source table of the target data, a field definition of the source table, and a preset target table, data to be synchronized to the preset target table in the data warehouse, where the data to be synchronized is obtained by screening the target data according to the source table filtering condition.
For specific limitations of the data synchronization system, reference may be made to the above limitations of the data synchronization method, which are not described herein again. The various modules in the data synchronization system described above may be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 11. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a computer storage medium and an internal memory. The computer storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the computer storage media. The database of the computer device is used for storing data generated or obtained during execution of the data synchronization method, such as select query statements. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a data synchronization method.
In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the steps of the data synchronization method in the above-described embodiments are implemented, for example, steps S210-S230 shown in fig. 2 or steps shown in fig. 3 to 9. Alternatively, when the processor executes the computer program, the functions of the modules/units in the embodiment of the data synchronization system, for example, the functions of the modules/units shown in fig. 10, are not described here again to avoid repetition.
In an embodiment, a computer storage medium is provided, where a computer program is stored on the computer storage medium, and when executed by a processor, the computer program implements the steps of the data synchronization method in the foregoing embodiments, such as steps S210 to S230 shown in fig. 2 or steps shown in fig. 3 to fig. 9, which are not repeated herein for avoiding repetition. Alternatively, the computer program, when executed by the processor, implements the functions of each module/unit in the embodiment of the data synchronization system, for example, the functions of each module/unit shown in fig. 10, and are not described herein again to avoid repetition.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. A method of data synchronization, comprising:
searching a business database through an inquiry statement to obtain target data, wherein the business database is a relational database;
analyzing the query statement to obtain a source table and a source table filtering condition, wherein the source table comprises the target data, a source table of the target data and a field definition of the source table;
copying data to be synchronized to a preset target table in a data warehouse according to a source table of the target data, field definitions of the source table and the preset target table, wherein the data to be synchronized is obtained by screening the target data according to the source table filtering conditions.
2. The data synchronization method of claim 1, wherein the parsing the query statement to obtain a source table and a source table filter condition comprises:
parsing the query statement into a syntax tree by a parsing tool;
extracting the source table and the source table filtering rules based on the syntax tree.
3. The data synchronization method of claim 2, wherein said extracting the source table and the source table filtering rules based on the syntax tree comprises:
extracting the source table based on a tableName sub-tree in the syntax tree;
and extracting the source table filtering rule based on a whereState subtree in the syntax tree.
4. The data synchronization method of claim 1, wherein the copying the data to be synchronized to the data warehouse at the preset target table according to the source table of the target data, the field definition of the source table and the preset target table further comprises:
and repeatedly executing the steps according to the preset synchronous frequency.
5. The data synchronization method of claim 1, wherein after copying the data to be synchronized to the preset target table in the data warehouse according to the source table of the target data, the field definition of the source table and the preset target table, further comprising:
identifying sensitive data in the service database according to a preset synchronization increment identifier, wherein the sensitive data is data to be synchronized which changes in the service database, the preset synchronization increment identifier comprises a timestamp and a synchronization rule, and the synchronization rule comprises any one of newly-added synchronization, modified synchronization and deleted synchronization;
updating the sensitive data into the data warehouse.
6. The data synchronization method according to any one of claims 1 to 5, wherein the target data is obtained by searching the service database through the query statement, and by the following steps:
acquiring a query statement input by a user;
acquiring an accessible database in the service database according to the user authority;
and searching in the accessible database through the query statement to acquire the target data.
7. The data synchronization method according to any one of claims 1 to 5, wherein the copying the data to be synchronized to the preset target table in the data warehouse according to the source table of the target data, the field definition of the source table and the preset target table comprises:
establishing the preset target table according to an operation instruction input by a user on a visual graphical interface, wherein the visual graphical interface is used for guiding the user to establish the preset target table in the data warehouse, and the preset target table and the source table have the same structure;
establishing a synchronous task according to a source table of the target data, a field definition of the source table and the preset target table;
and executing the synchronization task through ETL, and copying the data to be synchronized to the preset target table in the data warehouse.
8. A data synchronization system, comprising:
the system comprises an acquisition module, a query module and a processing module, wherein the acquisition module is used for searching a business database through a query statement to acquire target data, and the business database is a relational database;
the analysis module is used for analyzing the query statement to acquire a source table and a source table filtering condition, wherein the source table comprises the target data, a source table of the target data and a field definition of the source table;
and the synchronization module is used for copying the data to be synchronized to the preset target table in the data warehouse according to the source table of the target data, the field definition of the source table and the preset target table, wherein the data to be synchronized is obtained by screening the target data according to the source table filtering condition.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the data synchronization method according to any one of claims 1 to 7 when executing the computer program.
10. A computer storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the data synchronization method according to any one of claims 1 to 7.
CN202110605474.9A 2021-05-31 2021-05-31 Data synchronization method and system Pending CN113282599A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110605474.9A CN113282599A (en) 2021-05-31 2021-05-31 Data synchronization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110605474.9A CN113282599A (en) 2021-05-31 2021-05-31 Data synchronization method and system

Publications (1)

Publication Number Publication Date
CN113282599A true CN113282599A (en) 2021-08-20

Family

ID=77282884

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110605474.9A Pending CN113282599A (en) 2021-05-31 2021-05-31 Data synchronization method and system

Country Status (1)

Country Link
CN (1) CN113282599A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821565A (en) * 2021-09-10 2021-12-21 上海得帆信息技术有限公司 Method for synchronizing data of multiple data sources
CN114780641A (en) * 2022-05-07 2022-07-22 湖南长银五八消费金融股份有限公司 Multi-library multi-table synchronization method and device, computer equipment and storage medium
CN116521636A (en) * 2023-05-16 2023-08-01 三峡科技有限责任公司 Automatic synchronization method and system for operation data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160171051A1 (en) * 2014-12-15 2016-06-16 National Tsing Hua University Synchronization system for transforming database and method thereof
CN106682002A (en) * 2015-11-05 2017-05-17 中兴通讯股份有限公司 Database synchronization method and system, source data and target data synchronization device
CN109669983A (en) * 2018-12-27 2019-04-23 杭州火树科技有限公司 Visualize multi-data source ETL tool
CN111666326A (en) * 2020-05-29 2020-09-15 中国工商银行股份有限公司 ETL scheduling method and device
CN112579610A (en) * 2020-12-23 2021-03-30 安徽航天信息有限公司 Multi-data source structure analysis method, system, terminal device and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160171051A1 (en) * 2014-12-15 2016-06-16 National Tsing Hua University Synchronization system for transforming database and method thereof
CN106682002A (en) * 2015-11-05 2017-05-17 中兴通讯股份有限公司 Database synchronization method and system, source data and target data synchronization device
CN109669983A (en) * 2018-12-27 2019-04-23 杭州火树科技有限公司 Visualize multi-data source ETL tool
CN111666326A (en) * 2020-05-29 2020-09-15 中国工商银行股份有限公司 ETL scheduling method and device
CN112579610A (en) * 2020-12-23 2021-03-30 安徽航天信息有限公司 Multi-data source structure analysis method, system, terminal device and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113821565A (en) * 2021-09-10 2021-12-21 上海得帆信息技术有限公司 Method for synchronizing data of multiple data sources
CN113821565B (en) * 2021-09-10 2024-03-15 上海得帆信息技术有限公司 Method for synchronizing data by multiple data sources
CN114780641A (en) * 2022-05-07 2022-07-22 湖南长银五八消费金融股份有限公司 Multi-library multi-table synchronization method and device, computer equipment and storage medium
CN114780641B (en) * 2022-05-07 2023-07-14 湖南长银五八消费金融股份有限公司 Multi-library multi-table synchronization method, device, computer equipment and storage medium
CN116521636A (en) * 2023-05-16 2023-08-01 三峡科技有限责任公司 Automatic synchronization method and system for operation data
CN116521636B (en) * 2023-05-16 2023-11-28 三峡科技有限责任公司 Automatic synchronization method and system for operation data

Similar Documents

Publication Publication Date Title
CN107819824B (en) Urban data opening and information service system and service method
CN110908997B (en) Data blood relationship construction method and device, server and readable storage medium
CN113282599A (en) Data synchronization method and system
US11941034B2 (en) Conversational database analysis
CN105912594B (en) SQL statement processing method and system
CN109815254B (en) Cross-region task scheduling method and system based on big data
CN114116716A (en) Hierarchical data retrieval method, device and equipment
CN115374102A (en) Data processing method and system
CN109213826A (en) Data processing method and equipment
US20200334314A1 (en) Emergency disposal support system
CN107491463B (en) Optimization method and system for data query
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN115757689A (en) Information query system, method and equipment
CN115858513A (en) Data governance method, data governance device, computer equipment and storage medium
Taleghani Executive information systems development lifecycle
CN110737432A (en) script aided design method and device based on root list
CN108549714B (en) Data processing method and device
CN114443015A (en) Method for generating adding, deleting, modifying and checking service interface based on database metadata
CN110928963B (en) Column-level authority knowledge graph construction method for operation and maintenance service data table
US20230252022A1 (en) Secure And Efficient Database Command Execution Support
Suganya et al. Efficient fragmentation and allocation in distributed databases
US10003492B2 (en) Systems and methods for managing data related to network elements from multiple sources
CN115168474B (en) Internet of things central station system building method based on big data model
CN114861229B (en) Hive dynamic desensitization method and system
US11803543B2 (en) Lossless switching between search grammars

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination