CN116186165A - Data copying method, device, system and storage medium - Google Patents

Data copying method, device, system and storage medium Download PDF

Info

Publication number
CN116186165A
CN116186165A CN202310216017.XA CN202310216017A CN116186165A CN 116186165 A CN116186165 A CN 116186165A CN 202310216017 A CN202310216017 A CN 202310216017A CN 116186165 A CN116186165 A CN 116186165A
Authority
CN
China
Prior art keywords
source
target
data
cluster end
node
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310216017.XA
Other languages
Chinese (zh)
Inventor
蔡雅琼
吕文栋
陈晓新
邓宇
陈冰涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202310216017.XA priority Critical patent/CN116186165A/en
Publication of CN116186165A publication Critical patent/CN116186165A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a data copying method, device, system and storage medium, and belongs to the technical field of big data and data processing. Wherein the method comprises the following steps: generating replication configuration parameters and node opening instructions, determining source table data of source data, determining source table metadata, establishing a data transmission channel between each node of a source cluster end and each node of a target cluster end, determining a replication mapping relation between the nodes of the source cluster end and the nodes of the target cluster end, splitting the source table data, generating target table metadata according to the source table metadata, and generating initial table data according to the target table metadata; and sending the split source table data from the nodes of the source cluster end to the nodes of the target cluster end through the data transmission channel, so that the target cluster end imports the source table data into the initial table data according to the target table metadata, and generates the target table data. The method and the device can improve the data copying efficiency.

Description

Data copying method, device, system and storage medium
Technical Field
The embodiment of the application relates to the technical field of big data and data processing, in particular to a data copying method, a device, a system and a storage medium.
Background
With the increasing data volume of enterprise-level big data, MPP (Massively Parallel Processing ) database clusters are widely used in cloud data warehouses. In order to meet the needs of batch queries of users and to achieve high availability of primary and backup between database clusters, replication of data between clusters is often required.
At present, in the prior art, when data among clusters is copied, a source data end analyzes a configuration file input by a worker at the source cluster end to obtain an analysis result containing screening conditions, then a database query instruction is utilized to read target data from a database at the source cluster end according to the analysis result of the screening conditions, and then the target data is copied into a target database at the target cluster end.
However, the inventors found that the prior art has at least the following technical problems: when the database table full-scale replication of a large-scale database or the data replication or migration among the whole database levels is required, the problem of low replication efficiency still exists.
Disclosure of Invention
The application provides a data replication method, a device, a system and a storage medium, which can improve the data replication efficiency when the database table full replication of a large-scale database or the data replication or migration among the whole database levels are carried out among clusters.
In a first aspect, the present application provides a data replication method, including:
responding to the data replication request, and generating replication configuration parameters and node opening instructions;
determining source table data of source data according to the replication configuration parameter information; determining source table metadata according to a source database system table;
according to the node opening instruction, a data transmission channel between each node of the source cluster end and each node of the target cluster end is established;
receiving a target database system table sent by the target cluster end;
determining a copy mapping relation between the node of the source cluster end and the node of the target cluster end according to the target database system table and the source database system table;
splitting the source table data according to the copy mapping relation;
the source table metadata is sent to the target cluster end, so that the target cluster end generates target table metadata according to the source table metadata and generates initial table data according to the target table metadata;
and sending the split source table data from the nodes of the source cluster end to the nodes of the target cluster end through the data transmission channel, so that the target cluster end imports the source table data into initial table data according to the target table metadata, and generates target table data.
In one possible implementation manner, the establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the node opening instruction includes: opening each node of the source cluster end according to the node opening instruction; the node opening instruction is sent to the target cluster end, so that the target cluster end opens each node corresponding to all nodes of the source cluster end according to the node opening instruction; determining the super user white list information of the database according to the node opening instruction; the database super user white list information is sent to the target cluster end, so that the target cluster end determines a network access corresponding relation according to the database super user white list information; and receiving the network access corresponding relation sent by the target cluster end, and establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the network access corresponding relation.
In one possible implementation, the source table metadata includes metadata database schema definition language DDL information; correspondingly, the sending the source table metadata to the target cluster end, so that the target cluster end generates target table metadata according to the source table metadata, and generates initial table data according to the target table metadata, including: and sending the metadata DDL information to the target cluster end so that the target cluster end executes a pre-stored synchronous execution tool according to the database super user white list information, synchronizes the source table metadata from the source cluster end to the target cluster end, generates target table metadata according to the metadata DDL information, and generates initial table data according to the target table metadata.
In one possible implementation manner, the determining, according to the target database system table and the source database system table, a replication mapping relationship between a node of the source cluster end and a node of the target cluster end includes: determining the internet protocol information of each node of the source cluster end according to the source database system table; determining the internet protocol information of each node of the target cluster end according to the target database system table; and determining a copy mapping relation between each node of the source cluster end and each node of the target cluster end according to the internet protocol information of each node of the source cluster end and the internet protocol information of each node of the target cluster end.
In one possible implementation manner, after the responding to the data replication request, the method further includes: and generating statistical information of the source data.
In one possible implementation, the statistical information of the source data includes a source table record number; correspondingly, the method includes the steps that the split source table data is sent from a plurality of nodes of the source cluster end to a plurality of nodes of the target cluster end through the data transmission channel, so that the target cluster end imports the source table data into initial table data according to the target table metadata, and after generating target table data, the method further includes the steps of: receiving a target record number sent by the target cluster end, wherein the target record number is generated when the target cluster end imports source table data into initial table data; performing data comparison processing on the target record number and the source table record number to generate a record number comparison result; if the record number comparison result is judged to meet the preset error reporting condition, returning to the step of responding to the data replication request to generate replication configuration parameters and node opening instructions; and if the record number comparison result is judged to meet the preset success condition, returning a data copying success result.
In one possible implementation, the statistical information of the source data includes a source table record number; correspondingly, the sending the source table metadata to the target cluster end, so that the target cluster end generates target table metadata according to the source table metadata, and after generating initial table data according to the target table metadata, the method further includes: and if the number of the source table records meets the preset threshold condition, sending the source table data from one of the nodes of the source cluster end to the target cluster end through the data transmission channel according to the replication mapping relation, so that the target cluster end imports the source table data into initial table data according to the target table metadata to generate target table data.
In one possible implementation manner, the sending the source table metadata to the target cluster end, so that the target cluster end generates target table metadata according to the source table metadata, and after generating initial table data according to the target table metadata, further includes: and sending a same-table query instruction to the target cluster end, so that the target cluster end determines a target end data table with the same source table metadata according to the same-table query instruction, and deletes the target end data table.
In one possible implementation manner, before splitting the source table data according to the replication mapping relationship, the method further includes: and if the number of the nodes of the source cluster end is not equal to the number of the nodes of the target cluster end, carrying out redistribution processing on the source table data to generate redistributed source table data.
In one possible implementation manner, the sending the split source table data from the plurality of nodes of the source cluster end to the plurality of nodes of the target cluster end through the data transmission channel, so that the target cluster end imports the source table data into the initial table data according to the target table metadata, and after generating the target table data, the method further includes: and sending a statistical information collection instruction to the target cluster end so that the target cluster end can collect the statistical information of the target table data according to the statistical information collection instruction.
In a second aspect, the present application further provides a data replication method, applied to a target cluster end, including:
receiving a node opening instruction sent by a source cluster end, wherein the node opening instruction is generated by the source cluster end in response to a data copying request, and the copying configuration parameter and the node opening instruction are generated; the replication configuration parameters are used for indicating the source cluster end to determine source table data of source data according to the replication configuration parameter information; determining source table metadata according to the source database system table;
According to the node opening instruction, a data transmission channel between each node of the source cluster end and each node of the target cluster end is established;
transmitting a target database system table to the source cluster end, so that the source cluster end determines a copy mapping relation between a node of the source cluster end and a node of the target cluster end according to the target database system table and the source database system table, and splits the source table data according to the copy mapping relation;
receiving source table metadata sent by the source cluster end, generating target table metadata according to the source table metadata, and generating initial table data according to the target table metadata;
receiving split source table data sent by the source cluster end through the data transmission channel, wherein the split source table data is sent from a plurality of nodes of the source cluster end to a plurality of nodes of the target cluster end;
and importing the source table data into the initial table data according to the target table metadata to generate target table data.
In one possible implementation manner, the establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the node opening instruction includes: receiving a node opening instruction sent by the source cluster end, wherein the node opening instruction is used for the source cluster end to open each node of the source cluster end and determine the white list information of the database super user; opening each node corresponding to all nodes of the source cluster terminal according to the node opening instruction; receiving the database super user white list information sent by the source cluster end, and determining a network access corresponding relation according to the database super user white list information; and sending the network access relation to the source cluster end so that the source cluster end establishes a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the network access corresponding relation.
In one possible implementation, the source table metadata includes metadata database schema definition language DDL information; correspondingly, the receiving the source table metadata sent by the source cluster end, generating target table metadata according to the source table metadata, and generating initial table data according to the target table metadata, including: receiving metadata DDL information sent by the source cluster end, executing a pre-stored synchronization execution tool according to the database super user white list information, and synchronizing the source table metadata from the source cluster end to the target cluster end; generating target table metadata according to the metadata DDL information, and generating initial table data according to the target table metadata.
In a possible implementation manner, the source cluster end responds to a data replication request and also generates statistical information of the source data, wherein the statistical information of the source data comprises a source table record number; correspondingly, the method for generating the target table data according to the target table metadata comprises the steps of: generating a target record number when the source table data is imported into the initial table data; the target record number is sent to the source cluster end, so that the source cluster end performs data comparison processing on the target record number and the source table record number to generate a record number comparison result, if the record number comparison result meets the preset error reporting condition, the step of responding to the data replication request to generate replication configuration parameters and node opening instructions is returned, and if the record number comparison result meets the preset success condition, the step of returning to the data replication success result is performed.
In one possible implementation manner, after the source table data is imported into the initial table data according to the target table metadata and the target table data is generated, the method further includes: when the source cluster end judges that the record number of the source table meets a preset threshold condition, receiving source table data sent by one node of the source cluster end through the data transmission channel; and importing the source table data into the initial table data according to the target table metadata to generate target table data.
In one possible implementation manner, after generating target table metadata according to the source table metadata and generating initial table data according to the target table metadata, the method further includes: receiving a same-table query instruction sent by the source cluster end; and determining a target end data table with the same source table metadata according to the same table query instruction, and deleting the target end data table.
In one possible implementation manner, after the source table data is imported into the initial table data according to the target table metadata and the target table data is generated, the method further includes: and receiving a statistical information instruction sent by the source cluster end, and collecting statistical information of the target table data according to the statistical information instruction.
In a third aspect, the present application provides a data replication apparatus, applied to a source cluster end, including:
the request response module is used for responding to the data replication request and generating replication configuration parameters and node opening instructions;
the source data processing module is used for determining source table data of source data according to the replication configuration parameter information; determining source table metadata according to the source database system table;
the first transmission channel establishing module is used for establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the node opening instruction;
the first receiving module is used for receiving a target database system table sent by the target cluster end;
the replication mapping relation determining module is used for determining the replication mapping relation between the nodes of the source cluster end and the nodes of the target cluster end according to the target database system table and the source database system table;
the data splitting module is used for splitting the source table data according to the replication mapping relation;
the first sending module is used for sending the source table metadata to the target cluster end so that the target cluster end generates target table metadata according to the source table metadata and generates initial table data according to the target table metadata;
The first sending module is further configured to send the split source table data from the plurality of nodes of the source cluster end to the plurality of nodes of the target cluster end through the data transmission channel, so that the target cluster end imports the source table data into initial table data according to the target table metadata, and generates target table data.
In a fourth aspect, the present application further provides a data replication device, applied to a target cluster end, including:
the second receiving module is used for receiving a node opening instruction sent by a source cluster end, wherein the node opening instruction is generated by the source cluster end in response to a data replication request, and replication configuration parameters and the node opening instruction are generated; the replication configuration parameters are used for indicating the source cluster end to determine source table data of source data according to the replication configuration parameter information; determining source table metadata according to the source database system table;
the second transmission channel establishing module is used for establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the node opening instruction;
the second sending module is used for sending a target database system table to the source cluster end so that the source cluster end determines a copy mapping relation between a node of the source cluster end and a node of the target cluster end according to the target database system table and the source database system table, and splits the source table data according to the copy mapping relation;
The second receiving module is further configured to receive source table metadata sent by the source cluster end, generate target table metadata according to the source table metadata, and generate initial table data according to the target table metadata;
the second receiving module is further configured to receive split source table data sent by the source cluster end through the data transmission channel, where the split source table data is sent from a plurality of nodes of the source cluster end to a plurality of nodes of the target cluster end;
and the data copying module is used for importing the source table data into the initial table data according to the target table metadata to generate target table data.
In a fifth aspect, the present application provides a server system, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executes computer-executable instructions stored by the memory such that the at least one processor performs the data replication method as described in the first or second aspect.
In a sixth aspect, the present application provides a computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement a data replication method as described in the first or second aspect.
In a seventh aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements a data replication method as described in the first or second aspect.
According to the data replication method, device, system and storage medium, replication configuration parameters and node opening instructions are generated by responding to data replication requests, source table data of source data are determined according to replication configuration parameter information, and source table metadata are determined according to a source database system table. Then, the source table metadata is sent to the target cluster end, so that the target cluster end generates target table metadata according to the source table metadata and generates initial table data according to the target table metadata, and the source table metadata is copied. After receiving the target database system table, determining a replication mapping relation between a node of a source cluster end and a node of the target cluster end according to the target database system table and the source database system table, and splitting source table data according to the replication mapping relation. And finally, the split source table data is sent to a plurality of nodes of the target cluster end from the plurality of nodes of the source cluster end through a data transmission channel, so that the target cluster end imports the source table data into the initial table data according to the target table metadata to generate target table data, and the replication efficiency of the source data between the source cluster end and the target cluster end is improved by utilizing the data processing capacity and the communication bandwidth capacity of the plurality of nodes between clusters when the nodes work simultaneously.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the prior art descriptions, it being obvious that the drawings in the following description are some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
Fig. 1 is an application scenario schematic diagram of a data replication method provided in an embodiment of the present application;
FIG. 2 is a flow chart of a data replication method according to an embodiment of the present disclosure;
FIG. 3 is a flowchart illustrating a data replication method according to another embodiment of the present disclosure;
fig. 4 is an interactive flow diagram of a data replication method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a data replication device according to an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of another data replication device according to an embodiment of the present disclosure;
fig. 7 is a schematic hardware structure of a server system according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
Currently, with increasing data size of enterprise-level big data, MPP (Massively Parallel Processing ) database clusters are widely used in cloud data warehouse. In order to meet the requirements of user running batch inquiry and realize high availability of the main and standby of the database clusters, the data among the clusters often need to be copied. In the prior art, when data among clusters are copied, a source data end analyzes a configuration file input by a worker at the source cluster end to obtain an analysis result containing screening conditions, then a database query instruction is utilized to read target data from a database at the source cluster end according to the analysis result of the screening conditions, and then the target data is copied into a target database at the target cluster end. The inventor finds that when the full-quantity copying of the library table of the large-scale database or the data copying or migration among the whole library levels is required, the method for only screening the source data meeting the screening conditions and only unloading the data from the cluster master node for data copying still has the problem of low copying efficiency.
In order to solve the above technical problems, the embodiments of the present application provide the following technical ideas for solving the problems: the method comprises the steps of dividing source data to be copied into two parts of source table data and source table metadata, and copying the source table metadata from a source cluster end to a target cluster end by utilizing nodes of the source cluster end and target cluster end nodes. And then the target cluster terminal generates target table metadata according to the source table metadata, and constructs initial table data with the same basic information as the table structure of the source table data, the user authority and the like. The source table data utilizes the data processing capacity and the bandwidth communication capacity of a plurality of nodes of the source cluster end and a plurality of nodes of the target cluster end to improve the transmission efficiency of the source table metadata transmitted from the source cluster end to the target cluster end, and further improve the replication efficiency of the source table data.
Referring to fig. 1, fig. 1 is an application scenario schematic diagram of a data replication method provided in an embodiment of the present application, where, as shown in fig. 1, the method includes: a source cluster side 101 and a target cluster side 102. The source cluster end 101 and the target cluster end 102 may each be a plurality of nodes, where one of the plurality of nodes includes a master node and a plurality of data nodes, and the master node and the data nodes may be computers or servers. When there is a data replication requirement, a worker can perform parameter configuration and installation of a data replication tool at any node in the source cluster end 101 or the target cluster end 102. For example, when one node in the source cluster end 101 acquires a data replication request triggered by a user, the source cluster end 101 instructs the source cluster end 101 to replicate source data to be replicated to the target cluster end 102 in response to the data replication request.
In the application scenario shown in fig. 1, the source cluster end 101 may be a GP (distributed database) cluster or an MPP (massively parallel processing ) cluster, and the target cluster end 102 may also be a GP cluster or an MPP cluster.
Fig. 2 is a schematic flow chart of a data replication method according to an embodiment of the present application, where an execution body of the embodiment may be any node in the source cluster end 101 in the embodiment shown in fig. 1, and the node may be a server or other relevant devices of a computer, and the embodiment is not limited herein in particular. As shown in fig. 2, the data replication method includes:
S201: and generating a replication configuration parameter and a node opening instruction in response to the data replication request.
In this embodiment, the data replication request may be triggered by any node in the source cluster, and the replication configuration parameters may include parameters required for replicating data, such as a table list to be replicated, related information of the target cluster, and a table mapping relationship. The node opening instruction is an instruction for indicating the source cluster end and the target cluster end to open the corresponding nodes.
S202: determining source table data of source data according to the replication configuration parameter information; source table metadata is determined from the source database system table.
In this embodiment, the source data may be an entire database, and the replication of the database may include replication of both table data and table metadata. The source table data comprises a source database table and data to be exported and copied to a target cluster side in the table. The system table of the source database is a table automatically generated by the system when the source cluster end stores data, and is used for storing system information, corresponding data storage positions and other information in the database. The source table metadata may include statistical information of the source table data, table structure definition information, table usage rights, view dependency relationships, and the like. For example: inquiring the source data according to the list to be copied in the copy configuration parameter information, determining the source list data,
S203: and establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the node opening instruction.
In this embodiment, the node activation instruction is used to activate a plurality of nodes at the source cluster end and activate a plurality of nodes at the target cluster end, so that data transmission can be performed before each node at the source cluster end and each node at the target cluster end. A data transmission path refers to a path between two nodes capable of transmitting signals.
In an optional embodiment of the present application, step S203 includes:
s203a: opening each node of the source cluster end according to the node opening instruction; transmitting a node opening instruction to a target cluster end so that the target cluster end opens each node corresponding to all nodes of the source cluster end according to the node opening instruction; and determining the white list information of the database super user according to the node opening instruction.
S203b: and sending the database super-user white list information to the target cluster end so that the target cluster end determines the corresponding relation of network access according to the database super-user white list information.
S203c: and receiving the network access corresponding relation sent by the target cluster end, and establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the network access corresponding relation.
In this embodiment, the open node refers to an access right of the open node server. The database super user white list information refers to user authority information, and after the database super user white list is opened, the users at the source cluster end can be allowed to log in the target cluster end without being close. The network access correspondence refers to the information of the connection or access path between two corresponding nodes, for example, the network correspondence between a web browser and a web site is the web address of the web site.
S204: and receiving a target database system table sent by the target cluster terminal.
In this embodiment, the target database system table is a table automatically generated by the system when the target cluster end stores data, and is used for storing system information and information such as a corresponding data storage position in the database.
S205: and determining the copy mapping relation between the nodes of the source cluster end and the nodes of the target cluster end according to the target database system table and the source database system table.
In this embodiment, the copy mapping relationship is a correspondence relationship between two objects. For example, two objects in this embodiment are the correspondence between one node of the source cluster end and one node of the target cluster end.
In an alternative embodiment of the present application, step S205 includes:
s205a: and determining the internet protocol information of each node of the source cluster end according to the source database system table.
S205b: and determining the internet protocol information of each node of the target cluster end according to the target database system table.
S205c: and determining the copy mapping relation between each node of the source cluster end and each node of the target cluster end according to the internet protocol information of each node of the source cluster end and the internet protocol information of each node of the target cluster end.
In this embodiment, the internet protocol information of the node is the internet protocol address of the physical network card of the node, the internet protocol information of all the nodes is a physical network that exists actually, and when the internet protocol information of each node of the source cluster indicates that a node outside the source cluster accesses a certain node or service in the source cluster, communication must be performed through the physical network of the source cluster. Similarly, the internet protocol information of each node of the target cluster indicates that when a node outside the target cluster accesses a certain node or service in the target cluster, communication must be performed through the physical network of the target cluster.
In this embodiment, the copy mapping relationship between each node of the source cluster end and each node of the target cluster end is specifically expressed in that one node of the source cluster end has a corresponding node in the target cluster end, and the internet protocol addresses between the two nodes are corresponding.
S206: and splitting the source table data according to the replication mapping relation.
In this embodiment, splitting source table data may be performed by dividing the data into multiple shares and deriving the data from multiple nodes. For example, the source table data is exported at the source cluster end using a cluster helper tool.
S207: and sending the source table metadata to the target cluster end so that the target cluster end generates target table metadata according to the source table metadata and generates initial table data according to the target table metadata.
In this embodiment, the source table metadata is metadata generated when the source cluster end queries the source database system table, and the target table metadata is source table metadata copied from the source cluster end to the target cluster end. The initial table data is a data table which is not stored with data, and the data table and the source data have the same table structure, user authority and other basic table data information.
In an alternative embodiment of the present application, the source table metadata includes metadata database schema definition language DDL information, and accordingly, step S207 includes:
And sending the metadata DDL information to a target cluster end so that the target cluster end executes a pre-stored synchronous execution tool according to the database super user white list information, synchronizes the source table metadata from the source cluster end to the target cluster end, generates target table metadata according to the metadata DDL information, and generates initial table data according to the target table metadata.
In this embodiment, the metadata DDL information may be sent between the master node of the source cluster side and the master node of the target cluster side.
In this embodiment, the metadata DDL information refers to metadata related commands such as a new command, a modification command, and a delete command, which define metadata related operation commands. In this embodiment, the pre-stored synchronization executing tool may be a copying tool, such as a hash copy hashcopy tool.
S208: and sending the split source table data from the nodes of the source cluster end to the nodes of the target cluster end through the data transmission channel, so that the target cluster end imports the source table data into the initial table data according to the target table metadata, and generates the target table data.
In this embodiment, step S208 may be performed by executing multiple replication tools at the source cluster end at the same time, for example, by executing hashcopy tools at multiple nodes at the source cluster end at the same time.
In summary, in the data replication method provided in this embodiment, by responding to a data replication request, a replication configuration parameter and a node activation instruction are generated, source table data of source data is determined according to replication configuration parameter information, and source table metadata is determined according to a source database system table. Then, the source table metadata is sent to the target cluster end, so that the target cluster end generates target table metadata according to the source table metadata and generates initial table data according to the target table metadata, and the source table metadata is copied. After receiving the target database system table, determining a replication mapping relation between the nodes of the source cluster end and the nodes of the target cluster end according to the target database system table and the source database system table, and splitting the source table data according to the replication mapping relation. And finally, the split source table data is sent to a plurality of nodes of the target cluster end from a plurality of nodes of the source cluster end through a data transmission channel, so that the target cluster end imports the source table data into the initial table data according to the target table metadata to generate the target table data, and the replication efficiency of the source data between the source cluster end and the target cluster end is improved by utilizing the data processing capacity and the communication bandwidth capacity of the plurality of nodes between clusters when the nodes work simultaneously.
On the basis of the above embodiment, as an optional embodiment of the present application, after responding to the data replication request in step S201, the method further includes:
step A: statistical information of the source data is generated.
In this embodiment, the statistical information of the source data may be the number of records such as the number of the database to be copied and the number of the source data.
Based on the above embodiments, in an alternative embodiment of the present application, the statistical information of the source data includes a source table record number. Accordingly, after step S208, the method further includes:
and (B) step (B): and receiving the target record number sent by the target cluster end, wherein the target record number is generated when the target cluster end imports the source table data into the initial table data.
Step C: and carrying out data comparison processing on the target record number and the source table record number, and generating a record number comparison result.
In this embodiment, the data comparison process is a record number consistency comparison process, and the record number comparison result may include a record number consistency or a record number inconsistency.
Step D: if the record number comparison result is judged to meet the preset error reporting condition, returning to the step of responding to the data replication request to generate replication configuration parameters and node opening instructions.
Step E: and if the record number comparison result is judged to meet the preset success condition, returning a data copying success result.
In this embodiment, the preset error reporting condition is that the record numbers are inconsistent, and when the record numbers are inconsistent, it indicates that the data copy fails, and the execution needs to be continued from step S201 to step S208 until the data comparison result is that the record numbers are consistent.
In summary, according to the data replication method provided by the embodiment, the target record number and the source table record number are subjected to data comparison processing to generate record number comparison results, and corresponding processing measures are selected according to different record number comparison results until the data replication is successful, so that data replication failure or missing data is avoided.
On the basis of the above embodiment, as an optional embodiment of the present application, the statistical information of the source data includes the number of source table records, and correspondingly, step S207 further includes:
step F: if the number of the source table records meets the preset threshold condition, sending the source table data from one node of the source cluster end to the target cluster end through the data transmission channel according to the copy mapping relation, so that the target cluster end imports the source table data into the initial table data according to the target table metadata to generate target table data.
In this embodiment, the preset threshold condition is a manually set value condition, and for example, the value may be 100 ten thousand. When the record number of the source table meets the preset threshold condition, the data processing capacity and the bandwidth communication capacity of one node of the source cluster end and one node of the target cluster end are indicated to meet the data replication requirement.
In this embodiment, if the number of records in the source table does not meet the preset threshold condition, step S208 is used to send the source table data.
In summary, the data replication method provided in this embodiment determines whether to use one node or multiple nodes to send source table data by determining whether the number of source table records meets a preset threshold condition, so as to save data processing data and bandwidth communication resources of the source cluster end and the target cluster end.
Based on the above embodiments, in an optional embodiment of the present application, after step S207, the method further includes:
step G: and sending the same-table query instruction to the target cluster end, so that the target cluster end determines a target end data table with the same source table metadata according to the same-table query instruction, and deletes the target end data table.
In this embodiment, the same table query instruction is an instruction indicating whether the target cluster end queries the same database or data table, and the target end deletes the data table and then proceeds to step S208.
In summary, in the data replication method provided in this embodiment, by sending the same-table query instruction to the target cluster end, the target cluster end deletes the target end data table with the same metadata as the source table metadata, so as to avoid inserting repeated data, and further improve the data replication efficiency.
Based on the above embodiments, in an optional embodiment of the present application, before step S206, the method further includes:
step H: if the number of the nodes of the source cluster end is not equal to the number of the nodes of the target cluster end, the source table data is subjected to redistribution processing, and redistributed source table data is generated.
In this embodiment, the redistribution process indicates that the cluster sizes of the target cluster end and the source cluster end are inconsistent when the number of nodes of the source cluster end is unequal to the number of nodes of the target cluster end. In this embodiment, the redistribution refers to redistributing the original data of the cluster into the new node, so as to properly reduce the data amount in the old node.
In summary, according to the data replication method provided by the embodiment, the source table data is subjected to redistribution processing, so that the data can be more quickly transmitted from a plurality of nodes of the source cluster end to a plurality of nodes of the target cluster end, and further the data replication efficiency is improved.
On the basis of the above embodiment, as an optional embodiment of the present application, after step S208, the method further includes:
step I: and sending the statistical information collection instruction to the target cluster end so that the target cluster end can collect the statistical information of the target table data according to the statistical information collection instruction.
In this embodiment, the statistics information collection instruction is an instruction for instructing the target cluster end to collect statistics information of the target table data. The statistics of the target table data may include the data amount of the target table data.
In summary, the data replication method provided in this embodiment provides convenience for subsequent users to use and call the target table data or execute the program by collecting the statistical information of the target table data.
Referring to fig. 3, fig. 3 is a flowchart of a data replication method according to another embodiment of the present application. The execution body of the embodiment may be any type of server, or may be the target cluster shown in fig. 1, which is not particularly limited herein. As shown in fig. 3, the method includes:
s301: receiving a node opening instruction sent by a source cluster end, wherein the node opening instruction is generated by the source cluster end in response to a data replication request, and the replication configuration parameter and the node opening instruction are generated; the replication configuration parameters are used for indicating the source cluster end to determine source table data of the source data according to replication configuration parameter information; source table metadata is determined from the source database system table.
S302: and establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the node opening instruction.
Specifically, in an alternative embodiment of the present application, step S302 includes:
s302a: and receiving a node opening instruction sent by the source cluster end, wherein the node opening instruction is used for opening each node of the source cluster end and determining the white list information of the database super user.
S302b: and opening each node corresponding to all nodes of the source cluster terminal according to the node opening instruction.
S302c: and receiving the database super user white list information sent by the source cluster terminal, and determining the corresponding relation of network access according to the database super user white list information.
S302d: and sending the network access relation to the source cluster end so that the source cluster end establishes a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the network access corresponding relation.
In this embodiment, the implementation principle and technical effect of step S302 are similar to those of step S203 in the embodiment shown in fig. 2, so that the description of this embodiment is omitted here.
S303: and sending the target database system table to the source cluster end so that the source cluster end determines a replication mapping relation between the nodes of the source cluster end and the nodes of the target cluster end according to the target database system table and the source database system table, and splitting the source table data according to the replication mapping relation.
In this embodiment, the target database system table is a table automatically generated by the system when the target cluster end stores data, and is used for storing system information and information such as a corresponding data storage position in the database.
S304: and receiving source table metadata sent by a source cluster terminal, generating target table metadata according to the source table metadata, and generating initial table data according to the target table metadata.
Specifically, in an alternative embodiment of the present application, the source table metadata includes metadata database schema definition language DDL information, and accordingly, step S304 includes:
s304a: and receiving metadata DDL information sent by the source cluster end, executing a pre-stored synchronous execution tool according to the database super user white list information, and synchronizing the source table metadata from the source cluster end to the target cluster end.
S304b: generating target table metadata according to the metadata DDL information, and generating initial table data according to the target table metadata.
In this embodiment, the target table metadata is the same as the source table metadata, and taking the data table as an example, the target table metadata may include basic table parameter information such as the number of rows, the number of columns, and the header name of the data table. The initial table data is a data table into which data is not imported.
S305: and receiving split source table data sent by the source cluster end through a data transmission channel, wherein the split source table data is sent from a plurality of nodes of the source cluster end to a plurality of nodes of the target cluster end.
S306: and importing the source table data into the initial table data according to the target table metadata to generate target table data.
In summary, in the data replication method provided in this embodiment, when source data of a source cluster end is replicated to a target cluster end, a data transmission channel is established between a plurality of nodes of the source cluster end and a plurality of nodes of the target cluster end, and a plurality of nodes are used to receive source table data sent from a plurality of nodes of the source cluster end at the same time, and according to source table metadata sent from one node of the source cluster end, target table metadata is generated first, and then initial table data is generated according to the target table metadata. And finally, importing the source table data transmitted from the nodes of the source cluster end into the initial table data according to the target table metadata to generate target table data, and improving the data replication efficiency when the data volume is large by utilizing the data processing capacity and the bandwidth communication capacity among the nodes.
On the basis of the above embodiment, as an optional embodiment of the present application, the source cluster side further generates statistical information of source data in response to the data replication request, where the statistical information of source data includes a record number of the source table. Accordingly, step S306 further includes:
Step a: when the source table data is imported into the initial table data, the target record number is generated.
Step b: the target record number is sent to the source cluster end, so that the source cluster end performs data comparison processing on the target record number and the record number of the source table to generate a record number comparison result, if the record number comparison result is judged to meet the preset error reporting condition, the step of responding to the data replication request to generate replication configuration parameters and node opening instructions is returned, and if the record number comparison result is judged to meet the preset success condition, the data replication success result is returned.
Based on the above embodiments, in an optional embodiment of the present application, after step S306, the method further includes:
step c: when the source cluster end judges that the record number of the source table meets a preset threshold condition, receiving the source table data sent by one node of the source cluster end through a data transmission channel.
Step d: and importing the source table data into the initial table data according to the target table metadata to generate target table data.
In this embodiment, the implementation process and technical effects of steps c to d are similar to those of step F in the embodiment shown in fig. 2, so that the description of this embodiment is omitted here.
Based on the above embodiments, in an optional embodiment of the present application, after step S306, the method further includes:
Step g: and receiving a same-table query instruction sent by the source cluster terminal.
Step h: and determining a target end data table with the same source table metadata according to the same table query instruction, and deleting the target end data table.
In this embodiment, the implementation principle and technical effect of the steps G to h are similar to those of the step G in the embodiment shown in fig. 2, so that the description of this embodiment is omitted here.
In an optional embodiment of the present application, after step S306, the method further includes:
step i: and receiving a statistical information instruction sent by the source cluster terminal, and collecting statistical information of the target table data according to the statistical information instruction.
In this embodiment, the statistics information instruction is configured to instruct the target cluster end to collect statistics information in a process of generating the target table data.
In summary, the data replication method provided in this embodiment provides convenience for the subsequent use of the target table data by collecting the statistical information of the target table data at the target cluster end.
Referring to fig. 4, fig. 4 is an interactive flow chart of a data replication method according to an embodiment of the present application, where the data replication method includes:
s401: the source cluster side responds to the data replication request and generates replication configuration parameters and node opening instructions.
S402: the source cluster end determines source table data of the source data according to the replication configuration parameters and determines source table metadata according to the source database system table.
S403: the source cluster end establishes a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the node opening instruction.
S404: and the target cluster end sends the target database system table to the source cluster end.
S405: and the source cluster end determines the copy mapping relation between the nodes of the source cluster end and the nodes of the target cluster end according to the target database system table and the source database system table.
S406: and splitting the source table data by the source cluster terminal according to the copy mapping relation.
S407: the source cluster end sends source table source data to the target cluster end.
S408: the target cluster terminal generates target table metadata according to the source table metadata, and generates initial table data according to the target table metadata.
S409: and the nodes of the source cluster end send the split source table data to the nodes of the target cluster end through the data transmission channel.
S4010: the target cluster end imports the source table data into the initial table data to generate target table data.
In summary, the data replication method provided in this embodiment may improve data replication efficiency by using a plurality of nodes at a source cluster end and a plurality of nodes at a target cluster end.
Referring to fig. 5, fig. 5 is a schematic structural diagram of a data copying apparatus according to an embodiment of the present application. As shown in fig. 5, the data copying apparatus 50 includes: a request response module 501, a source data processing module 502, a first transmission channel establishment module 503, a first receiving module 504, a duplication mapping relationship determination module 505, a data splitting module 506, and a first sending module 507.
Specifically, the request response module 501 is configured to generate a replication configuration parameter and a node activation instruction in response to a data replication request.
The source data processing module 502 is configured to determine source table data of source data according to the replication configuration parameter information; source table metadata is determined from the source database system table.
The first transmission channel establishment module 503 is configured to establish a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the node opening instruction.
The first receiving module 504 is configured to receive a target database system table sent by the target cluster side.
The replication mapping relation determining module 505 is configured to determine a replication mapping relation between a node of the source cluster end and a node of the target cluster end according to the target database system table and the source database system table.
The data splitting module 506 is configured to split the source table data according to the replication mapping relationship.
The first sending module 507 is configured to send the source table metadata to the target cluster end, so that the target cluster end generates target table metadata according to the source table metadata, and generates initial table data according to the target table metadata.
The first sending module 507 is further configured to send the split source table data from the plurality of nodes of the source cluster end to the plurality of nodes of the target cluster end through the data transmission channel, so that the target cluster end imports the source table data into the initial table data according to the target table metadata, and generates the target table data.
In an alternative embodiment of the present application, the first transmission channel establishment module 503 is specifically configured to:
opening each node of the source cluster end according to the node opening instruction; transmitting a node opening instruction to a target cluster end so that the target cluster end opens each node corresponding to all nodes of the source cluster end according to the node opening instruction; and determining the white list information of the database super user according to the node opening instruction. And sending the database super-user white list information to the target cluster end so that the target cluster end determines the corresponding relation of network access according to the database super-user white list information. And receiving the network access corresponding relation sent by the target cluster end, and establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the network access corresponding relation.
In an optional embodiment of the present application, the source table metadata includes metadata database schema definition language DDL information, and the first sending module 507 is specifically configured to send the metadata DDL information to the target cluster end, so that the target cluster end executes a pre-stored synchronization execution tool according to the database super-user whitelist information, synchronizes the source table metadata from the source cluster end to the target cluster end, generates target table metadata according to the metadata DDL information, and generates initial table data according to the target table metadata.
In an optional embodiment of the present application, the duplication mapping relationship determining module 505 is specifically configured to:
and determining the internet protocol information of each node of the source cluster end according to the source database system table. And determining the internet protocol information of each node of the target cluster end according to the target database system table. And determining the copy mapping relation between each node of the source cluster end and each node of the target cluster end according to the internet protocol information of each node of the source cluster end and the internet protocol information of each node of the target cluster end.
In an alternative embodiment of the present application, the request response module 501, after responding to the data replication request, is further configured to: statistical information of the source data is generated.
In an alternative embodiment of the present application, the statistical information of the source data includes a source table record number. Correspondingly, the first sending module 507 is further specifically configured to: and receiving the target record number sent by the target cluster end, wherein the target record number is generated when the target cluster end imports the source table data into the initial table data. And carrying out data comparison processing on the target record number and the source table record number, and generating a record number comparison result. If the record number comparison result is judged to meet the preset error reporting condition, returning to the step of responding to the data replication request to generate replication configuration parameters and node opening instructions. And if the record number comparison result is judged to meet the preset success condition, returning a data copying success result.
In an optional embodiment of the present application, the statistical information of the source data includes a record number of the source table, and accordingly, the first sending module 507 is further configured to: if the number of the source table records meets the preset threshold condition, sending the source table data from one node of the source cluster end to the target cluster end through the data transmission channel according to the copy mapping relation, so that the target cluster end imports the source table data into the initial table data according to the target table metadata to generate target table data.
In an alternative embodiment of the present application, the first sending module 507 is further configured to: and sending the same-table query instruction to the target cluster end, so that the target cluster end determines a target end data table with the same source table metadata according to the same-table query instruction, and deletes the target end data table.
In an alternative embodiment of the present application, the data splitting module 506 is further configured to: if the number of the nodes of the source cluster end is not equal to the number of the nodes of the target cluster end, the source table data is subjected to redistribution processing, and redistributed source table data is generated.
In an alternative embodiment of the present application, the first sending module 507 is further configured to: and sending the statistical information collection instruction to the target cluster end so that the target cluster end can collect the statistical information of the target table data according to the statistical information collection instruction.
Referring to fig. 6, fig. 6 is a schematic structural diagram of another data replication device according to an embodiment of the present application. As shown in fig. 6, the data copying apparatus 60 includes: a second receiving module 601, a second transmission channel establishing module 602, a second transmitting module 603, and a data copying module 604.
The second receiving module 601 is configured to receive a node activation instruction sent by the source cluster end, where the node activation instruction is generated by the source cluster end in response to a data replication request by the source cluster end; the replication configuration parameters are used for indicating the source cluster end to determine source table data of the source data according to replication configuration parameter information; source table metadata is determined from the source database system table.
The second transmission channel establishment module 602 is configured to establish a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the node opening instruction.
The second sending module 603 is configured to send the target database system table to the source cluster end, so that the source cluster end determines a replication mapping relationship between a node of the source cluster end and a node of the target cluster end according to the target database system table and the source database system table, and splits the source table data according to the replication mapping relationship.
The second receiving module 601 is further configured to receive source table metadata sent by the source cluster end, generate target table metadata according to the source table metadata, and generate initial table data according to the target table metadata.
The second receiving module 601 is further configured to receive split source table data sent by the source cluster end through the data transmission channel, where the split source table data is sent from a plurality of nodes of the source cluster end to a plurality of nodes of the target cluster end.
The data replication module 604 is configured to import the source table data into the initial table data according to the target table metadata, and generate the target table data.
In an alternative embodiment of the present application, the second transmission channel establishment module 602 is specifically configured to: and receiving a node opening instruction sent by the source cluster end, wherein the node opening instruction is used for opening each node of the source cluster end and determining the white list information of the database super user. And opening each node corresponding to all nodes of the source cluster terminal according to the node opening instruction. And receiving the database super user white list information sent by the source cluster terminal, and determining the corresponding relation of network access according to the database super user white list information. And sending the network access relation to the source cluster end so that the source cluster end establishes a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the network access corresponding relation.
In an alternative embodiment of the present application, the source table metadata includes DDL information of a metadata database schema definition language, and the second receiving module 601 is specifically configured to: and receiving metadata DDL information sent by the source cluster end, executing a pre-stored synchronous execution tool according to the database super user white list information, and synchronizing the source table metadata from the source cluster end to the target cluster end. Generating target table metadata according to the metadata DDL information, and generating initial table data according to the target table metadata.
In an optional embodiment of the present application, the source cluster side further generates statistical information of the source data in response to the data replication request, where the statistical information of the source data includes a record number of the source table. Accordingly, the data replication module 604 is further configured to: when the source table data is imported into the initial table data, the target record number is generated. The target record number is sent to the source cluster end, so that the source cluster end performs data comparison processing on the target record number and the record number of the source table to generate a record number comparison result, if the record number comparison result is judged to meet the preset error reporting condition, the step of responding to the data replication request to generate replication configuration parameters and node opening instructions is returned, and if the record number comparison result is judged to meet the preset success condition, the data replication success result is returned.
In an alternative embodiment of the present application, the second receiving module 601 is further configured to: when the source cluster end judges that the record number of the source table meets a preset threshold condition, receiving the source table data sent by one node of the source cluster end through a data transmission channel. A data replication module 604 for: and importing the source table data into the initial table data according to the target table metadata to generate target table data.
In an alternative embodiment of the present application, the second receiving module 601 is further configured to: and receiving a same-table query instruction sent by the source cluster terminal. And determining a target end data table with the same source table metadata according to the same table query instruction, and deleting the target end data table.
In an alternative embodiment of the present application, the second receiving module 601 is further configured to: and receiving a statistical information instruction sent by the source cluster terminal, and collecting statistical information of the target table data according to the statistical information instruction.
The data copying device provided in this embodiment may be used to execute the technical solution of the foregoing method embodiment, and its implementation principle and technical effects are similar, and this embodiment will not be repeated here.
Referring to fig. 7, fig. 7 is a schematic hardware structure of a server system according to an embodiment of the present application, as shown in fig. 7, where the server system includes: at least one processor 701 and a memory 702.
Wherein the memory 702 is used for storing computer-executable instructions.
A processor 701 for executing computer-executable instructions stored in a memory 702 to perform the steps involved in the method embodiments described above. Reference may be made in particular to the relevant description of the embodiments of the method described above.
Alternatively, the memory 702 may be separate or integrated with the processor 701.
When the memory 702 is provided separately, the server system further includes a bus 703 for connecting the memory 702 and the processor 701.
The embodiment of the application also provides a computer readable storage medium, wherein computer executable instructions are stored in the computer readable storage medium, and when a processor executes the computer executable instructions, the data copying method is realized.
Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements a data replication method as described above.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the above-described embodiments of the apparatus are merely illustrative, and for example, the above-described division of modules is merely a logical function division, and there may be other manners of dividing the modules when actually implemented, for example, multiple modules may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described above as separate components may or may not be physically separate, and components shown as modules may or may not be physical units, may be located in one place, or may be distributed over multiple network units. Some or all of the modules may be selected according to actual needs to implement the solution of this embodiment.
In addition, each functional module in the embodiments of the present invention may be integrated in one processing unit, or each module may exist alone physically, or two or more modules may be integrated in one unit. The units formed by the modules can be realized in a form of hardware or a form of hardware and software functional units.
The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform some steps of the methods of the various embodiments of the present application.
It should be understood that the above processor may be a central processing unit (Central Processing Unit, abbreviated as CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, abbreviated as DSP), application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution.
The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk or optical disk, etc.
The bus may be an industry standard architecture (Industry Standard Architecture, ISA) bus, an external device interconnect (Peripheral Component Interconnect, PCI) bus, or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, among others. The buses may be divided into address buses, data buses, control buses, etc. For ease of illustration, the buses in the drawings of the present application are not limited to only one bus or one type of bus.
The storage medium may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). It is also possible that the processor and the storage medium reside as discrete components in an electronic device or a master device.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the method embodiments described above may be performed by hardware associated with program instructions. The foregoing program may be stored in a computer readable storage medium. The program, when executed, performs steps including the method embodiments described above; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
To illustrate the technical solution of the present application, but not to limit it; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims (22)

1. The data replication method is characterized by being applied to a source cluster end and comprising the following steps of:
responding to the data replication request, and generating replication configuration parameters and node opening instructions;
determining source table data of source data according to the replication configuration parameter information; determining source table metadata according to a source database system table;
according to the node opening instruction, a data transmission channel between each node of the source cluster end and each node of the target cluster end is established;
receiving a target database system table sent by the target cluster end;
determining a copy mapping relation between the node of the source cluster end and the node of the target cluster end according to the target database system table and the source database system table;
Splitting the source table data according to the copy mapping relation;
the source table metadata is sent to the target cluster end, so that the target cluster end generates target table metadata according to the source table metadata and generates initial table data according to the target table metadata;
and sending the split source table data from the nodes of the source cluster end to the nodes of the target cluster end through the data transmission channel, so that the target cluster end imports the source table data into initial table data according to the target table metadata, and generates target table data.
2. The method according to claim 1, wherein the establishing a data transmission channel between each node of the source cluster side and each node of the target cluster side according to the node activation instruction includes:
opening each node of the source cluster end according to the node opening instruction; the node opening instruction is sent to the target cluster end, so that the target cluster end opens each node corresponding to all nodes of the source cluster end according to the node opening instruction; determining the super user white list information of the database according to the node opening instruction;
The database super user white list information is sent to the target cluster end, so that the target cluster end determines a network access corresponding relation according to the database super user white list information;
and receiving the network access corresponding relation sent by the target cluster end, and establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the network access corresponding relation.
3. The method of claim 2, wherein the source table metadata comprises metadata database schema definition language DDL information;
correspondingly, the sending the source table metadata to the target cluster end, so that the target cluster end generates target table metadata according to the source table metadata, and generates initial table data according to the target table metadata, including:
and sending the metadata DDL information to the target cluster end so that the target cluster end executes a pre-stored synchronous execution tool according to the database super user white list information, synchronizes the source table metadata from the source cluster end to the target cluster end, generates target table metadata according to the metadata DDL information, and generates initial table data according to the target table metadata.
4. The method of claim 1, wherein determining the replication mapping between the nodes of the source cluster side and the nodes of the target cluster side based on the target database system table and the source database system table comprises:
determining the internet protocol information of each node of the source cluster end according to the source database system table;
determining the internet protocol information of each node of the target cluster end according to the target database system table;
and determining a copy mapping relation between each node of the source cluster end and each node of the target cluster end according to the internet protocol information of each node of the source cluster end and the internet protocol information of each node of the target cluster end.
5. The method of claim 1, wherein after responding to the data replication request, further comprising:
and generating statistical information of the source data.
6. The method of claim 5, wherein the statistics of the source data include a source table record number;
correspondingly, the method includes the steps that the split source table data is sent from a plurality of nodes of the source cluster end to a plurality of nodes of the target cluster end through the data transmission channel, so that the target cluster end imports the source table data into initial table data according to the target table metadata, and after generating target table data, the method further includes the steps of:
Receiving a target record number sent by the target cluster end, wherein the target record number is generated when the target cluster end imports source table data into initial table data;
performing data comparison processing on the target record number and the source table record number to generate a record number comparison result;
if the record number comparison result is judged to meet the preset error reporting condition, returning to the step of responding to the data replication request to generate replication configuration parameters and node opening instructions;
and if the record number comparison result is judged to meet the preset success condition, returning a data copying success result.
7. The method of claim 5, wherein the statistics of the source data include a source table record number;
correspondingly, the sending the source table metadata to the target cluster end, so that the target cluster end generates target table metadata according to the source table metadata, and after generating initial table data according to the target table metadata, the method further includes:
and if the number of the source table records meets the preset threshold condition, sending the source table data from one of the nodes of the source cluster end to the target cluster end through the data transmission channel according to the replication mapping relation, so that the target cluster end imports the source table data into initial table data according to the target table metadata to generate target table data.
8. The method of claim 1, wherein the sending the source table metadata to the target cluster side, so that the target cluster side generates target table metadata according to the source table metadata, and after generating initial table data according to the target table metadata, further comprises:
and sending a same-table query instruction to the target cluster end, so that the target cluster end determines a target end data table with the same source table metadata according to the same-table query instruction, and deletes the target end data table.
9. The method of claim 1, wherein prior to splitting the source table data according to the replication mapping, further comprising:
and if the number of the nodes of the source cluster end is not equal to the number of the nodes of the target cluster end, carrying out redistribution processing on the source table data to generate redistributed source table data.
10. The method according to any one of claims 1 to 9, wherein the sending the split source table data from the plurality of nodes of the source cluster side to the plurality of nodes of the target cluster side through the data transmission channel, so that the target cluster side imports the source table data into initial table data according to the target table metadata, and after generating target table data, further includes:
And sending a statistical information collection instruction to the target cluster end so that the target cluster end can collect the statistical information of the target table data according to the statistical information collection instruction.
11. The data replication method is characterized by being applied to a target cluster side and comprising the following steps of:
receiving a node opening instruction sent by a source cluster end, wherein the node opening instruction is generated by the source cluster end in response to a data copying request, and the copying configuration parameter and the node opening instruction are generated; the replication configuration parameters are used for indicating the source cluster end to determine source table data of source data according to the replication configuration parameter information; determining source table metadata according to the source database system table;
according to the node opening instruction, a data transmission channel between each node of the source cluster end and each node of the target cluster end is established;
transmitting a target database system table to the source cluster end, so that the source cluster end determines a copy mapping relation between a node of the source cluster end and a node of the target cluster end according to the target database system table and the source database system table, and splits the source table data according to the copy mapping relation;
Receiving source table metadata sent by the source cluster end, generating target table metadata according to the source table metadata, and generating initial table data according to the target table metadata;
receiving split source table data sent by the source cluster end through the data transmission channel, wherein the split source table data is sent from a plurality of nodes of the source cluster end to a plurality of nodes of the target cluster end;
and importing the source table data into the initial table data according to the target table metadata to generate target table data.
12. The method according to claim 11, wherein the establishing a data transmission channel between each node of the source cluster side and each node of the target cluster side according to the node activation instruction includes:
receiving a node opening instruction sent by the source cluster end, wherein the node opening instruction is used for the source cluster end to open each node of the source cluster end and determine the white list information of the database super user;
opening each node corresponding to all nodes of the source cluster terminal according to the node opening instruction;
receiving the database super user white list information sent by the source cluster end, and determining a network access corresponding relation according to the database super user white list information;
And sending the network access relation to the source cluster end so that the source cluster end establishes a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the network access corresponding relation.
13. The method of claim 12, wherein the source table metadata includes metadata database schema definition language DDL information;
correspondingly, the receiving the source table metadata sent by the source cluster end, generating target table metadata according to the source table metadata, and generating initial table data according to the target table metadata, including:
receiving metadata DDL information sent by the source cluster end, executing a pre-stored synchronization execution tool according to the database super user white list information, and synchronizing the source table metadata from the source cluster end to the target cluster end;
generating target table metadata according to the metadata DDL information, and generating initial table data according to the target table metadata.
14. The method of claim 11, wherein the source cluster side further generates statistical information of the source data in response to the data replication request, the statistical information of the source data including a source table record number;
Correspondingly, the method for generating the target table data according to the target table metadata comprises the steps of:
generating a target record number when the source table data is imported into the initial table data;
the target record number is sent to the source cluster end, so that the source cluster end performs data comparison processing on the target record number and the source table record number to generate a record number comparison result, if the record number comparison result meets the preset error reporting condition, the step of responding to the data replication request to generate replication configuration parameters and node opening instructions is returned, and if the record number comparison result meets the preset success condition, the step of returning to the data replication success result is performed.
15. The method of claim 14, wherein importing source table data into initial table data according to the target table metadata, and generating target table data, further comprises:
when the source cluster end judges that the record number of the source table meets a preset threshold condition, receiving source table data sent by one node of the source cluster end through the data transmission channel;
and importing the source table data into the initial table data according to the target table metadata to generate target table data.
16. The method of claim 11, wherein generating target table metadata from the source table metadata, and generating initial table data from the target table metadata, further comprises:
receiving a same-table query instruction sent by the source cluster end;
and determining a target end data table with the same source table metadata according to the same table query instruction, and deleting the target end data table.
17. The method according to any one of claims 11 to 16, wherein the importing source table data into initial table data according to the target table metadata, after generating target table data, further comprises:
and receiving a statistical information instruction sent by the source cluster end, and collecting statistical information of the target table data according to the statistical information instruction.
18. A data replication device, applied to a source cluster, comprising:
the request response module is used for responding to the data replication request and generating replication configuration parameters and node opening instructions;
the source data processing module is used for determining source table data of source data according to the replication configuration parameter information; determining source table metadata according to the source database system table;
The first transmission channel establishing module is used for establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the node opening instruction;
the first receiving module is used for receiving a target database system table sent by the target cluster end;
the replication mapping relation determining module is used for determining the replication mapping relation between the nodes of the source cluster end and the nodes of the target cluster end according to the target database system table and the source database system table;
the data splitting module is used for splitting the source table data according to the replication mapping relation;
the first sending module is used for sending the source table metadata to the target cluster end so that the target cluster end generates target table metadata according to the source table metadata and generates initial table data according to the target table metadata;
the first sending module is further configured to send the split source table data from the plurality of nodes of the source cluster end to the plurality of nodes of the target cluster end through the data transmission channel, so that the target cluster end imports the source table data into initial table data according to the target table metadata, and generates target table data.
19. A data replication device, applied to a target cluster side, comprising:
the second receiving module is used for receiving a node opening instruction sent by a source cluster end, wherein the node opening instruction is generated by the source cluster end in response to a data replication request, and replication configuration parameters and the node opening instruction are generated; the replication configuration parameters are used for indicating the source cluster end to determine source table data of source data according to the replication configuration parameter information; determining source table metadata according to the source database system table;
the second transmission channel establishing module is used for establishing a data transmission channel between each node of the source cluster end and each node of the target cluster end according to the node opening instruction;
the second sending module is used for sending a target database system table to the source cluster end so that the source cluster end determines a copy mapping relation between a node of the source cluster end and a node of the target cluster end according to the target database system table and the source database system table, and splits the source table data according to the copy mapping relation;
the second receiving module is further configured to receive source table metadata sent by the source cluster end, generate target table metadata according to the source table metadata, and generate initial table data according to the target table metadata;
The second receiving module is further configured to receive split source table data sent by the source cluster end through the data transmission channel, where the split source table data is sent from a plurality of nodes of the source cluster end to a plurality of nodes of the target cluster end;
and the data copying module is used for importing the source table data into the initial table data according to the target table metadata to generate target table data.
20. A server system, comprising: at least one processor and memory;
the memory stores computer-executable instructions;
the at least one processor executing computer-executable instructions stored in the memory causing the at least one processor to perform the data replication method of any one of claims 1 to 10 or any one of claims 11 to 17.
21. A computer readable storage medium having stored therein computer executable instructions which, when executed by a processor, implement the data replication method of any one of claims 1 to 10 or any one of claims 11 to 17.
22. A computer program product comprising a computer program which, when executed by a processor, implements the data replication method of any one of claims 1 to 10 or any one of claims 11 to 17.
CN202310216017.XA 2023-03-01 2023-03-01 Data copying method, device, system and storage medium Pending CN116186165A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310216017.XA CN116186165A (en) 2023-03-01 2023-03-01 Data copying method, device, system and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310216017.XA CN116186165A (en) 2023-03-01 2023-03-01 Data copying method, device, system and storage medium

Publications (1)

Publication Number Publication Date
CN116186165A true CN116186165A (en) 2023-05-30

Family

ID=86436489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310216017.XA Pending CN116186165A (en) 2023-03-01 2023-03-01 Data copying method, device, system and storage medium

Country Status (1)

Country Link
CN (1) CN116186165A (en)

Similar Documents

Publication Publication Date Title
CN110147407B (en) Data processing method and device and database management server
US10831612B2 (en) Primary node-standby node data transmission method, control node, and database system
CN111049928B (en) Data synchronization method, system, electronic device and computer readable storage medium
US20200125452A1 (en) Systems and methods for cross-regional back up of distributed databases on a cloud service
EP2380090A2 (en) Data integrity in a database environment through background synchronization
WO2022134797A1 (en) Data fragmentation storage method and apparatus, a computer device, and a storage medium
CN106021566A (en) Method, device and system for improving concurrent processing capacity of single database
CN112121413A (en) Response method, system, device, terminal and medium of function service
CN113051102B (en) File backup method, device, system, storage medium and computer equipment
CN105335450B (en) Data storage processing method and device
CN109388651B (en) Data processing method and device
CN112835885A (en) Processing method, device and system for distributed table storage
CN112148206A (en) Data reading and writing method and device, electronic equipment and medium
CN112069152B (en) Database cluster upgrading method, device, equipment and storage medium
CN111459913B (en) Capacity expansion method and device of distributed database and electronic equipment
CN101483668A (en) Network storage and access method, device and system for hot spot data
CN111090782A (en) Graph data storage method, device, equipment and storage medium
CN110928911A (en) System, method and device for processing checking request and computer readable storage medium
CN116186165A (en) Data copying method, device, system and storage medium
CN115794819A (en) Data writing method and electronic equipment
CN114564210A (en) Copy deployment method, device, system, electronic equipment and storage medium
CN111782634A (en) Data distributed storage method and device, electronic equipment and storage medium
US10712959B2 (en) Method, device and computer program product for storing data
CN111400404A (en) Node initialization method, device, equipment and storage medium
CN111221857A (en) Method and apparatus for reading data records from a distributed system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination