CN109753511B - Cross-region real-time synchronization method and system for big data platform - Google Patents

Cross-region real-time synchronization method and system for big data platform Download PDF

Info

Publication number
CN109753511B
CN109753511B CN201811626088.2A CN201811626088A CN109753511B CN 109753511 B CN109753511 B CN 109753511B CN 201811626088 A CN201811626088 A CN 201811626088A CN 109753511 B CN109753511 B CN 109753511B
Authority
CN
China
Prior art keywords
metadata
data platform
file
coordination
master
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811626088.2A
Other languages
Chinese (zh)
Other versions
CN109753511A (en
Inventor
刘垚
康金怀
王小玉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Business Intelligence Of Oriental Nations Corp ltd
Original Assignee
Business Intelligence Of Oriental Nations Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Business Intelligence Of Oriental Nations Corp ltd filed Critical Business Intelligence Of Oriental Nations Corp ltd
Priority to CN201811626088.2A priority Critical patent/CN109753511B/en
Publication of CN109753511A publication Critical patent/CN109753511A/en
Application granted granted Critical
Publication of CN109753511B publication Critical patent/CN109753511B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Information Transfer Between Computers (AREA)

Abstract

A cross-region real-time synchronization method and a system of a big data platform are disclosed.A user is connected with the big data platform to issue an operation request, generates an operation instruction through a master coordination terminal and sends the operation instruction to a slave terminal of a sub data platform; generating version information of metadata, creating a metadata file according to the version information of the metadata, writing the metadata file into corresponding metadata, and sending the executed operation to a driven end in real time; the driven end completes the corresponding execution operation, creates a version number file corresponding to the version information and marks the version number file as a pre-update database state, and feeds back success information of the pre-update database to the master coordination end; and when the master coordination end receives success information of the pre-update database fed back by all the slave ends, the master coordination end modifies the corresponding version file and issues an update database instruction to the slave end, and the slave end updates the corresponding version file according to the master coordination end and marks the version file as an updated database state. The metadata loss and node blockage caused by single-point downtime are avoided, the reading speed is high, and the synchronization consistency is ensured.

Description

Cross-region real-time synchronization method and system for big data platform
Technical Field
The embodiment of the invention relates to the technical field of data processing, in particular to a cross-region real-time synchronization method and a cross-region real-time synchronization system for a large data platform.
Background
As is known, two-phase commit refers to an algorithm designed to keep all nodes based on a distributed system architecture consistent when a transaction is committed in the field of computer networks and databases. Generally, two-phase commit is also referred to as a protocol, with the specific first phase: preparation phase (voting phase) and second phase: the commit phase (execute phase). In a distributed system, each node can know the success or failure of the operation of the node, but cannot know the success or failure of the operation of other nodes. When a transaction spans a plurality of nodes, in order to maintain the ACID (short for four basic elements for correct execution of a database transaction) property of the transaction, a component serving as a coordinator needs to be introduced to uniformly master the operation results of all nodes (called participants) and finally indicate whether the nodes need to actually commit the operation results (such as writing updated data into a disk and the like). The idea of two-stage submission can be summarized as follows: the participants inform the coordinator of the success or failure of the operation, and the coordinator determines whether each participant needs to submit the operation or suspend the operation according to the feedback information of all the participants.
In the prior art, the following defects exist when two-stage submission is carried out on data:
first, single point of failure. Due to the importance of the coordinator, once the coordinator fails, the participants are blocked all the time. Especially in the second phase, the coordinator fails, and all participants are still in the state of locking the transaction resource and can not continue to complete the transaction operation. If the coordinator has a problem, one coordinator can be elected again, but the problem that the participants are in a blocking state due to the downtime of the coordinator cannot be solved.
Second, the data is inconsistent. In phase two of two-phase commit, after the coordinator sends a commit request to the participants, if a local network anomaly occurs or the coordinator fails during the sending of the commit request, this may result in only a portion of the participants receiving the commit request. And the submission operation is executed after the submission request is received by the part of participants. But the other machines which do not receive the commit request cannot execute the transaction commit, so that the data of the whole distributed system is inconsistent.
Thirdly, the coordinator goes down after sending out the submitted message again, and the only participant receiving the message also goes down at the same time. Then even if the coordinator creates a new coordinator via the election protocol, the status of the transaction is uncertain and no one knows whether the transaction has been committed.
And the fourth and second stages can not reduce cross-region network access when reading the metadata and read the metadata of the latest version.
Disclosure of Invention
Therefore, embodiments of the present invention provide a cross-region real-time synchronization method and system for a big data platform, which avoid metadata loss caused by a single point downtime, avoid a situation of waiting for blocking, and ensure atomicity, external access consistency, isolation, and durability of metadata modification and synchronization.
In order to achieve the above object, an embodiment of the present invention provides the following: a cross-region real-time synchronization method of a big data platform comprises the following steps:
establishing network connection between a total data platform and branch data platforms distributed in different places, taking a total platform node of the total data platform as a total coordination end, and taking a branch platform node of the branch data platform as a slave end;
generating an operation instruction through the master coordination terminal, and sending the generated operation instruction to a slave terminal of the sub data platform through the master data platform;
generating version information of metadata through the master coordination terminal, creating a metadata file by the master coordination terminal according to the version information of the metadata, writing corresponding metadata into the metadata file, and sending executed operations to the slave terminal in real time by the master coordination terminal;
the master coordination end waits for the slave end to finish the corresponding operation, and the master coordination end issues a database pre-updating instruction to the slave end; the slave end creates a version number file corresponding to the version information and marks the version number file as a pre-update database state, and the slave end feeds back successful information of the pre-update database to the master coordination end;
and after the slave end receives the update database command sent by the master coordination end, the slave end updates the corresponding version file according to the master coordination end and marks the version file as an updated database state.
As an optimal scheme of a cross-region real-time synchronization method of a big data platform, the state of the main coordination end is updated regularly through a distributed coordination service component, and when the main coordination end is in a non-response state, another main platform node is selected through the distributed coordination service component to notify the sub-platform nodes to perform data recovery operation.
As an optimal scheme of a cross-region real-time synchronization method of a big data platform, when the master coordination end has no response state, the slave end is informed of operation failure, and the slave end performs data recovery operation after receiving the operation failure notice; and when the total coordination terminal finishes modifying the corresponding version file, the operation is successful.
As an optimal scheme of a cross-region real-time synchronization method of a big data platform, the general coordination terminal utilizes a distributed coordination service component to generate version information of metadata, and names the metadata file by taking the version information of the metadata as a version number.
As an optimal scheme of a cross-region real-time synchronization method of a big data platform, when metadata are accessed through a total data platform, version information is obtained from the name of the metadata file; and when the metadata file of the corresponding version number cannot be searched, prompting a searching user.
As an optimal scheme of a cross-region real-time synchronization method of a big data platform, when metadata is accessed through a sub data platform, version information and the state of the metadata are obtained from the name of a metadata file;
when the metadata file of the corresponding version number cannot be searched, prompting a search user;
when the metadata file corresponding to the version number is in a pre-update database state, acquiring latest version information from the total data platform, and reading corresponding metadata from the sub data platform according to the latest version information;
and when the metadata file corresponding to the version number is in the updated database state, directly reading the corresponding metadata from the sub data platform.
The embodiment of the invention also provides a cross-region real-time synchronization system of the big data platform, which comprises the following steps:
the network building module is used for building network connection between the total data platform and the sub data platforms distributed in different places, taking a total platform node of the total data platform as a total coordination end, and taking a sub platform node of the sub data platform as a slave end;
the operation instruction generating module is used for generating an operation instruction by the master coordination end and sending the generated operation instruction to the slave end of the sub data platform through the master data platform;
the version information generation module is used for generating the version information of the metadata by the master coordination terminal;
the metadata file creating module is used for creating a metadata file according to the version information of the metadata by the master coordination terminal;
the metadata writing module is used for writing corresponding metadata in the metadata file;
the first state updating module is used for creating a version number file corresponding to the version information by the driven end and marking the version number file as a pre-updating database state;
the feedback module is used for the driven end to feed back success information of the pre-update database to the master coordination end;
the command issuing module is used for modifying the corresponding version file through the master coordination end and issuing a database updating command to the slave end after the master coordination end receives the success information of the pre-updating database fed back by all the slave ends;
and the second state updating module is used for updating the corresponding version file according to the master coordination end and marking the version file as the updated database state after the slave side receives the database updating command sent by the master coordination end.
The distributed coordination service component is used for selecting a certain total platform node to inform the sub-platform nodes to carry out data recovery operation when the total coordination end has no response state.
The optimal scheme of the cross-region real-time synchronization system as a big data platform further comprises a first notification module, a second notification module and a data recovery module;
the first notification module is used for notifying the operation failure of the driven end when the master coordination end has no response state;
the second notification module is used for prompting a search user when the metadata file with the corresponding version number cannot be searched;
and the data recovery module is used for performing data recovery operation on the time division platform node when the total coordination terminal has a no-response state.
The preferred scheme of the cross-region real-time synchronization system of the big data platform further comprises a searching module used for searching the metadata file with the corresponding version number through the version information.
The embodiment of the invention has the following advantages: when the metadata is created or modified, the metadata of the sub data platform is in a submitted synchronous state in most cases and can be directly read, and the latest readable version number needs to be acquired from the headquarter in few cases, so that the network communication times are reduced, and the metadata reading speed is high;
by utilizing a distributed file system backup mechanism, metadata loss caused by single-point downtime is avoided, the consistency of metadata reading is ensured, and the read metadata is consistent from a total data platform or a sub data platform; a node downtime mechanism of a total coordination end is provided, a monitoring node is provided, data recovery can be carried out when downtime and other problems occur, and the situation that the data platform is blocked and waits all the time is avoided.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
Fig. 1 is a flowchart of a cross-region real-time synchronization method for a big data platform according to an embodiment of the present invention;
fig. 2 is a schematic diagram of a cross-region real-time synchronization system of a big data platform according to an embodiment of the present invention;
in the figure: 1. a network building module; 2. an operation instruction generation module; 3. a version information generation module; 4. a metadata file creation module; 5. a metadata write module; 6. a search module; 7. a first state update module; 8. a feedback module; 9. an instruction issuing module; 10. a second state update module; 11. an anomaly monitoring module; 12. a first notification module; 13. a second notification module; 14. and a data recovery module.
Detailed Description
The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In this embodiment, the metadata is stored in the HDFS (hadoop Data File system), and the backup mechanism of the HDFS is fully utilized, so that the shutdown occurs and the metadata is not lost. One of the storage forms is a Version number file, Version number information and the like, the file name modification on the HDFS is an atomic operation, the file name of the Version number of the headquarter metadata is Version, and the headquarter needs to record the submission state. The file name of the Version number file of the province metadata is Version. The other is a metadata file in which content information of metadata is stored. The file name is meta. Where "+" denotes a specific number of the version number.
Referring to fig. 1, a cross-region real-time synchronization method for a big data platform is provided, which includes the following steps:
s1: establishing network connection between a total data platform and branch data platforms distributed in different places, taking a total platform node of the total data platform as a total coordination end, and taking a branch platform node of the branch data platform as a slave end;
s2: generating an operation instruction through the master coordination terminal, and sending the generated operation instruction to a slave terminal of the sub data platform through the master data platform;
s3: generating version information of metadata through the master coordination terminal, creating a metadata file by the master coordination terminal according to the version information of the metadata, writing corresponding metadata into the metadata file, and sending executed operations to the slave terminal in real time by the master coordination terminal;
s4: the slave end completes the corresponding execution operation, creates a version number file corresponding to the version information and marks the version number file as a pre-update database state, and the slave end feeds back the success information of the pre-update database to the master coordination end;
s5: and after the slave end receives the update database command sent by the master coordination end, the slave end updates the corresponding version file according to the master coordination end and marks the version file as an updated database state.
In one embodiment of the cross-region real-time synchronization method of the big data platform, the state of the main coordination terminal is periodically updated through the distributed coordination service component, and when the main coordination terminal is in a non-response state, another main platform node is selected through the distributed coordination service component to notify the sub-platform nodes to perform data recovery operation. When the master coordination end has no response state, the slave end is informed of operation failure, and the slave end performs data recovery operation after receiving the operation failure notice; and when the total coordination terminal finishes modifying the corresponding version file, the operation is successful.
Specifically, the distributed coordination service component adopts ZooKeeper, which is a distributed application coordination service with an open source code, is an open source implementation of Chubby of Google, is an important component of Hadoop and Hbase, and can provide configuration maintenance, domain name service, distributed synchronization, group service and the like. ZooKeeper is based on the Fast Paxos algorithm. For each DDL instruction, only one node of the total data platform and the sub data platform participates in execution, and the node of the total data platform connected by the user is a total coordination end.
The main coordination end updates the state of the main coordination end on the ZooKeeper regularly, the main data platform monitors the state of the main coordination end continuously, and if the main coordination end goes down, the main data platform selects a main platform node through the ZooKeeper and informs the sub data platforms of data recovery.
Specifically, the master coordination end does not need to wait for all slave ends to complete data updating, whether the operation is successful or not is subject to whether the master coordination end performs data updating operation or not, and the master coordination end performs data updating operation to indicate that the operation is successful; otherwise, the operation fails. If the total data platform detects that abnormity (such as network disconnection, downtime, overtime and the like) occurs, the total data platform reports 'operation failure' of the user, then the sub data platform is informed to recover data to return to a state before operation, metadata before operation failure is cleared, the whole process is terminated, if a total coordination terminal makes a mistake when modifying a version number file name, 'operation failure' occurs, the whole process is terminated, and the sub data platform is informed to recover data to return to the state before operation.
In an embodiment of a cross-region real-time synchronization method for a big data platform, the master coordination terminal generates version information of metadata by using a distributed coordination service component, and names the metadata file by using the version information of the metadata as a version number. When metadata is accessed through a total data platform, acquiring version information from the name of the metadata file; and when the metadata file of the corresponding version number cannot be searched, prompting a searching user.
Specifically, when metadata is accessed from a total data platform, the name of a Version number file is read first, the Version number such as Version2 is obtained from the name, and the name Meta.version2 of the metadata file is spelled out directly by the Version number; if the version number file cannot be found, the user is prompted for an error, for example: "table absent", "partition absent", and the like.
In one embodiment of the cross-region real-time synchronization method of the big data platform, when metadata is accessed through the sub data platform, version information and the state of the metadata are obtained from the name of the metadata file;
when the metadata file of the corresponding version number cannot be searched, prompting a search user;
when the metadata file corresponding to the version number is in a pre-update database state, acquiring latest version information from the total data platform, and reading corresponding metadata from the sub data platform according to the latest version information;
and when the metadata file corresponding to the version number is in the updated database state, directly reading the corresponding metadata from the sub data platform.
Specifically, when the metadata is accessed from the provincial data platform, the name of the Version number file is read first, and the Version number Version2 and the metadata state are obtained from the name.
a) If the version number file is not found, prompting the user to make an error, such as 'table does not exist' and the like;
b) if the state is the pre-update database state, acquiring a latest readable Version number from the total data platform (for example, acquiring Version1), and then reading metadata corresponding to the Version number from the sub data platform;
c) if the state is the updated database state, reading a metadata file corresponding to Version 2; if there is no version number file, the user is prompted for an error, such as "table not present," or the like.
Referring to fig. 2, an embodiment of the present invention further provides a cross-region real-time synchronization system for a big data platform, including:
the network building module 1 is used for building network connection between the total data platform and the sub data platforms distributed in different places, taking a total platform node of the total data platform as a total coordination end, and taking a sub platform node of the sub data platform as a slave end;
the operation instruction generating module 2 is used for generating an operation instruction by the master coordination end and sending the generated operation instruction to the slave end of the sub data platform through the master data platform;
the version information generating module 3 is used for generating the version information of the metadata by the master coordination terminal;
the metadata file creating module 4 is used for creating a metadata file according to the version information of the metadata by the master coordination terminal;
a metadata writing module 5, configured to write corresponding metadata in a metadata file;
the first state updating module 7 is used for the driven end to create a version number file corresponding to the version information and mark the version number file as a pre-updating database state;
the feedback module 8 is used for the driven end to feed back success information of the pre-update database to the master coordination end;
the instruction issuing module 9 is configured to modify the corresponding version file through the master coordination end and issue an update database instruction to the slave end after the master coordination end receives success information of the pre-update database fed back by all the slave ends;
and a second state updating module 10, configured to, after the slave side receives the database updating instruction sent by the master coordination side, update, by the slave side, the corresponding version file according to the master coordination side, and mark the corresponding version file as an updated database state.
In an embodiment of the cross-region real-time synchronization system of the big data platform, the system further includes an anomaly monitoring module 11, configured to monitor a response state of the master coordination end, and select a certain master platform node to notify the sub-platform nodes to perform data recovery operation through the distributed coordination service component when the master coordination end is in a no-response state.
In one embodiment of the cross-region real-time synchronization system of the big data platform, the system further comprises a first notification module 12, a second notification module 13 and a data recovery module 14;
the first notification module 12 is configured to perform operation failure notification on the slave end when the master coordination end has a no-response state;
the second notification module 13 is configured to prompt a search user when the metadata file with the corresponding version number is not searched;
the data recovery module 14 is configured to perform data recovery operation on the time division platform node when the master coordinator is in a non-response state.
In an embodiment of the cross-region real-time synchronization system of the big data platform, the system further includes a searching module 6, configured to search the metadata file with the corresponding version number through the version information.
Specifically, in the practice process of cross-region real-time synchronization of a large data platform, for example, a headquarter user creates a table on a total data platform, metadata information of the table needs to be synchronized to all provincial branch data platforms in real time, so that the provincial branch data platform user can see the metadata information of the table, such as how many columns of the table are, what the column names are, and what the number types are. In the whole process, the headquarter user sends an operation instruction to the branch data platform, the nodes connected with the headquarter user are used as a master coordination end, and the master coordination end is responsible for commanding the branch data platform to carry out corresponding operation. And the total coordination end updates the state of the total coordination end on the Zookeeper at regular intervals, the total data platform can continuously monitor the state of the total coordination end, and if the total coordination end goes down, the total data platform selects a new node through the Zookeeper and informs the sub data platform of data recovery. And establishing network connection from the total data platform to the province data platforms of all provinces, wherein the nodes of the connected province data platforms are used as driven ends.
The master coordinator of the total data platform of the headquarters generates a Version number Version2 of the metadata, which is monotonically increasing, generated using Zookeeper. And creating a metadata file with a file name of Meta.Vesion2 at a head office total data platform, and writing the metadata. And then the operation is sent to the slave end of the branch data platform of all provinces in real time.
After the driven end of the province branch data platform finishes all operations, a Version number file is created, the file name is Version2_ presommit, which is equivalent to province, the state of the slave is changed into a pre-update database state, the Version number Version2 is recorded, and then the master is reported to be successful in entering the pre-update database state.
And if the master coordination end of the total data platform of the headquarter receives that the slave ends of all provinces report that the state of entering the pre-updating database is successful, continuing to execute the following operation. And the master coordination end of the total data platform of the headquarter modifies the Version number file, the file name is changed to Version2, the Version number is equivalent to record Version2, and then the slave ends of all province sub data platforms are informed to update the database.
The general coordination end of the general data platform of the headquarters prompts the user that the operation is successful, the user does not need to wait for all provinces to update the databases to be completed, whether the operation is successful or not is determined by whether the headquarters executes the database updating operation or not, and the headquarters executes the database updating operation to show that the operation is successful; otherwise, the operation fails. After receiving the instruction of updating the database, the point data platforms of all provinces modify the Version number file, the file name is changed to Version2_ commit, which is equivalent to changing the state of the point data platforms to the state of updating the database, and the Version number Version2 is recorded. When the metadata is created or modified, the metadata of the sub data platform is in a submitted synchronous state in most cases and can be directly read, and the latest readable version number needs to be acquired from the headquarter in few cases, so that the speed of reading the metadata is high; by utilizing a distributed file system backup mechanism, metadata loss caused by single-point downtime is avoided, the consistency of metadata reading is ensured, and the read metadata is consistent from a total data platform or a sub data platform; a node downtime mechanism of a total coordination end is provided, a monitoring node is provided, data recovery can be carried out when downtime and other problems occur, and the situation that the data platform is blocked and waits all the time is avoided.
Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims (8)

1. A cross-region real-time synchronization method of a big data platform is characterized by comprising the following steps:
establishing network connection between a total data platform and branch data platforms distributed in different places, taking a total platform node of the total data platform as a total coordination end, and taking a branch platform node of the branch data platform as a slave end;
generating an operation instruction through the master coordination terminal, and sending the generated operation instruction to a slave terminal of the sub data platform through the master data platform;
generating version information of metadata through the master coordination terminal, creating a metadata file by the master coordination terminal according to the version information of the metadata, writing corresponding metadata into the metadata file, and sending executed operations to the slave terminal in real time by the master coordination terminal;
the slave end completes the corresponding execution operation, creates a version number file corresponding to the version information and marks the version number file as a pre-update database state, and feeds back the success information of the pre-update database to the master coordination end;
when the master coordination end receives success information of the pre-update database fed back by all slave ends, the master coordination end modifies the corresponding version file and issues an update database command to the slave end, and after the slave end receives the update database command issued by the master coordination end, the slave end updates the corresponding version file according to the master coordination end and marks the version file as an updated database state;
the state of the master coordination terminal is updated regularly through a distributed coordination service component, and when the master coordination terminal is in a non-response state, another master platform node is selected through the distributed coordination service component to inform the sub-platform nodes of data recovery operation;
the master coordination end does not need to wait for all slave ends to finish data updating, whether the operation is successful or not is subject to whether the master coordination end performs the data updating operation or not, and the master coordination end performs the data updating operation to indicate that the operation is successful; otherwise, the operation fails;
if the total data platform detects that the abnormality occurs, the total data platform returns the operation failure of the user, then the sub data platform is informed to recover the data to return to the state before the operation, the metadata before the operation failure is cleared, the whole process is terminated, if the total coordination end makes a mistake when modifying the file name of the version number, the operation failure is ended, the whole process is terminated, and the sub data platform is informed to recover the data to return to the state before the operation;
when the metadata is accessed through the sub data platform, acquiring version information and the state of the metadata from the name of the metadata file;
when the metadata file of the corresponding version number cannot be searched, prompting a search user;
when the metadata file corresponding to the version number is in a pre-update database state, acquiring the state of the metadata file corresponding to the version number from the total data platform, if the metadata file is in the updated state, reading corresponding metadata from the sub data platform, and otherwise, prompting a user;
and when the metadata file corresponding to the version number is in the updated database state, directly reading the corresponding metadata from the sub data platform.
2. The cross-region real-time synchronization method for the big data platform as claimed in claim 1, wherein when the master coordination end has no response status, the slave end is notified of operation failure, and the slave end performs data recovery operation after receiving the operation failure notification; and when the total coordination terminal finishes modifying the corresponding version file, the operation is successful.
3. The cross-region real-time synchronization method for the big data platform as claimed in claim 1, wherein the master coordination terminal generates version information of metadata by using a distributed coordination service component, and names the metadata file by using the version information of the metadata as a version number.
4. The cross-region real-time synchronization method for the big data platform according to claim 1, wherein when metadata is accessed through a total data platform, version information is obtained from a name of the metadata file; and when the metadata file of the corresponding version number cannot be searched, prompting a searching user.
5. A cross-region real-time synchronization system of a big data platform is characterized by comprising:
the network building module is used for building network connection between the total data platform and the sub data platforms distributed in different places, taking a total platform node of the total data platform as a total coordination end, and taking a sub platform node of the sub data platform as a slave end;
the operation instruction generating module is used for generating an operation instruction by the master coordination end and sending the generated operation instruction to the slave end of the sub data platform through the master data platform;
the version information generation module is used for generating the version information of the metadata by the master coordination terminal;
the metadata file creating module is used for creating a metadata file according to the version information of the metadata by the master coordination terminal;
the metadata writing module is used for writing corresponding metadata in the metadata file;
the first state updating module is used for creating a version number file corresponding to the version information by the driven end and marking the version number file as a pre-updating database state;
the feedback module is used for the driven end to feed back success information of the pre-update database to the master coordination end;
the command issuing module is used for modifying the corresponding version file through the master coordination end and issuing a database updating command to the slave end after the master coordination end receives the success information of the pre-updating database fed back by all the slave ends;
the second state updating module is used for updating the corresponding version file according to the general coordination end and marking the version file as an updated database state after the driven end receives an update database instruction sent by the general coordination end;
the cross-region real-time synchronization system of the big data platform periodically updates the state of the main coordination end through a distributed coordination service component, and when the main coordination end is in a non-response state, another main platform node is selected through the distributed coordination service component to notify the sub-platform nodes to carry out data recovery operation;
the master coordination end does not need to wait for all slave ends to finish data updating, whether the operation is successful or not is subject to whether the master coordination end performs the data updating operation or not, and the master coordination end performs the data updating operation to indicate that the operation is successful; otherwise, the operation fails;
if the total data platform detects that the abnormality occurs, the total data platform returns the operation failure of the user, then the sub data platform is informed to recover the data to return to the state before the operation, the metadata before the operation failure is cleared, the whole process is terminated, if the total coordination end makes a mistake when modifying the file name of the version number, the operation failure is ended, the whole process is terminated, and the sub data platform is informed to recover the data to return to the state before the operation;
when the metadata is accessed through the sub data platform, acquiring version information and the state of the metadata from the name of the metadata file;
when the metadata file of the corresponding version number cannot be searched, prompting a search user;
when the metadata file corresponding to the version number is in a pre-update database state, acquiring the state of the metadata file corresponding to the version number from the total data platform, if the metadata file is in the updated state, reading corresponding metadata from the sub data platform, and otherwise, prompting a user;
and when the metadata file corresponding to the version number is in the updated database state, directly reading the corresponding metadata from the sub data platform.
6. The cross-region real-time synchronization system of the big data platform as claimed in claim 5, further comprising an anomaly monitoring module for monitoring a response state of the master coordination terminal, and when the master coordination terminal has no response state, selecting a certain master platform node through the distributed coordination service component to notify the sub-platform nodes to perform data recovery operation.
7. The cross-region real-time synchronization system of the big data platform as claimed in claim 5, further comprising a first notification module, a second notification module and a data recovery module;
the first notification module is used for notifying the operation failure of the driven end when the master coordination end has no response state;
the second notification module is used for prompting a search user when the metadata file with the corresponding version number cannot be searched;
and the data recovery module is used for performing data recovery operation on the time division platform node when the total coordination terminal has a no-response state.
8. The cross-regional real-time synchronization system of the big data platform as claimed in claim 5, further comprising a search module for searching metadata files with corresponding version numbers through version information.
CN201811626088.2A 2018-12-28 2018-12-28 Cross-region real-time synchronization method and system for big data platform Active CN109753511B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811626088.2A CN109753511B (en) 2018-12-28 2018-12-28 Cross-region real-time synchronization method and system for big data platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811626088.2A CN109753511B (en) 2018-12-28 2018-12-28 Cross-region real-time synchronization method and system for big data platform

Publications (2)

Publication Number Publication Date
CN109753511A CN109753511A (en) 2019-05-14
CN109753511B true CN109753511B (en) 2020-12-04

Family

ID=66404176

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811626088.2A Active CN109753511B (en) 2018-12-28 2018-12-28 Cross-region real-time synchronization method and system for big data platform

Country Status (1)

Country Link
CN (1) CN109753511B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110535907A (en) * 2019-07-26 2019-12-03 济南浪潮数据技术有限公司 A kind of metadata synchronization method and system
CN110517493B (en) * 2019-08-30 2022-03-25 公安部交通管理科学研究所 Cross-regional motor vehicle comprehensive information acquisition method and system
CN112835885B (en) * 2019-11-22 2023-09-01 北京金山云网络技术有限公司 Processing method, device and system for distributed form storage
CN113535391B (en) * 2021-06-28 2024-04-16 北京东方国信科技股份有限公司 Distributed cluster state information management method and system of cross-domain big data platform
CN113392074B (en) * 2021-07-13 2022-07-05 山东大学 Internet of things equipment security management method adopting memory documents
CN113448978B (en) * 2021-07-14 2024-04-16 中国银行股份有限公司 Method and device for guaranteeing data consistency in same-name image file replacement operation

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105472024A (en) * 2015-12-28 2016-04-06 北京赛思信安技术股份有限公司 Cross-region data synchronizing method based on message pushing mode
CN105491106A (en) * 2015-11-18 2016-04-13 中国石油天然气集团公司 Real-time synchronization system and method for oil well logging master-slave database systems
CN106776121A (en) * 2016-11-23 2017-05-31 中国工商银行股份有限公司 A kind of data calamity is for device, system and method
CN108776670A (en) * 2018-05-11 2018-11-09 阿里巴巴集团控股有限公司 A kind of strange disaster recovery method, system and electronic equipment

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104462192B (en) * 2011-12-01 2017-09-01 中国核工业二三建设有限公司 Method based on nuclear power construction Amulti-project management data synchronization technology device
US10545993B2 (en) * 2015-03-19 2020-01-28 Russell Sullivan Methods and systems of CRDT arrays in a datanet
CN105468727A (en) * 2015-11-20 2016-04-06 国家电网公司 Zookeeper based method for realizing MySQL strong-consistency copy
CN106980625B (en) * 2016-01-18 2020-08-04 阿里巴巴集团控股有限公司 Data synchronization method, device and system
CN106250514B (en) * 2016-08-04 2019-10-15 上海摩库数据技术有限公司 Transnational method of data synchronization based on Mysql database and SQL log
CN106874341B (en) * 2016-12-23 2022-04-05 中科星图股份有限公司 Database synchronization method
CN108121804B (en) * 2017-12-22 2020-06-05 百度在线网络技术(北京)有限公司 Cross-region distributed data storage method, device, terminal and storage medium
CN108763234A (en) * 2018-02-01 2018-11-06 宝付网络科技(上海)有限公司 A kind of real time data synchronization method and system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105491106A (en) * 2015-11-18 2016-04-13 中国石油天然气集团公司 Real-time synchronization system and method for oil well logging master-slave database systems
CN105472024A (en) * 2015-12-28 2016-04-06 北京赛思信安技术股份有限公司 Cross-region data synchronizing method based on message pushing mode
CN106776121A (en) * 2016-11-23 2017-05-31 中国工商银行股份有限公司 A kind of data calamity is for device, system and method
CN108776670A (en) * 2018-05-11 2018-11-09 阿里巴巴集团控股有限公司 A kind of strange disaster recovery method, system and electronic equipment

Also Published As

Publication number Publication date
CN109753511A (en) 2019-05-14

Similar Documents

Publication Publication Date Title
CN109753511B (en) Cross-region real-time synchronization method and system for big data platform
US9984140B1 (en) Lease based leader election system
US10747745B2 (en) Transaction execution commitment without updating of data row transaction status
US8190562B2 (en) Linking framework for information technology management
CN113396407A (en) System and method for augmenting database applications using blockchain techniques
US9513894B2 (en) Database software upgrade using specify-validate-execute protocol
JP6220851B2 (en) System and method for supporting transaction recovery based on strict ordering of two-phase commit calls
US20130117226A1 (en) Method and A System for Synchronizing Data
US20120271795A1 (en) Scalable row-store with consensus-based replication
EP3391244B1 (en) Replication control among redundant data centers
CN103345502B (en) Transaction processing method and system of distributed type database
WO2000075813A1 (en) Bidirectional database replication scheme for controlling ping-ponging
JP2022013854A (en) Method and device for updating database by using two-phase commit distributed transaction
CN113987064A (en) Data processing method, system and equipment
CN101933014A (en) System and method for replication and synchronisation
CN112579613B (en) Database cluster difference comparison and data synchronization method, system and medium
CN114238495A (en) Method and device for switching main cluster and standby cluster of database, computer equipment and storage medium
CN102317913A (en) Transaction recovery method and apparatus
EP3026574A1 (en) Affair processing method and device
CN108108119B (en) Configuration method and device for extensible storage cluster things
US20140250326A1 (en) Method and system for load balancing a distributed database providing object-level management and recovery
CN103780433B (en) Self-healing type virtual resource configuration management data architecture
CN116166196A (en) Method and device for recovering expansion and contraction capacity of storage pool in distributed storage system
JPH08235043A (en) Cooperative distributed system
CN111444281B (en) Database parameter synchronization method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant