CN116701533A - Data synchronization method, device, equipment and computer readable storage medium - Google Patents

Data synchronization method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN116701533A
CN116701533A CN202310672148.9A CN202310672148A CN116701533A CN 116701533 A CN116701533 A CN 116701533A CN 202310672148 A CN202310672148 A CN 202310672148A CN 116701533 A CN116701533 A CN 116701533A
Authority
CN
China
Prior art keywords
theme
data
target
cluster
normal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310672148.9A
Other languages
Chinese (zh)
Inventor
王平
张志强
赵光辉
涂宗芳
周斌
逯飞斌
焦顺志
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Bank Co Ltd
Original Assignee
China Merchants Bank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Bank Co Ltd filed Critical China Merchants Bank Co Ltd
Priority to CN202310672148.9A priority Critical patent/CN116701533A/en
Publication of CN116701533A publication Critical patent/CN116701533A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • G06F16/275Synchronous replication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/546Message passing systems or structures, e.g. queues
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data synchronization method, a device, equipment and a computer readable storage medium, wherein the data synchronization method is applied to an open source stream processing platform, the open source stream processing platform is provided with a plurality of clusters, each cluster comprises at least one theme, the theme comprises a normal theme and a mirror theme, and the method comprises the following steps: receiving a data writing request sent by a producer, wherein the data writing request comprises a theme identifier and data to be written; determining a target normal theme matched with the theme identification in the normal themes, and writing the data to be written into the target normal theme; and determining a target mirror image theme matched with the target normal theme in the mirror image themes, and copying data to be written in the target normal theme into the target mirror image theme so as to enable the target normal theme and the target mirror image theme to keep data synchronization. The application reduces the realization cost of the data synchronization of the open source data processing platform.

Description

Data synchronization method, device, equipment and computer readable storage medium
Technical Field
The present application relates to the field of data transmission technologies, and in particular, to a data synchronization method, apparatus, device, and computer readable storage medium.
Background
The open source processing platform Kafka is a high throughput distributed publish-subscribe messaging system that can process all action stream data of consumers in websites. In order to ensure high availability of Kafka, data in Kafka is generally required to be backed up and synchronized, but in the current Kafka data synchronization scheme, an additional data synchronization tool, such as MirrorMake, needs to be configured, and based on the data synchronization tool, data is pulled down by a consumer to be unpacked and then is sent to a backup storage address by a producer package, so that the realization cost of data synchronization of an open source data processing platform is high.
Disclosure of Invention
The application mainly aims to provide a data synchronization method, a device, equipment and a computer readable storage medium, which aim to solve the technical problem of how to reduce the realization cost of data synchronization of an open source stream data processing platform.
In order to achieve the above object, the present application provides a data synchronization method, which is applied to an open source stream processing platform, the open source stream processing platform is provided with a plurality of clusters, each cluster includes at least one theme, the theme includes a normal theme and a mirror theme, and the data synchronization method includes the following steps:
receiving a data writing request sent by a producer, wherein the data writing request comprises a theme identifier and data to be written;
determining a target normal theme matched with the theme identification in the normal themes, and writing the data to be written into the target normal theme;
and determining a target mirror image theme matched with the target normal theme in the mirror image themes, and copying data to be written in the target normal theme into the target mirror image theme so as to enable the target normal theme and the target mirror image theme to keep data synchronization.
Optionally, before the step of receiving the data writing request sent by the producer, the method includes:
setting a cluster role of each cluster, wherein the cluster role comprises a main cluster and a standby cluster, and the main clusters and the standby clusters are in one-to-one correspondence;
traversing each main cluster in turn, and setting all topics in the traversed main clusters as normal topics;
determining a backup cluster corresponding to the traversed main cluster, copying all normal topics included in the traversed main cluster into the backup cluster corresponding to the traversed main cluster, and setting all topics copied into the backup cluster as mirror image topics, wherein the normal topics and the mirror image topics are in one-to-one correspondence.
Optionally, after the step of setting all topics copied into the backup cluster as mirror topics, the method includes:
traversing each normal theme in turn to obtain metadata information of the traversed normal theme, wherein the metadata information comprises one or more of theme setting, an access control list and partition quantity;
and determining the mirror image theme corresponding to the traversed normal theme, and synchronizing the metadata information to the mirror image theme corresponding to the traversed normal theme.
Optionally, the step of copying the data to be written in the target normal theme to the target mirror theme includes:
determining target backup clusters matched with the target mirror image subject in each backup cluster, and starting a mirror image grabbing component in the target backup clusters;
capturing and copying the data to be written from the target normal subject based on the mirror image capturing component to obtain a data copy to be written;
and writing the data copy to be written into the target mirror theme.
Optionally, the step of writing the data to be written into the target normal subject includes:
determining a target main cluster matched with the determined target normal subject in each main cluster, and acquiring the current available state of the target main cluster;
if the current available state comprises a usable state, writing the data to be written into the target normal theme;
and if the current available state comprises a fault state, taking the target mirror theme as a new target normal theme, and writing the data to be written into the new target normal theme.
Optionally, before the step of obtaining the current available state of the target primary cluster, the method includes:
acquiring the use state data of the target main cluster, wherein the use state data comprises one or more of memory occupation information, port available information and process state information;
detecting whether the use state data is matched with preset fault data or not;
if the using state data is successfully matched with the preset fault data, a fault verification message is sent to a preset user terminal;
if a verification success message sent by the user terminal is received, taking the fault state as the current available state of the target main cluster;
and if the verification failure message sent by the user terminal is received, the usable state is used as the current usable state of the target main cluster.
Optionally, after the step of taking the fault state as the current available state of the target primary cluster, the method includes:
determining target backup clusters matched with the target mirror image subject in each backup cluster, and setting the cluster roles of the target backup clusters as main clusters;
and setting the cluster role of the target main cluster as a standby cluster. In addition, in order to achieve the above object, the present application further provides a data synchronization device, where the data synchronization device includes an open source stream processing platform, where the open source stream processing platform is provided with a plurality of clusters, each cluster includes at least one theme, where the theme includes a normal theme and a mirror theme, and the data synchronization device further includes:
the receiving module is used for receiving a data writing request sent by a producer, wherein the data writing request comprises a theme identifier and data to be written;
the writing module is used for determining a target normal theme matched with the theme identification in the normal themes and writing the data to be written into the target normal theme;
and the synchronization module is used for determining a target mirror image theme matched with the target normal theme in the mirror image themes, and copying the data to be written in the target normal theme into the target mirror image theme so as to enable the target normal theme to keep data synchronization with the target mirror image theme.
In addition, to achieve the above object, the present application also provides a data synchronization apparatus, including: the system comprises a memory, a processor and a data synchronization program stored in the memory and capable of running on the processor, wherein the data synchronization program realizes the steps of the data synchronization method when being executed by the processor.
In addition, in order to achieve the above object, the present application also provides a computer-readable storage medium, on which a data synchronization program is stored, which when executed by a processor, implements the steps of the data synchronization method as described above.
When a data writing request sent by a producer is received, the data to be written is written into a target normal theme in the Kafka, and the data to be written is also written into a target mirror image theme corresponding to the target normal theme, so that the real-time data synchronization between the target normal theme and the target mirror image theme is realized, the data synchronization between the normal theme in the Kafka and the mirror image theme can be realized without deploying an additional data synchronization tool, and the realization cost of the data synchronization of the Kafka is reduced.
Drawings
The achievement of the objects, functional features and advantages of the present application will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
FIG. 1 is a schematic diagram of a terminal/device structure of a hardware operating environment according to an embodiment of the present application;
FIG. 2 is a flowchart of a data synchronization method according to a first embodiment of the present application;
FIG. 3 is an illustrative diagram of a normal operation of a cluster in the data synchronization method of the present application;
FIG. 4 is an explanatory diagram of a disaster recovery situation switching cluster in the data synchronization method of the present application;
FIG. 5 is an illustrative diagram of a data synchronization method of the present application prior to cluster migration;
FIG. 6 is an illustrative diagram of a data synchronization method of the present application after cluster migration;
FIG. 7 is an illustrative diagram of a replicated read topic in a data synchronization method in accordance with the present application;
FIG. 8 is another illustrative diagram of a replicated read topic in a data synchronization method in accordance with the present application;
fig. 9 is a schematic diagram of a device module of the data synchronization device of the present application.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
Referring to fig. 1, fig. 1 is a schematic diagram of a data synchronization device structure of a hardware running environment according to an embodiment of the present application.
As shown in fig. 1, the data synchronization device may include: a processor 1001, such as a central processing unit (Central Processing Unit, CPU), a communication bus 1002, a user interface 1003, a network interface 1004, a memory 1005. Wherein the communication bus 1002 is used to enable connected communication between these components. The user interface 1003 may include a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may further include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a WIreless interface (e.g., a WIreless-FIdelity (WI-FI) interface). The Memory 1005 may be a high-speed random access Memory (Random Access Memory, RAM) Memory or a stable nonvolatile Memory (NVM), such as a disk Memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
It will be appreciated by those skilled in the art that the structure shown in fig. 1 does not constitute a limitation of the data synchronization device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 1, an operating device, a data storage module, a network communication module, a user interface module, and a data synchronization program may be included in a memory 1005 as one type of computer-readable storage medium.
In the data synchronization device shown in fig. 1, the network interface 1004 is mainly used for data communication with other devices; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 in the data synchronization device of the present application may be provided in the data synchronization device, and the data synchronization device calls the data synchronization program stored in the memory 1005 through the processor 1001 and executes the data synchronization method provided by the embodiment of the present application.
Referring to fig. 2, in a first embodiment of the data synchronization method, the present application provides a data synchronization method, which is applied to an open source stream processing platform, where the open source stream processing platform is provided with a plurality of clusters, each cluster includes at least one theme, where the theme includes a normal theme and a mirror theme, and the data synchronization method includes the following steps:
step S10, receiving a data writing request sent by a producer, wherein the data writing request comprises a theme identifier and data to be written;
the current open source data synchronization scheme uses MirrorMaker (including MM and MM 2), which is mainly implemented based on a consumer- > producer model, and is implemented by pulling data off by a consumer, unpacking the data, and then sending the data to a backup Topic (i.e. a theme) by a producer package, so that MM services need to be deployed in Kafka by using additional resources, and the Offset of message data can be guaranteed to be relatively consistent based on MirrorMaker. However, the following disadvantages exist: the method mainly comprises the steps that additional services and resources are required to be deployed to provide data mirror image services, the data synchronization performance is poor in a consumption- > production mode, the CPU utilization rate and the transmissibility are mainly achieved, only the data replication capability is achieved, a client is required to manually reconfigure and restart a switching cluster, only one-way synchronization is achieved through Offset synchronization, reverse is not achieved, the Offset of mirror images Topic is inconsistent, only relative consistency can be maintained, the client is required to manage the Offset by itself, and the client is required to perform Offset conversion when the cluster is switched.
Based on the above phenomenon, in this embodiment, the secondary development is performed on the source code of Kafka, the data replication function is extended, the data replication is realized by crossing the cluster Topic and the Offset based on the Kafka replication principle, the Topic of the main cluster and the Topic of the standby cluster keep real-time data synchronization, no additional synchronization tool is required to be deployed, and the main and standby Topic relationship can be created for many times. The main/standby Topic scheme is suitable for scenes such as cross-region/available region synchronous data, cluster disaster recovery switching, read-write separation and migration, and the like, and can improve cluster availability and disaster recovery capacity and data synchronization performance.
The open source processing platform Kafka is a high throughput distributed publish-subscribe messaging system, which refers to a system that passes data between applications or subroutines so that applications need only be concerned with the data processing itself and not how the data is shared. In a publish and subscribe mode system, message persistence is based on Topic, and the message publisher is also referred to as a message producer, i.e., the producer may be a terminal that publishes a message, including a client and a server, and the subscriber to the message may be referred to as a message consumer. That is, the producer may be a terminal subscribing to messages, including a client and a server, and the consumer may subscribe to messages in one or more topics at the same time, further, kafka operates on a cluster, and one or more server Kafka clusters may store the stream data records in a classified manner, and the classified names are called topics (i.e. topics).
Further, a topic identifier may be set for each topic in the cluster, where the topic identifier is used to uniquely identify a topic, the producer needs to write data into the open source stream processing platform Kafka, and a data writing request may be sent to Kafka, where the data writing request at least includes the topic identifier and data to be written, and after receiving the data writing request, the topic identifier and the data to be written are unpacked, where the data to be written is the data in the pre-written topic.
In an embodiment, the step of receiving a data writing request sent by a producer, where the data writing request includes a topic identification and data to be written includes:
step a, setting a cluster role of each cluster, wherein the cluster role comprises a main cluster and a standby cluster, and the main clusters and the standby clusters are in one-to-one correspondence;
in an exemplary embodiment, a cluster role of each cluster may be set, and it should be noted that each main cluster may set a single backup cluster corresponding to the main cluster, where the backup clusters are used for backing up and synchronizing data in the main cluster, and both a producer and a consumer may write data into and read data from the main cluster, and in particular, if the number of consumers for reading data from a certain topic in the main cluster in the same preset time period is greater than a preset threshold, a part of consumers may also be switched to read data from the backup clusters corresponding to the main cluster, so as to share the data reading pressure of the main cluster.
Further, the main cluster and the corresponding standby cluster in the embodiment can be deployed in the same or different areas, and when the clusters are deployed in a cross-area mode (for example, the main cluster is deployed in Beijing, and the standby cluster is deployed in Shanghai), the problems of cross-machine-room bandwidth and delay can be solved, and local reading and writing can be prioritized. When the same-area multi-cluster deployment is performed, the cluster switching capability can be improved, the switching is not perceived by a user, and the local configuration is not required to be modified or the application is not required to be restarted.
The backup of the backup cluster to the primary cluster may be a full data backup (backup of all data in the primary cluster) or a partial data backup (backup of partial data in the primary cluster). Further, for the corresponding relation between the main cluster and the backup cluster, after the corresponding relation between one main cluster and one backup cluster for carrying out data backup is set, the main cluster and the backup cluster can be considered to be bound, if the backup cluster for carrying out data backup on the main cluster needs to be changed, the binding between the original main cluster and the backup cluster can be firstly released, after the binding is released, the corresponding relation between the new backup cluster and the main cluster is established, the binding between the main cluster and the new backup cluster is carried out, and the data backup is carried out on the main cluster through the new backup cluster. Namely, the corresponding binding relation between the main cluster and the standby cluster can be reset after the binding is released.
Step b, traversing each main cluster in turn, and setting all topics in the traversed main clusters as normal topics;
it should be noted that, the user may set part or all of the topics in the main cluster as normal topics according to the actual needs, and it can be understood that if a certain topic is a normal topic in the main cluster, there will be a mirror topic corresponding to the normal topic in the backup cluster corresponding to the main cluster, and if a certain topic is a mirror topic in the main cluster, there will be a normal topic corresponding to the mirror topic in the backup cluster corresponding to the main cluster, and data synchronization is maintained between the normal topic and the mirror topic. Preferably, in this embodiment, all topics in the primary cluster are set as normal topics, but when the primary cluster is switched (i.e. the primary cluster is switched to the backup cluster, the backup cluster is switched to the primary cluster), the normal topics in the primary cluster (i.e. the primary cluster before switching) are switched to mirror topics, and the mirror topics in the primary cluster (i.e. the backup cluster before switching) are switched to normal topics.
It should be noted that, the producer and the consumer generally write data into and read data from the normal subject, and particularly, if the number of consumers that read data from a certain normal subject in the same preset time period is greater than a preset threshold, a part of consumers may be switched to read data from the mirror image subject corresponding to the normal subject, so as to share the data reading pressure of the normal subject. For example, referring to fig. 7, a normal Topic TopicA is in the master cluster clusteria, a producer P and a consumer C read and write data from the normal Topic TopicA (the producer writes data and the consumer reads data), a database D is connected to the normal Topic TopicA, a solid circle represents the normal Topic in the figure, a dotted circle represents the mirror Topic, and when the number of consumers reading data from the normal Topic TopicA in a certain preset period is greater than the preset number, a part of consumers C can be migrated to the mirror Topic TopicA in clusterib which keeps data synchronization with the master cluster clusteria to read data, so as to reduce the normal topicaread pressure.
In addition, a plurality of read topics can be copied from the region to region, and the user can read the data of the nearby topics, if one normal Topic copies one read Topic at each of the X, Y, Z three places, then the user A can read the data through the read Topic of the X region if the geographic position of the user A is close to the X place, and the user B can read the data through the read Topic of the Y region if the geographic position of the user A is close to the Y place. For example, referring to fig. 8, a region a (region a) is deployed with a master cluster, a producer P and a consumer C read and write data from and to a database D of normal topics TopicA in the cluster, when the number of consumers reading data from and to normal topics TopicA within a certain preset period is greater than a preset number, one or more read topics may be copied from the clusters in the region a (region a) and the region B (region B), respectively, for example, one read TopicA may be copied from the clusters clusterib in the region a, the cluster clusteric in the region B, and the cluster clusterid in the region B, respectively, and the consumer C may select one topicbased on a proximity principle or randomly.
And c, determining a backup cluster corresponding to the traversed main cluster, copying all normal topics included in the traversed main cluster into the backup cluster corresponding to the traversed main cluster, and setting all topics copied into the backup cluster as mirror image topics, wherein the normal topics and the mirror image topics are in one-to-one correspondence.
The topics in the primary cluster and the topics in the backup clusters should be in one-to-one correspondence, for example, the normal Topic1 in a certain primary cluster, then there will be a mirrored Topic1 'corresponding to Topic1 uniquely in the backup cluster corresponding to the primary cluster (i.e. the cluster keeping data synchronization with the primary cluster), and Topic1 keeps data synchronization with Topic 1'. It can be understood that the normal topics in the primary cluster and the mirror topics in the backup cluster are set in a one-to-one correspondence, and similar to the primary cluster, all topics copied into the backup cluster are set as mirror topics in this embodiment.
It should be noted that, the topics in the backup clusters are copied from the primary clusters, that is, after the primary and backup clusters establish a corresponding relationship, the primary clusters copy all the topics included in the primary and backup clusters into the backup clusters one by one, and set the topics copied into the backup clusters as mirror topics, so that mirror topics corresponding to normal topics in the primary clusters one by one are obtained in the backup clusters.
Further, if the topic type of a topic is modified, the topic type corresponding to the topic will be correspondingly compiled, for example, a normal topic TopicA is in a certain master cluster clusteri, a mirrored topic TopicA ' corresponding to TopicA is in a backup cluster clusteri corresponding to clusteri, then the topic type of the normal topic TopicA is assumed to be modified as a mirrored topic, the topic type of the original mirrored topic TopicA ' will also be correspondingly modified as a normal topic, and similarly, the topic type of the original normal topic TopicA ' is assumed to be modified as a normal topic. The mirrored Topic and normal Topic remain identical, including the Topic name, partition number, message Offset, configuration, etc., all remain identical.
Step S20, determining a target normal theme matched with the theme identification in the normal themes, and writing the data to be written into the target normal theme;
and matching the topic identification of each normal topic with the topic identification in the data writing request, taking the normal topic matched with the topic identification in the data request as a target normal topic, and writing the data to be written into the target normal topic. It should be noted that, the external producer usually can only directly write data to the normal theme, but does not have the authority of directly writing between the mirror image themes, alternatively, a component for capturing the mirror image theme may be set in the cluster, and after the component is started, data is written to the mirror image, for example, the mirror image capturing component may be MirrorFetcher.
Step S30, determining a target mirror image theme matched with the target normal theme in the mirror image themes, and copying data to be written in the target normal theme into the target mirror image theme so as to enable the target normal theme to keep data synchronization with the target mirror image theme.
Alternatively, the data to be written in the target normal theme may be copied by starting the mirror grabbing component mirrorFetcher, and the copied data to be written is written in the target mirror theme, where mirrorFetcher is used to synchronize Topic data, including common Topic (Topic created by a user) and internal Topic (Topic owned by Kafka), including the Offset and metadata (metadata) information of the synchronized Topic.
Therefore, it can be understood that no additional external service resource needs to be deployed, data synchronization is kept between the normal theme and the mirror theme, and the normal theme and the mirror theme are in different clusters, typically, the normal theme is in a main cluster, and the mirror theme is in a standby cluster, so that data synchronization across clusters is realized.
In this embodiment, when a data writing request sent by a producer is received, the data to be written is written into a target normal theme in the open source stream processing platform Kafka, and the data to be written is also written into a target mirror theme corresponding to the target normal theme, so that real-time data synchronization between the target normal theme and the target mirror theme is realized, and data synchronization between the normal theme and the mirror theme in the Kafka can be realized without deploying an additional data synchronization tool, thereby reducing the realization cost of data synchronization of the open source stream processing platform.
Further, based on the first embodiment of the present application, a second embodiment of the data synchronization method of the present application is provided, in this embodiment, step S10 of the foregoing embodiment, the step of receiving a data writing request sent by a producer, where the data writing request includes a theme identifier and data to be written, and before the step of:
step d, traversing each normal theme in turn, and obtaining metadata information of the traversed normal theme, wherein the metadata information comprises one or more of theme setting, an access control list and partition quantity;
after the corresponding relation between the main cluster and the standby cluster and the corresponding relation between the normal theme and the mirror theme are set, the metadata information of the normal theme can be synchronized to the mirror theme corresponding to the normal theme, wherein the metadata information includes but is not limited to theme setting (Topic config), access control list (ACL, access Control Lists) and partition number. It should be noted that after the capacity expansion of the main Topic (i.e., normal Topic) partition, the backup Topic (i.e., mirror image Topic) partition may automatically expand, but the capacity expansion of the backup Topic partition may not be performed by the main Topic, and the number of backup Topic partitions is greater than or equal to the number of main Topic partitions.
And e, determining the mirror image theme corresponding to the traversed normal theme, and synchronizing the metadata information to the mirror image theme corresponding to the traversed normal theme.
The metadata synchronization component, such as a mirrorconnector component, may be preconfigured to synchronize the metadata information of the normal Topic to the corresponding mirror image Topic, and it should be noted that, the correspondence relationship in the first embodiment and the second embodiment of the present application refers to a relationship in which data synchronization is maintained.
In this embodiment, metadata information of a normal theme is synchronized to a mirror theme corresponding to the normal theme, so that complete synchronization between the normal theme and the mirror theme is ensured, availability of the normal theme and the mirror theme is ensured, and high availability of a main cluster and a standby cluster and a main Topic and a standby Topic is realized.
In an embodiment, the step of writing the data to be written to the target image theme includes:
f, determining a target backup cluster matched with the target mirror image subject in each backup cluster, and starting a mirror image grabbing component in the target backup cluster;
the topic identification of all target mirror image topics included in each backup cluster can be recorded, so that the backup cluster where the topic identification is located can be found out by acquiring the topic identification of the target mirror image topic to serve as a target backup cluster matched with the target mirror image topic, and a mirror image grabbing component in the target backup cluster, such as a MirrorFetcher component, is started, and data is written into the target mirror image topic by the mirror image grabbing component.
Step g, capturing and copying the data to be written from the target normal subject based on the mirror image capturing component to obtain a data copy to be written;
and h, writing the data copy to be written into the target mirror theme.
In this embodiment, by starting the image capturing component in the target backup cluster, copying the data to be written in the normal target theme to obtain the data copy to be written, and writing the data copy to be written in the target image theme instead of directly writing the data to be written in the target image theme by an external producer, thereby ensuring the safe writing of the data of the image theme.
In one embodiment, the step of writing the data to be written into the target normal subject includes:
step i, determining a target main cluster matched with the target normal subject in each main cluster, and acquiring the current available state of the target main cluster;
step j, if the current available state comprises an available state, writing the data to be written into the target normal subject;
and step k, if the current available state comprises a fault state, taking the target mirror theme as a new target normal theme, and writing the data to be written into the new target normal theme.
In this embodiment, when a user determines that a certain primary cluster fails and is unstable or unavailable, disaster recovery migration can be performed on the primary cluster, and normal read-write operation is performed by rapidly switching to the backup cluster. For example, referring to fig. 3, when the master cluster, clusteria, is operating normally, producer P and consumer C read and write data from and to normal topic topicas in clusteria (producer and consumer read and write data), database D is connected to topicas, clusterib and mirror topic topicas in the backup cluster keep data synchronization with normal topic topicas, but do not read data from mirror topic topicas, the normal topic is indicated by solid circles, the mirror topic is indicated by dashed circles, and when clusteria fails or is unstable, and it is required to migrate the master cluster, referring to fig. 4, producer P and consumer B read and write data from and to normal topic topicas in clusteria cluster are switched to data read and write to topic topicas in clusterib cluster, database D is also migrated into clusterib cluster, clusterib cluster is set as the master cluster, and topicas in clusterib cluster are set as the normal topic.
In this embodiment, when the target main cluster fails and is unavailable or unstable, the target mirror image theme is used as a new target normal theme, and the data to be written is written into the new target normal theme, so that the failure resistance of the cluster is improved, and when the cluster fails, successful writing of the data is still ensured.
In an embodiment, before the step of obtaining the current available state of the target primary cluster, the method includes:
step l, acquiring the use state data of the target main cluster, wherein the use state data comprises one or more of memory occupation information, port available information and process state information;
step m, detecting whether the use state data is matched with preset fault data;
step n, if the use state data is successfully matched with the preset fault data, sending a fault verification message to a preset user terminal;
step n1, if a verification success message sent by the user terminal is received, taking a fault state as a current available state of the target main cluster;
and n2, if the verification failure message sent by the user terminal is received, the usable state is used as the current available state of the target main cluster.
In this embodiment, the usage status of each primary cluster may be detected in real time, the usage status data of each primary cluster may be obtained in real time or periodically, each usage status data is matched with preset fault data, if the matching is successful, a prompt message may be output, where the prompt message includes sending a fault verification message to a preset user terminal to allow the user to check whether the cluster is actually unavailable, if the verification message sent by the user terminal feeds back that the primary cluster is unavailable, it indicates that a verification success message sent by the user terminal is received, then the current available status of the primary cluster may be marked as a fault status, and if the verification message sent by the user terminal feeds back that the primary cluster is available, then the current available status of the primary cluster may be marked as an available status. All primary clusters are initially marked as usable.
In an embodiment, after the step of taking the failure state as the current available state of the target primary cluster, the step of:
step p, determining a target standby cluster matched with the determined target mirror image theme in each standby cluster, and setting the cluster role of the target standby cluster as a main cluster;
and q, setting the cluster role of the target main cluster as a standby cluster.
In this embodiment, when the current available state of the target primary cluster is detected to be a fault state, the target primary cluster is migrated, that is, the cluster role of the target backup cluster is set as the primary cluster, the cluster role of the target primary cluster is set as the backup cluster, and the topic types in the corresponding clusters are also switched correspondingly, that is, the normal topic is switched to the mirror topic, and the mirror topic is switched to the normal topic.
In a possible embodiment, the Topic security level changes, the cluster capacity limits, and other factors cause Topic to need to be migrated from one cluster to another, and Topic may also be migrated between clusters at will without user cooperation. For example, referring to fig. 5, the primary cluster ClusterA, clusterA before migration includes a normal topic TopicA, the producer P and the consumer C read and write data from and to the database D of the normal topic TopicA (the producer writes data and the consumer reads data), the mirrored topic TopicA of the backup cluster clusteri maintains data synchronization with the normal topic TopicA of the primary cluster clusteri, and after migration, referring to fig. 6, the producer P, the consumer C and the database D in the original clusteri are all migrated into the clusteri cluster, and the original mirrored topic TopicA in the clusteri cluster is switched to the normal topic.
In one possible embodiment, since service tuning or machine room tuning requires that the cluster be migrated to another machine room, topic migration can be implemented on two independent clusters without user cooperation, while cross-version migration is possible. Similar to the migration flow, the risk of upgrading a cluster across large versions is relatively large, the stability of the whole cluster may be affected by an upgrading node, and some system configuration adjustment is difficult to realize in cluster rolling upgrading, such as the splitting of a control plane (network layer) and a DataPlane (data layer). But the embodiment can realize partial Topic increment migration to reduce the risk in the whole upgrading process.
In particular, if a request of a client is received during a cluster migration (switching) process caused by unavailability, instability or other factors of a main cluster, including but not limited to a data reading request of a consumer and a data writing request of a producer, the request can be forwarded or blocked, a data forwarding component can be preset to forward the request to a standby cluster, a data blocking component can be preset to block the request, for example, a forwarding component is responsible for forwarding the request of the client, the request can be forwarded to the standby cluster when the switching occurs, and a Breaker component is responsible for blocking the request of the client, so that the client performs metadata change to achieve the switching purpose. Further, whether to forward the request or block the request may be determined according to an identifier of the client, FOR example, after receiving the request of the client, a request identifier of the request is obtained, if the request identifier matches a preset forwarding identifier, the request is forwarded, and if the request identifier matches a preset blocking identifier, the request is blocked, where the forwarding identifier includes, but is not limited to, api_ VERSIONS, METADATA, FIND _ COORDINATOR, INIT _process_id, and the blocking identifier includes, but is not limited to, PRODUCE, FETCH, OFFSETCOMMIT, OFFSETFETCH, JOINGROUP, HEARTBEAT, OFFSET _for_leader_epoch.
In addition, referring to fig. 9, the present application further provides a data synchronization device, where the data synchronization device includes an open source stream processing platform, where the open source stream processing platform is provided with a plurality of clusters, each cluster includes at least one theme, where the theme includes a normal theme and a mirror theme, and the data synchronization device further includes:
the receiving module A10 is used for receiving a data writing request sent by a producer, wherein the data writing request comprises a theme identifier and data to be written;
the writing module A20 is used for determining a target normal theme matched with the theme identification in the normal themes, and writing the data to be written into the target normal theme;
and the synchronization module A30 is used for determining a target mirror image theme matched with the target normal theme in the mirror image themes, and copying the data to be written in the target normal theme into the target mirror image theme so as to enable the target normal theme to keep data synchronization with the target mirror image theme.
In addition, the embodiment of the application also provides a data synchronization device, which comprises a memory, a processor and a data synchronization program stored in the memory and executable on the processor, wherein the data synchronization program realizes the steps of the data synchronization method when being executed by the processor.
The specific implementation manner of the data synchronization device of the present application is substantially the same as that of each embodiment of the data synchronization method described above, and will not be repeated here.
In addition, in order to achieve the above object, the present application also provides a computer-readable storage medium, on which a data synchronization program is stored, which when executed by a processor, implements the steps of the data synchronization method as described above.
The specific implementation manner of the computer readable storage medium of the present application is basically the same as the above embodiments of the data synchronization method, and will not be repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a computer readable storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above, including several instructions for causing a terminal device (which may be a mobile phone, a computer, a cloud server, an air conditioner, or a network device, etc.) to perform the method according to the embodiments of the present application.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (10)

1. The data synchronization method is applied to an open source stream processing platform, the open source stream processing platform is provided with a plurality of clusters, each cluster comprises at least one theme, the theme comprises a normal theme and a mirror image theme, and the data synchronization method comprises the following steps:
receiving a data writing request sent by a producer, wherein the data writing request comprises a theme identifier and data to be written;
determining a target normal theme matched with the theme identification in the normal themes, and writing the data to be written into the target normal theme;
and determining a target mirror image theme matched with the target normal theme in the mirror image themes, and copying data to be written in the target normal theme into the target mirror image theme so as to enable the target normal theme and the target mirror image theme to keep data synchronization.
2. The data synchronization method of claim 1, wherein prior to the step of receiving the producer transmitted data write request, comprising:
setting a cluster role of each cluster, wherein the cluster role comprises a main cluster and a standby cluster, and the main clusters and the standby clusters are in one-to-one correspondence;
traversing each main cluster in turn, and setting all topics in the traversed main clusters as normal topics;
determining a backup cluster corresponding to the traversed main cluster, copying all normal topics included in the traversed main cluster into the backup cluster corresponding to the traversed main cluster, and setting all topics copied into the backup cluster as mirror image topics, wherein the normal topics and the mirror image topics are in one-to-one correspondence.
3. The data synchronization method according to claim 2, wherein after the step of setting all topics copied into the backup cluster as mirror topics, the method comprises:
traversing each normal theme in turn to obtain metadata information of the traversed normal theme, wherein the metadata information comprises one or more of theme setting, an access control list and partition quantity;
and determining the mirror image theme corresponding to the traversed normal theme, and synchronizing the metadata information to the mirror image theme corresponding to the traversed normal theme.
4. The data synchronization method according to claim 2, wherein the step of copying the data to be written in the target normal subject to the target mirror subject includes:
determining target backup clusters matched with the target mirror image subject in each backup cluster, and starting a mirror image grabbing component in the target backup clusters;
capturing and copying the data to be written from the target normal subject based on the mirror image capturing component to obtain a data copy to be written;
and writing the data copy to be written into the target mirror theme.
5. The data synchronization method according to claim 2, wherein the step of writing the data to be written to the target normal subject includes:
determining a target main cluster matched with the determined target normal subject in each main cluster, and acquiring the current available state of the target main cluster;
if the current available state comprises a usable state, writing the data to be written into the target normal theme;
and if the current available state comprises a fault state, taking the target mirror theme as a new target normal theme, and writing the data to be written into the new target normal theme.
6. The data synchronization method of claim 5, wherein prior to the step of obtaining the current availability status of the target primary cluster, comprising:
acquiring the use state data of the target main cluster, wherein the use state data comprises one or more of memory occupation information, port available information and process state information;
detecting whether the use state data is matched with preset fault data or not;
if the using state data is successfully matched with the preset fault data, a fault verification message is sent to a preset user terminal;
if a verification success message sent by the user terminal is received, taking the fault state as the current available state of the target main cluster;
and if the verification failure message sent by the user terminal is received, the usable state is used as the current usable state of the target main cluster.
7. The data synchronization method of claim 6, wherein after the step of regarding the failure state as the current available state of the target primary cluster, comprising:
determining target backup clusters matched with the target mirror image subject in each backup cluster, and setting the cluster roles of the target backup clusters as main clusters;
and setting the cluster role of the target main cluster as a standby cluster.
8. The data synchronization device is characterized by comprising an open source stream processing platform, wherein the open source stream processing platform is provided with a plurality of clusters, each cluster comprises at least one theme, the theme comprises a normal theme and a mirror image theme, and the data synchronization device further comprises:
the receiving module is used for receiving a data writing request sent by a producer, wherein the data writing request comprises a theme identifier and data to be written;
the writing module is used for determining a target normal theme matched with the theme identification in the normal themes and writing the data to be written into the target normal theme;
and the synchronization module is used for determining a target mirror image theme matched with the target normal theme in the mirror image themes, and copying the data to be written in the target normal theme into the target mirror image theme so as to enable the target normal theme to keep data synchronization with the target mirror image theme.
9. A data synchronization device, the data synchronization device comprising: memory, a processor and a data synchronization program stored on the memory and executable on the processor, which when executed by the processor, implements the steps of the data synchronization method according to any one of claims 1 to 7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a data synchronization program, which when executed by a processor, implements the steps of the data synchronization method according to any of claims 1 to 7.
CN202310672148.9A 2023-06-07 2023-06-07 Data synchronization method, device, equipment and computer readable storage medium Pending CN116701533A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310672148.9A CN116701533A (en) 2023-06-07 2023-06-07 Data synchronization method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310672148.9A CN116701533A (en) 2023-06-07 2023-06-07 Data synchronization method, device, equipment and computer readable storage medium

Publications (1)

Publication Number Publication Date
CN116701533A true CN116701533A (en) 2023-09-05

Family

ID=87840492

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310672148.9A Pending CN116701533A (en) 2023-06-07 2023-06-07 Data synchronization method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN116701533A (en)

Similar Documents

Publication Publication Date Title
CN111314479B (en) Data processing method and equipment
CN109376197B (en) Data synchronization method, server and computer storage medium
CN109151045B (en) Distributed cloud system and monitoring method
CN110581782B (en) Disaster tolerance data processing method, device and system
CN111045745A (en) Method and system for managing configuration information
CN110099084B (en) Method, system and computer readable medium for ensuring storage service availability
WO2020063600A1 (en) Data disaster recovery method and site
CN107657027B (en) Data storage method and device
CN110022338B (en) File reading method and system, metadata server and user equipment
CN108228581B (en) Zookeeper compatible communication method, server and system
CN114124650A (en) Master-slave deployment method of SPTN (shortest Path bridging) network controller
CN114900449B (en) Resource information management method, system and device
CN115658390A (en) Container disaster tolerance method, system, device, equipment and computer readable storage medium
CN115277727A (en) Data disaster recovery method, system, device and storage medium
US20240054054A1 (en) Data Backup Method and System, and Related Device
CN116701533A (en) Data synchronization method, device, equipment and computer readable storage medium
CN107404511B (en) Method and device for replacing servers in cluster
CN116389233A (en) Container cloud management platform active-standby switching system, method and device and computer equipment
CN101242201B (en) A master-slave system maintenance method, system and device
CN108429813B (en) Disaster recovery method, system and terminal for cloud storage service
CN115705269A (en) Data synchronization method, system, server and storage medium
CN115037745B (en) Method and device for electing in distributed system
CN111010448B (en) Distributed message system and data center DC
CN109542353B (en) Consistency algorithm for wide area distributed storage system
CN106878399B (en) Data sending method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination