CN111935320A - Data synchronization method, related device, equipment and storage medium - Google Patents

Data synchronization method, related device, equipment and storage medium Download PDF

Info

Publication number
CN111935320A
CN111935320A CN202011044182.4A CN202011044182A CN111935320A CN 111935320 A CN111935320 A CN 111935320A CN 202011044182 A CN202011044182 A CN 202011044182A CN 111935320 A CN111935320 A CN 111935320A
Authority
CN
China
Prior art keywords
node
upstream
information
nodes
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011044182.4A
Other languages
Chinese (zh)
Other versions
CN111935320B (en
Inventor
秦凯悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN202011044182.4A priority Critical patent/CN111935320B/en
Publication of CN111935320A publication Critical patent/CN111935320A/en
Application granted granted Critical
Publication of CN111935320B publication Critical patent/CN111935320B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1095Replication or mirroring of data, e.g. scheduling or transport for data synchronisation between network nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a data synchronization method based on cloud technology, which can be applied to the field of cloud storage, and particularly relates to data storage and data reading. The method provided by the application comprises the following steps: acquiring a first node information set corresponding to a target node; determining M node scores according to the first node information set; determining a first upstream node from the M upstream nodes according to the M node scores; and if the first upstream node meets the node registration condition, receiving the data sent by the first upstream node. The application also provides a related device, equipment and a storage medium. In the application, for the learners, each upstream node is scored, the optimal upstream node is screened out according to the node scores of the upstream nodes, and the data is synchronized to the learners by the upstream nodes, so that the data synchronization can be efficiently completed.

Description

Data synchronization method, related device, equipment and storage medium
Technical Field
The present application relates to the field of storage, and in particular, to a method for data synchronization, a related apparatus, a server, and a storage medium.
Background
With the development of the current internet, cloud customers pay more and more attention to data security, and a large number of industries have cross-machine room and cross-region requirements on data storage. In a distributed system, to be able to combat a failure condition, data may be kept as multiple copies, a copy being kept by each upstream node of the replica group. The replication group comprises a main node (leader) and a slave node (follower), the leader stores a main copy, and the follower stores a slave copy.
When the amount of data reading requests is large, more nodes are needed to share the requests, and if more folders are added to the replication group to bear the data reading requests, a greater burden is imposed on the leader. In order to solve the above problem, a new node role, i.e. a learning node (learner), is added, and the learner can synchronize the latest data of the follower or the upstream learner, so as to provide a corresponding service for the data reading request.
For the learner, a follower may be randomly selected as a node for data synchronization, however, the follower may not provide efficient service for the learner, resulting in inefficient data synchronization.
Disclosure of Invention
The embodiment of the application provides a data synchronization method, a related device, equipment and a storage medium, for a learner, each upstream node is scored, an optimal upstream node is screened out according to the node scores of the upstream nodes, and data is synchronized to the learner by the upstream nodes, so that data synchronization can be efficiently completed.
In view of the above, an aspect of the present application provides a method for data synchronization, including:
acquiring a first node information set corresponding to a target node, wherein the first node information set comprises first node information of M upstream nodes, the M upstream nodes are upstream nodes of the target node, and M is an integer greater than or equal to 1;
determining M node scores according to the first node information set, wherein the M node scores have a one-to-one correspondence relationship with M upstream nodes;
determining a first upstream node from the M upstream nodes according to the M node scores, wherein the node score corresponding to the first upstream node is the maximum value of the M node scores;
and if the first upstream node meets the node registration condition, receiving the data sent by the first upstream node.
In one possible design, in another implementation manner of an aspect of the embodiment of the present application, the first node information includes role information of an upstream node and association information of the upstream node;
determining M node scores according to the first node information set, including:
for each first node information in the first node information set, if the role information of the upstream node indicates that the upstream node is a main node, determining that the node score of the upstream node is a preset value;
and for each piece of first node information in the first node information set, if the state information of the upstream node indicates that the upstream node is a slave node or a learning node, determining a node score according to the association information of the upstream node, wherein the node score is greater than or equal to a preset value.
In one possible design, in another implementation manner of an aspect of the embodiment of the present application, the first node information includes state information and role information of an upstream node, and association information of the upstream node;
determining M node scores according to the first node information set, including:
for each first node information in the first node information set, if the role information of the upstream node indicates that the upstream node is a main node, or the state information of the upstream node indicates an abnormal state, determining that the node score of the upstream node is a preset value;
for each first node information in the first node information set, if the state information of the upstream node indicates that the upstream node is a slave node or a learning node and indicates a normal state, determining a node score according to the association information of the upstream node, wherein the node score is greater than or equal to a preset value.
Another aspect of the present application provides a data synchronization apparatus, including:
the node information acquisition module is used for acquiring a first node information set corresponding to a target node, wherein the first node information set comprises first node information of M upstream nodes, the M upstream nodes are upstream nodes of the target node, and M is an integer greater than or equal to 1;
the determining module is used for determining M node scores according to the first node information set, wherein the M node scores and the M upstream nodes have one-to-one correspondence;
the determining module is further used for determining a first upstream node from the M upstream nodes according to the M node scores, wherein the node score corresponding to the first upstream node is the maximum value of the M node scores;
and the synchronization module is used for receiving the data sent by the first upstream node if the first upstream node meets the node registration condition.
In one possible design, in one implementation of another aspect of an embodiment of the present application,
the acquisition module is specifically used for determining M upstream nodes according to first upstream list information corresponding to the target node;
sending an information acquisition request to each upstream node in the M upstream nodes so that each upstream node responds to the information acquisition request and acquires first node information of each upstream node;
when first node information sent by each upstream node is received, a first node information set is obtained.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the first node information includes state information of an upstream node and association information of the upstream node;
the determining module is specifically configured to determine, for each piece of first node information in the first node information set, that a node score of an upstream node is a preset value if the state information of the upstream node indicates an abnormal state;
and for each first node information in the first node information set, if the state information of the upstream node indicates a normal state, determining a node score according to the association information of the upstream node, wherein the node score is greater than or equal to a preset value.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the first node information includes role information of an upstream node and association information of the upstream node;
the determining module is specifically configured to determine, for each piece of first node information in the first node information set, that a node score of an upstream node is a preset value if the role information of the upstream node indicates that the upstream node is a master node;
and for each piece of first node information in the first node information set, if the state information of the upstream node indicates that the upstream node is a slave node or a learning node, determining a node score according to the association information of the upstream node, wherein the node score is greater than or equal to a preset value.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the first node information includes state information and role information of the upstream node, and association information of the upstream node;
the determining module is specifically configured to determine, for each piece of first node information in the first node information set, that a node score of an upstream node is a preset value if role information of the upstream node indicates that the upstream node is a master node, or if state information of the upstream node indicates an abnormal state;
for each first node information in the first node information set, if the state information of the upstream node indicates that the upstream node is a slave node or a learning node and indicates a normal state, determining a node score according to the association information of the upstream node, wherein the node score is greater than or equal to a preset value.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the association information of the upstream node includes a number of mounted nodes of the upstream node, a depth of the upstream node, load information of the upstream node, and submitted log information of the upstream node;
the determining module is specifically used for calculating a score intermediate quantity according to the number of the mounting nodes of the upstream nodes, the depth of the upstream nodes and the load information of the upstream nodes;
and calculating to obtain the node score of the upstream node according to the intermediate score quantity and the submitted log information of the upstream node, wherein the node score is positively correlated with the submitted log information of the upstream node, and the node score is negatively correlated with the number of mounted nodes of the upstream node, the depth of the upstream node and the load information of the upstream node.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
the determining module is specifically used for carrying out sorting processing on the scores of the M nodes according to a descending order to obtain a sorting result;
sequencing the nodes in the first upstream list information according to the sequencing result to obtain second upstream list information, wherein the second upstream list information comprises M upstream nodes;
the first upstream node in the second upstream list information is determined to be the first upstream node.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the data synchronization apparatus further includes a sending module;
a sending module, configured to send a node registration request to a first upstream node after the determining module determines the first upstream node from the M upstream nodes according to the M node scores, so that the first upstream node determines the loadable quantity according to the node registration request;
the determining module is further used for determining that the first upstream node meets the node registration condition if the number of the loadable nodes is greater than or equal to 1;
the determining module is further configured to determine that the first upstream node does not satisfy the node registration condition if the number of loadable nodes is 0.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the data synchronization apparatus further includes a sending module and a receiving module;
a sending module, configured to send a node registration request to a first upstream node after the determining module determines the first upstream node from the M upstream nodes according to the M node scores;
the receiving module is used for receiving a node registration response sent by the first upstream node, wherein the node registration response carries role information of the first upstream node;
the determining module is further used for determining that the first upstream node meets the node registration condition if the role information indicates that the first upstream node is a slave node or a learning node;
the determining module is further configured to determine that the first upstream node does not satisfy the node registration condition if the role information indicates that the first upstream node is the master node.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
the determining module is further configured to determine, after determining a first upstream node from the M upstream nodes according to the M node scores, a second upstream node from the M upstream nodes according to the M node scores if the first upstream node does not satisfy the node registration condition, where a node score corresponding to the second upstream node is a second largest value of the M node scores;
and the synchronization module is further used for receiving the data sent by the second upstream node if the second upstream node meets the node registration condition.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the data synchronization apparatus further includes a sending module and a receiving module;
the receiving module is further used for receiving a data reading request sent by the network device after the synchronization module receives the data sent by the first upstream node;
and the sending module is also used for sending the metadata information to the network equipment according to the data reading request.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the data synchronization apparatus further includes a sending module and a receiving module;
the receiving module is further configured to receive a first data reading request sent by the network device after the synchronization module receives the data sent by the first upstream node;
the sending module is further configured to send the first metadata information to the network device according to the first data reading request, where the network device is further configured to send a second data reading request to other nodes except the target node, and the second data reading request is used to obtain the second metadata information.
In one possible design, in another implementation manner of another aspect of the embodiment of the present application, the data synchronization apparatus further includes a sending module and a receiving module;
the synchronization module is specifically used for receiving an initial query request sent by a first network device;
synchronizing target metadata information in the first upstream node to the target node according to the initial query request;
the receiving module is further configured to receive a data query request sent by the second network device after the synchronization module receives the data sent by the first upstream node;
and the sending module is also used for sending the target metadata information to the second network equipment according to the data query request.
In one possible design, in another implementation of another aspect of an embodiment of the present application,
the acquisition module is further configured to acquire a second node information set corresponding to the target node when the first upstream node fails after the synchronization module receives data sent by the first upstream node, where the second node information set includes second node information of N upstream nodes, the N upstream nodes are upstream nodes of the target node, and N is an integer greater than or equal to 1;
the determining module is further used for determining N node scores according to the second node information set, wherein the N node scores and the N upstream nodes have one-to-one correspondence;
the determining module is further configured to determine a third upstream node from the N upstream nodes according to the N node scores, where a node score corresponding to the third upstream node is a maximum value of the N node scores;
and the synchronization module is further used for synchronizing the data in the third upstream node to the target node if the third upstream node meets the node registration condition.
Another aspect of the present application provides a computer device, comprising: a memory, a transceiver, a processor, and a bus system;
wherein, the memory is used for storing programs;
a processor for executing the program in the memory, the processor for performing the above-described aspects of the method according to instructions in the program code;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
Another aspect of the present application provides a computer-readable storage medium having stored therein instructions, which when executed on a computer, cause the computer to perform the method of the above-described aspects.
In another aspect of the application, a computer program product or computer program is provided, the computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions to cause the computer device to perform the method provided by the above aspects.
According to the technical scheme, the embodiment of the application has the following advantages:
in the embodiment of the application, a data synchronization method is provided, where a target node first obtains a first node information set corresponding to the target node, and then determines M node scores according to the first node information set, so that the target node may continue to determine a first upstream node from M upstream nodes according to the M node scores, and if the first upstream node meets a node registration condition, receive data sent by the first upstream node. Through the method, for the learners, a scoring mechanism is adopted to score each upstream node, the optimal upstream node is screened out according to the node scores of the upstream nodes, and the data is synchronized to the learners by the upstream nodes, so that the data synchronization can be efficiently completed.
Drawings
FIG. 1 is a block diagram of an embodiment of a data synchronization system;
FIG. 2 is a schematic diagram of an embodiment of implementing data synchronization based on a dynamic scoring mechanism in an embodiment of the present application;
FIG. 3 is a schematic diagram of an embodiment of a method for data synchronization in an embodiment of the present application;
fig. 4 is a schematic diagram illustrating a target node acquiring first node information in an embodiment of the present application;
FIG. 5 is a schematic diagram of a target node selecting an upstream node for synchronizing data in an embodiment of the present application;
FIG. 6 is a schematic diagram illustrating data synchronization based on a data reading scenario in an embodiment of the present application;
FIG. 7 is another diagram illustrating data synchronization based on a data reading scenario in an embodiment of the present application;
FIG. 8 is a schematic diagram illustrating data synchronization based on a data query scenario in an embodiment of the present application;
FIG. 9 is a diagram illustrating a dynamic adjustment of a scoring policy in an embodiment of the present application;
FIG. 10 is a schematic diagram of an embodiment of a data synchronization apparatus in an embodiment of the present application;
fig. 11 is a schematic structural diagram of a server in an embodiment of the present application.
Detailed Description
The embodiment of the application provides a data synchronization method, a related device, equipment and a storage medium, for a learner, each upstream node is scored, an optimal upstream node is screened out according to the node scores of the upstream nodes, and data is synchronized to the learner by the upstream nodes, so that data synchronization can be efficiently completed.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the drawings described above, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are, for example, capable of operation in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "corresponding" and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
With the rapid development of internet enterprises, the requirements for data storage are higher and higher, and the modes are different. For example, for a shopping website, a large number of commodity pictures are involved, which are characterized by small files but large numbers. For example, a video service website stores a large number of video files in the background, and the size of the video files is usually several tens of megabytes to several gigabytes. These application scenarios are all unresolvable by conventional file systems. The distributed file system can store data on a plurality of physically dispersed storage nodes, uniformly manage and distribute resources of the nodes, provide a file system access interface for a user, and solve the problem of limitation of the local file system on the size of files, the number of opened files and the like.
In a distributed system, to be able to combat a failure condition, data may be kept as multiple copies, a copy being kept by each upstream node of the replica group. Wherein, the replication group comprises a main node (leader) and a slave node (follower). When the amount of data reading requests is large, more nodes are needed to share the requests, and if more folders are added to the replication group to bear the data reading requests, a greater burden is imposed on the leader. In order to solve the above problem, a new node role, i.e. a learning node (learner), is added, and the learner can synchronize the latest data of the follower or the upstream learner, so as to provide a corresponding service for the data reading request.
Based on this, the present application provides a data synchronization method implemented based on Cloud technology, which can screen out an optimal upstream node after scoring each upstream node, and synchronize data to a leaner by the upstream node, thereby efficiently completing data synchronization. The data synchronization method provided by the application particularly relates to a cloud storage (cloud storage) technology. In the following, a cloud storage and a cloud technology will be introduced respectively, where the cloud storage is a new concept extended and developed from a cloud computing concept, and a distributed cloud storage system (hereinafter referred to as a storage system) refers to a storage system that integrates a large number of storage devices (storage devices are also referred to as storage nodes) of various types in a network through functions such as cluster application, a grid technology, and a distributed storage file system, and cooperates with each other through application software or an application interface to provide a data storage function and a service access function to the outside.
At present, a storage method of a storage system is as follows: logical volumes are created, and when created, each logical volume is allocated physical storage space, which may be the disk composition of a certain storage device or of several storage devices. The client stores data on a certain logical volume, that is, the data is stored on a file system, the file system divides the data into a plurality of parts, each part is an object, the object not only contains the data but also contains additional information such as data Identification (ID), the file system writes each object into a physical storage space of the logical volume, and the file system records storage location information of each object, so that when the client requests to access the data, the file system can allow the client to access the data according to the storage location information of each object. The process of allocating physical storage space for the logical volume by the storage system specifically includes: physical storage space is divided in advance into stripes according to a group of capacity measures of objects stored in a logical volume (the measures often have a large margin with respect to the capacity of the actual objects to be stored) and Redundant Array of Independent Disks (RAID), and one logical volume can be understood as one stripe, thereby allocating physical storage space to the logical volume.
The cloud technology is a hosting technology for unifying series resources such as hardware, software, network and the like in a wide area network or a local area network to realize the calculation, storage, processing and sharing of data. The cloud technology is a general term of network technology, information technology, integration technology, management platform technology, application technology and the like applied based on a cloud computing business model, can form a resource pool, is used as required, and is flexible and convenient. Cloud computing technology will become an important support. Background services of the technical network system require a large amount of computing and storage resources, such as video websites, picture-like websites and more web portals. With the high development and application of the internet industry, each article may have its own identification mark and needs to be transmitted to a background system for logic processing, data in different levels are processed separately, and various industrial data need strong system background support and can only be realized through cloud computing.
For ease of understanding, referring to fig. 1, fig. 1 is a schematic diagram of an architecture of a data synchronization system in an embodiment of the present application, and as shown in the figure, the data synchronization system may be understood as a part of a distributed system, and the data synchronization system includes a plurality of servers and a plurality of clients. Some of these servers are master devices and some are slave devices, and in short, a device in which a master node (leader) is deployed is a master device, and the server 1 in fig. 1 is a master device. The device with the slave node (follower) is a slave device, and both the server 2 and the server 3 in fig. 1 are slave devices of the server 1. The device in which the learning node (learner) is deployed is a slave device, and the server 4 and the server 5 in fig. 1 are both slave devices of the server 3. The client may be deployed on the terminal device or may be deployed on the application server, which is not limited herein. The number of nodes deployed on the server may be one or more, and this application takes the example of deploying one node per server as an example, however, this should not be construed as limiting the application.
In the data synchronization system shown in fig. 1, a leader and a follower form a replication group, and consistency between the leader and the follower needs to be ensured in the replication group. When a leader encounters a downtime condition, the replication group can quickly elect a new leader. The leaner acts as a new copy role, does not participate in voting, and does not join the copy set, and the leaner can act as a downstream node of the follower, or act as a downstream node of other leaners.
In fig. 1, server 2 and server 3 synchronize data from server 1, and server 4 and server 5 may synchronize data from server 3. The client 1 and the client 2 respectively send data reading requests to the server 2, and the server 2 respectively feeds back corresponding metadata information to the client 1 and the client 2. The client 3 may send a data reading request to the server 1, and the server 1 feeds back corresponding metadata information to the client 3, and the client 4 sends a data writing request to the server 1, where the data may be written to the server 1 concurrently, or may be written to the server 1 in batch, and after the data writing is completed, the writing result is notified in a callback function manner. The client 5 and the client 6 respectively send data reading requests to the server 4, and the server 4 respectively feeds back corresponding metadata information to the client 5 and the client 6. It will be appreciated that there may be delays in the data on different wells and different learners because the data is not yet fully synchronized, there is a time difference, but the data read by the leader is accurate.
It should be noted that the server related to the present application may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), and a big data and artificial intelligence platform. The terminal device may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a palm computer, a personal computer, a smart television, a smart watch, and the like. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, and the application is not limited herein. The number of servers and terminal devices is not limited.
Referring to fig. 2, a data synchronization process provided by the present application will be described, please refer to fig. 2, where fig. 2 is a schematic diagram of an embodiment of implementing data synchronization based on a dynamic scoring mechanism in the embodiment of the present application, and as shown in the figure, a metadata storage module is a replication group formed by three nodes, which are a master node a (leader a), a slave node a (follower a), and a slave node b (follower b), respectively. However, there may be many storage nodes and access nodes that need to access them, and in order to avoid that the metadata storage module cannot support a high Query Per Second (QPS), a metadata cache layer is designed, and data is synchronized from the metadata nodes using a leaner technology provided by a Simple Consensus Algorithm Library (SCAL), the data consistency is guaranteed by the SCAL, and the metadata storage layer can be expanded as required. Based on a dynamic scoring mechanism, the metadata cache layer selects an optimized upstream node, so that more efficient and reliable reading service is provided.
Specifically, how to select the optimized upstream node will be described below by taking the newly added learning node D (leaner D) and the learning node e (leaner e), respectively, and specifically for leaner D:
in step a1, leaner D collects node information of follower a;
in step a2, leaner D collects node information of follower B;
in step a3, the leaner D scores each of the follower a and the follower B according to the collected node information of the follower a and the collected node information of the follower B;
in step a4, the leaner D registers sequentially with the upstream node according to the node scores from high to low, and if the score of the follower B is greater than the score of the follower a, the leaner D registers with the follower B, and after the registration is successful, the leaner D can synchronize data from the follower B and start to receive the synchronization log of the follower B.
For learner E, specifically:
in step B1, the leaner E collects node information of leaner a;
in step B2, the leaner E collects node information of the leaner B;
in step B3, the leaner E scores the leaner a and the leaner B according to the collected node information of the leaner a and the node information of the leaner B, respectively;
in step B4, the leaner E registers to the upstream node in sequence according to the node scores from high to low, and assuming that the score of the leaner a is greater than that of the leaner B, the leaner E registers to the leaner a, and after the registration is successful, the leaner E synchronizes data from the leaner a and starts to receive the synchronization log of the follower B.
In view of the fact that this application relates to certain terms, these terms will be described below to facilitate a better understanding of the concepts provided herein.
1. Simple Consensus Algorithm Library (Simple Consensus Algorithm Library, SCAL): SCAL is a method used between servers to reach an agreement for a certain problem.
2. Copy (Replica): to combat failures (disk and stand-alone, etc.), data in a distributed system is kept in multiple copies, each referred to as a duplicate.
3. A leader: belonging to the master replica in the replica group.
4. A follower: a slave replica belonging to a replica group.
5. lerner: a new duplicate character, not participating in the vote, and not joining the duplicate set, may hang on the follower or another leaner.
6. Upstream and downstream: i.e., both ends of the isochronous data stream, where it is upstream that has the most recent data and downstream that catches up with the data.
7. Depth (depth): and representing the depth of the node, wherein the depth of the leader is 0, the depth of the follower is 1, the depth of the following leader of the follower is 2, the depth of the following leader of the first layer leader is 3, and so on.
8. Upstream list information: a list of upstream nodes (SourceList) representing learner.
9. Tenure (term): a new term is entered whenever an old primary copy fails. Each term usually consists of an election phase and an out-of-service phase, but in some extreme cases, if a new primary copy cannot be elected, the present term immediately ends and goes to the next term. term is the logical clock of the replica set.
10. Committed log information (lastcommit id): that is, the synchronized log number consists of term and index (index), both of which are monotonically increasing numbers and do not fall back.
11. State information of the node: the method mainly comprises an uninitialized state, a normal state, a data backup and recovery (recovery) state, an error state and a shutdown (shutdown) state.
With reference to fig. 3, an embodiment of a method for data synchronization in the present application includes:
101. a target node acquires a first node information set corresponding to the target node, wherein the first node information set comprises first node information of M upstream nodes, the M upstream nodes are upstream nodes of the target node, and M is an integer greater than or equal to 1;
in this embodiment, the target node first needs to acquire a first node information set, where the first node information set includes first node information of M upstream nodes, and the M upstream nodes are upstream nodes of the target node. The target node may be a newly added leaner, and the upstream node may be another leaner or a follower, which is not limited herein.
The first node information set has M pieces of first node information, each piece of first node information corresponds to one upstream node, for example, if the upstream node is a follower a, the first node information is the node information of the follower a. For another example, if the upstream node is leaner a, the first node information is node information of leaner a.
102. The target node determines M node scores according to the first node information set, wherein the M node scores and the M upstream nodes have one-to-one correspondence;
in this embodiment, the target node calculates and obtains a node score of each upstream node according to each piece of first node information in the first node information set, so as to obtain M node scores.
103. The target node determines a first upstream node from the M upstream nodes according to the M node scores, wherein the node score corresponding to the first upstream node is the maximum value of the M node scores;
in this embodiment, after the target node acquires the node score corresponding to each upstream node, a maximum value is selected from the node scores, and the upstream node corresponding to the node score is determined as the first upstream node. Assuming that the M node scores are "0", "60", and "100", respectively, the maximum value of the M node scores is determined to be "100", and then the upstream node having the node score of "100" is determined as the first upstream node.
104. And if the first upstream node meets the node registration condition, the target node receives the data sent by the first upstream node.
In this embodiment, the target node or the first upstream node determines whether the node registration condition is satisfied, and if the node registration condition is satisfied, it indicates that the first upstream node is most suitable for providing the data to be synchronized for the target node, so that the target node may receive the data sent by the first upstream node and store the data in the local of the target device, thereby implementing data synchronization.
In the embodiment of the application, a data synchronization method is provided, where a target node first obtains a first node information set corresponding to the target node, and then determines M node scores according to the first node information set, so that the target node may continue to determine a first upstream node from M upstream nodes according to the M node scores, and if the first upstream node meets a node registration condition, receive data sent by the first upstream node. Through the method, for the learners, a scoring mechanism is adopted to score each upstream node, the optimal upstream node is screened out according to the node scores of the upstream nodes, and the data is synchronized to the learners by the upstream nodes, so that the data synchronization can be efficiently completed.
Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in this application embodiment, the obtaining, by the target node, the first node information set corresponding to the target node specifically includes the following steps:
the target node determines M upstream nodes according to first upstream list information corresponding to the target node;
the target node sends an information acquisition request to each upstream node in the M upstream nodes so that each upstream node responds to the information acquisition request and acquires first node information of each upstream node;
when the target node receives the first node information sent by each upstream node, the target node acquires a first node information set.
In this embodiment, a manner in which a target node collects a first node information set is introduced. The target node needs to acquire the first upstream list information first. The first upstream list information is pre-stored in the target node in the form of a configuration file, and when the target node is started, the configuration file is loaded, so that the corresponding first upstream list information is obtained. It will be appreciated that the target node also provides an interface for modification of the upstream list information by which relevant information in the first upstream list information can be modified.
It should be noted that the first upstream list information may be updated in real time, for example, in a period of time, the first upstream list information acquired by the target node includes a below a and a below B, and after a period of time elapses, the first upstream list information acquired by the target node includes a below a, a below B, and a below C. In this application, the list information currently acquired by the target node is taken as the first upstream list information.
For ease of understanding, the process of the target node acquiring the first node information will be described below with reference to fig. 4. Referring to fig. 4, fig. 4 is a schematic diagram of a target node acquiring first node information in an embodiment of the present application, for example, referring to (a) in fig. 4, assuming that the target node is learner D, first upstream list information of the learner D is shown in table 1.
TABLE 1
Upstream node Depth of field
follower A 1
follower B 1
As can be seen from table 1, M upstream nodes of the target node are a follower a and a follower B, respectively, based on which, in step C1, the target node sends an information acquisition request to the follower a, and in step C2, the follower a obtains first node information of itself in response to the information acquisition request, and then sends the first node information of the follower a to the target node. Similarly, in step C3, the target node sends an information collection request to the follower B, and in step C4, the follower B obtains the first node information of itself in response to the information collection request, and then sends the first node information of the follower B to the target node. It should be noted that there is no fixed execution sequence between step C1 and step C3. After the target node receives the first node information of follower a and the first node information of follower B, that is, the target node has acquired the first node information set. Exemplarily, referring to (B) in fig. 4, assuming that the target node is a leaner E, the first upstream list information of the leaner E is shown in table 2.
TABLE 2
Upstream node Depth of field
learner A 2
learner B 2
As can be seen from table 2, the M upstream nodes of the target node are learner a and learner B, respectively, based on which, in step D1, the target node sends an information acquisition request to the learner a, and in step D2, the learner a obtains first node information of itself in response to the information acquisition request, and then sends the first node information of the learner a to the target node. Similarly, in step D3, the target node sends an information collection request to the leaner B, and in step D4, the leaner B obtains the first node information of itself in response to the information collection request, and then sends the first node information of the leaner B to the target node. It should be noted that there is no fixed execution sequence between step D1 and step D3. After the target node receives the first node information of leaner a and the first node information of leaner B, the target node has obtained the first set of node information.
Secondly, in the embodiment of the present application, a manner is provided in which the target node collects the first node information set, and through the manner, the target node may actively request corresponding node information from each upstream node in the first upstream list information, and the upstream node issues the node information according to the current condition of the upstream node, so that the real-time performance of the node information can be ensured, and a more accurate node score can be obtained by calculation.
Optionally, on the basis of the respective embodiments corresponding to fig. 3, in another optional embodiment provided in the embodiments of the present application, the first node information includes state information of an upstream node and association information of the upstream node;
the target node determines M node scores according to the first node information set, and the method specifically comprises the following steps:
for each first node information in the first node information set, if the state information of the upstream node indicates an abnormal state, the target node determines that the node score of the upstream node is a preset value;
and for each piece of first node information in the first node information set, if the state information of the upstream node indicates a normal state, the target node determines a node score according to the association information of the upstream node, wherein the node score is greater than or equal to a preset value.
In this embodiment, a method of determining a node score based on state information is described. The first node information of each upstream node includes state information and associated information, wherein the state information is used for indicating that the upstream node is in a normal (normal) state or a non-normal state, and the non-normal state includes but is not limited to an uninitialized state, a recovery state, an error state and a shutdown state. The upstream node in the normal state can synchronize data normally, and therefore, if the upstream node in the non-normal state is the upstream node of the target node, synchronization of subsequent data cannot be performed.
Specifically, assuming that the target node has two upstream nodes, namely, a follower a and a follower B, the state information of the follower a is determined based on the first node information of the follower a, and the state information of the follower B is determined based on the first node information of the follower B. If the state information of the follower A is in an error state, the node score of the follower A is directly set as a preset value. Assuming that the state information of the follower B is "normal state", the association information of the follower B is determined based on the first node information of the follower B, and the node score of the follower B is calculated from the association information. The association information of the upstream node is used to indicate information related to the upstream node, and includes, but is not limited to, the number of mounted nodes of the upstream node, the depth, the load information, the submitted log information, and the like.
Optionally, in this application, a situation that the leader cannot provide the data synchronization service for the leader may also be set. Based on this, the first node information of each upstream node may further include role information corresponding to the upstream node, where the role information is used for a type to which the upstream node belongs, for example, leader, follower, or leaner. And if the upstream node belongs to the leader, the node score of the upstream node is a preset value. And if the upstream node belongs to the follower or the leaner, determining the node score according to the association information of the upstream node.
It should be noted that the preset value may be 0 or a negative number, the preset value is smaller than or equal to the scores of the other nodes, and the value range of the node score is uint64 — t, that is, the node score can be represented by 8 bytes.
Secondly, in the embodiment of the application, a mode for determining the node score based on the state information is provided, and through the mode, for the node in the abnormal state, the node score corresponding to the node in the abnormal state is not required to be calculated by adopting the associated information, but the preset value is directly used as the node score of the node, so that the calculation amount is reduced, and the calculation efficiency of the node score is improved.
Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided by the embodiment of the present application, the association information of the upstream node includes the number of mounted nodes of the upstream node, the depth of the upstream node, load information of the upstream node, and submitted log information of the upstream node;
the target node determines a node score according to the association information of the upstream node, and specifically comprises the following steps:
the target node calculates a score intermediate quantity according to the number of the mounting nodes of the upstream node, the depth of the upstream node and the load information of the upstream node;
and the target node calculates and obtains the node score of the upstream node according to the intermediate score and the submitted log information of the upstream node, wherein the node score is positively correlated with the submitted log information of the upstream node, and the node score is negatively correlated with the number of the mounted nodes of the upstream node, the depth of the upstream node and the load information of the upstream node.
In this embodiment, a method of calculating a node score based on node association information is described. If the upstream node is in a normal state, the target node can calculate a node score according to the associated information of the upstream node, wherein the associated information of each upstream node comprises the number of mounted nodes, the depth, the load information and the submitted log information. The manner in which the node scores are calculated will be described below.
Specifically, the node score of the upstream node may be calculated by the target node according to the following formula:
Score =(10000- Depth * 100 - LearnerCount * 10- Load)+ LastCommitId;
wherein, Score represents the node Score of the upstream node, (10000-Depth 100-LearnerCount 10-Load) represents the Score intermediate quantity of the upstream node, Depth represents the Depth of the upstream node, LearnerCount represents the mount node number of the upstream node, and LastCommitd represents the submitted log information of the upstream node. Therefore, the node score is positively correlated with submitted log information of the upstream node, and the node score is negatively correlated with the number of mounted nodes of the upstream node, the depth of the upstream node and the load information of the upstream node. And the higher the node score of the upstream node is, the more suitable the upstream node is as a node for data synchronization. Based on this, the content and definition of the associated information will be described below, respectively.
1. The depth of the upstream node indicates the hierarchy of the upstream node, and the deeper the hierarchy, the slower the synchronization to the latest data is indicated, so that the node score is inversely related to the depth of the upstream node, i.e. the deeper the depth, the smaller the node score.
2. The mount node number of the upstream node indicates the number of nodes already mounted under the upstream node, and the more downstream nodes mounted by one upstream node, the greater the pressure borne by the upstream node. Therefore, the node score is negatively correlated with the number of mounted nodes of the upstream node, i.e., the larger the number of mounted nodes, the smaller the node score.
3. The load information of the upstream node indicates the load condition of the upstream node, and the load of the upstream node is too high, which affects the efficiency of data synchronization. The load information may be denoted as QPS, indicating how many requests the upstream node has processed in one second of time. Therefore, the node score is negatively correlated with the load information of the upstream node, i.e., the larger the load information, the smaller the node score.
4. The submitted log information of the upstream node is that each log corresponds to a continuous and incremental log identification (log id), and the larger the submitted log information is, the more new the log synchronized by the upstream node is. The committed log information is an integer that is incremented sequentially from 0. Illustratively, when the upstream node is a follower, it is assumed that one replication group includes node a, node B, and node C, where node a is leader a, node B is follower B, and node C is follower C, and it is assumed that log id synchronized by leader a is 100, log id synchronized by follower B is 100, log id synchronized by follower C is 97, and since log id synchronized by most nodes is 100, submitted log information of leader a is 100, submitted log information of follower B is 100, and submitted log information of follower C is 97. Further, assume that log id synchronized by leader A is 100, log id synchronized by follower B is 99, log id synchronized by follower C is 93, and log id of leader A is 100, which is uncommitted, so that committed log information of leader A is 99. For follower B, the log id to which most nodes have synchronized is 99, and therefore, the committed log information of follower B is 99. For follower C, the log id to which most nodes have synchronized is 93, and therefore the committed log information of follower C is 93.
Illustratively, when the upstream node is a leaner, the committed log information is the latest log id of the upstream node. Therefore, the node score is positively correlated with the submitted log information of the upstream node, i.e., the larger the submitted log information is, the larger the node score is.
It should be noted that the type of the parameter included in the association information is only one example, and in practical application, the association information may further include Response Time (RT), Throughput (Throughput, TPS), the number of concurrent users, and the like. Where RT refers to the time the system responds to the request. TPS refers to the number of requests processed by the system per unit of time. It will be appreciated that the node score is inversely related to the RT, TPS and the number of concurrent users.
In the embodiment of the application, the number of mounted nodes, the depth, the load information and the submitted log information are used together as a basis for evaluating the node scores, the mounting capacity of upstream nodes is reflected from different dimensions, and the calculation is facilitated to obtain more accurate node scores, so that the target node is ensured to select the optimal upstream node, and the high efficiency of synchronous data is ensured.
Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, the determining, by the target node, a first upstream node from the M upstream nodes according to the scores of the M nodes specifically includes the following steps:
the target node carries out sorting processing on the scores of the M nodes according to the sequence from big to small to obtain a sorting result;
the target node performs sorting processing on the nodes in the first upstream list information according to the sorting result to obtain second upstream list information, wherein the second upstream list information comprises M upstream nodes;
the target node determines the first upstream node in the second upstream list information as the first upstream node.
In this embodiment, a method for sorting M upstream nodes based on node scores is introduced. Firstly, respectively calculating by a target node to obtain M node scores, then, sequencing the M node scores by the target node according to a descending order to obtain a sequencing result, wherein the sequencing result is used for expressing the occurrence order of the node scores. And finally, the target node carries out sorting processing on the nodes in the first upstream list information according to the sorting result to obtain second upstream list information. Therefore, in the subsequent node registration process, the target node can sequentially search the upstream nodes to be registered by directly using the sequenced second upstream list information.
It should be noted that, the present application is described by taking sequencing M node scores from large to small as an example, however, in an actual situation, the M node scores may also be sequenced according to a sequence from small to large, and details are not described here.
Specifically, assume that the target node has 5 upstream nodes, which are below a, below B, below C, below D, and below E, respectively, and the first upstream list information of the target node is shown in table 3.
TABLE 3
Upstream node Depth of field
follower A 1
follower B 1
follower C 1
follower D 1
follower E 1
It is assumed that the node score of the follower a is 280, the node score of the follower B is 520, the node score of the follower C is 0, the node score of the follower D is 440, and the node score of the follower E is 100. Based on this, after the scores of the M nodes are sorted in descending order, the second upstream list information of the target node is obtained as shown in table 4.
TABLE 4
Upstream node Depth of field
follower B 1
follower D 1
follower A 1
follower E 1
follower C 1
The target node may directly determine the first upstream node in the second upstream list information as the first upstream node. Taking table 4 as an example, then follower B is the first upstream node.
Secondly, in the embodiment of the application, a mode for sequencing M upstream nodes based on node scores is provided, and through the mode, the target node can sequence the M node scores, so that the corresponding upstream nodes can be searched more conveniently and rapidly, and the efficiency of synchronizing data is improved.
Optionally, on the basis of each of the embodiments corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, after the target node determines the first upstream node from the M upstream nodes according to the scores of the M nodes, the method further includes the following steps:
the target node sends a node registration request to the first upstream node so that the first upstream node determines the mountable quantity according to the node registration request;
if the number of the loadable nodes is more than or equal to 1, the target node determines that the first upstream node meets the node registration condition;
if the number of the loadable nodes is 0, the target node determines that the first upstream node does not satisfy the node registration condition.
In this embodiment, a method for determining whether an upstream node satisfies a node registration condition based on a registered number of the upstream node is described. After the target node sorts the M upstream nodes according to the node scores, second upstream list information can be obtained, and therefore the target node traverses each upstream node in the second upstream list information, registers to the upstream nodes in sequence until the registration is successful, selects the optimal upstream node, and starts to receive the synchronous log from the upstream node, so that efficient data reading service is provided.
Specifically, the second upstream list information is obtained by sorting M node scores in descending order, and thus, a first upstream node in the second upstream list information is taken as the first upstream node. Based on the node registration request, the target node sends a node registration request to the first upstream node, and the first upstream node determines the current mountable quantity according to the node registration request. Since the first upstream node may be preset with a maximum number of available bearers, the first upstream node determines the current remaining number of available bearers according to its own maximum number of available bearers. If the currently remaining loadable number is greater than or equal to 1, it means that the new sharer can be further continuously loaded, and therefore, the target node determines that the first upstream node satisfies the node registration condition, i.e., the target node successfully registers with the first upstream node. On the contrary, if the current remaining loadable quantity is equal to 0, it indicates that the new learner cannot be continuously loaded, and therefore, the target node determines that the first upstream node does not satisfy the node registration condition, that is, the target node fails to register with the first upstream node, and then the target node may select the upstream node corresponding to the next largest node score from the second upstream list information, and continuously determine whether the upstream node satisfies the node registration condition, which is not described herein again.
For easy understanding, please refer to fig. 5, fig. 5 is a schematic diagram of selecting an upstream node for synchronizing data by a target node in the embodiment of the present application, and as shown in the figure, taking the target node as a leaner D as an example, it is assumed that the upstream node of the leaner D is leaner a and leaner B, specifically;
in step E1, leaner D sends an information collection request to leaner B.
In step E2, leaner D sends an information collection request to leaner A. It should be noted that there is no fixed execution sequence between step E1 and step E3.
In step E3, the leaner B obtains the first node information of itself in response to the information collection request sent by the leaner D, and then sends the first node information of the leaner B to the leaner D.
In step E4, the leaner a obtains its first node information in response to the information collection request sent by the leaner D, and then sends the first node information of the leaner a to the leaner D. It should be noted that there is no fixed execution sequence between step E3 and step E4.
In step E5, the leaner D scores the leaner a and the leaner B according to the collected first node information of the leaner a and the first node information of the leaner B, respectively, to obtain a node score of the leaner a and a node score of the leaner B, respectively. Assuming that the node score of leaner a is greater than the node score of leaner B, leaner D preferentially sends a node registration request to leaner a.
In step E6, if the leaner a does not satisfy the node registration condition, leaner D needs to request node registration from leaner B.
In step E7, the leaner D requests node registration from the leaner B, and if the leaner B satisfies the node registration condition, the leaner D can synchronize data from the leaner B.
Secondly, in the embodiment of the present application, a manner of determining whether an upstream node satisfies a node registration condition based on a registered number of the upstream node is provided, and by the manner, a mount number of the upstream node may be further limited, and in order to avoid an excessive number of learners mounted by the same upstream node, a maximum mount number may be further preset, thereby facilitating to maintain stability and reliability of the data synchronization system.
Optionally, on the basis of each of the embodiments corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, after the target node determines the first upstream node from the M upstream nodes according to the scores of the M nodes, the method further includes the following steps:
a target node sends a node registration request to a first upstream node;
a target node receives a node registration response sent by a first upstream node, wherein the node registration response carries role information of the first upstream node;
if the role information indicates that the first upstream node is a slave node or a learning node, the target node determines that the first upstream node meets the node registration condition;
and if the role information indicates that the first upstream node is the main node, the target node determines that the first upstream node does not meet the node registration condition.
In this embodiment, a manner of determining whether an upstream node satisfies a node registration condition based on role information is described. After the target node sorts the M upstream nodes according to the node scores, second upstream list information can be obtained, and therefore the target node traverses each upstream node in the second upstream list information, registers to the upstream nodes in sequence until the registration is successful, selects the optimal upstream node, and starts to receive the synchronous log from the upstream node, so that efficient data reading service is provided.
Specifically, the second upstream list information is obtained by sorting M node scores in descending order, and thus, a first upstream node in the second upstream list information is taken as the first upstream node. Based on this, the target node sends a node registration request to the first upstream node, and the first upstream node feeds back a node registration response to the target node, wherein the node registration response also carries role information of the first upstream node. The role information comprises three types, namely leader, follower and leader, and the role information is mounted on the leader, can synchronize data more timely, but can cause greater pressure on the leader, and takes the advantages and disadvantages into consideration, so that the role information can be stipulated in advance and cannot mount a target node on the leader. The data synchronization rate mounted on the follower is slower, while the data synchronization efficiency mounted on the leader is slower, but the situation that the leader is stressed can be avoided.
Therefore, the target node determines whether the first upstream node is a leader according to the role information of the first upstream node, and if the first upstream node is not a leader but a follower or a leader, the target node determines that the first upstream node satisfies the node registration condition, that is, the target node successfully registers with the first upstream node. On the contrary, if the first upstream node is a leader, the target node determines that the first upstream node does not satisfy the node registration condition, that is, the target node fails to register with the first upstream node, and thus, the target node may select the upstream node corresponding to the next largest node score from the second upstream list information and continuously determine whether the upstream node satisfies the node registration condition, which is not described herein again.
It should be noted that the flow of selecting the upstream node for synchronizing data by the target node is shown in fig. 5, and reference may be made to fig. 5 and the related description of fig. 5, which is not repeated herein.
Secondly, in the embodiment of the present application, a manner for determining whether an upstream node satisfies a node registration condition based on role information is provided, and through the manner, a situation that a leader provides a mount for a leader may also be limited. And judging whether the upstream node meets the node registration condition or not based on the role information, so that the condition that the leader directly provides synchronous data for the leader can be avoided, and the stability and the reliability of a data synchronization system are favorably maintained.
Optionally, on the basis of each of the embodiments corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, after the target node determines the first upstream node from the M upstream nodes according to the scores of the M nodes, the method further includes the following steps:
if the first upstream node does not meet the node registration condition, the target node determines a second upstream node from the M upstream nodes according to the M node scores, wherein the node score corresponding to the second upstream node is the second largest value of the M node scores;
and if the second upstream node meets the node registration condition, the target node receives the data sent by the second upstream node.
In this embodiment, a processing method for an upstream node that does not satisfy a node registration condition is introduced. And judging whether the first upstream node meets the node registration condition or not by the target node or the first upstream node, wherein if the first upstream node meets the node registration condition, the first upstream node is most suitable for providing the data to be synchronized for the target node. On the contrary, if the first upstream node does not satisfy the node registration condition, the next highest value is continuously determined from the M node scores,
specifically, it is assumed that the M node scores are "0", "60", and "100", respectively, and thus, it is determined that the maximum value among the M node scores is "100", the next largest value is "60", that is, the upstream node corresponding to the maximum value of "100" is the first upstream node, and the upstream node corresponding to the next largest value of "60" is the second upstream node. Under the condition that the first upstream node is determined not to satisfy the node registration condition, the target node or the second upstream node judges whether the second upstream node satisfies the node registration condition, and if the second upstream node satisfies the node registration condition, the second upstream node is most suitable for providing the data to be synchronized for the target node. Thus, the target node may receive the data sent by the second upstream node and store the data locally to the target device, thereby achieving synchronization of the data.
Secondly, in the embodiment of the application, a processing mode that the upstream nodes do not meet the node registration condition is provided, and through the mode, for learner, a scoring mechanism is adopted to score each upstream node, the optimal upstream node is screened out according to the node scores of the upstream nodes, and once a target node cannot be registered to the upstream node with the highest score, the upstream node with the second highest node score is sequentially registered until the registration is completed. Therefore, the feasibility of the scheme is guaranteed, and meanwhile, data synchronization can be efficiently completed.
Optionally, on the basis of the foregoing embodiments corresponding to fig. 3, in another optional embodiment provided in this application embodiment, after the target node receives the data sent by the first upstream node, the method further includes the following steps:
a target node receives a data reading request sent by network equipment;
and the target node sends the metadata information to the network equipment according to the data reading request.
In this embodiment, a data reading mode is introduced in a data reading scenario, where a network device sends a data reading request to a target node to read data. Based on the leaner function of the SCAL design, a complete read-only copy of the metadata storage module can be created, the read-only copy is synchronized to the written metadata information from the metadata storage module of the upstream node, and data reading service is provided for the outside to reduce the pressure of the copy group.
Specifically, the network device may directly read data from a leader, and for convenience of description, the present application takes the example of the network device reading data from the target node as an example, however, this should not be construed as limiting the present application. Referring to fig. 6, fig. 6 is a schematic diagram of data synchronization based on a data reading scenario in the embodiment of the present application, and as shown in the figure, a replication group includes three metadata storage modules, where one metadata storage module is a leader, two other metadata storage modules are folower, and a metadata cache module is a leader. In the stage of starting to select the upstream node, when the upstream node fails or cannot synchronize to the metadata information in time, the upstream node needs to be reselected for registration. Take the metadata cache module starting phase to elect the upstream node as an example.
First, during the start-up phase, an upstream node selectable by the target node is configured. Then, the optimal upstream node is selected from the configured upstream list information (i.e. the synchronization to the metadata information is more convenient in time), and the registration is completed. At this time, the upstream node metadata storage module knows the downstream metadata cache module, and after the metadata storage module receives the data write request, synchronization between the copy groups is performed. When the data is synchronized to the upstream of the data cache module, the upstream synchronizes the data to the downstream, and at this time, the network device can read corresponding metadata information through the metadata cache module.
The method and the device have the advantages that the upstream node selecting strategy based on the dynamic scoring mechanism can ensure that the upstream node which is most beneficial to synchronizing data is selected, so that the timeliness and the high efficiency of metadata information synchronization are ensured, and better reading service is provided. It should be noted that the network device may be a terminal device or a server, and is not limited herein.
Further, in the embodiment of the application, a method for realizing data reading in a data reading scene is provided, and by the method, a more efficient and reliable reading service can be provided in a large-scale data reading scene, so that the feasibility of a scheme is improved.
Optionally, on the basis of the foregoing embodiments corresponding to fig. 3, in another optional embodiment provided in this application embodiment, after the target node receives the data sent by the first upstream node, the method further includes the following steps:
a target node receives a first data reading request sent by network equipment;
and the target node sends the first metadata information to the network equipment according to the first data reading request, wherein the network equipment is also used for sending a second data reading request to other nodes except the target node, and the second data reading request is used for acquiring the second metadata information.
In this embodiment, another way of implementing data reading in a data reading scenario is introduced, where a network device implements data reading by sending a data reading request to a target node. Based on the leaner function to which the SCAL relates, a complete read-only copy of the metadata storage module may be created, which is synchronized to the written metadata information from the metadata storage module of the upstream node, and provides a data reading service to the outside to reduce the pressure of the copy group.
Specifically, the network device may directly read data from a plurality of learners, and for convenience of description, the present application takes as an example that the network device reads data from two learners, and one of the learners is a target node, however, this should not be construed as limiting the present application. Referring to fig. 7, fig. 7 is another schematic diagram of performing data synchronization based on a data reading scenario in the embodiment of the present application, and as shown in the figure, a replication group includes three metadata storage modules, where one metadata storage module is a leader, two other metadata storage modules are folower, and a metadata cache module is a leader. In the stage of starting to select the upstream node, when the upstream node fails or cannot synchronize to the metadata information in time, the upstream node needs to be reselected. Take the metadata cache module starting phase to elect the upstream node as an example.
First, during the start-up phase, an upstream node selectable by the target node is configured. Then, the optimal upstream node is selected from the configured upstream list information (i.e. the synchronization to the metadata information is more convenient in time), and the registration is completed. At this time, the upstream node metadata storage module knows the downstream metadata cache module, and after the metadata storage module receives the data write request, synchronization between the copy groups is performed. When data is synchronized to the upstream of the data cache module, the upstream synchronizes the data to the downstream.
Illustratively, the network device may send a first data read request to the leaner 1, assuming the leaner is the target node, i.e. the target node feeds back the first metadata information to the network device. If the first metadata information acquired by the network device is not complete, the network device may further send a second data reading request to one of the learners 2, and another of the learners feeds back the second metadata information to the network device, so that the network device acquires the first metadata information and the second metadata information.
Illustratively, the network device may simultaneously send a first data read request to the sharer 1 and a second data read request to the sharer 2, with the first metadata information being fed back to the network device by the sharer 1 based on the first data read request, and the second metadata information being fed back to the network device by the sharer 2 based on the second data read request. It should be noted that, in this embodiment, 2 learners are taken as an example to provide metadata information to a network device for illustration, however, in practical applications, 3 or more than 3 learners may also provide metadata information to the same network device, and should not be construed as a limitation to the present application.
The method and the device have the advantages that the upstream node selecting strategy based on the dynamic scoring mechanism can ensure that the upstream node which is most beneficial to synchronizing data is selected, so that the timeliness and the high efficiency of metadata information synchronization are ensured, and better reading service is provided. It should be noted that the network device may be a terminal device or a server, and is not limited herein.
Further, in the embodiment of the application, another way for realizing data reading in a data reading scene is provided, and by the way, a more efficient and reliable reading service can be provided in a large-scale data reading scene, so that the feasibility of the scheme is improved. Meanwhile, one data reading request is divided into a plurality of data reading sub-requests, the reading amount of data is dispersed, namely the data is respectively read from different learners, so that the data reading efficiency is improved, and the load on the learners is reduced.
Optionally, on the basis of each embodiment corresponding to fig. 3, in another optional embodiment provided in the embodiment of the present application, the receiving, by the target node, data sent by the first upstream node specifically includes the following steps:
a target node receives an initial query request sent by first network equipment;
the target node synchronizes target metadata information in the first upstream node to the target node according to the initial query request;
after the target node receives the data sent by the first upstream node, the method further comprises the following steps:
the target node receives a data query request sent by second network equipment;
and the target node sends target metadata information to the second network equipment according to the data query request.
In this embodiment, a data query is implemented in a data query scenario, where a network device sends a data reading request to a target node to read data. Based on the leaner function to which the SCAL relates, a complete read-only copy of the metadata storage module may be created, which is synchronized to the written metadata information from the metadata storage module of the upstream node, and provides a data reading service to the outside to reduce the pressure of the copy group.
Specifically, the network device may query data directly from a target node, and for convenience of description, the network device queries data from the target node as an example, however, this should not be construed as limiting the present application. Referring to fig. 8, fig. 8 is a schematic diagram of data synchronization based on a data query scenario in the embodiment of the present application, as shown in the figure, a replication group includes three metadata storage modules, where one metadata storage module is a leader, two other metadata storage modules are folower, and a metadata cache module is a leaner. In the stage of starting to select the upstream node, when the upstream node fails or cannot synchronize to the metadata information in time, the upstream node needs to be reselected. Take the metadata cache module starting phase to elect the upstream node as an example.
For a metadata cache module that needs to frequently obtain metadata information, if data is obtained through a query request, Gigabytes (GB) of data may be returned, and thus, the metadata cache module may be synchronized to the written metadata information in real time as a target node (e.g., leaner 1) of the metadata storage module. For example, first, during the start-up phase, an upstream node selectable by the target node may be configured. Then, the optimal upstream node is selected from the configured upstream list information (i.e. the synchronization to the metadata information is more convenient in time), and the registration is completed. At this time, the upstream node metadata storage module knows the downstream metadata cache module, and after the metadata storage module receives the data write request, synchronization between the copy groups is performed. When a first network device sends an initial query request to a target node, the target node synchronizes target metadata information in a first upstream node (e.g., follower 1) to the target node according to the initial query request. Then, when other network devices (e.g., a second network device) need to query the target metadata information, the target node directly sends a data query request, and the target node directly sends the target metadata information to the second network device according to the data query request, which is equivalent to accessing only the memory of the target node, thereby facilitating to improve the data query efficiency.
The method and the device have the advantages that the upstream node selecting strategy based on the dynamic scoring mechanism can ensure that the upstream node which is most beneficial to synchronizing data is selected, so that the timeliness and the high efficiency of metadata information synchronization are ensured, and better reading service is provided. It should be noted that the network device may be a terminal device or a server, and is not limited herein.
Further, in the embodiment of the present application, a data query method in a data query scenario is provided, and through the above method, for a network device that needs to frequently obtain target metadata information, the target metadata information may be stored locally in a target node in advance, when the network device requests the target metadata information again, only a memory of the target node needs to be accessed, and the target node does not need to request related data from an upstream node again, so that data query efficiency is improved.
Optionally, on the basis of the foregoing embodiments corresponding to fig. 3, in another optional embodiment provided in this application embodiment, after the target node receives the data sent by the first upstream node, the method further includes the following steps:
when a first upstream node fails, a target node acquires a second node information set corresponding to the target node, wherein the second node information set comprises second node information of N upstream nodes, the N upstream nodes are upstream nodes of the target node, and N is an integer greater than or equal to 1;
the target node determines N node scores according to the second node information set, wherein the N node scores have a one-to-one correspondence relationship with the N upstream nodes;
the target node determines a third upstream node from the N upstream nodes according to the N node scores, wherein the node score corresponding to the third upstream node is the maximum value of the N node scores;
and if the third upstream node meets the node registration condition, the target node synchronizes the data in the third upstream node to the target node.
In this embodiment, a method for adjusting node scores based on a dynamic scoring policy is introduced. In order to enable the target node to select the optimal upstream node in different time periods, the node score of the upstream node needs to be updated. In the present application, if the log needs to be synchronized, the upstream node sends the log, and if the log does not need to be synchronized, the upstream node sends a heartbeat signal to the downstream node every short time (e.g., 1 second). If the downstream node does not receive the heartbeat signal or the log sent by the upstream node within a period of time (such as 10 seconds), which indicates that the upstream node may have a fault, the downstream node automatically initiates a process of recalculating the node score of each upstream node and registers with each upstream node.
Specifically, when a first upstream node fails, a target node needs to acquire a second node information set, where the second node information set includes second node information of N upstream nodes. Illustratively, in one case, if the target node does not update the first upstream list information, the target node also sends an information collection request to N upstream nodes, where N is equal to M. The N upstream nodes respectively acquire second node information and feed back the second upstream information to the target node. And when the target node receives the second node information sent by the N upstream nodes, obtaining a second node information set. For example, in another case, if the target node updates the first upstream list information, the target node sends an information acquisition request to N upstream nodes according to the updated upstream list information, where N and M may or may not be equal. The N upstream nodes respectively acquire second node information and feed back the second upstream information to the target node. And when the target node receives the second node information sent by the N upstream nodes, obtaining a second node information set.
Based on the node point information, the target node recalculates the node point value of each upstream node according to each second node information in the second node information set, so as to obtain N node point values. After the target node acquires the node score corresponding to each upstream node, a maximum value is selected from the node scores, and the upstream node corresponding to the node score is determined as a third upstream node. Assuming that the N node scores are "0", "80", and "150", respectively, the maximum value of the N node scores is determined to be "150", and then the upstream node having the node score of "150" is determined as the third upstream node. The target node or the third upstream node judges whether the node registration condition is met, if the third upstream node meets the node registration condition, the third upstream node is most suitable for providing the data to be synchronized for the target node, therefore, the target node can receive the data sent by the third upstream node and store the data in the local of the target device, and the data synchronization is realized.
For easy understanding, please refer to fig. 9, where fig. 9 is a schematic diagram of a dynamic adjustment scoring policy in the embodiment of the present application, and as shown in the figure, it is assumed that a target node is learner D and the target node has two upstream nodes, learner a and learner B respectively, specifically:
in step F1, the leaner D calculates that the leaner B sends the information collection request, and collects the first node information of the leaner B.
In step F2, the leaner D calculates that the leaner a sends the information collection request, and collects the first node information of the leaner a.
In step F3, the leaner D scores the leaner a and the leaner B, respectively, according to the collected first node information of the leaner a and the first node information of the leaner B.
In step F4, the learner D registers to the upstream node in order from high node score to low node score, and assuming that the score of the learner a is greater than the score of the learner B, the learner D registers to the learner a, and after the registration is successful, the learner D can synchronize data from the learner a and start receiving the synchronization log of the learner a.
Based on this, in the first time, the leaner a is the first upstream node, and if the score update condition is currently satisfied, the leaner D may further obtain a second node information set, where second node information included in the second node information set may be inconsistent with first node information included in the first node information set, for example, the number of mounted nodes included in the first node information of the leaner a is 2, and the number of mounted nodes included in the second node information of the leaner a is 3. Specifically, the method comprises the following steps:
in step G1, the leaner D calculates that the leaner B sends the information collection request, and collects the second node information of the leaner B.
In step G2, the leaner D calculates that the leaner a sends the information collecting request, and collects the second node information of the leaner a.
In step G3, the leaner D scores the leaner a and the leaner B, respectively, according to the collected second node information of the leaner a and the second node information of the leaner B.
In step G4, the learner D registers to the upstream node in order from high node score to low node score, and assuming that the score of the learner B is greater than the score of the learner a, the learner D registers to the learner B, and after the registration is successful, the learner D can synchronize data from the learner B and start receiving the synchronization log of the learner B.
Based on this, in the second time, the leaner B is the third upstream node.
Further, in the embodiment of the present application, a method for adjusting node scores based on a dynamic scoring policy is provided, and in the above manner, a target node may periodically adjust corresponding node scores according to state information, role information, and association information of an upstream node, so as to avoid a situation that a selected upstream node is not suitable for continuing to provide data synchronization services after a period of time, thereby increasing feasibility and operability of a scheme, and facilitating provision of more efficient data synchronization services.
Referring to fig. 10, fig. 10 is a schematic diagram of an embodiment of a data synchronization apparatus in an embodiment of the present application, in which the data synchronization apparatus 20 includes:
an obtaining module 201, configured to obtain a first node information set corresponding to a target node, where the first node information set includes first node information of M upstream nodes, where the M upstream nodes are upstream nodes of the target node, and M is an integer greater than or equal to 1;
a determining module 202, configured to determine M node scores according to the first node information set, where the M node scores and M upstream nodes have a one-to-one correspondence relationship;
the determining module 202 is further configured to determine, according to the M node scores, a first upstream node from the M upstream nodes, where a node score corresponding to the first upstream node is a maximum value of the M node scores;
the synchronization module 203 is configured to receive data sent by the first upstream node if the first upstream node meets the node registration condition.
In the embodiment of the application, a data synchronization device is provided, where a target node first obtains a first node information set corresponding to the target node, and then determines M node scores according to the first node information set, so that the target node may continue to determine a first upstream node from M upstream nodes according to the M node scores, and if the first upstream node meets a node registration condition, receive data sent by the first upstream node. By adopting the device, for the learners, a scoring mechanism is adopted to score each upstream node, the optimal upstream node is screened out according to the node scores of the upstream nodes, and the data is synchronized to the learners by the upstream node, so that the data synchronization can be efficiently completed.
Alternatively, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application,
an obtaining module 201, configured to determine M upstream nodes according to first upstream list information corresponding to a target node;
sending an information acquisition request to each upstream node in the M upstream nodes so that each upstream node responds to the information acquisition request and acquires first node information of each upstream node;
when first node information sent by each upstream node is received, a first node information set is obtained.
In the embodiment of the application, a data synchronization device is provided, and by adopting the device, a target node can actively request corresponding node information from each upstream node in first upstream list information, and the upstream node issues the node information according to the current self condition, so that the real-time performance of the node information can be ensured, more accurate node score can be obtained by calculation, the target node is ensured to select the optimal upstream node, and the high efficiency of synchronization data is ensured.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application, the first node information includes state information of an upstream node and association information of the upstream node;
a determining module 202, configured to specifically determine, for each piece of first node information in the first node information set, if the state information of the upstream node indicates an abnormal state, that a node score of the upstream node is a preset value;
and for each first node information in the first node information set, if the state information of the upstream node indicates a normal state, determining a node score according to the association information of the upstream node, wherein the node score is greater than or equal to a preset value.
In the embodiment of the application, the data synchronization device is provided, and by adopting the device, for the node in the abnormal state, the corresponding node score is not required to be calculated by adopting the associated information, but the preset value is directly used as the node score of the node, so that the calculation amount is reduced, and the calculation efficiency of the node score is favorably improved.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application, the first node information includes role information of an upstream node and association information of the upstream node;
the determining module 202 is specifically configured to, for each piece of first node information in the first node information set, determine that a node score of an upstream node is a preset value if role information of the upstream node indicates that the upstream node is a master node;
and for each piece of first node information in the first node information set, if the state information of the upstream node indicates that the upstream node is a slave node or a learning node, determining a node score according to the association information of the upstream node, wherein the node score is greater than or equal to a preset value.
In the embodiment of the application, a data synchronization device is provided, and by adopting the device, for the condition that the upstream node is the main node, the corresponding node score is not required to be calculated by adopting the associated information, but the preset value is directly used as the node score of the node, so that the calculation amount is reduced, and the calculation efficiency of the node score is favorably improved.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application, the first node information includes state information and role information of an upstream node and association information of the upstream node;
the determining module 202 is specifically configured to, for each piece of first node information in the first node information set, determine that a node score of an upstream node is a preset value if role information of the upstream node indicates that the upstream node is a master node, or if state information of the upstream node indicates an abnormal state;
for each first node information in the first node information set, if the state information of the upstream node indicates that the upstream node is a slave node or a learning node and indicates a normal state, determining a node score according to the association information of the upstream node, wherein the node score is greater than or equal to a preset value.
In the embodiment of the application, a data synchronization device is provided, and by using the device, for a node in an abnormal state, or for a situation that an upstream node is a main node, correlation information does not need to be used for calculating a node score corresponding to the upstream node, but a preset value is directly used as the node score of the node, so that the calculation amount is reduced, and the calculation efficiency of the node score is improved.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application, the association information of the upstream node includes the number of mounted nodes of the upstream node, the depth of the upstream node, the load information of the upstream node, and the submitted log information of the upstream node;
the determining module 202 is specifically configured to calculate a score median according to the number of mounted nodes of the upstream node, the depth of the upstream node, and load information of the upstream node;
and calculating to obtain the node score of the upstream node according to the intermediate score quantity and the submitted log information of the upstream node, wherein the node score is positively correlated with the submitted log information of the upstream node, and the node score is negatively correlated with the number of mounted nodes of the upstream node, the depth of the upstream node and the load information of the upstream node.
In the embodiment of the application, the data synchronization device is provided, and by adopting the device, the number, the depth, the load information and the submitted log information of the mounted nodes are jointly used as the basis for evaluating the node scores, the mounting capacity of upstream nodes is reflected from different dimensions, so that more accurate node scores can be obtained through calculation, the optimal upstream nodes can be selected by target nodes, and the high efficiency of synchronous data is ensured.
Alternatively, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application,
the determining module 202 is specifically configured to perform sorting processing on the scores of the M nodes in a descending order to obtain a sorting result;
sequencing the nodes in the first upstream list information according to the sequencing result to obtain second upstream list information, wherein the second upstream list information comprises M upstream nodes;
the first upstream node in the second upstream list information is determined to be the first upstream node.
In the embodiment of the application, a data synchronization device is provided, and by adopting the device, the target node can sort the values of the M nodes, so that the corresponding upstream nodes can be searched more conveniently, and the efficiency of synchronizing data is improved.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application, the data synchronization apparatus 20 further includes a sending module 204;
a sending module 204, configured to send a node registration request to a first upstream node after the determining module determines the first upstream node from the M upstream nodes according to the M node scores, so that the first upstream node determines the loadable quantity according to the node registration request;
the determining module 202 is further configured to determine that the first upstream node meets the node registration condition if the number of loadable nodes is greater than or equal to 1;
the determining module 202 is further configured to determine that the first upstream node does not satisfy the node registration condition if the number of loadable nodes is 0.
In the embodiment of the present application, a data synchronization apparatus is provided, and with the above apparatus, the mount number of upstream nodes may also be limited, and in order to avoid that the number of learners mounted by the same upstream node is too large, the maximum mount number may also be preset, thereby being beneficial to maintaining the stability and reliability of a data synchronization system.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application, the data synchronization apparatus 20 further includes a sending module 204 and a receiving module 205;
a sending module 204, configured to send a node registration request to a first upstream node after the determining module 202 determines the first upstream node from the M upstream nodes according to the M node scores;
a receiving module 205, configured to receive a node registration response sent by a first upstream node, where the node registration response carries role information of the first upstream node;
the determining module 202 is further configured to determine that the first upstream node satisfies the node registration condition if the role information indicates that the first upstream node is a slave node or a learning node;
the determining module 202 is further configured to determine that the first upstream node does not satisfy the node registration condition if the role information indicates that the first upstream node is the master node.
In the embodiment of the application, a data synchronization device is provided, and by using the device, a leader can be limited from providing a mounting condition for a leader, and the leader itself has a high load due to a large pressure on the leader caused by mounting the leader on the leader, so that the leader is not suitable for being used as an object for mounting the leader. And judging whether the upstream node meets the node registration condition or not based on the role information, so that the condition that the leader directly provides synchronous data for the leader can be avoided, and the stability and the reliability of a data synchronization system are favorably maintained.
Alternatively, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application,
the determining module 202 is further configured to, after determining a first upstream node from the M upstream nodes according to the M node scores, determine a second upstream node from the M upstream nodes according to the M node scores if the first upstream node does not satisfy the node registration condition, where a node score corresponding to the second upstream node is a second largest value of the M node scores;
the synchronization module 203 is further configured to receive data sent by the second upstream node if the second upstream node meets the node registration condition.
In the embodiment of the application, a data synchronization device is provided, and for learners, a scoring mechanism is used for scoring each upstream node, an optimal upstream node is screened out according to node scores of the upstream nodes, and once a target node cannot be registered to the upstream node with the highest score, the target node registers to the upstream node with the second highest node score in sequence until the registration is completed. Therefore, the feasibility of the scheme is guaranteed, and meanwhile, data synchronization can be efficiently completed.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application, the data synchronization apparatus 20 further includes a sending module 204 and a receiving module 205;
the receiving module 205 is further configured to receive a data reading request sent by the network device after the synchronization module 203 receives the data sent by the first upstream node;
the sending module 204 is further configured to send metadata information to the network device according to the data reading request.
In the embodiment of the application, a data synchronization device is provided, and by adopting the device, more efficient and reliable reading service can be provided in a large-scale data reading scene, so that the feasibility of a scheme is improved.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application, the data synchronization apparatus 20 further includes a sending module 204 and a receiving module 205;
the receiving module 205 is further configured to receive a first data reading request sent by the network device after the synchronization module 203 receives the data sent by the first upstream node;
the sending module 204 is further configured to send the first metadata information to the network device according to the first data reading request, where the network device is further configured to send a second data reading request to other nodes except the target node, and the second data reading request is used to obtain the second metadata information.
In the embodiment of the application, a data synchronization device is provided, and by adopting the device, more efficient and reliable reading service can be provided in a large-scale data reading scene, so that the feasibility of a scheme is improved. Meanwhile, one data reading request is divided into a plurality of data reading sub-requests, the reading amount of data is dispersed, namely the data is respectively read from different learners, so that the data reading efficiency is improved, and the load on the learners is reduced.
Optionally, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application, the data synchronization apparatus 20 further includes a sending module 204 and a receiving module 205;
a synchronization module 203, specifically configured to receive an initial query request sent by a first network device;
synchronizing target metadata information in the first upstream node to the target node according to the initial query request;
the receiving module 205 is further configured to receive a data query request sent by a second network device after the synchronization module 203 receives data sent by the first upstream node;
the sending module 204 is further configured to send the target metadata information to the second network device according to the data query request.
In the embodiment of the application, a data synchronization device is provided, and by using the device, for a network device that needs to frequently obtain target metadata information, the target metadata information can be stored locally at a target node in advance, when the network device requests the target metadata information again, only a memory of the target node needs to be accessed, and the target node does not need to request related data from an upstream node again, so that the data query efficiency is improved.
Alternatively, on the basis of the embodiment corresponding to fig. 10, in another embodiment of the data synchronization apparatus 20 provided in the embodiment of the present application,
the obtaining module 201 is further configured to, after the synchronization module 203 receives data sent by a first upstream node, obtain, when the first upstream node fails, a second node information set corresponding to the target node, where the second node information set includes second node information of N upstream nodes, the N upstream nodes are upstream nodes of the target node, and N is an integer greater than or equal to 1;
the determining module 202 is further configured to determine N node scores according to the second node information set, where the N node scores and N upstream nodes have a one-to-one correspondence relationship;
the determining module 202 is further configured to determine a third upstream node from the N upstream nodes according to the N node scores, where a node score corresponding to the third upstream node is a maximum value of the N node scores;
the synchronization module 203 is further configured to synchronize data in the third upstream node to the target node if the third upstream node satisfies the node registration condition.
In the embodiment of the application, a data synchronization device is provided, and by using the device, a target node can periodically adjust a corresponding node score according to state information, role information and association information of an upstream node, so that the situation that the selected upstream node is not suitable for continuously providing data synchronization service after a period of time is avoided, the feasibility and operability of a scheme are improved, and the data synchronization device is beneficial to providing more efficient data synchronization service.
Fig. 11 is a schematic diagram of a server 300 according to an embodiment of the present application, where the server 300 may have a relatively large difference due to different configurations or performances, and may include one or more Central Processing Units (CPUs) 322 (e.g., one or more processors) and a memory 332, and one or more storage media 330 (e.g., one or more mass storage devices) for storing applications 342 or data 344. Memory 332 and storage media 330 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 330 may include one or more modules (not shown), each of which may include a series of instruction operations for the server. Still further, the central processor 322 may be configured to communicate with the storage medium 330 to execute a series of instruction operations in the storage medium 330 on the server 300.
The Server 300 may also include one or more power supplies 326, one or more wired or wireless network interfaces 350, one or more input-output interfaces 358, and/or one or more operating systems 341, such as a Windows ServerTM,Mac OS XTM,UnixTM, LinuxTM,FreeBSDTMAnd so on.
The steps performed by the server in the above embodiment may be based on the server structure shown in fig. 11.
In the embodiment of the present application, the CPU 322 included in the server further has the following functions:
acquiring a first node information set corresponding to a target node, wherein the first node information set comprises first node information of M upstream nodes, the M upstream nodes are upstream nodes of the target node, and M is an integer greater than or equal to 1;
determining M node scores according to the first node information set, wherein the M node scores have a one-to-one correspondence relationship with M upstream nodes;
determining a first upstream node from the M upstream nodes according to the M node scores, wherein the node score corresponding to the first upstream node is the maximum value of the M node scores;
and if the first upstream node meets the node registration condition, receiving the data sent by the first upstream node.
Optionally, the CPU 322 is specifically configured to execute the following steps:
determining M upstream nodes according to first upstream list information corresponding to the target node;
sending an information acquisition request to each upstream node in the M upstream nodes so that each upstream node responds to the information acquisition request and acquires first node information of each upstream node;
when first node information sent by each upstream node is received, a first node information set is obtained.
Optionally, the CPU 322 is specifically configured to execute the following steps:
for each first node information in the first node information set, if the state information of the upstream node indicates an abnormal state, determining the node score of the upstream node as a preset value;
and for each first node information in the first node information set, if the state information of the upstream node indicates a normal state, determining a node score according to the association information of the upstream node, wherein the node score is greater than or equal to a preset value.
Optionally, the CPU 322 is specifically configured to execute the following steps:
calculating a score intermediate quantity according to the number of mounting nodes of the upstream nodes, the depth of the upstream nodes and the load information of the upstream nodes;
and calculating to obtain the node score of the upstream node according to the intermediate score quantity and the submitted log information of the upstream node, wherein the node score is positively correlated with the submitted log information of the upstream node, and the node score is negatively correlated with the number of mounted nodes of the upstream node, the depth of the upstream node and the load information of the upstream node.
Optionally, the CPU 322 is specifically configured to execute the following steps:
according to the sequence from big to small, the scores of the M nodes are sorted to obtain a sorting result;
sequencing the nodes in the first upstream list information according to the sequencing result to obtain second upstream list information, wherein the second upstream list information comprises M upstream nodes;
the first upstream node in the second upstream list information is determined to be the first upstream node.
Optionally, the CPU 322 is further configured to perform the following steps:
sending a node registration request to a first upstream node so that the first upstream node determines the loadable quantity according to the node registration request;
if the number of the suspendable nodes is larger than or equal to 1, determining that the first upstream node meets the node registration condition;
and if the number of the mountable nodes is 0, determining that the first upstream node does not meet the node registration condition.
Optionally, the CPU 322 is further configured to perform the following steps:
sending a node registration request to a first upstream node;
receiving a node registration response sent by a first upstream node, wherein the node registration response carries role information of the first upstream node;
if the role information indicates that the first upstream node is a slave node or a learning node, determining that the first upstream node meets a node registration condition;
and if the role information indicates that the first upstream node is the main node, determining that the first upstream node does not meet the node registration condition.
Optionally, the CPU 322 is further configured to perform the following steps:
if the first upstream node does not meet the node registration condition, determining a second upstream node from the M upstream nodes according to the M node scores, wherein the node score corresponding to the second upstream node is the second largest value of the M node scores;
and if the second upstream node meets the node registration condition, receiving the data sent by the second upstream node.
Optionally, the CPU 322 is further configured to perform the following steps:
receiving a data reading request sent by network equipment;
and sending the metadata information to the network equipment according to the data reading request.
Optionally, the CPU 322 is further configured to perform the following steps:
receiving a first data reading request sent by network equipment;
and sending the first metadata information to the network equipment according to the first data reading request, wherein the network equipment is also used for sending a second data reading request to other nodes except the target node, and the second data reading request is used for acquiring the second metadata information.
Optionally, the CPU 322 is specifically configured to execute the following steps:
receiving an initial query request sent by first network equipment;
synchronizing target metadata information in the first upstream node to the target node according to the initial query request;
CPU 322 is also configured to perform the following steps:
receiving a data query request sent by second network equipment;
and sending the target metadata information to the second network equipment according to the data query request.
Optionally, the CPU 322 is further configured to perform the following steps:
when a first upstream node fails, acquiring a second node information set corresponding to a target node, wherein the second node information set comprises second node information of N upstream nodes, the N upstream nodes are upstream nodes of the target node, and N is an integer greater than or equal to 1;
determining N node scores according to the second node information set, wherein the N node scores have a one-to-one correspondence relationship with the N upstream nodes;
determining a third upstream node from the N upstream nodes according to the N node scores, wherein the node score corresponding to the third upstream node is the maximum value of the N node scores;
and if the third upstream node meets the node registration condition, synchronizing the data in the third upstream node to the target node.
Embodiments of the present application also provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute the method described in the foregoing embodiments.
Embodiments of the present application also provide a computer program product including a program, which, when run on a computer, causes the computer to perform the methods described in the foregoing embodiments.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims (15)

1. A method of data synchronization, comprising:
acquiring a first node information set corresponding to a target node, wherein the first node information set comprises first node information of M upstream nodes, the M upstream nodes are upstream nodes of the target node, and M is an integer greater than or equal to 1;
determining M node scores according to the first node information set, wherein the M node scores and the M upstream nodes have one-to-one correspondence;
determining a first upstream node from the M upstream nodes according to the M node scores, wherein the node score corresponding to the first upstream node is the maximum value of the M node scores;
and if the first upstream node meets the node registration condition, receiving the data sent by the first upstream node.
2. The method of claim 1, wherein the obtaining the first node information set corresponding to the target node comprises:
determining the M upstream nodes according to first upstream list information corresponding to the target node;
sending an information acquisition request to each upstream node in the M upstream nodes, so that each upstream node responds to the information acquisition request and acquires first node information of each upstream node;
and when first node information sent by each upstream node is received, acquiring the first node information set.
3. The method of claim 1, wherein the first node information comprises state information of an upstream node and association information of the upstream node;
the determining M node scores according to the first node information set includes:
for each first node information in the first node information set, if the state information of the upstream node indicates an abnormal state, determining that the node score of the upstream node is a preset value;
and for each piece of first node information in the first node information set, if the state information of the upstream node indicates a normal state, determining a node score according to the association information of the upstream node, wherein the node score is greater than or equal to the preset value.
4. The method of claim 3, wherein the association information of the upstream node comprises a number of mounted nodes of the upstream node, a depth of the upstream node, load information of the upstream node, and committed log information of the upstream node;
the determining a node score according to the association information of the upstream node includes:
calculating a score intermediate quantity according to the number of the mounting nodes of the upstream nodes, the depth of the upstream nodes and the load information of the upstream nodes;
and calculating the node score of the upstream node according to the intermediate score and the submitted log information of the upstream node, wherein the node score is positively correlated with the submitted log information of the upstream node, and the node score is negatively correlated with the number of the mounted nodes of the upstream node, the depth of the upstream node and the load information of the upstream node.
5. The method of claim 1, wherein said determining a first upstream node from said M upstream nodes based on said M node scores comprises:
according to the sequence from big to small, the scores of the M nodes are sorted to obtain a sorting result;
sequencing the nodes in the first upstream list information according to the sequencing result to obtain second upstream list information, wherein the second upstream list information comprises M upstream nodes;
determining a first upstream node in the second upstream list information as the first upstream node.
6. The method of claim 1, wherein after determining a first upstream node from the M upstream nodes based on the M node scores, the method further comprises:
sending a node registration request to the first upstream node so that the first upstream node determines the mountable amount according to the node registration request;
if the mountable number is greater than or equal to 1, determining that the first upstream node meets the node registration condition;
and if the mountable number is 0, determining that the first upstream node does not satisfy the node registration condition.
7. The method of claim 1, wherein after determining a first upstream node from the M upstream nodes based on the M node scores, the method further comprises:
sending a node registration request to the first upstream node;
receiving a node registration response sent by the first upstream node, wherein the node registration response carries role information of the first upstream node;
if the role information indicates that the first upstream node is a slave node or a learning node, determining that the first upstream node meets the node registration condition;
and if the role information indicates that the first upstream node is the main node, determining that the first upstream node does not meet the node registration condition.
8. The method of claim 1, wherein after determining a first upstream node from the M upstream nodes based on the M node scores, the method further comprises:
if the first upstream node does not meet the node registration condition, determining a second upstream node from the M upstream nodes according to the M node scores, wherein the node score corresponding to the second upstream node is the second largest value of the M node scores;
and if the second upstream node meets the node registration condition, receiving data sent by the second upstream node.
9. The method according to any of claims 1 to 8, wherein after receiving the data sent by the first upstream node, the method further comprises:
receiving a data reading request sent by network equipment;
and sending metadata information to the network equipment according to the data reading request.
10. The method according to any of claims 1 to 8, wherein after receiving the data sent by the first upstream node, the method further comprises:
receiving a first data reading request sent by network equipment;
and sending first metadata information to the network equipment according to the first data reading request, wherein the network equipment is further used for sending a second data reading request to other nodes except the target node, and the second data reading request is used for acquiring second metadata information.
11. The method according to any one of claims 1 to 8, wherein the receiving the data sent by the first upstream node comprises:
receiving an initial query request sent by first network equipment;
synchronizing target metadata information in the first upstream node to the target node according to the initial query request;
after receiving the data sent by the first upstream node, the method further includes:
receiving a data query request sent by second network equipment;
and sending the target metadata information to the second network equipment according to the data query request.
12. The method according to any of claims 1 to 8, wherein after receiving the data sent by the first upstream node, the method further comprises:
when the first upstream node fails, acquiring a second node information set corresponding to a target node, wherein the second node information set comprises second node information of N upstream nodes, the N upstream nodes are upstream nodes of the target node, and N is an integer greater than or equal to 1;
determining N node scores according to the second node information set, wherein the N node scores and the N upstream nodes have one-to-one correspondence;
determining a third upstream node from the N upstream nodes according to the N node scores, wherein the node score corresponding to the third upstream node is the maximum value of the N node scores;
and if the third upstream node meets the node registration condition, synchronizing the data in the third upstream node to the target node.
13. A data synchronization apparatus, comprising:
an obtaining module, configured to obtain a first node information set corresponding to a target node, where the first node information set includes first node information of M upstream nodes, where the M upstream nodes are upstream nodes of the target node, and M is an integer greater than or equal to 1;
a determining module, configured to determine M node scores according to the first node information set, where the M node scores and the M upstream nodes have a one-to-one correspondence relationship;
the determining module is further configured to determine a first upstream node from the M upstream nodes according to the M node scores, where a node score corresponding to the first upstream node is a maximum value of the M node scores;
and the synchronization module is used for receiving the data sent by the first upstream node if the first upstream node meets the node registration condition.
14. A server, comprising: a memory, a transceiver, a processor, and a bus system;
wherein the memory is used for storing programs;
the processor for executing the program in the memory, the processor for performing the method of any one of claims 1 to 12 according to instructions in program code;
the bus system is used for connecting the memory and the processor so as to enable the memory and the processor to communicate.
15. A computer-readable storage medium comprising instructions that, when executed on a computer, cause the computer to perform the method of any of claims 1 to 12.
CN202011044182.4A 2020-09-28 2020-09-28 Data synchronization method, related device, equipment and storage medium Active CN111935320B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011044182.4A CN111935320B (en) 2020-09-28 2020-09-28 Data synchronization method, related device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011044182.4A CN111935320B (en) 2020-09-28 2020-09-28 Data synchronization method, related device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN111935320A true CN111935320A (en) 2020-11-13
CN111935320B CN111935320B (en) 2021-01-05

Family

ID=73334716

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011044182.4A Active CN111935320B (en) 2020-09-28 2020-09-28 Data synchronization method, related device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN111935320B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113835930A (en) * 2021-09-26 2021-12-24 杭州谐云科技有限公司 Cache service recovery method, system and device based on cloud platform
CN113946287A (en) * 2021-09-08 2022-01-18 广州虎牙科技有限公司 Distributed storage system and data processing method and related device thereof
CN116095096A (en) * 2023-01-05 2023-05-09 中国联合网络通信集团有限公司 Data synchronization method, device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170344904A1 (en) * 2016-05-31 2017-11-30 International Business Machines Corporation Coordinated version control system, method, and recording medium for parameter sensitive applications
CN107919977A (en) * 2016-10-11 2018-04-17 阿里巴巴集团控股有限公司 A kind of on-line rapid estimation of the distributed consensus system based on Paxos agreements, the method and apparatus of online capacity reducing
CN110427433A (en) * 2019-08-08 2019-11-08 上海中通吉网络技术有限公司 A kind of block chain common recognition method and storage medium
US20190392072A1 (en) * 2018-06-22 2019-12-26 Ebay Inc. Key-value replication with consensus protocol
CN110990154A (en) * 2019-11-28 2020-04-10 曙光信息产业股份有限公司 Big data application optimization method and device and storage medium
CN111083192A (en) * 2019-11-05 2020-04-28 北京字节跳动网络技术有限公司 Data consensus method and device and electronic equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108347455B (en) * 2017-01-24 2021-03-26 阿里巴巴集团控股有限公司 Metadata interaction method and system
EP3709229A1 (en) * 2019-03-13 2020-09-16 Ricoh Company, Ltd. Learning device and learning method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170344904A1 (en) * 2016-05-31 2017-11-30 International Business Machines Corporation Coordinated version control system, method, and recording medium for parameter sensitive applications
CN107919977A (en) * 2016-10-11 2018-04-17 阿里巴巴集团控股有限公司 A kind of on-line rapid estimation of the distributed consensus system based on Paxos agreements, the method and apparatus of online capacity reducing
US20190392072A1 (en) * 2018-06-22 2019-12-26 Ebay Inc. Key-value replication with consensus protocol
CN110427433A (en) * 2019-08-08 2019-11-08 上海中通吉网络技术有限公司 A kind of block chain common recognition method and storage medium
CN111083192A (en) * 2019-11-05 2020-04-28 北京字节跳动网络技术有限公司 Data consensus method and device and electronic equipment
CN110990154A (en) * 2019-11-28 2020-04-10 曙光信息产业股份有限公司 Big data application optimization method and device and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113946287A (en) * 2021-09-08 2022-01-18 广州虎牙科技有限公司 Distributed storage system and data processing method and related device thereof
CN113835930A (en) * 2021-09-26 2021-12-24 杭州谐云科技有限公司 Cache service recovery method, system and device based on cloud platform
CN113835930B (en) * 2021-09-26 2024-02-06 杭州谐云科技有限公司 Cache service recovery method, system and device based on cloud platform
CN116095096A (en) * 2023-01-05 2023-05-09 中国联合网络通信集团有限公司 Data synchronization method, device and storage medium
CN116095096B (en) * 2023-01-05 2024-05-03 中国联合网络通信集团有限公司 Data synchronization method, device and storage medium

Also Published As

Publication number Publication date
CN111935320B (en) 2021-01-05

Similar Documents

Publication Publication Date Title
US11360854B2 (en) Storage cluster configuration change method, storage cluster, and computer system
US9906598B1 (en) Distributed data storage controller
US10496669B2 (en) System and method for augmenting consensus election in a distributed database
US8918392B1 (en) Data storage mapping and management
US8108634B1 (en) Replicating a thin logical unit
US8122284B2 (en) N+1 failover and resynchronization of data storage appliances
US11314444B1 (en) Environment-sensitive distributed data management
CN111813760B (en) Data migration method and device
US20110153570A1 (en) Data replication and recovery method in asymmetric clustered distributed file system
CN102904949B (en) Replica-based dynamic metadata cluster system
US8930364B1 (en) Intelligent data integration
US10331625B2 (en) Managing sequential data store
JP6491210B2 (en) System and method for supporting persistent partition recovery in a distributed data grid
US20150095282A1 (en) Multi-site heat map management
CN111935320B (en) Data synchronization method, related device, equipment and storage medium
US20060190460A1 (en) Method and mechanism of handling reporting transactions in database systems
CN107220271B (en) Method and system for storage processing and management of distributed digital resources
CN113987064A (en) Data processing method, system and equipment
CN110022338A (en) File reading, system, meta data server and user equipment
CN117677943A (en) Data consistency mechanism for hybrid data processing
CN109726211B (en) Distributed time sequence database
US11134121B2 (en) Method and system for recovering data in distributed computing system
Das Scalable and elastic transactional data stores for cloud computing platforms
CN111752892B (en) Distributed file system and implementation method, management system, equipment and medium thereof
CN111522688B (en) Data backup method and device for distributed system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant