WO2020001060A1 - 分布式系统成员变更方法和分布式系统 - Google Patents

分布式系统成员变更方法和分布式系统 Download PDF

Info

Publication number
WO2020001060A1
WO2020001060A1 PCT/CN2019/076844 CN2019076844W WO2020001060A1 WO 2020001060 A1 WO2020001060 A1 WO 2020001060A1 CN 2019076844 W CN2019076844 W CN 2019076844W WO 2020001060 A1 WO2020001060 A1 WO 2020001060A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
address
distributed system
target node
master node
Prior art date
Application number
PCT/CN2019/076844
Other languages
English (en)
French (fr)
Inventor
白杨
陈雷
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP19827288.2A priority Critical patent/EP3817290A4/en
Publication of WO2020001060A1 publication Critical patent/WO2020001060A1/zh
Priority to US17/125,318 priority patent/US11445013B2/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • H04L67/1046Joining mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0889Techniques to speed-up the configuration process
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/08Configuration management of networks or network elements
    • H04L41/0876Aspects of the degree of configuration automation
    • H04L41/0886Fully automatic configuration
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • H04L67/1048Departure or maintenance mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/34Network arrangements or protocols for supporting network services or applications involving the movement of software or configuration parameters 
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • H04L67/5682Policies or rules for updating, deleting or replacing the stored data
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/40Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass for recovering from a failure of a protocol instance or entity, e.g. service redundancy protocols, protocol state redundancy or protocol service redirection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0654Management of faults, events, alarms or notifications using network fault recovery
    • H04L41/0668Management of faults, events, alarms or notifications using network fault recovery by dynamic selection of recovery network elements, e.g. replacement by the most appropriate element after failure

Definitions

  • This application relates to the field of computers, and in particular to distributed systems.
  • the distributed system includes multiple members. After the members in the distributed system increase (or decrease), the members need to be notified of the increase (or decrease) of the members so that the members in the cluster can know the latest information in the cluster. Membership. For example, members can change members by adding new members (or deleting existing members) to the member list of their own records.
  • member changes are mainly realized through log synchronization technology.
  • the master node of the cluster After the members increase (or decrease), the master node of the cluster obtains information about the increase (or decrease) of members, and uses the member change instructions to operate the log. Synchronize to each slave node. After receiving the member change instruction from the slave node, the member list recorded by itself is updated according to the operation log.
  • the master node informs all original members of the execution log, and updates the cluster member set to ⁇ C1, C2 ⁇ (that is, the cluster members have saved both C1 and C2).
  • the list of members of each original member is C1 (the update notification has not been received or the update has not been completed) or ⁇ C1, C2 ⁇ (the update notification has been received and the update has been completed).
  • the master node must satisfy both the majority of the C1 set and the majority of the C2 set.
  • the master node recognized by the member who owns the C1 list and the master node recognized by the members who have the C2 list are the same node. This ensures that there will be no dual master node scenario during the member change process.
  • a method for changing a member of a distributed system including: a first target node requests a management server for a node address set, wherein the node address set includes addresses of all nodes in the distributed system, and the Distributed system master node and multiple slave nodes; when the address of the first target node is not in the node address set, the first target node sends a join request to the master node of the distributed system, and the join request Including the address of the first target node; after receiving the join request, adding the address of the first target node in the member addition instruction to the local member list of the master node, the master node The node instructs each node in the distributed system to add the address of the first target node to a local member list.
  • the first target node can actively join the distributed system on its own initiative, eliminating the need to rely on the operation and maintenance personnel to manually configure the master node. And no longer use logs as a technical means, the entire process is concise and efficient, and the resource occupation rate is low.
  • the first target node creates a local member list of the first target node, and the local member list of the first target node includes: in the distributed system Addresses of all nodes, and addresses of the first target node. This solution is used to configure the first target node so that the first target node recognizes itself as a member of the distributed system.
  • the master node instructs each slave node of the plurality of slave nodes in the distributed system to increase an address of the first target node to all addresses.
  • the local member list specifically includes: the master node sends a member adding instruction to each of the slave nodes, wherein the member adding instruction includes an address of the first target node; and the master node receives the slave The node responds to the member addition sent, the member addition response is a response message of the member addition instruction; the master node adds the address of the first target node to a local member list, and the master node sends a member addition
  • the validating instruction is given to all the slave nodes, and instructs all the slave nodes to add the address of the first target node to the local member list according to the member adding response after receiving the member adding validating instruction.
  • the first target node sending the join request to the master node includes: the first target node sends all the addresses in the node address set to A manner of broadcasting the join request, thereby sending the join request to the master node.
  • the broadcast method can be used to ensure that the join request can be sent to the master node, and avoid that the master node cannot receive the join request.
  • the sending, by the first target node, the join request to the master node includes: the first target node requests a node from a management server to the first target node After the address set, the method further includes one of the following steps: the master node caches the next join request received; or the master node caches the leave request received.
  • the member addition effective instruction specifically includes a COMMIT instruction. This plan introduces the specific content of members' effective instructions.
  • the address includes at least one or a combination of a node ID, a node IP address, and a node port number. This scheme introduces the possible forms of addresses. Addresses can also be in other forms, which can be used as unique tags for nodes.
  • the master node after the master node receives another join request, the master node records a member change flag, and the member change flag includes: a change type and the third The address of the target node; when the master node follows another join request, each slave node of the plurality of slave nodes in the distributed system is instructed to increase the address of the first target node to a local member Before the list, the master node is faulty; the slave node in the distributed system elects a new master node; after the new master node reads the change flag, it instructs the distributed node according to the change type The slave node in the system adds the address of the third target node to the local member list.
  • This solution enables: after the master node fails, the newly selected master node can perform member addition / deletion operations again. This ensures reliable execution of member addition / deletion requests.
  • the method may further include: a second target node sending a leave request to the master node; After receiving the leave request, the master node sends a member delete instruction to all slave nodes, where the member leave instruction includes the address of the second target node; each slave node receives the member delete instruction after receiving the member delete instruction Sending a member deletion response to the master node; after receiving the member deletion response from the slave node, the master node deletes the address of the second target node from the node address set; the master node Delete the second target node address from the local member list, the master node sends a member deletion effective instruction to the each slave node; each slave node deletes the second from the local member list The address of the target node.
  • This solution describes how to delete a node from the distributed system.
  • a node set is provided, where the node set includes a first target node and a distributed system, where the distributed system includes multiple nodes, and the distributed system may execute the method of the first aspect, or execute Each of the first aspects is possible.
  • a method for changing a member of a distributed system including: a second target node requests a management server for a node address set, wherein the distributed system includes a master node and a plurality of slave nodes, and Including the addresses of all nodes in the distributed system; when the address of the second target node is in the node address set, the second target node sends a leave request to the master node of the distributed system, The leave request includes the address of the second target node; after receiving the leave requests of all the slave nodes, the master node removes the address of the second target node from the local of the master node Deleting the member list, and instructing each slave node of the plurality of slave nodes in the distributed system to delete the address of the second target node from the local member list.
  • the deletion process can be initiated by the node that needs to be deleted, without relying on the operation and maintenance personnel to manually configure the master node. And the deletion process does not involve logs, which is simple and efficient,
  • the master node instructs all slave nodes in the distributed system to delete the address of the second target node from a local member list, and specifically includes: the master node sends A member deletion instruction is given to all slave nodes in the distributed system, wherein the member addition instruction includes the address of the second target node; the slave node receiving the member deletion instruction sends a member deletion response to the member node Master node; after determining that the slave node has received the member deletion response from all slave nodes, deleting the address of the second target node from the node address set of the management server, and sending a member deletion effective instruction to all slave nodes ; The slave node, after receiving the member addition effective instruction, deletes the address of the second target node from the local member list.
  • This solution describes the specific process of deletion, such as the specific operation of the slave node and the master node.
  • a distributed system includes multiple nodes.
  • the distributed system may execute the method of the third aspect described above, or perform each possible implementation of the third aspect.
  • FIG. 1 is a topology diagram of a distributed system embodiment of the present invention
  • FIG. 2 is a schematic diagram of adding a new member to an existing distributed system
  • FIG. 3 is a schematic diagram of deleting an existing member from an existing distributed system
  • FIG. 4 is a flowchart of a method for changing members of a distributed system.
  • a distributed system (also referred to as a cluster) includes multiple nodes, and the nodes have computing capabilities, such as computers or servers.
  • the node is the controller of the storage array.
  • the nodes that make up a distributed system are also called members of the distributed system.
  • the nodes of the distributed system can be divided into master nodes and slave nodes.
  • the master node has certain management functions for the slave nodes.
  • the distributed system 1 includes: a node 11, a node 12, a node 13, a node 14, and a node 15.
  • Node 11 is the master node, and the remaining nodes are slave nodes.
  • the distributed system 2 and the management server 2 communicate.
  • the distributed system 1 in the embodiment of the present patent may be a distributed system of self-selection.
  • a distributed system based on the Paxos algorithm or a distributed system based on the zab algorithm, or a raft algorithm.
  • the so-called self-selected master means that after the failure of the master node 11, there is no need to rely on nodes outside the distributed system, and rely on nodes inside the distributed system (that is, slave nodes 12, slave nodes 13, and slave nodes 14). In the election of the new master node.
  • the node 16 is a node other than the distributed system 1 and needs to be added to the distributed system 1.
  • the entirety of the node 15 and the distributed system 1 may be referred to as a node set.
  • the node 15 is a node other than the distributed system 1 and needs to be deleted from the distributed system 1.
  • the general two-stage log method is too complicated and must be triggered manually by the administrator on the master node, which consumes a lot of resources of the distributed system and increases the burden on the administrator.
  • an embodiment of the present invention provides a method for changing members of a distributed system.
  • Step 11 The first target node (for example, node 16 in FIG. 2) requests a management server (management server 2) for a node address set, where the node address set includes addresses of all nodes in the distributed system.
  • a management server management server 2
  • the first target node is a node that needs to be added to the distributed system. In this step, without the administrator's participation, the first target node actively requests the management server for the node address set.
  • the management server may be outside the distributed system or a node with a storage function in the distributed system, as long as it has a non-volatile storage function.
  • the management server may also be the first target node.
  • the node address set includes the addresses of all nodes in the distributed system.
  • the node address is used to distinguish different nodes and can be regarded as the label of the node.
  • the node address is one or a combination of a node ID, a node IP address, or a combination of a node ID, a node IP address, and a node port number.
  • Step 12 After acquiring the address set, the first target node determines whether the address of the first target node is in the address set, thereby determining whether the first target node is already in the distributed system. Node.
  • the address of the first target node is in the node address set, it means that the first target node has joined the distributed system, and exits the process.
  • the first target node When the address of the first target node is not in the node address set, the first target node sends a join request to the master node of the distributed system.
  • the join request includes an address of the first target node.
  • the first target node may send a join request to the master node, and examples are given below.
  • the first target node When the first target node cannot know the address of the master node, it is difficult to establish a point-to-point communication connection with the master node, and then a broadcast can be sent to all nodes in the distributed system by using a node address set Sending the joining request to the master node. There is another solution.
  • the first target node sends a join request to one or more nodes in the distributed system. If the node receiving the join request is not the master node, it continues to pass the join down. Request until the master node receives the join request.
  • the join request may be directly sent to the master node.
  • Step 13 After receiving the join request, the master node sends a member addition instruction to all slave nodes in the distributed system, so as to add the first target node to the distributed system.
  • the member adding instruction includes an address of the first target node.
  • Each node (including the master node and member nodes) in the distributed system has a member list, and the member list is used to record all members in the distributed system (or record all members except the node), record
  • the specific information can be a member address.
  • the master node may use the member address in the member list, and may send the member adding instruction to all members.
  • Step 14 A slave node that receives the member addition instruction, and sends a member addition response to the master node after receiving the member addition instruction.
  • the member addition response is a response message of the member addition instruction, and is used to inform the master node that it has successfully received the member addition instruction.
  • the member addition operation will not be performed (that is, the first target node will not be "added” at this time), and the member addition will be performed in step 16.
  • the slave node that receives the member increase instruction may cache the member increase instruction, and send the member increase instruction response after the cache is successful.
  • the master node After receiving the response to the member adding instruction sent by all the slave nodes, the master node increases the address of the first target node to the location of the master according to the address of the first target node included in the member adding instruction. List of members of the node.
  • the master node may also send a member addition effective instruction to the slave node, which instruction is used to instruct the slave node to execute the member addition instruction cached in step 14.
  • the member adding effective instruction may be a commit instruction.
  • the master node may also send the address of the first target node to the management server, so that the management server writes the address of the first target node into the node address set.
  • the address of the first target node is added to a node address set of the management server, that is, the node address set is updated. After the node address set is updated, if there is a new stage that needs to join or leave the distributed cluster, it may be determined whether it belongs to the distributed cluster by querying the updated member address set.
  • step 16 after receiving the member adding effective instruction, the slave node executes the adding operation of the first target node (for example, executes the member adding instruction cached in the cache), that is, the first target node The address is added to the slave's own local member list.
  • the adding operation of the first target node for example, executes the member adding instruction cached in the cache
  • adding the address of the first target node to the local member list means that the first target node is recognized as a member of the distributed system.
  • the “local” in each embodiment of the present invention is for a node.
  • node A any node named node A as an example.
  • the local member list of node A belongs to node A.
  • the processor of node A can learn which members in the distributed system where node A is located besides node A by reading its own member list. This list of members can be stored on node A, and in some cases outside of node A.
  • the first target node creates a local member list
  • the local member list created by the first target node includes: addresses of all original nodes in the distributed system.
  • the address of the first target node may be further included.
  • the first target node After completing steps 15, 16 and 17, the first target node creates its own local member list, and all nodes including the original nodes in the distributed system update their own local member list, realizing the first A target node joins the distributed system. It can be seen from the above steps that the solution of the embodiment of the present invention provides a node adding technology that forms a member "auto-discovery". New nodes can be actively added to the distributed storage system, and the original distributed system can sense the addition of new nodes. It simplifies the participation of operation and maintenance personnel in the change of members, and makes the entire member change process more automated and intelligent.
  • the following steps 18-23 describe the process of deleting the second target node from the distributed system.
  • the second target node may be any node except the master node in the distributed system (for example, node 15 in FIG. 3).
  • it may be the first target node (node 16).
  • the process of deleting a node may be performed before (or after) the process of adding a node. That is to say, the flow of steps 11-17 and the flow of steps 18-23 are relatively independent. The two can be executed one after the other or only one of them, and not executed in parallel.
  • Step 18 The second target node requests a node address set from the management server, where the node address set includes addresses of all nodes in the distributed system.
  • the second target node is a node in the distributed system that needs to go offline. In this step, without the administrator's participation, the second target node actively requests the management server for the node address set.
  • Step 19 After acquiring the address set, the second target node determines whether the address of the second target node is in the address set. This step is optional.
  • the address of the second target node is not in the node address set, it means that the second target node no longer belongs to the distributed system, and the process is exited.
  • the second target node When the address of the second target node is in the node address set, the second target node sends a leave request to the master node of the distributed system.
  • the leaving request includes an address of the second target node.
  • the determination step in step 19 may not be passed, and the second target node sends a leave request to the master node of the distributed system.
  • the judgment is optional, and the second target node may directly send a leave request to the master node.
  • the second target node may send the leave request to the master node, such as unicast, multicast, and broadcast, as long as the leave request can be sent to the master node.
  • Step 20 After receiving the leave request, the master node sends a member deletion instruction to all slave nodes in the distributed system, so as to leave the second target node from the distributed system.
  • the member deletion instruction includes an address of the second target node.
  • the master node may use the member address in the member list, and may send the member deletion instruction to all members.
  • Step 21 The slave node receiving the member deletion instruction, and after receiving the member deletion instruction, sends a member deletion response to the master node, so as to inform the master node that it has successfully received the member deletion instruction.
  • the member deletion response is a response message of the member deletion instruction, and is used to inform the master node that it has successfully received the member deletion instruction.
  • the member deletion instruction can be cached in the memory of the slave node. The deletion operation will not be executed in this step (that is, the second target member will not be "deleted” at this time). Only executed in step 23.
  • the slave node receiving the member deletion instruction may cache the member deletion instruction, and then send the member deletion instruction after the cache is successful.
  • Step 22 After receiving the member deletion response from all slave nodes, the master node sends a member deletion validating instruction to all slave nodes.
  • the master node also deletes the address of the second target node from the member list of the master node.
  • the master node may also instruct the management server to delete the address of the second target node from the node address set of the management server. After the node address set is updated, if there is a new stage that needs to leave or leave the distributed cluster, it is possible to determine whether it belongs to the distributed cluster by querying the updated member address set.
  • the member deletion validating instruction may be a commit instruction, and sending the member deletion validating instruction is used to instruct the slave node to execute the member deletion instruction cached in step 14.
  • Step 23 After receiving the member delete effective instruction, the slave node performs a read operation of the second target node (for example, the member delete instruction cached in the cache is executed), that is, the second target is deleted.
  • the node's address is removed from the local member list of each slave node.
  • deleting the address of the second target node from the local member list means that the second target node exits the distributed system and is no longer in the distributed system. member.
  • the second target node is no longer a member of the distributed system.
  • the optional second target node may delete its local member list.
  • the solution of the embodiment of the present invention provides a node deletion technology of "actively deleting itself".
  • a node in a distributed system may actively exclude itself from the distributed storage system.
  • the distributed system can sense a node's deletion request and respond. It simplifies the participation of operation and maintenance personnel in the change of members, and makes the entire member change process more automated and intelligent.
  • the master node after the master node receives the join request and before the master node sends a member addition validating instruction to all member nodes; or after the master node receives the leave request, and Before the master node sends a member deletion effective instruction to all member nodes, if the master node receives a new join request or a new leave request, it can perform caching. To avoid executing two member change requests at the same time.
  • a member change flag may be set in the management server.
  • the member change flag is used to record the type of change (add or delete) and the address of the node that needs to be changed. If the master node fails in the above process, depending on the type of change recorded in the member change flag and the address of the node that needs to be changed, the newly selected master node can re-execute the member change.
  • the member change flag includes: a change type (join type / delete type) and an address of a member to be added (for example, an address of a third target node); when all slave nodes in the distributed system are instructed to copy the first Before the addresses of the three target nodes are added to the local member list, the master node is faulty; the distributed system elects a new master node; the new master node reads the change flag and resets the first target The address of the node is added to the set of node addresses, and all slave nodes in the distributed system are instructed to add the address of the third target node to the local member list.
  • a change type join type / delete type
  • an address of a member to be added for example, an address of a third target node
  • a new master node can be elected based on the negotiation between the slave nodes, for example, the slave node with the smallest load is selected as the new master node, or the earliest slave node joining the cluster is selected as the new master node.
  • the new master node may restart execution from step 13 ( Sending a member addition instruction to all the slave nodes in the distributed system), so as to re-perform steps 13-17, thereby realizing adding the first target node to the distributed system.
  • step 13 Sending a member addition instruction to all the slave nodes in the distributed system
  • step 18 sending member addition instructions to the distributed system All slave nodes

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Automation & Control Theory (AREA)
  • Computer Security & Cryptography (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Multi Processors (AREA)

Abstract

一种更高效率的分布式系统成员变更方案,包括:第一目标节点向管理服务器请求节点地址集合,其中,所述节点地址集合中包括了分布式系统中所有节点的地址;当第一目标节点的地址不在所述节点地址集合中,所述第一目标节点向分布式系统的主节点发送加入请求;所述主节点把所述第一目标节点的地址增加到所述节点地址集合中,以及指令所述分布式系统中的所有从节点把所述第一目标节点的地址增加到本地成员列表。

Description

分布式系统成员变更方法和分布式系统 技术领域
本申请涉及计算机领域,特别涉及分布式系统。
背景技术
分布式系统包括多个成员,分布式系统中的成员增加(或者减少)后,需要把成员的增加(或者减少)这一信息通知到各个成员,以便让集群中的成员可以获知集群中最新的成员情况。例如,成员可以通过在自己记录的成员列表中新增新的成员(或者删除原有成员)进行成员变更。
在业界,主要通过日志(log)同步技术实现成员变更,在成员发生增加(或者减少)后,集群的主节点获得成员发生增加(或者减少)的信息,把进行成员变更指令以操作日志的方式同步给各个从节点,从节点收到成员变更指令后按照操作日志更新自己记录的成员列表。
在现有技术中,假设成员变更前集群成员集合为C1,成员变更后集群成员为C2,下面对两阶段提交来执行成员变更进行介绍。第一阶段:主节点通知所有原有成员执行日志,将集群成员集合更新为{C1,C2}(也就是说集群成员既保存了C1又保存了C2)。在此期间每个原有成员的成员列表为C1(还没收到更新通知或者更新尚未完成)或者{C1,C2}(已经收到更新通知,并且完成了更新)。对于{C1,C2}列表来说,主节点必须同时满足C1集合的大多数和C2集合的大多数。换句话说需要满足一个原则:对于拥有C1列表的成员所认可的主节点、和对于拥有C2列表的成员所认可的主节点,是同一个节点。这样就保证了在成员变更过程中不会存在双主节点的场景。第二阶段:第一阶段成功(大多数成员或者所有成员成功的更新为{C1,C2}就可以认为第一阶段成功)后,主节点通知所有成员将通过执行日志把成员集合更新为C2。在更新完成前,每个成员的成员列表为{C1,C2}(更新完成前)或者C2(更新完成后)。
由上可以看出,上述使用日志的方案过于复杂,既要花费大量的时间又要耗费节点大量的资源,而且必须人工参与,增加了运维人员的工作量。
发明内容
第一方面,提供一种分布式系统成员变更方法,包括:第一目标节点向管理服务器请求节点地址集合,其中,所述节点地址集合中包括所述分布式系统中所有节点的地址,所述分布式系统主节点和多个从节点;当第一目标节点的地址不在所述节点地址集合中,所述第一目标节点向分布式系统的所述主节点发送加入请求,所述加入请求中包括所述第一目标节点的地址;在收到所述加入请求后,把所述成员增加指令中的所述第一目标节点的地址增加到所述主节点的本地成员列表中,所述主节点指令所述分布式系统中每个节点把所述第一目标节点的地址增加到本地成员列表。应用该方法,第一目标节点可以主动自发加入分布式系统,不再需要依赖运维人员手动配置主节点。而且不再使用日志作为技术手段,整个过程简洁高效,资源占用率低。
在所述第一方面的第一种可能实现中,所述第一目标节点创建所述第一目标节点的本地成员列表,所述第一目标节点的本地成员列表包括:所述分布式系统中所有节 点的地址,以及所述第一目标节点的地址。该方案用于进行所述第一目标节点的配置,使所述第一目标节点认可自己成为所述分布式系统的成员。
在所述第一方面的第二种可能实现中,所述主节点指令所述分布式系统中的所述多个从节点中的每个从节点把所述第一目标节点的地址增加到所述本地成员列表,具体包括:所述主节点发送成员增加指令给所述每个从节点,其中,所述成员增加指令中包括所述第一目标节点的地址;所述主节点接收所述从节点对发送的成员增加响应,所述成员增加响应是所述成员增加指令的响应消息;所述主节点把所述第一目标节点的地址增加到本地成员列表,以及所述主节点发送成员增加生效指令给所有从节点,指示所有从节点在收到所述成员增加生效指令后,按照所述成员增加响应把所述第一目标节点的地址增加到本地成员列表。该方案介绍了在增加从节点的过程中,主节点和从节点直接的交互流程。
在所述第一方面的第三种可能实现中,所述第一目标节点向所述主节点发送所述加入请求,包括:所述第一目标节点通过向所述节点地址集合中的所有地址广播所述加入请求的方式,从而把所述加入请求发送给所述主节点。使用广播的方法,可以确保所述加入请求能够被的发送给主节点,避免主节点收不到加入请求。
在所述第一方面的第四种可能实现中,所述第一目标节点向所述主节点发送所述加入请求,包括:所述第一目标节点通过向第一目标节点向管理服务器请求节点地址集合之后,所述方法还包括下述步骤之一:所述主节点缓存收到的下一个加入请求;或者所述主节点缓存收到的离开请求。该方案可以避免多个成员变更请求并行处理所引起的冲突故障,并且避免其他加入/离开请求被丢失。
在所述第一方面的第五种可能实现中,所述成员增加生效指令具体包括COMMIT指令。该方案介绍了成员增加生效指令的具体内容。
在所述第一方面的第六种可能实现中,所述地址包括:节点ID,节点IP地址和节点端口号中的至少一种或者多种的组合。该方案介绍了地址的可能形式,地址还可以是其他形式,可以作为节点的唯一标记即可。
在所述第一方面的第七种可能实现中,在所述主节点收到另一个加入请求后,所述主节点记录成员变更标记,所述成员变更标记包括:变更类型和所述第三目标节点的地址;当在所述主节点按照另一个加入请求,指令所述分布式系统中的所述多个从节点中的每个从节点把所述第一目标节点的地址增加到本地成员列表之前,所述主节点故障;所述分布式系统中的从节点选举出新的主节点;所述新的主节点读取所述变更标记后,按照所述变更类型,指令所述分布式系统中的从节点把所述第三目标节点的地址增加到本地成员列表。该方案使得:在主节点故障后,新选出的主节点可以重新执行成员增加/删除操作。从而保证了成员增加/删除请求的可靠执行。
在所述第一方面的第八种可能实现中,在所述第一方面的第二种可能实现的基础上,所述方法还可以包括:第二目标节点发送离开请求给所述主节点;在收到所述离开请求后,所述主节点发送成员删除指令给所有从节点,其中,成员离开指令包括所述第二目标节点的地址;每个从节点在收到所述成员删除指令后,发送成员删除响应给所述主节点;所述主节点在收到所述从节点的成员删除响应后,把所述第二目标节点的地址从所述节点地址集合中删除;所述主节点把所述第二目标节点地址从本地成 员列表中删除,所述主节点发送成员删除生效指令给所述每个从节点;所述每个从节点从所述本地成员列表中删除所述第二目标节点的地址。该方案介绍了如何从所述分布式系统中删除一个节点。
第二方面,提供一种节点集合,所述节点集合包括第一目标节点和分布式系统,所述分布式系统包括多个节点,所述分布式系统可以执行上述第一方面的方法,或者执行第一方面的各个可能实现。
第三方面,提供一种分布式系统成员变更方法,包括:第二目标节点向管理服务器请求节点地址集合,其中,所述分布式系统包括主节点和多个从节点,所述节点地址集合中包括所述分布式系统中所有节点的地址;当所述第二目标节点的地址在所述节点地址集合中,所述第二目标节点向所述分布式系统的所述主节点发送离开请求,所述离开请求中包括所述第二目标节点的地址;在收到所有所述从节点的所述离开请求后,所述主节点把所述第二目标节点的地址从所述主节点的本地成员列表中删除,以及指令所述分布式系统中的所述多个从节点中的每个从节点把所述第二目标节点的地址从本地成员列表中删除。删除流程可以由需要被删除的节点自行发起,不需要依赖运维人员手动配置主节点。并且删除流程未涉及日志,简洁高效、减少了系统资源的占用。
第三方面的第一种可能实现方式,所述主节点指令所述分布式系统中的所有从节点把所述第二目标节点的地址从本地成员列表中删除,具体包括:所述主节点发送成员删除指令给所述分布式系统中的所有从节点,其中,所述成员增加指令中包括所述第二目标节点的地址;收到所述成员删除指令的从节点发送成员删除响应给所述主节点;主节点确定收到所有从节点的所述成员删除响应后,把所述第二目标节点的地址从所述管理服务器的节点地址集合中删除,以及发送成员删除生效指令给所有从节点;所述从节点,在收到所述成员增加生效指令后,把所述第二目标节点的地址从本地成员列表中删除。该方案介绍了删除的具体流程,例如从节点和主节点的具体操作。
第四方面,提供一种分布式系统,所述分布式系统包括多个节点,所述分布式系统可以执行上述的第三方面的方法,或者执行第三方面的各个可能实现。
附图说明
图1是本发明分布式系统实施例拓扑图;
图2是在现有分布式系统中增加新成员的示意图;
图3是从现有分布式系统中删除已有成员的示意图;
图4是分布式系统成员变更方法流程图。
具体实施方式
分布式系统(也可以称为集群)包括多个节点,节点拥有计算能力,例如是计算机或者服务器。或者,节点是存储阵列的控制器。组成分布式系统的节点也被称为分布式系统的成员。按照节点的功能进行划分,可以把分布式系统的节点区分为主节点和从节点,主节点对从节点具有一定的管理功能。
参见附图1,分布式系统1包括:节点11、节点12、节点13、节点14以及节点15。其中节点11是主节点,其余节点是从节点。分布式系统2和管理服务器2通信。
本专利实施例中的分布式系统1可以是自选主的分布式系统。例如基于Paxos算法的分布式系统,或者基于zab算法,或者基于raft算法的分布式系统。所谓自选主,是指在主节点11故障后,不用依靠分布式系统外部的节点,依靠分布式系统内部的节点(也就是从节点12、从节点13、从节点14),可在这些从节点中自行选举出新的主节点。
由于分布式系统拥有多个(有时候甚至是数据巨大的)成员,因此涉及到节点上线和下线的问题。例如,分布式系统中的成员总数不足以满足业务的需求,那么就需要增加给分布式系统中增加新的成员,也就是上线新的成员;如果某个节点的可靠性降低或者出现了故障,或者节点总数远远大于需要的数量,就需要减少分布式系统中节点的数量,也就是下线已有成员。增加或者减少分布式系统中成员的数量,统称为分布式系统成员变更。参见附图2,节点16是所述分布式系统1之外的节点,需要增加到所述分布式系统1中,所述节点15和所述分布式系统1的整体可以称为节点集合。参见附图3,节点15是所述分布式系统1之外的节点,需要从所述分布式系统1中删除。
通常的两阶段日志法复杂度太高,而且必须由管理员手动在主节点上触发,既耗费了分布式系统大量的资源,又增加了管理员的负担。
如图4所示,本发明实施例提供一种分布式系统成员变更方法。
步骤11,第一目标节点(例如图2中的节点16)向管理服务器(管理服务器2)请求节点地址集合,其中,所述节点地址集合中包括了分布式系统中所有节点的地址。
第一目标节点是需要增加到分布式系统的节点。本步骤中,在不需要管理员参与的情况下,第一目标节点主动向管理服务器请求节点地址集合。
管理服务器可以是在分布式系统之外,也可以是分布式系统中具有存储功能的节点,只要拥有非易失性存储功能即可。例如,所述管理服务器也可以是所述第一目标节点。
节点地址集合中包括了分布式系统中所有节点的地址,节点地址用于区分不同的节点,可以视为节点的标签。例如,节点地址是节点ID,节点IP地址中的一种或者二者的组合,还可以是节点ID,节点IP地址和节点端口号的组合。
步骤12,第一目标节点在获取所述地址集合后,判断所述第一目标节点的地址是否在所述地址集合之中,从而确定所述第一目标节点是否是所述分布式系统已有的节点。
当第一目标节点的地址在所述节点地址集合中,意味着第一目标节点已经加入了所述分布式系统,退出本流程。
当所述第一目标节点的地址不在所述节点地址集合中,所述第一目标节点向分布式系统的主节点发送加入请求。所述加入请求包括所述第一目标节点的地址。
需要说明的是,所述第一目标节点向所述主节点发送加入请求的方式可以有多种,下面进行举例。
当所述第一目标节点无法获知所述主节点的地址,那么难以建立与所述主节点之间的点对点通信连接,那么可以通过使用节点地址集合向所述分布式系统中的所有节点发送广播的方式,把所述加入请求发送给所述主节点。还有另外一种方案,所述第 一目标节点把加入请求发给所述分布式系统中的一个或者多个节点,如果收到加入请求的节点不是主节点,就继续往下传递所述加入请求,直至主节点收到所述加入请求为止。
当所述第一目标节点可以获得所述主节点的地址(例如把所述主节点的地址预存在某服务器中,所述第一目标节点可以从这个服务器获得主节点的地址),那么在获得所述主节点的地址之后,可以直接发送所述加入请求给所述主节点。
步骤13,所述主节点收到所述加入请求后,发送成员增加指令给所述分布式系统中的所有从节点,以便把所述第一目标节点加入所述分布式系统。其中,所述成员增加指令中包括所述第一目标节点的地址。
所述分布式系统中的每个节点(包括主节点和成员节点)都拥有成员列表,成员列表用于记录所述分布式系统中的所有成员(或者记录除了本节点外的所有成员),记录的具体信息可以是成员地址。
所述主节点使用成员列表中的成员地址,可以把所述成员增加指令发送给所有成员。
步骤14,收到所述成员增加指令的从节点,在收到所述成员增加指令后,发送成员增加响应给所述主节点。所述成员增加响应是所述成员增加指令的响应消息,用于告知所述主节点自己已经成功接收到所述成员增加指令。本步骤中,成员增加操作并不会被执行(也就是说,此时第一目标节点还不会被“增加”),在步骤16中才会执行成员的增加。收到成员增加指令的从节点,可以对成员增加指令进行缓存,在缓存成功后再发送所述成员增加指令响应。
步骤15,主节点接收到所有从节点发送的所述成员增加指令响应后,,根据所述成员增加指令中包括的所述第一目标节点的地址,把第一目标节点的地址增加到位于主节点的成员列表。
另外,主节点确定接收所有从节点发送的所述成员增加指令响应后,还可以向从节点发送成员增加生效指令,该指令用于指示从节点执行在步骤14中所缓存的成员增加指令。具体的,所述成员增加生效指令可以是执行(commit)指令。
此外,所述主节点还可以把所述第一目标节点的地址发送给所述管理服务器,使得所述管理服务器将所述第一目标节点的地址写入所述节点地址集合中。所述第一目标节点的地址增加到所述管理服务器的节点地址集合中,也就是对所述节点地址集合进行更新。在所述节点地址集合更新后,如果后续有新的阶段需要加入或者离开所述分布式集群,可以通过查询更新后的成员地址集合来判断自己是否属于所述分布式集群。
步骤16,所述从节点,在收到所述成员增加生效指令后,执行第一目标节点的增加操作(例如执行缓存中缓存的所述成员增加指令),也就是把所述第一目标节点的地址增加到从节点自己的本地成员列表。
对于所述分布式系统中的某个节点来说,增加第一目标节点的地址到本地成员列表,意味着承认第一目标节点成为所述分布式系统的成员。
需要说明的是,本发明各个实施例中的“本地”是针对节点而言。例如:以任意一个命名为A节点的节点举例。A节点的本地成员列表归属于A节点,A节点的处理器 通过读取自己的成员列表可以获知A节点所在的分布式系统中除了A节点之外,还有哪些成员。这个成员列表可以存储在A节点上,某些情况下也可以存储在A节点外部。
步骤17,所述第一目标节点创建本地成员列表,所述第一目标节点创建的本地成员列表中包括:所述分布式系统中所有原有节点的地址。可选的,还可以进一步包括所述第一目标节点的地址。
在完成步骤15、16和17之后,所述第一目标节点创建了自己的本地成员列表,所述分布式系统中原有节点在内的所有节点更新了自己的本地成员列表,实现了所述第一目标节点加入所述分布式系统。由以上步骤可以看出,本发明实施例的方案提供了形成了一种成员“自动发现”的节点添加技术。新节点可以主动的添加分布式存储系统中,并且,原有分布式系统可以感知到新节点的加入。简化了运维人员在成员变更中的参与行为,使得整个成员变更过程更加自动化,智能化。
下面步骤18-23对从所述分布式系统中删除第二目标节点的过程进行介绍,删除所述第二目标节点的原因可以有多种,例如所述第二节点可靠性降低、所述分布式系统中资源过多、需要对所述第二目标节点进行更换等等。所述第二目标节点可以是所述分布式系统中除主节点外的任意节点(例如图3中的节点15)。例如可以是所述第一目标节点(节点16),当所述第二目标节点不是所述第一目标节点的情况下,删除节点的流程可以在增加节点的流程之前(或者之后)执行。也就是说,步骤11-17的流程和步骤18-23的流程是相对独立的,二者可以先后执行或者只执行任意一个,不并行执行即可。
步骤18,第二目标节点向管理服务器请求节点地址集合,其中,所述节点地址集合中包括了分布式系统中所有节点的地址。
第二目标节点是所述分布式系统中的需要下线的节点。本步骤中,在不需要管理员参与的情况下,第二目标节点主动向管理服务器请求节点地址集合。
步骤19,第二目标节点在获取所述地址集合后,判断所述第二目标节点的地址是否在所述地址集合之中。本步骤是可选的。
当第二目标节点的地址不在所述节点地址集合中,意味着第二目标节点已经不属于所述分布式系统,退出本流程。
当所述第二目标节点的地址在所述节点地址集合中,所述第二目标节点向分布式系统的主节点发送离开请求。所述离开请求中包括所述第二目标节点的地址。需要说明的是,在其他实施例中,也可以不经过步骤19的判断步骤,所述第二目标节点向分布式系统的主节点发送离开请求。在其他实施例中,判断是可选的,第二目标节点可以直接发送离开请求给主节点。
需要说明的是,所述第二目标节点向所述主节点发送离开请求的方式可以有多种,例如单播、组播、广播,只要能够把离开请求发送到主节点即可。
步骤20,所述主节点收到所述离开请求后,发送成员删除指令给所述分布式系统中的所有从节点,以便把所述第二目标节点离开所述分布式系统。其中,所述成员删除指令中包括所述第二目标节点的地址。
所述主节点使用成员列表中的成员地址,可以把所述成员删除指令发送给所有成 员。
步骤21,收到所述成员删除指令的从节点,在收到所述成员删除指令后,发送成员删除响应给所述主节点,以便告知所述主节点自己已经成功接收到所述成员删除指令。所述成员删除响应是所述成员删除指令的响应消息,用于告知所述主节点自己已经成功接收到所述成员删除指令。本步骤中,成员删除指令可以缓存在从节点的内存中,删除操作在本步骤中并不会被执行(也就是说,此时所述第二目标成员还不会被“删除”),在步骤23中才会被执行。
收到成员删除指令的从节点,可以对成员删除指令进行缓存,在缓存成功后再发送所述成员删除指令。
步骤22,主节点收到所有从节点的所述成员删除响应后:发送成员删除生效指令给所有从节点。
此外,所述主节点还把所述第二目标节点的地址从主节点的成员列表中删除。
所述主节点还可以指示所述管理服务器把所述第二目标节点的地址从所述管理服务器的节点地址集合中删除。在所述节点地址集合更新后,如果后续有新的阶段需要离开或者离开所述分布式集群,可以通过查询更新后的成员地址集合来判断自己是否属于所述分布式集群。
其中,所述成员删除生效指令可以是执行(commit)指令,发送成员删除生效指令是用于指示从节点执行在步骤14中所缓存的成员删除指令。
步骤23,所述从节点,在收到所述成员删除生效指令后,执行读第二目标节点的删除操作(例如执行缓存中缓存的所述成员删除指令),也就是把所述第二目标节点的地址从各个从节点的本地成员列表中删除。
对于所述分布式系统中的某个节点来说,把第二目标节点的地址从本地成员列表中删除,意味着第二目标节点退出所述分布式系统,不再是所述分布式系统的成员。
执行完步骤22、23之后,所述第二目标节点不再是所述分布式系统的成员。
步骤24,可选的所述第二目标节点可以删除其本地成员列表。
由以上步骤18-24可以看出,本发明实施例的方案提供了了一种成员“主动删除自己”的节点删除技术。分布式系统中的节点可以主动把自己排除在所述分布式存储系统之外。并且,所述分布式系统可以感知到节点的删除请求并进行响应。简化了运维人员在成员变更中的参与行为,使得整个成员变更过程更加自动化,智能化。
可选的,在所述主节点收到所述加入请求后,且所述主节点发送成员增加生效指令给所有成员节点之前;或者,在所述主节点收到所述离开请求后,且所述主节点发送成员删除生效指令给所有成员节点之前,如果所述主节点收到新的加入请求或者新的离开请求,可以进行缓存。以避免同时执行两个成员变更请求。
可选的,在所述主节点收到某个加入请求后,且所述主节点发送成员增加生效指令给所有成员节点之前;或者,在所述主节点收到所述某个离开请求后,且所述主节点发送成员删除生效指令给所有成员节点之前,还可以在所述管理服务器中设置成员变更标记。成员变更标记用于记录变更的类型(增加或者删除),以及需要变更的节点的地址。如果在上述过程中主节点故障后,依靠成员变更标记所记录的变更类型和需要变更的节点的地址,新选出的主节点可以重新执行成员变更。换言之:在所述主 节点收到加入/删除请求后,所述主节点记录成员变更标记。所述成员变更标记包括:变更类型(加入类型/删除类型)和需要增加的成员的地址(例如第三目标节点的地址);当在指令所述分布式系统中的所有从节点把所述第三目标节点的地址增加到本地成员列表之前,所述主节点故障;所述分布式系统选举出新的主节点;所述新的主节点读取所述变更标记后重新把所述第一目标节点的地址增加到所述的节点地址集合中,以及指令所述分布式系统中的所有从节点把所述第三目标节点的地址增加到本地成员列表。在主节点故障后,可以依据从节点之间的协商,选举出新的主节点,例如选择负载最小的从节点作为新的主节点,或者选择加入集群最早的从节点作为新的主节点。
根据以上原理,例如:在执行步骤13-17之中任一步骤时,出现主节点故障后,那么,根据成员变更标记所记录的信息,所述新的主节点可以重新从步骤13开始执行(发送成员增加指令给所述分布式系统中的所有从节点),以便重新执行步骤13-17,从而实现把所述第一目标节点增加到所述分布式系统。类似的,在执行步骤18-23的过程中,如果出现主节点故障,那么,依靠成员变更标记,新的主节点可以重新从步骤18开始执行(发送成员增加指令给所述分布式系统中的所有从节点),从而重新执行步骤18-23。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (17)

  1. 一种分布式系统成员变更方法,其特征在于,包括:
    第一目标节点向管理服务器请求节点地址集合,其中,所述节点地址集合中包括所述分布式系统中所有节点的地址,所述分布式系统主节点和多个从节点;
    当第一目标节点的地址不在所述节点地址集合中,所述第一目标节点向分布式系统的所述主节点发送加入请求,所述加入请求中包括所述第一目标节点的地址;
    在收到所述加入请求后,把所述成员增加指令中的所述第一目标节点的地址增加到所述主节点的本地成员列表中,所述主节点指令所述分布式系统中每个节点把所述第一目标节点的地址增加到本地成员列表。
  2. 根据权利要求1所述的方法,还包括:
    所述第一目标节点创建所述第一目标节点的本地成员列表,所述第一目标节点的本地成员列表包括:所述分布式系统中所有节点的地址,以及所述第一目标节点的地址。
  3. 根据权利要求1或2所述的方法,所述主节点指令所述分布式系统中的所述多个从节点中的每个从节点把所述第一目标节点的地址增加到所述本地成员列表,具体包括:
    所述主节点发送成员增加指令给所述每个从节点,其中,所述成员增加指令中包括所述第一目标节点的地址;
    所述主节点接收所述从节点对发送的成员增加响应,所述成员增加响应是所述成员增加指令的响应消息;
    所述主节点把所述第一目标节点的地址增加到本地成员列表,以及所述主节点发送成员增加生效指令给所有从节点,指示所有从节点在收到所述成员增加生效指令后,按照所述成员增加响应把所述第一目标节点的地址增加到本地成员列表。
  4. 根据权利要求3所述的方法,所述成员增加生效指令具体包括COMMIT指令。
  5. 根据权利要求3所述的方法,所述主节点接收所述从节点对所述成员发送的成员增加响应之后,还包括:
    所述主节点指示所述管理服务器把所述第一目标节点的地址增加所述管理服务器的所述节点地址集合中。
  6. 根据权利要求1所述的方法,所述第一目标节点向所述主节点发送所述加入请求,包括:
    所述第一目标节点通过向所述节点地址集合中的所有地址广播所述加入请求,从而把所述加入请求发送给所述主节点。
  7. 根据权利要求1所述的方法,第一目标节点向管理服务器请求节点地址集合之后,所述方法还包括下述步骤中的至少一个:
    所述主节点缓存后续收到的其他加入请求;
    所述主节点缓存收到的离开请求。
  8. 根据权利要求1所述的方法,所述地址包括:
    节点ID,节点IP地址和节点端口号中的至少一种或者多种的组合。
  9. 根据权利要求1-8中任一所述的方法,所述方法还包括:
    第二目标节点发送离开请求给所述主节点;
    在收到所述离开请求后,所述主节点发送成员删除指令给所述多个从节点中的每个从节点,其中,所述成员删除指令包括所述第二目标节点的地址;
    每个从节点在收到所述成员删除指令后,发送成员删除响应给所述主节点;
    所述主节点收到所述每个从节点的成员删除响应后,把所述第二目标节点的地址从所述节点地址集合中删除;所述主节点把所述第二目标节点地址从本地成员列表中删除,所述主节点发送成员删除生效指令给所述每个从节点;所述每个从节点从所述本地成员列表中删除所述第二目标节点的地址。
  10. 根据权利要求1所述的方法,在所述主节点指令所述分布式系统中每个节点把所述第一目标节点的地址增加到本地成员列表之后,还包括:
    在所述主节点收到另一个加入请求后,所述主节点记录成员变更标记,所述成员变更标记包括:变更类型和所述第三目标节点的地址;
    当在所述主节点按照另一个加入请求,指令所述分布式系统中的所述多个从节点中的每个从节点把所述第一目标节点的地址增加到本地成员列表之前,所述主节点故障;
    所述分布式系统中的从节点选举出新的主节点;
    所述新的主节点读取所述变更标记后,按照所述变更类型,指令所述分布式系统中的从节点把所述第三目标节点的地址增加到本地成员列表。
  11. 一种节点集合,所述节点集合包括第一目标节点和分布式系统,所述分布式系统包括主节点和多个从节点,其特征在于:
    所述第一目标节点用于:向所述管理服务器请求节点地址集合,其中,所述节点地址集合中包括所述分布式系统中所有节点的地址;当所述第一目标节点的地址不在所述节点地址集合中,所述第一目标节点还用于:向所述主节点发送加入请求;
    所述主节点用于:在所述主节点收到所述加入请求后,把所述第一目标节点的地址增加到所述主节点的本地成员列表中,以及指令所述分布式系统中的所述多个从节点中的每个从节点把所述第一目标节点的地址增加到本地成员列表。
  12. 根据权利要求11所述的节点集合,其中,所述第一目标节点还用于:
    创建所述第一目标节点的本地成员列表,所述第一目标节点的本地成员列表包括:所述分布式系统中所有节点的地址,以及所述第一目标节点的地址。
  13. 根据权利要求11所述的节点集合,其中,所述主节点具体用于:
    发送成员增加指令给所述每个从节点,其中,所述成员增加指令中包括所述第一目标节点的地址;
    接收所述从节点发送的成员增加响应,所述成员增加响应是所述成员增加指令的响应消息;
    发送成员增加生效指令给所有从节点,指示所有从节点在收到所述成员增加生效指令后,把所述成员增加指令中的第一目标节点的地址增加到本地成员列表。
  14. 一种分布式系统成员变更方法,其特征在于,包括:
    第二目标节点向管理服务器请求节点地址集合,其中,所述分布式系统包括主节点和多个从节点,所述节点地址集合中包括所述分布式系统中所有节点的地址;
    当所述第二目标节点的地址在所述节点地址集合中,所述第二目标节点向所述分布式系统的所述主节点发送离开请求,所述离开请求中包括所述第二目标节点的地址;
    在收到所有所述从节点的所述离开请求后,所述主节点把所述第二目标节点的地址从所述主节点的本地成员列表中删除,以及指令所述分布式系统中的所述多个从节点中的每个从节点把所述第二目标节点的地址从本地成员列表中删除。
  15. 根据权利要求14所述的方法,所述主节点指令所述分布式系统中的所述多个从节点中的每个从节点把所述第二目标节点的地址从本地成员列表中删除,具体包括:
    所述主节点发送成员删除指令给所述分布式系统中的所有从节点,其中,所述成员删除指令中包括所述第二目标节点的地址;
    在收到所有从节点对所述成员删除指令的成员删除响应后,所述主节点把所述第二目标节点的地址从所述主节点的本地成员列表中删除,以及发送成员删除生效指令给所有从节点,以便指令
    所述从节点通过执行所述成员删除指令,把所述第二目标节点的地址从本地成员列表中删除。
  16. 一种分布式系统,所述分布式系统包括主节点和多个从节点,所述多个从节点中包括第二目标节点,其特征在于:
    所述第二目标节点,用于向管理服务器请求节点地址集合,其中,所述节点地址集合中包括分布式系统中所有节点的地址,所述分布式系统包括主节点和多个从节点;
    当第二目标节点的地址在所述节点地址集合中,所述第二目标节点还用于向所述分布式系统的所述主节点发送离开请求,所述离开请求中包括所述第二目标节点的地址;
    所述主节点用于,在收到所述离开请求后,把所述第二目标节点的地址从所述主节点的本地成员列表中删除,以及指令所述分布式系统中的所述多个从节点中的每个从节点把所述第二目标节点的地址从本地成员列表中删除。
  17. 根据权利要求16所述的分布式系统,所述主节点具体用于:
    发送成员删除指令给所述分布式系统中的所有从节点,其中,所述成员删除指令中包括所述第二目标节点的地址;
    在收到所有从节点的成员删除响应后,把所述第二目标节点的地址从所述主节点的本地成员列表中删除;以及发送成员删除生效指令给所有从节点,以便指令
    所述从节点通过执行所述成员删除指令把所述第二目标节点的地址从本地成员列表中删除,其中,所述成员删除响应是所述成员删除指令的响应消息。
PCT/CN2019/076844 2018-06-30 2019-03-04 分布式系统成员变更方法和分布式系统 WO2020001060A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19827288.2A EP3817290A4 (en) 2018-06-30 2019-03-04 ELEMENT REPLACEMENT PROCEDURE FOR DISTRIBUTED SYSTEM AND DISTRIBUTED SYSTEM
US17/125,318 US11445013B2 (en) 2018-06-30 2020-12-17 Method for changing member in distributed system and distributed system

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810703094.7A CN110661637A (zh) 2018-06-30 2018-06-30 分布式系统成员变更方法和分布式系统
CN201810703094.7 2018-06-30

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/125,318 Continuation US11445013B2 (en) 2018-06-30 2020-12-17 Method for changing member in distributed system and distributed system

Publications (1)

Publication Number Publication Date
WO2020001060A1 true WO2020001060A1 (zh) 2020-01-02

Family

ID=68986036

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/076844 WO2020001060A1 (zh) 2018-06-30 2019-03-04 分布式系统成员变更方法和分布式系统

Country Status (4)

Country Link
US (1) US11445013B2 (zh)
EP (1) EP3817290A4 (zh)
CN (1) CN110661637A (zh)
WO (1) WO2020001060A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112671601A (zh) * 2020-12-11 2021-04-16 航天信息股份有限公司 一种基于Zookeeper的接口监控系统及方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112860393B (zh) * 2021-01-20 2024-03-15 北京科技大学 一种分布式任务调度方法及系统
CN113660350A (zh) * 2021-10-18 2021-11-16 恒生电子股份有限公司 分布式锁协调方法、装置、设备及存储介质
CN116185697B (zh) * 2023-05-04 2023-08-04 苏州浪潮智能科技有限公司 容器集群管理方法、装置、系统、电子设备及存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877858B (zh) * 2010-06-24 2012-09-26 四川平安都市通讯科技有限公司 一种基于无线分布式系统的组网方法
CN102984267A (zh) * 2012-12-07 2013-03-20 北京搜狐新媒体信息技术有限公司 一种实现分布式缓存节点动态更新到客户端的方法及系统
US20130232118A1 (en) * 2010-07-07 2013-09-05 Microsoft Corporation Shared log-structured multi-version transactional datastore with metadata to enable melding trees
CN107222520A (zh) * 2017-04-25 2017-09-29 天津大学 基于定向扩散算法的分布式系统中间件

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003223444A (ja) * 2002-01-30 2003-08-08 Fuji Photo Film Co Ltd コンピュータ装置およびコンピュータ装置を制御するプログラム
US8867996B2 (en) * 2011-01-14 2014-10-21 Alcatel Lucent Area tracking systems and methods of tracking electronic devices
CN105656653B (zh) * 2014-11-14 2019-07-19 华为技术有限公司 分布式协调系统中新增节点的入网方法、装置和系统
CN106712981B (zh) * 2015-07-23 2020-03-06 阿里巴巴集团控股有限公司 一种节点变更通知方法及装置
CN106911728B (zh) * 2015-12-22 2019-11-29 华为技术服务有限公司 分布式系统中主节点的选取方法和装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877858B (zh) * 2010-06-24 2012-09-26 四川平安都市通讯科技有限公司 一种基于无线分布式系统的组网方法
US20130232118A1 (en) * 2010-07-07 2013-09-05 Microsoft Corporation Shared log-structured multi-version transactional datastore with metadata to enable melding trees
CN102984267A (zh) * 2012-12-07 2013-03-20 北京搜狐新媒体信息技术有限公司 一种实现分布式缓存节点动态更新到客户端的方法及系统
CN107222520A (zh) * 2017-04-25 2017-09-29 天津大学 基于定向扩散算法的分布式系统中间件

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112671601A (zh) * 2020-12-11 2021-04-16 航天信息股份有限公司 一种基于Zookeeper的接口监控系统及方法
CN112671601B (zh) * 2020-12-11 2023-10-31 航天信息股份有限公司 一种基于Zookeeper的接口监控系统及方法

Also Published As

Publication number Publication date
US20210136145A1 (en) 2021-05-06
EP3817290A4 (en) 2021-07-21
CN110661637A (zh) 2020-01-07
US11445013B2 (en) 2022-09-13
EP3817290A1 (en) 2021-05-05

Similar Documents

Publication Publication Date Title
WO2020001060A1 (zh) 分布式系统成员变更方法和分布式系统
US11265216B2 (en) Communicating state information in distributed operating systems
US10674486B2 (en) System, security and network management using self-organizing communication orbits in distributed networks
US20200358848A1 (en) Methods, systems, and media for providing distributed database access during a network split
US11924044B2 (en) Organizing execution of distributed operating systems for network devices
US10979286B2 (en) Method, device and computer program product for managing distributed system
US11316775B2 (en) Maintaining coherency in distributed operating systems for network devices
WO2016070375A1 (zh) 一种分布式存储复制系统和方法
US7024483B2 (en) System and method for topology manager employing finite state automata for dynamic cluster formation
US10657119B1 (en) Fleet node management system
US7139925B2 (en) System and method for dynamic cluster adjustment to node failures in a distributed data system
US11294934B2 (en) Command processing method and server
WO2017152860A1 (zh) 一种心跳信息发送方法、装置及心跳发送节点
US9846624B2 (en) Fast single-master failover
CN106230622B (zh) 一种集群实现方法及装置
US7805503B2 (en) Capability requirements for group membership
US20160100006A1 (en) Transferring Data Between Sites
CN109992447B (zh) 数据复制方法、装置及存储介质
US20220358118A1 (en) Data synchronization in edge computing networks
CN114090342A (zh) 存储容灾的链路管理方法及消息执行节点、存储控制集群
WO2015045589A1 (ja) フォールトトレラントシステム及びフォールトトレラントシステム制御方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19827288

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2019827288

Country of ref document: EP

Effective date: 20201204

NENP Non-entry into the national phase

Ref country code: DE