WO2021078294A1 - Service coordination method and apparatus for distributed storage system, and electronic device - Google Patents

Service coordination method and apparatus for distributed storage system, and electronic device Download PDF

Info

Publication number
WO2021078294A1
WO2021078294A1 PCT/CN2020/123516 CN2020123516W WO2021078294A1 WO 2021078294 A1 WO2021078294 A1 WO 2021078294A1 CN 2020123516 W CN2020123516 W CN 2020123516W WO 2021078294 A1 WO2021078294 A1 WO 2021078294A1
Authority
WO
WIPO (PCT)
Prior art keywords
control server
service coordination
coordination device
servers
main control
Prior art date
Application number
PCT/CN2020/123516
Other languages
French (fr)
Chinese (zh)
Inventor
黎海兵
Original Assignee
北京金山云网络技术有限公司
北京金山云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京金山云网络技术有限公司, 北京金山云科技有限公司 filed Critical 北京金山云网络技术有限公司
Publication of WO2021078294A1 publication Critical patent/WO2021078294A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L65/00Network arrangements, protocols or services for supporting real-time applications in data packet communication
    • H04L65/40Support for services or applications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • H04L67/1046Joining mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • H04L67/1048Departure or maintenance mechanisms
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/104Peer-to-peer [P2P] networks
    • H04L67/1044Group management mechanisms 
    • H04L67/1051Group master selection mechanisms

Definitions

  • This application relates to the field of distributed storage technology, and more specifically, to a service coordination method of a distributed storage system, a service coordination device of a distributed storage system, an electronic device, and a distributed storage system.
  • Distributed storage is a storage solution that distributes data to multiple independent devices.
  • the distributed network storage system adopts an expandable system structure and utilizes multiple storage servers to share the storage load. It not only improves the reliability, availability, and access efficiency of the system, it is also easy to expand.
  • a dedicated control server usually coordinates the storage of data on multiple data servers. Metadata used to describe data attributes is stored in the control server, which can realize functions such as storage location records, historical data records, and resource search.
  • the number of control servers is usually multiple, and the main control server provides control services, and the other control servers serve as backups.
  • the distributed application coordination service can be used to coordinate the operation of multiple control servers, such as notifying multiple control servers to perform master selection operations.
  • existing coordination schemes are prone to unowned, dual-master, etc., which affect the stability of the distributed storage system.
  • One purpose of this application is to provide a new technical solution for service coordination of a distributed storage system.
  • a service coordination method for a distributed storage system includes a service coordination device and a plurality of control servers.
  • the method is implemented by any of the control servers.
  • the method includes: sending a query request to the service coordination device to obtain a query result, the query result indicating whether the multiple control servers include a main control server; and determining whether to send to other control servers according to the query result Sending a master selection instruction, where the master selection instruction is used to determine a master control server from the plurality of control servers.
  • the sending a query request to the service coordination device to obtain a query result, the query result characterizing whether a main control server is included in the plurality of control servers includes: using a preset first At a time interval, sending a query request to the service coordination device; receiving a query result sent from the service coordination device in response to the query request, the query result indicating whether the multiple control servers include a main control server; Wherein, the first time interval is less than a preset session timeout duration of the service coordination device.
  • the query result is whether the service coordination device has recorded the identification of the main control server.
  • the method further includes: when the control server is the master control server, periodically sending a connection request to the service coordination device; if the service coordination device is not received within a set time window In response to the connection request, the service provided as the main control server is stopped.
  • periodically sending a connection request to the service coordination device includes: sending a connection request to the service coordination device at a preset second time interval Sending the connection request; wherein the second time interval is less than a preset session timeout duration of the service coordination device.
  • the set time window is less than the set session timeout duration of the service coordination device.
  • the determining whether to send a master selection instruction to other control servers according to the query result, the master selection instruction being used to determine a master control server from the plurality of control servers includes: When the query result indicates that the main control server is not included in the plurality of control servers, a master selection instruction is sent to other control servers to determine a main control server from the plurality of control servers.
  • the method further includes: after determining a main control server from the plurality of control servers, sending the determined identification of the main control server to the service coordination device.
  • a service coordination method for a distributed storage system includes a service coordination device and a plurality of control servers that implement the method described in the first aspect of the present application.
  • the method includes: receiving a query request sent by the control server; in response to the query request, obtaining a query result, the query result indicating whether the multiple control servers contain a master Control server; sending the query result to the control server.
  • the query result is whether the server coordination device has recorded the identification of the main control server.
  • the method further includes: receiving a connection request periodically sent by a main control server of the plurality of control servers; and sending a response message for the connection request to the main control server.
  • the method further includes: receiving the determined identification of the main control server sent by the control server; and recording the identification of the main control server.
  • the service coordination device provides coordination services based on a distributed application coordination service (Zookeeper).
  • Ziookeeper distributed application coordination service
  • a service coordination device for a distributed storage system.
  • the distributed storage system includes a service coordination device and a plurality of control servers.
  • the device is applied to any of the control servers and includes:
  • the query module is configured to send a query request to the service coordination device to obtain a query result, and the query result represents whether a main control server is included in the plurality of control servers;
  • the judgment module is configured to determine according to the query result Whether to send a master selection instruction to other control servers, where the master selection instruction is used to determine a master control server from the multiple control servers.
  • the query module when the query module sends a query request to the service coordination device to obtain a query result, and the query result indicates whether a main control server is included in the multiple control servers, the query module is set to: At a preset first time interval, a query request is sent to the service coordination device; a query result sent from the service coordination device in response to the query request is received, and the query result represents whether the multiple control servers include There is a main control server; wherein, the first time interval is less than a preset session timeout duration of the service coordination device.
  • the query result is whether the service coordination device has recorded the identification of the main control server.
  • the device further includes a connection detection module, the connection detection module is configured to: when the control server is the main control server, periodically send a connection request to the service coordination device; In the window, if the response of the service coordination device to the connection request is not received, the service provided as the main control server is stopped.
  • connection detection module when the connection detection module periodically sends a connection request to the service coordination device when the control server is the main control server, it is set to: send a connection request to the service coordination device at a preset second time interval.
  • the service coordination device sends the connection request; wherein, the second time interval is less than a preset session timeout duration of the service coordination device.
  • the set time window is less than the set session timeout duration of the service coordination device.
  • the judgment module determines whether to send a master selection instruction to other control servers according to the query result, it is set to: when the query result indicates that the plurality of control servers does not contain a master When the server is controlled, a master selection instruction is sent to the other control servers to determine a master control server from the multiple control servers.
  • the device further includes an identification sending module, the identification sending module is configured to: after determining a master control server from the plurality of control servers, send the determined master to the service coordination device The ID of the control server.
  • a service coordination device for a distributed storage system, the distributed storage system including a service coordination device and a plurality of control servers that implement the method described in the first aspect of the present application, the device Applied to the service coordination device, it includes: a first receiving module configured to receive a query request sent by the control server; a result obtaining module configured to obtain a query result in response to the query request, and the query result represents the query result.
  • the multiple control servers include a main control server; the first sending module is configured to send the query result to the control server.
  • the query result is whether the server coordination device has recorded the identification of the main control server.
  • the device further includes a second receiving module and a second sending module: the second receiving module is configured to receive a connection request periodically sent by a main control server among the plurality of control servers; The second sending module is configured to send a response message for the connection request to the main control server.
  • the device further includes a third receiving module and a recording module: the third receiving module is configured to receive the determined identification of the main control server sent by the control server; the recording module is configured to record The identifier of the main control server.
  • the service coordination device provides coordination services based on a distributed application coordination service (Zookeeper).
  • Ziookeeper distributed application coordination service
  • an electronic device including a processor and a memory, the memory storing machine executable instructions that can be executed by the processor, and the processor executing the machine executable instructions In order to realize the service coordination method of the distributed storage system described in the first aspect or the second aspect of the present application.
  • a distributed storage system including a user agent server, multiple storage servers, multiple control servers that implement the method described in the first aspect of the present application, and implement the second aspect of the present application.
  • the service coordination device of the method wherein the control server is in communication connection with the user agent server, the plurality of storage servers, and the service coordination device respectively.
  • control server actively inquires whether the main control server is included in the multiple control servers, and determines whether to send the master selection instruction to other control servers according to the query result, which can avoid the situation that the system has no master and improve the system The stability.
  • FIG. 1 shows a schematic diagram of the hardware configuration of a distributed storage system that can be used to implement the embodiments of the present application.
  • Figure 2 shows a schematic structural diagram of a server that can be used to implement the embodiments of the present application.
  • Fig. 3 shows a flowchart of a service coordination method of a distributed storage system according to an embodiment of the present application.
  • Fig. 4 shows a flow chart of a specific example of the implementation of the service coordination method of the distributed storage system according to the embodiment of the present application.
  • Figure 1 shows a schematic structural diagram of a distributed storage system that can be used to implement embodiments of the present application.
  • the distributed storage system 100 includes a user agent server 1000, a storage server 2000, a control server 3000, and a service coordination device 4000.
  • the number of storage servers 2000 and control servers 3000 are both multiple (two or more).
  • the storage server 2000 is set to store target data.
  • the user proxy server 1000 is configured to receive a data read and write request for target data sent by the user terminal, and forward the data read and write request to the control server 3000.
  • the control server 3000 is configured to query the storage server 2000 corresponding to the target data from the metadata stored by itself, and return the identification information of the storage server 2000 to the user agent server 1000.
  • the user agent server 1000 interacts with the corresponding storage server 2000 according to the identification information to complete the read and write operations on the target data.
  • the service coordination device 4000 is configured to coordinate the operation of multiple control servers 3000, such as assigning an identity to the control server 3000, notifying the control server 3000 to elect a master control server, and so on.
  • the service coordination device 4000 is, for example, an electronic device installed with distributed application coordination software.
  • the distributed application coordination service can be arranged in multiple control servers 3000.
  • the multiple control servers 3000 can implement their own services based on the distributed application coordination service. Coordination, no additional service coordination equipment 4000 is needed.
  • the user agent server 1000, the storage server 2000, the control server 3000, and the service coordination device 4000 may communicate with each other through a wired network or a wireless network.
  • the control server and the service coordination device are actually communications between different processes in the same device.
  • the user agent server 1000, the storage server 2000, the control server 3000, and the service coordination device 4000 all have the hardware configuration of the server 1100 as shown in FIG. 2.
  • the server 1100 may include a processor 1110, a memory 1120, an interface device 1130, a communication device 1140, a display device 1150, and an input device 1160.
  • the processor 1110 may be, for example, a central processing unit CPU or the like.
  • the memory 1120 includes, for example, ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory such as a hard disk, and the like.
  • the interface device 1130 includes, for example, a USB interface, a serial interface, and the like.
  • the communication device 1140 can perform wired or wireless communication, for example.
  • the display device 1150 is, for example, a liquid crystal display.
  • the input device 1160 may include, for example, a touch screen, a keyboard, and the like.
  • the memory 1120 of the server 1100 is configured to store instructions, which are used to control the processor 1110 to operate to support the implementation of the service coordination method according to any embodiment of this specification.
  • Technicians can design instructions according to the scheme disclosed in this specification. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.
  • server 1100 in the embodiment of this specification may only involve some of the devices, for example, only the processor 1110, the memory 1120, and the Communication device 1140.
  • the file batch comparison system 1000 shown in FIG. 1 is only for explanatory purposes, and is by no means intended to limit this specification, its application or purpose.
  • This embodiment provides a service coordination method for a distributed storage system.
  • the method is implemented by, for example, any control server 3000 in FIG. 1. As shown in Figure 3, the method includes the following steps S1100-S1200:
  • Step S1100 Send a query request to the service coordination device to obtain a query result.
  • the query result represents whether a main control server is included in the multiple control servers.
  • the distributed storage system includes multiple control servers.
  • One main control server is selected from multiple control servers through elections, etc., and the main control server provides control services, and other control servers serve as backups.
  • control server sends a registration request to the service coordination device during the initial startup phase.
  • service coordination device assigns a unique identity to each control server according to preset rules.
  • the service coordination device coordinates the master selection process and records the identification of the master control server.
  • the query request is sent to the server coordination device.
  • the service coordination device may be inquiring of the current multiple control servers, which one has the master control server identifier and the status of the master control server.
  • the system state of the current distributed storage system may include: a plurality of control servers include the system state of the main control server (hereinafter referred to as the main state, the main state includes, for example, one main control server, or two main control servers).
  • the control server) and multiple control servers do not include the system state of the main control server (hereinafter referred to as the lack of main state).
  • control server actively sends a query request to the service coordination device to obtain the query result.
  • the query result indicates whether the main control server is included in the multiple control servers.
  • step S1100 further includes: sending a query request to the service coordination device at a preset first time interval; receiving a query result sent from the service coordination device in response to the query request, and the query result represents a plurality of control servers Whether a main control server is included; wherein, the first time interval is less than the preset session timeout duration of the service coordination device.
  • a long connection is maintained between the control server and the service coordination device.
  • the service coordination device receives a message (such as a heartbeat packet) sent by the control server that it is in an active state, The service coordination device judges that the connection is in a normal state, and continues to monitor the connection state between the two.
  • control server periodically queries the current system status at a preset first time interval.
  • the first time interval is less than the session timeout duration set by the service coordination device, which is beneficial to avoid session expiration.
  • the query result is whether the service coordination device records the identification of the main control server.
  • the identification of the main control server is, for example, the IP or other identification of the main control server, or the unique identification assigned by the service coordination device (for example, assign a specific character string as the identification, or add a character string to the IP address of the main control server) As an identity, etc.).
  • the process of the control server inquiring whether the current system state is the missing master state includes: requesting the service coordination device to provide the current system state; and determining the current system state according to the current system state provided by the service coordination device in response to the request.
  • the service coordination device records that the current system is in a master state or an unowned state, and the identity of the current master control server in the master state. After the control server sends a request to provide the current system state to the service coordination device, it determines whether the current system is in the active state or the unmaintained state according to the received information returned by the service coordination device that the current system is in the active state or the unactive state.
  • Step S1200 Determine whether to send a master selection instruction to other control servers according to the query result.
  • the master selection instruction is used to determine a master control server from a plurality of control servers.
  • the control server when the query result indicates that the main control server is not included in the plurality of control servers, the control server sends a master selection instruction to other control servers to determine a main control server from the plurality of control servers.
  • control server actively sends the master selection instruction when the current system state queried is the master lack state, thereby initiating the master selection operation of co-electing the master control server with other control servers, without waiting for notification from the service coordination device Then initiate the master election operation.
  • control servers other than the main control server are in a non-election state, and may only communicate with the service coordination device, without communicating with other control servers.
  • the control server other than the master control server enters the election state, communicates with other control servers, and initiates the master election operation.
  • the process of selecting the master is, for example: each control server sends a vote and receives votes from other control servers; processes and counts votes according to preset election rules; updates its own status according to the election results, for example, Update its own status as master control server or non-master (slave) control server.
  • multiple control servers vote according to the principle that the one with the smallest identity is elected to select the main control server.
  • control servers can only communicate with the service coordination device, without mutual communication.
  • the communication between the control server and the service coordination device communication is manifested as the communication between the process corresponding to the control service and the process corresponding to the coordination service.
  • the control server actively inquires whether the main control server is included in the multiple control servers, and determines whether to send the master selection instruction to other control servers according to the query result, which is beneficial to avoid system failure. The case of the Lord.
  • control servers can actively initiate the master selection based on the queried system status, thereby avoiding the system having no master status.
  • the service coordination method of the distributed storage system further includes: when the control server is the main control server, periodically sending a connection request to the service coordination device; if within the set time window, the service coordination device does not receive In response to the connection request, the service provided as the main control server is stopped.
  • the main control server actively connects with the service coordination device, for example, actively sends a request to obtain the connection status, and judges whether the connection is successful according to whether the service coordination device returns a corresponding message . If all connections within the set time window fail, then the main control server actively stops the service provided as the main control server, that is, withdraws from the node position of the main control server. In this way, it is helpful to avoid the dual-master state of the system.
  • the current main control server of the system can work normally, but the communication connection with the service coordination device is interrupted.
  • the service coordination device clears the recorded identity of the main control server and notifies other control servers to restart Initiate the election of the master. After re-election of the master, there will be two master control servers in the system.
  • the original main control server cannot successfully connect with the service coordination device, it will actively withdraw from the position of the master node, thereby avoiding the dual-master state of the system.
  • the process of the main control server actively connecting to the service coordination device includes: actively connecting to the service coordination device at a preset second time interval, where the second time interval is less than the session timeout duration set by the service coordination device .
  • the main control server actively connects to the service coordination device at a preset second time interval.
  • the second time interval is less than the session timeout duration set by the service coordination device, which is beneficial to avoid the situation that the session between the main control server and the service coordination device expires.
  • the above-mentioned set time window is also smaller than the session timeout duration set by the service coordination device.
  • the service coordination method of the distributed storage system further includes: after determining a master control server from the plurality of control servers, sending the determined identification of the master control server to the service coordination device.
  • the identifier of the main control server is, for example, the IP or other identifiers of the main control server, such as a unique identifier assigned by the service coordination device. In this way, it is beneficial for the service coordination device to obtain the identification of the re-selected main control server in time, thereby maintaining the normal operation of the distributed storage system.
  • This embodiment also provides another service coordination method for a distributed storage system.
  • the method is implemented by, for example, the service coordination device 4000 in FIG. 1, or when the distributed application coordination service is arranged in multiple control servers, Implemented by multiple control servers.
  • the method includes the following steps: receiving a query request sent by a control server; in response to the query request, obtaining a query result, the query result indicating whether a main control server is included in a plurality of control servers; and sending the query result to the control server.
  • the query result is whether the server coordination device records the identification of the main control server.
  • the identifier of the main control server is, for example, the IP or other identifiers of the main control server, such as a unique identifier assigned by the service coordination device.
  • the service coordination method of the distributed storage system further includes: receiving a connection request periodically sent by a main control server among the multiple control servers; and sending a response message for the connection request to the main control server.
  • the main control server actively connects with the service coordination device, for example, actively sends a request to obtain the connection status, and judges whether the connection is successful according to whether the service coordination device returns a corresponding message . If all connections within the set time window fail, then the main control server actively stops the service provided as the main control server, that is, withdraws from the node position of the main control server. In this way, it is helpful to avoid the dual-master state of the system.
  • the process of the service coordination device regularly acquiring the survival status of the main control server is, for example: the service coordination device receives a message about its own active status periodically sent by the master control server, and judges the master control server when the message is obtained. Is alive.
  • the service coordination method of the distributed storage system further includes: receiving a connection request periodically sent by the main control server among the multiple control servers; receiving the determined identification of the main control server sent by the control server; recording the main control server The ID of the server. In this way, it is beneficial for the service coordination device to obtain the identification of the re-selected main control server in time, thereby maintaining the normal operation of the distributed storage system.
  • the service coordination device is a service coordination device that provides coordination services based on a distributed application coordination service (Zookeeper).
  • Distributed Application Coordination Service (Zookeeper) is a distributed, open source distributed application coordination service. It is the manager of the cluster and monitors the status of each node in the cluster to perform the next reasonable operation according to the feedback submitted by the node. .
  • the service coordination device can effectively manage a cluster composed of multiple control servers, and coordinate multiple control servers to provide external control services.
  • FIG. 4 shows a specific example of the implementation of the service coordination method of the distributed storage system provided by this embodiment.
  • each control server sends a registration request to the service coordination device, that is, step S101 is executed.
  • the service coordination device allocates a globally unique identity to each control server, that is, step S102 is executed.
  • a plurality of control servers are elected according to the identities assigned by the control servers, and the main control server is selected according to the principle of the least identity being elected.
  • the main control server provides the control server externally, and other control servers serve as backups, that is, step S103 is executed.
  • the main control server actively connects to the service coordination device periodically, and the time interval of the regular connection is less than the session timeout time set by the service coordination device, that is, step S104a is executed.
  • the slave control server also actively queries the service coordination device for the system status periodically, and the time interval of the regular query is less than the session timeout time set by the service coordination device, that is, step S104b is executed.
  • the main control server can provide services normally but the communication with the service coordination device is disconnected during the operation of the system. In this case, the service coordination device cannot obtain the survival status reported by the main control server.
  • the record of the main control server is cleared, and the system becomes the main lack state, that is, step S105 is executed.
  • the slave control server After that, the slave control server knows that the current state of the system is the lack of master status by periodically querying the system status. On this basis, it sends a master selection instruction to other control servers, actively initiates the re-election operation, and selects a new master control server, namely Step S106 and step S107 are executed.
  • the original master control server fails to perform master control connections with the service coordination device within the set time window, and on this basis, actively exits the master node position, that is, steps S108 and S109 are executed.
  • the slave control server actively initiates the master re-election based on the inquiry that the system is in a master-deficient state, which is beneficial to avoid the situation that the system is not master.
  • the original main control server actively retreats from the main node position when the connection with the service coordination device fails, which avoids the simultaneous existence of the new main control server and the original main control server, which is beneficial to avoid the situation of dual system masters.
  • This embodiment provides a service coordination device for a distributed storage system.
  • the distributed storage system includes a service coordination device and a plurality of control servers.
  • the device is applied to any control server and includes a query module and a judgment module.
  • the query module is configured to send a query request to the service coordination device to obtain a query result, and the query result represents whether a main control server is included in a plurality of control servers.
  • the judgment module is configured to determine whether to send a master selection instruction to other control servers according to the query result, and the master selection instruction is used to determine a master control server from a plurality of control servers.
  • the query module when the query module sends a query request to the service coordination device to obtain the query result, the query result indicates whether the main control server is included in the multiple control servers, and the query module is set to: at a preset first time interval, Send a query request to the service coordination device; receive the query result sent from the service coordination device in response to the query request, and the query result represents whether a main control server is included in the multiple control servers; wherein, the first time interval is less than the preset of the service coordination device The session timeout duration of.
  • the query result is whether the service coordination device records the identification of the main control server.
  • the device further includes a connection detection module, and the connection detection module is configured to: when the control server is the main control server, periodically send a connection request to the service coordination device; if the service is not received within the set time window Coordinating the device's response to the connection request will stop the service provided as the main control server.
  • connection detection module when the connection detection module sends a connection request to the service coordination device regularly when the control server is the main control server, it is set to: send the connection request to the service coordination device at a preset second time interval ; Wherein, the second time interval is less than the preset session timeout duration of the service coordination device.
  • the set time window is smaller than the session timeout duration set by the service coordination device.
  • the judging module determines whether to send the master selection instruction to other control servers according to the query result, it is configured to send the selection to other control servers when the query result indicates that the master control server is not included in the plurality of control servers.
  • the main command is used to determine a main control server from a plurality of control servers.
  • the device further includes an identification sending module, and the identification sending module is configured to send the determined identification of the master control server to the service coordination device after determining a master control server from the plurality of control servers.
  • This embodiment also provides a service coordination device for a distributed storage system.
  • the distributed storage system includes a service coordination device and a plurality of control servers implementing the methods described in the method embodiments.
  • the device is applied to the service coordination device and includes: The first receiving module, the result obtaining module and the first sending module.
  • the first receiving module is configured to receive the query request sent by the control server.
  • the result obtaining module is configured to obtain the query result in response to the query request, and the query result represents whether the main control server is included in the plurality of control servers.
  • the first sending module is configured to send the query result to the control server.
  • the query result is whether the server coordination device records the identification of the main control server.
  • the device further includes a second receiving module and a second sending module: the second receiving module is configured to receive connection requests periodically sent by the main control server among the plurality of control servers; the second sending module is configured to send the connection request to the main control server.
  • the control server sends a response message for the connection request.
  • the device further includes a third receiving module and a recording module: the third receiving module is set to receive the determined identification of the main control server sent by the control server; the recording module is set to record the identification of the main control server.
  • the service coordination device provides coordination services based on a distributed application coordination service (Zookeeper).
  • Ziookeeper distributed application coordination service
  • This embodiment provides an electronic device that includes a processor and a memory.
  • the memory stores machine-executable instructions that can be executed by the processor.
  • the processor executes the machine-executable instructions to implement the distributed storage system described in the method embodiments of the present application. Service coordination method.
  • This embodiment provides a distributed storage system, including a user agent server, multiple storage servers, multiple control servers that implement the first method described in the method embodiment of this application, and the second method described in the method embodiment of this application.
  • the service coordination device of the method wherein the control server is respectively connected to the user agent server, the multiple storage servers in communication connection, and the service coordination device communication connection.
  • This embodiment provides a machine-readable storage medium.
  • the machine-readable storage medium stores machine-executable instructions.
  • the machine-executable instructions When the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the method embodiments of the present application. Describe the service coordination method of the distributed storage system.
  • the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present application.
  • the computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • Non-exhaustive list of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device such as a printer with instructions stored thereon
  • the computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
  • the computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network.
  • the network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers.
  • the network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
  • the computer program instructions used to perform the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, status setting data, or in one or more programming languages.
  • Programming languages include object-oriented programming languages-such as Smalltalk, C++, etc., and conventional procedural programming languages-such as "C" language or similar programming languages.
  • Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out.
  • the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to access the Internet connection).
  • LAN local area network
  • WAN wide area network
  • an electronic circuit such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions.
  • the computer-readable program instructions are executed to realize various aspects of the present application.
  • These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
  • each block in the flowchart or block diagram can represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more executables for implementing the specified logical functions. instruction.
  • the functions marked in the block may also occur in a different order than the order marked in the drawings. For example, two consecutive blocks can actually be executed in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that realization by hardware, realization by software, and realization by a combination of software and hardware are all equivalent.
  • a distributed application coordination service (Zookeeper) is used to coordinate the operation of multiple control servers, for example, multiple control servers are notified to perform a master selection operation.
  • Zookeeper distributed application coordination service
  • existing coordination schemes are prone to unowned, dual-master, etc., which affect the stability of the distributed storage system.
  • the control server actively inquires whether the main control server is included in the multiple control servers, and determines whether to send a master selection instruction to other control servers according to the query result, which can avoid the situation that the system has no master , Improve the stability of the system.

Abstract

The present application relates to a service coordination method and apparatus for a distributed storage system, and an electronic device. The method comprises: sending a query request to a service coordination device to obtain a query result, wherein the query result represents whether a plurality of control servers comprise a master control server; and determining, according to the query result, whether to send a master control server selecting instruction to another control server, wherein the master control server selecting instruction is used for determining a master control server from the plurality of control servers.

Description

分布式存储系统的服务协调方法、装置及电子设备Service coordination method, device and electronic equipment of distributed storage system
本申请要求于2019年10月25日提交中国专利局、申请号为201911024570.3、发明名称为“分布式存储系统的服务协调方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 25, 2019, the application number is 201911024570.3, and the invention title is "Service Coordination Method, Apparatus and Electronic Equipment of Distributed Storage System", all of which are approved The reference is incorporated in this application.
技术领域Technical field
本申请涉及分布式存储技术领域,更具体地,涉及一种分布式存储系统的服务协调方法、一种分布式存储系统的服务协调装置、一种电子设备以及一种分布式存储系统。This application relates to the field of distributed storage technology, and more specifically, to a service coordination method of a distributed storage system, a service coordination device of a distributed storage system, an electronic device, and a distributed storage system.
背景技术Background technique
分布式存储是将数据分散存储在多台独立的设备上的存储方案。分布式网络存储系统采用可扩展的系统结构,利用多台存储服务器分担存储负荷,它不但提高了系统的可靠性、可用性和存取效率,还易于扩展。Distributed storage is a storage solution that distributes data to multiple independent devices. The distributed network storage system adopts an expandable system structure and utilizes multiple storage servers to share the storage load. It not only improves the reliability, availability, and access efficiency of the system, it is also easy to expand.
在分布式存储系统中,通常由专门的控制服务器来协调数据在多台数据服务器上的存储。控制服务器中存储有用于描述数据属性的元数据,可以实现存储位置记录、历史数据记录、资源查找等功能。为了提高系统的可靠性,控制服务器的数目通常为多个,由其中的主控制服务器提供控制服务,其它控制服务器作为备份。In a distributed storage system, a dedicated control server usually coordinates the storage of data on multiple data servers. Metadata used to describe data attributes is stored in the control server, which can realize functions such as storage location records, historical data records, and resource search. In order to improve the reliability of the system, the number of control servers is usually multiple, and the main control server provides control services, and the other control servers serve as backups.
可以通过分布式应用程序协调服务(Zookeeper)来协调多个控制服务器的运行,例如通知多个控制服务器进行选主操作。但是,现有的协调方案容易出现无主、双主等情况,从而影响分布式存储系统的稳定性。The distributed application coordination service (Zookeeper) can be used to coordinate the operation of multiple control servers, such as notifying multiple control servers to perform master selection operations. However, existing coordination schemes are prone to unowned, dual-master, etc., which affect the stability of the distributed storage system.
发明内容Summary of the invention
本申请的一个目的是提供一种分布式存储系统的服务协调的新的技术方案。One purpose of this application is to provide a new technical solution for service coordination of a distributed storage system.
根据本申请的第一方面,提供了一种分布式存储系统的服务协调方法,所述分布式存储系统包括服务协调设备和多个控制服务器,所述方法由任意 所述控制服务器实施,所述方法包括:向所述服务协调设备发送查询请求,以得到查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器;根据所述查询结果确定是否向其他所述控制服务器发送选主指令,所述选主指令用于从所述多个控制服务器中确定一个主控制服务器。According to the first aspect of the present application, there is provided a service coordination method for a distributed storage system. The distributed storage system includes a service coordination device and a plurality of control servers. The method is implemented by any of the control servers. The method includes: sending a query request to the service coordination device to obtain a query result, the query result indicating whether the multiple control servers include a main control server; and determining whether to send to other control servers according to the query result Sending a master selection instruction, where the master selection instruction is used to determine a master control server from the plurality of control servers.
在一实施方式中,所述向所述服务协调设备发送查询请求,以得到查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器,包括:以预设的第一时间间隔,向所述服务协调设备发送查询请求;接收来自所述服务协调设备响应于所述查询请求发送的查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器;其中,所述第一时间间隔小于所述服务协调设备的预设的会话超时时长。In an embodiment, the sending a query request to the service coordination device to obtain a query result, the query result characterizing whether a main control server is included in the plurality of control servers, includes: using a preset first At a time interval, sending a query request to the service coordination device; receiving a query result sent from the service coordination device in response to the query request, the query result indicating whether the multiple control servers include a main control server; Wherein, the first time interval is less than a preset session timeout duration of the service coordination device.
在一实施方式中,所述查询结果为所述服务协调设备是否记录有主控制服务器的标识。In one embodiment, the query result is whether the service coordination device has recorded the identification of the main control server.
在一实施方式中,所述方法还包括:在所述控制服务器为主控制服务器时,定时向所述服务协调设备发送连接请求;若在设定时间窗口内,未接收到所述服务协调设备针对所述连接请求的响应,则停止作为所述主控制服务器提供的服务。In one embodiment, the method further includes: when the control server is the master control server, periodically sending a connection request to the service coordination device; if the service coordination device is not received within a set time window In response to the connection request, the service provided as the main control server is stopped.
在一实施方式中,所述在所述控制服务器为主控制服务器时的情况下,定时向所述服务协调设备发送连接请求,包括:以预设的第二时间间隔,向所述服务协调设备发送所述连接请求;其中,所述第二时间间隔小于所述服务协调设备的预设的会话超时时长。In one embodiment, when the control server is the main control server, periodically sending a connection request to the service coordination device includes: sending a connection request to the service coordination device at a preset second time interval Sending the connection request; wherein the second time interval is less than a preset session timeout duration of the service coordination device.
在一实施方式中,所述设定时间窗口小于所述服务协调设备的设定的会话超时时长。In an embodiment, the set time window is less than the set session timeout duration of the service coordination device.
在一实施方式中,所述根据所述查询结果确定是否向其他所述控制服务器发送选主指令,所述选主指令用于从所述多个控制服务器中确定一个主控制服务器包括:当所述查询结果表征所述多个控制服务器中未包含有主控制服务器时,向其他所述控制服务器发送选主指令,以从所述多个控制服务器中确定一个主控制服务器。In an embodiment, the determining whether to send a master selection instruction to other control servers according to the query result, the master selection instruction being used to determine a master control server from the plurality of control servers includes: When the query result indicates that the main control server is not included in the plurality of control servers, a master selection instruction is sent to other control servers to determine a main control server from the plurality of control servers.
在一实施方式中,所述方法还包括:在从所述多个控制服务器中确定一个主控制服务器后,向所述服务协调设备发送所确定的主控制服务器的标识。In an embodiment, the method further includes: after determining a main control server from the plurality of control servers, sending the determined identification of the main control server to the service coordination device.
根据本申请的第二方面,提供了一种分布式存储系统的服务协调方法,所述分布式存储系统包括服务协调设备和多个实施本申请第一方面所述方法的控制服务器,所述方法由所述服务协调设备实施,所述方法包括:接收所述控制服务器发送的查询请求;响应于所述查询请求,获取查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器;向所述控制服务器发送所述查询结果。According to a second aspect of the present application, there is provided a service coordination method for a distributed storage system. The distributed storage system includes a service coordination device and a plurality of control servers that implement the method described in the first aspect of the present application. Implemented by the service coordination device, the method includes: receiving a query request sent by the control server; in response to the query request, obtaining a query result, the query result indicating whether the multiple control servers contain a master Control server; sending the query result to the control server.
在一实施方式中,所述查询结果为所述服务器协调设备是否记录有主控制服务器的标识。In one embodiment, the query result is whether the server coordination device has recorded the identification of the main control server.
在一实施方式中,所述方法还包括:接收所述多个控制服务器中的主控制服务器定时发送的连接请求;向所述主控制服务器发送针对所述连接请求的响应消息。In an embodiment, the method further includes: receiving a connection request periodically sent by a main control server of the plurality of control servers; and sending a response message for the connection request to the main control server.
在一实施方式中,所述方法还包括:接收所述控制服务器发送的所确定的主控制服务器的标识;记录所述主控制服务器的标识。In an embodiment, the method further includes: receiving the determined identification of the main control server sent by the control server; and recording the identification of the main control server.
在一实施方式中,所述服务协调设备基于分布式应用程序协调服务(Zookeeper)提供协调服务。In an embodiment, the service coordination device provides coordination services based on a distributed application coordination service (Zookeeper).
根据本申请的第三方面,提供了一种分布式存储系统的服务协调装置,所述分布式存储系统包括服务协调设备和多个控制服务器,所述装置应用于任意所述控制服务器,包括:查询模块,设置为向所述服务协调设备发送查询请求,以得到查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器;判断模块,设置为根据所述查询结果确定是否向其他所述控制服务器发送选主指令,所述选主指令用于从所述多个控制服务器中确定一个主控制服务器。According to a third aspect of the present application, there is provided a service coordination device for a distributed storage system. The distributed storage system includes a service coordination device and a plurality of control servers. The device is applied to any of the control servers and includes: The query module is configured to send a query request to the service coordination device to obtain a query result, and the query result represents whether a main control server is included in the plurality of control servers; the judgment module is configured to determine according to the query result Whether to send a master selection instruction to other control servers, where the master selection instruction is used to determine a master control server from the multiple control servers.
在一实施方式中,所述查询模块在向所述服务协调设备发送查询请求,以得到查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器时,设置为:以预设的第一时间间隔,向所述服务协调设备发送查询请求;接收来自所述服务协调设备响应于所述查询请求发送的查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器;其中,所述第一时间间隔小于所述服务协调设备的预设的会话超时时长。In one embodiment, when the query module sends a query request to the service coordination device to obtain a query result, and the query result indicates whether a main control server is included in the multiple control servers, the query module is set to: At a preset first time interval, a query request is sent to the service coordination device; a query result sent from the service coordination device in response to the query request is received, and the query result represents whether the multiple control servers include There is a main control server; wherein, the first time interval is less than a preset session timeout duration of the service coordination device.
在一实施方式中,所述查询结果为所述服务协调设备是否记录有主控制 服务器的标识。In one embodiment, the query result is whether the service coordination device has recorded the identification of the main control server.
在一实施方式中,所述装置还包括连接检测模块,所述连接检测模块设置为:在所述控制服务器为主控制服务器时,定时向所述服务协调设备发送连接请求;若在设定时间窗口内,未接收到所述服务协调设备针对所述连接请求的响应,则停止作为所述主控制服务器提供的服务。In one embodiment, the device further includes a connection detection module, the connection detection module is configured to: when the control server is the main control server, periodically send a connection request to the service coordination device; In the window, if the response of the service coordination device to the connection request is not received, the service provided as the main control server is stopped.
在一实施方式中,所述连接检测模块在所述控制服务器为主控制服务器时的情况下,定时向所述服务协调设备发送连接请求时,设置为:以预设的第二时间间隔,向所述服务协调设备发送所述连接请求;其中,所述第二时间间隔小于所述服务协调设备的预设的会话超时时长。In one embodiment, when the connection detection module periodically sends a connection request to the service coordination device when the control server is the main control server, it is set to: send a connection request to the service coordination device at a preset second time interval. The service coordination device sends the connection request; wherein, the second time interval is less than a preset session timeout duration of the service coordination device.
在一实施方式中,所述设定时间窗口小于所述服务协调设备的设定的会话超时时长。In an embodiment, the set time window is less than the set session timeout duration of the service coordination device.
在一实施方式中,所述判断模块在根据所述查询结果确定是否向其他所述控制服务器发送选主指令时,设置为:当所述查询结果表征所述多个控制服务器中未包含有主控制服务器时,向其他所述控制服务器发送选主指令,以从所述多个控制服务器中确定一个主控制服务器。In one embodiment, when the judgment module determines whether to send a master selection instruction to other control servers according to the query result, it is set to: when the query result indicates that the plurality of control servers does not contain a master When the server is controlled, a master selection instruction is sent to the other control servers to determine a master control server from the multiple control servers.
在一实施方式中,所述装置还包括标识发送模块,所述标识发送模块设置为:在从所述多个控制服务器中确定一个主控制服务器后,向所述服务协调设备发送所确定的主控制服务器的标识。In an embodiment, the device further includes an identification sending module, the identification sending module is configured to: after determining a master control server from the plurality of control servers, send the determined master to the service coordination device The ID of the control server.
根据本申请的第四方面,提供了一种分布式存储系统的服务协调装置,所述分布式存储系统包括服务协调设备和多个实施本申请第一方面所述方法的控制服务器,所述装置应用于所述服务协调设备,包括:第一接收模块,设置为接收所述控制服务器发送的查询请求;结果获取模块,设置为响应于所述查询请求,获取查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器;第一发送模块,设置为向所述控制服务器发送所述查询结果。According to a fourth aspect of the present application, there is provided a service coordination device for a distributed storage system, the distributed storage system including a service coordination device and a plurality of control servers that implement the method described in the first aspect of the present application, the device Applied to the service coordination device, it includes: a first receiving module configured to receive a query request sent by the control server; a result obtaining module configured to obtain a query result in response to the query request, and the query result represents the query result. Whether the multiple control servers include a main control server; the first sending module is configured to send the query result to the control server.
在一实施方式中,所述查询结果为所述服务器协调设备是否记录有主控制服务器的标识。In one embodiment, the query result is whether the server coordination device has recorded the identification of the main control server.
在一实施方式中,所述装置还包括第二接收模块和第二发送模块:所述第二接收模块设置为接收所述多个控制服务器中的主控制服务器定时发送的 连接请求;所述第二发送模块设置为向所述主控制服务器发送针对所述连接请求的响应消息。In an embodiment, the device further includes a second receiving module and a second sending module: the second receiving module is configured to receive a connection request periodically sent by a main control server among the plurality of control servers; The second sending module is configured to send a response message for the connection request to the main control server.
在一实施方式中,所述装置还包括第三接收模块和记录模块:所述第三接收模块设置为接收所述控制服务器发送的所确定的主控制服务器的标识;所述记录模块设置为记录所述主控制服务器的标识。In one embodiment, the device further includes a third receiving module and a recording module: the third receiving module is configured to receive the determined identification of the main control server sent by the control server; the recording module is configured to record The identifier of the main control server.
在一实施方式中,所述服务协调设备基于分布式应用程序协调服务(Zookeeper)提供协调服务。In an embodiment, the service coordination device provides coordination services based on a distributed application coordination service (Zookeeper).
根据本申请的第五方面,提供了一种电子设备,包括处理器和存储器,所述存储器存储有能够被所述处理器执行的机器可执行指令,所述处理器执行所述机器可执行指令以实现本申请第一方面或者第二方面所述的分布式存储系统的服务协调方法。According to a fifth aspect of the present application, there is provided an electronic device including a processor and a memory, the memory storing machine executable instructions that can be executed by the processor, and the processor executing the machine executable instructions In order to realize the service coordination method of the distributed storage system described in the first aspect or the second aspect of the present application.
根据本申请的第六方面,提供了一种分布式存储系统,包括用户代理服务器、多个存储服务器、多个实施本申请第一方面所述方法的控制服务器以及实施本申请第二方面所述方法的服务协调设备,其中,所述控制服务器分别于所述用户代理服务器、所述多个存储服务器通信连接和所述服务协调设备通信连接。According to the sixth aspect of the present application, there is provided a distributed storage system, including a user agent server, multiple storage servers, multiple control servers that implement the method described in the first aspect of the present application, and implement the second aspect of the present application. The service coordination device of the method, wherein the control server is in communication connection with the user agent server, the plurality of storage servers, and the service coordination device respectively.
在本实施例一个实施例中,控制服务器主动查询多个控制服务器中是否包含有主控制服务器,根据查询结果确定是否向其他控制服务器发送选主指令,能够避免出现系统无主的情况,提高系统的稳定性。In one embodiment of this embodiment, the control server actively inquires whether the main control server is included in the multiple control servers, and determines whether to send the master selection instruction to other control servers according to the query result, which can avoid the situation that the system has no master and improve the system The stability.
通过以下参照附图对本申请的示例性实施例的详细描述,本申请的其它特征及其优点将会变得清楚。Through the following detailed description of exemplary embodiments of the present application with reference to the accompanying drawings, other features and advantages of the present application will become clear.
附图说明Description of the drawings
被结合在说明书中并构成说明书的一部分的附图示出了本申请的实施例,并且连同其说明一起用于解释本申请的原理。The drawings incorporated in the specification and constituting a part of the specification illustrate the embodiments of the present application, and together with the description are used to explain the principle of the present application.
图1示出了可用于实现本申请的实施例的分布式存储系统的硬件配置示意图。FIG. 1 shows a schematic diagram of the hardware configuration of a distributed storage system that can be used to implement the embodiments of the present application.
图2示出了可用于实现本申请实施例的服务器的结构示意图。Figure 2 shows a schematic structural diagram of a server that can be used to implement the embodiments of the present application.
图3示出了根据本申请实施例的分布式存储系统的服务协调方法的流程 图。Fig. 3 shows a flowchart of a service coordination method of a distributed storage system according to an embodiment of the present application.
图4示出了根据本申请实施例的分布式存储系统服务协调方法实施的具体例子的流程图。Fig. 4 shows a flow chart of a specific example of the implementation of the service coordination method of the distributed storage system according to the embodiment of the present application.
具体实施方式Detailed ways
现在将参照附图来详细描述本申请的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本申请的范围。Various exemplary embodiments of the present application will now be described in detail with reference to the accompanying drawings. It should be noted that unless specifically stated otherwise, the relative arrangement, numerical expressions and numerical values of the components and steps set forth in these embodiments do not limit the scope of the application.
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本申请及其应用或使用的任何限制。The following description of at least one exemplary embodiment is actually only illustrative, and in no way serves as any restriction on the application and its application or use.
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,技术、方法和设备应当被视为说明书的一部分。The techniques, methods, and equipment known to those of ordinary skill in the relevant fields may not be discussed in detail, but where appropriate, the techniques, methods, and equipment should be regarded as part of the specification.
在这里示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。In all examples shown and discussed herein, any specific value should be interpreted as merely exemplary, rather than as a limitation. Therefore, other examples of the exemplary embodiment may have different values.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。It should be noted that similar reference numerals and letters indicate similar items in the following drawings, and therefore, once an item is defined in one drawing, it does not need to be further discussed in subsequent drawings.
<硬件配置><Hardware Configuration>
图1示出了可用于实现本申请实施例的分布式存储系统的结构示意图。Figure 1 shows a schematic structural diagram of a distributed storage system that can be used to implement embodiments of the present application.
如图1所示,分布式存储系统100包括用户代理服务器1000、存储服务器2000、控制服务器3000以及服务协调设备4000。其中,存储服务器2000和控制服务器3000的数目均为多个(两个以上)。As shown in FIG. 1, the distributed storage system 100 includes a user agent server 1000, a storage server 2000, a control server 3000, and a service coordination device 4000. Among them, the number of storage servers 2000 and control servers 3000 are both multiple (two or more).
存储服务器2000设置为存储目标数据。The storage server 2000 is set to store target data.
用户代理服务器1000设置为接收用户端发送的针对目标数据的数据读写请求,并将该数据读写请求转发至控制服务器3000。The user proxy server 1000 is configured to receive a data read and write request for target data sent by the user terminal, and forward the data read and write request to the control server 3000.
控制服务器3000设置为从自身存储的元数据中查询对应于目标数据的存储服务器2000,并将存储服务器2000的标识信息返回至用户代理服务器1000。用户代理服务器1000根据该标识信息与对应的存储服务器2000交互,完成对目标数据的读写操作。The control server 3000 is configured to query the storage server 2000 corresponding to the target data from the metadata stored by itself, and return the identification information of the storage server 2000 to the user agent server 1000. The user agent server 1000 interacts with the corresponding storage server 2000 according to the identification information to complete the read and write operations on the target data.
服务协调设备4000设置为协调多个控制服务器3000的运行,例如为控制服务器3000分配身份标识、通知控制服务器3000选举主控制服务器等。服务协调设备4000例如是安装有分布式应用程序协调软件的电子设备。The service coordination device 4000 is configured to coordinate the operation of multiple control servers 3000, such as assigning an identity to the control server 3000, notifying the control server 3000 to elect a master control server, and so on. The service coordination device 4000 is, for example, an electronic device installed with distributed application coordination software.
需要说明的是,在一些实施例中,可以将分布式应用程序协调服务布置在多个控制服务器3000中,这种情况下,多个控制服务器3000可以基于分布式应用程序协调服务实现自身的服务协调,不再需要额外的服务协调设备4000。It should be noted that in some embodiments, the distributed application coordination service can be arranged in multiple control servers 3000. In this case, the multiple control servers 3000 can implement their own services based on the distributed application coordination service. Coordination, no additional service coordination equipment 4000 is needed.
用户代理服务器1000、存储服务器2000、控制服务器3000以及服务协调设备4000之间可以通过有线网或者无线网进行通信。在分布式应用程序协调服务布置在多个控制服务器中的情况下,控制服务器和服务协调设备实际上是同一设备中不同进程间的通信。The user agent server 1000, the storage server 2000, the control server 3000, and the service coordination device 4000 may communicate with each other through a wired network or a wireless network. In the case that the distributed application coordination service is arranged in multiple control servers, the control server and the service coordination device are actually communications between different processes in the same device.
用户代理服务器1000、存储服务器2000、控制服务器3000以及服务协调设备4000都具有如图2所示的服务器1100的硬件配置。如图2所示,服务器1100可以包括处理器1110、存储器1120、接口装置1130、通信装置1140、显示装置1150和输入装置1160。处理器1110例如可以是中央处理器CPU等。存储器1120例如包括ROM(只读存储器)、RAM(随机存取存储器)、诸如硬盘的非易失性存储器等。接口装置1130例如包括USB接口、串行接口等。通信装置1140例如能够进行有线或无线通信。显示装置1150例如是液晶显示屏。输入装置1160例如可以包括触摸屏、键盘等。The user agent server 1000, the storage server 2000, the control server 3000, and the service coordination device 4000 all have the hardware configuration of the server 1100 as shown in FIG. 2. As shown in FIG. 2, the server 1100 may include a processor 1110, a memory 1120, an interface device 1130, a communication device 1140, a display device 1150, and an input device 1160. The processor 1110 may be, for example, a central processing unit CPU or the like. The memory 1120 includes, for example, ROM (Read Only Memory), RAM (Random Access Memory), nonvolatile memory such as a hard disk, and the like. The interface device 1130 includes, for example, a USB interface, a serial interface, and the like. The communication device 1140 can perform wired or wireless communication, for example. The display device 1150 is, for example, a liquid crystal display. The input device 1160 may include, for example, a touch screen, a keyboard, and the like.
应用于本说明书的实施例中,服务器1100的存储器1120设置为存储指令,该指令用于控制处理器1110进行操作以支持实现根据本说明书任意实施例的服务协调方法。技术人员可以根据本说明书所公开方案设计指令。指令如何控制处理器进行操作,这是本领域公知,故在此不再详细描述。In the embodiment applied to this specification, the memory 1120 of the server 1100 is configured to store instructions, which are used to control the processor 1110 to operate to support the implementation of the service coordination method according to any embodiment of this specification. Technicians can design instructions according to the scheme disclosed in this specification. How the instruction controls the processor to operate is well known in the art, so it will not be described in detail here.
本领域技术人员应当理解,尽管在图2中示出了服务器1100的多个装置,但是,本说明书实施例的服务器1100可以仅涉及其中的部分装置,例如,只涉及处理器1110、存储器1120和通信装置1140。Those skilled in the art should understand that although multiple devices of the server 1100 are shown in FIG. 2, the server 1100 in the embodiment of this specification may only involve some of the devices, for example, only the processor 1110, the memory 1120, and the Communication device 1140.
图1所示的文件批量对比系统1000仅是解释性的,并且决不是为了要限制本说明书、其应用或用途。The file batch comparison system 1000 shown in FIG. 1 is only for explanatory purposes, and is by no means intended to limit this specification, its application or purpose.
<方法实施例><Method Example>
本实施例提供了一种分布式存储系统的服务协调方法,该方法例如由图1中任意的控制服务器3000实施。如图3所示,该方法包括以下步骤S1100-S1200:This embodiment provides a service coordination method for a distributed storage system. The method is implemented by, for example, any control server 3000 in FIG. 1. As shown in Figure 3, the method includes the following steps S1100-S1200:
步骤S1100,向服务协调设备发送查询请求,以得到查询结果,查询结果表征多个控制服务器中是否包含有主控制服务器。Step S1100: Send a query request to the service coordination device to obtain a query result. The query result represents whether a main control server is included in the multiple control servers.
本实施例中,分布式存储系统包括多个控制服务器。通过选举等方式从多个控制服务器中选出一个主控制服务器,由主控制服务器提供控制服务,其它控制服务器作为备份。In this embodiment, the distributed storage system includes multiple control servers. One main control server is selected from multiple control servers through elections, etc., and the main control server provides control services, and other control servers serve as backups.
在一个例子中,控制服务器在初始启动阶段,向服务协调设备发送注册请求。服务协调设备响应于注册请求,按照预设规则为每个控制服务器分配唯一的身份标识。In an example, the control server sends a registration request to the service coordination device during the initial startup phase. In response to the registration request, the service coordination device assigns a unique identity to each control server according to preset rules.
本实施例中,由服务协调设备协调选主过程并记录主控制服务器的标识。在本步骤中,向服务器协调设备发送查询请求,示例性地,可以为向服务协调设备询问当前的多个控制服务器中,哪一个具有主控制服务器标识,以及该主控制服务器的状态。进而可以确定当前分布式存储系统的系统状态为具有主控制服务器、或者不具有主控制服务器、或者具有两个主控制服务器。本实施例中,分布式存储系统的系统状态可以包括:多个控制服务器包含有主控制服务器的系统状态(以下称为有主状态,有主状态例如包括一个主控制服务器,或者包括两个主控制服务器)和多个控制服务器不包含有主控制服务器的系统状态(以下称为缺主状态)。In this embodiment, the service coordination device coordinates the master selection process and records the identification of the master control server. In this step, the query request is sent to the server coordination device. Illustratively, the service coordination device may be inquiring of the current multiple control servers, which one has the master control server identifier and the status of the master control server. Furthermore, it can be determined that the system state of the current distributed storage system has a main control server, or does not have a main control server, or has two main control servers. In this embodiment, the system state of the distributed storage system may include: a plurality of control servers include the system state of the main control server (hereinafter referred to as the main state, the main state includes, for example, one main control server, or two main control servers). The control server) and multiple control servers do not include the system state of the main control server (hereinafter referred to as the lack of main state).
本实施例中,控制服务器主动向服务协调设备发送查询请求,以得到查询结果。查询结果表征多个控制服务器中是否包含有主控制服务器。In this embodiment, the control server actively sends a query request to the service coordination device to obtain the query result. The query result indicates whether the main control server is included in the multiple control servers.
在一实施方式中,步骤S1100进一步包括:以预设的第一时间间隔,向服务协调设备发送查询请求;接收来自服务协调设备响应于查询请求发送的查询结果,查询结果表征多个控制服务器中是否包含有主控制服务器;其中,第一时间间隔小于服务协调设备的预设的会话超时时长。In one embodiment, step S1100 further includes: sending a query request to the service coordination device at a preset first time interval; receiving a query result sent from the service coordination device in response to the query request, and the query result represents a plurality of control servers Whether a main control server is included; wherein, the first time interval is less than the preset session timeout duration of the service coordination device.
本实施例中,控制服务器与服务协调设备之间维持长连接,在预设的会话超时时长内,如果服务协调设备接收到了控制服务器发送的关于自身处于 活跃状态的消息(例如心跳包),则服务协调设备判断该连接处于正常状态,并继续监听二者之间的连接状态。In this embodiment, a long connection is maintained between the control server and the service coordination device. Within the preset session timeout period, if the service coordination device receives a message (such as a heartbeat packet) sent by the control server that it is in an active state, The service coordination device judges that the connection is in a normal state, and continues to monitor the connection state between the two.
本实施例中,控制服务器以预设的第一时间间隔定期查询当前系统状态。第一时间间隔小于服务协调设备的设定的会话超时时长,有利于避免会话过期。In this embodiment, the control server periodically queries the current system status at a preset first time interval. The first time interval is less than the session timeout duration set by the service coordination device, which is beneficial to avoid session expiration.
在一实施方式中,查询结果为服务协调设备是否记录有主控制服务器的标识。主控制服务器的标识例如是主控制服务器的IP或者其他标识,或者服务协调设备分配的唯一身份标识(例如,分配一个特定的字符串作为身份标识,或者在主控制服务器的IP地址跟添加字符串作为身份标识等)。In one embodiment, the query result is whether the service coordination device records the identification of the main control server. The identification of the main control server is, for example, the IP or other identification of the main control server, or the unique identification assigned by the service coordination device (for example, assign a specific character string as the identification, or add a character string to the IP address of the main control server) As an identity, etc.).
在一实施方式中,控制服务器查询当前系统状态是否为缺主状态的过程包括:请求服务协调设备提供当前系统状态;根据服务协调设备响应于请求提供的当前系统状态,确定当前系统状态。在一个例子中,服务协调设备记录当前系统处于有主状态或者无主状态,以及有主状态下当前主控制服务器的身份标识。控制服务器向服务协调设备发送了提供当前系统状态的请求后,根据接收到的服务协调设备返回的当前系统处于有主状态或者无主状态的信息,确定当前系统为有主状态或者无主状态。In one embodiment, the process of the control server inquiring whether the current system state is the missing master state includes: requesting the service coordination device to provide the current system state; and determining the current system state according to the current system state provided by the service coordination device in response to the request. In one example, the service coordination device records that the current system is in a master state or an unowned state, and the identity of the current master control server in the master state. After the control server sends a request to provide the current system state to the service coordination device, it determines whether the current system is in the active state or the unmaintained state according to the received information returned by the service coordination device that the current system is in the active state or the unactive state.
步骤S1200,根据查询结果确定是否向其他控制服务器发送选主指令,选主指令用于从多个控制服务器中确定一个主控制服务器。Step S1200: Determine whether to send a master selection instruction to other control servers according to the query result. The master selection instruction is used to determine a master control server from a plurality of control servers.
在一实施方式中,当查询结果表征多个控制服务器中未包含有主控制服务器时,控制服务器向其他控制服务器发送选主指令,以从多个控制服务器中确定一个主控制服务器。In one embodiment, when the query result indicates that the main control server is not included in the plurality of control servers, the control server sends a master selection instruction to other control servers to determine a main control server from the plurality of control servers.
本实施例中,控制服务器在查询到的当前系统状态是缺主状态的情况下,主动发送选主指令,从而发起与其他控制服务器共同选举主控制服务器的选主操作,无需等待服务协调设备通知后再发起选主操作。In this embodiment, the control server actively sends the master selection instruction when the current system state queried is the master lack state, thereby initiating the master selection operation of co-electing the master control server with other control servers, without waiting for notification from the service coordination device Then initiate the master election operation.
本实施例中,在系统正常运行的情况下,主控制服务器之外的控制服务器处于非选举状态,可以只与服务协调设备通信,无需与其他控制服务器通信。在查询到的当前系统状态是缺主状态的情况下,主控制服务器之外的控制服务器进入选举状态,与其他控制服务器进行通信并发起选主操作。In this embodiment, when the system is running normally, control servers other than the main control server are in a non-election state, and may only communicate with the service coordination device, without communicating with other control servers. In the case that the current system state that is queried is the master-less state, the control server other than the master control server enters the election state, communicates with other control servers, and initiates the master election operation.
本实施例中,选主的过程例如是:每个控制服务器发出投票并接收其他 控制服务器发出的投票;按照预设的选举规则对投票进行处理和统计;根据选举结果更新自身的状态,例如将自身状态更新为主控制服务器或者非主(从)控制服务器。In this embodiment, the process of selecting the master is, for example: each control server sends a vote and receives votes from other control servers; processes and counts votes according to preset election rules; updates its own status according to the election results, for example, Update its own status as master control server or non-master (slave) control server.
在一实施方式中,在选举阶段,多个控制服务器按照身份标识最小者当选的原则进行投票,选出主控制服务器。In one embodiment, in the election phase, multiple control servers vote according to the principle that the one with the smallest identity is elected to select the main control server.
选主完成后,多个控制服务器可只与服务协调设备通信,无需相互通信。在分布式应用程序协调服务布置在多个控制服务器中的情况下,控制服务器和服务协调设备通信之间的通信表现为控制服务对应的进程和协调服务对应的进程之间的通信。本实施例提供的分布式存储系统的服务协调方法,控制服务器主动查询多个控制服务器中是否包含有主控制服务器,根据查询结果确定是否向其他控制服务器发送选主指令,有利于避免出现系统无主的情况。例如,在一种场景中,当前主控制服务器出现死机故障,服务协调设备清除了该主控制服务器的身份标识,但是由于网络连接不通等原因不能通知其他控制服务器重新选主,这种情况下,根据本实施例提供的方法,其他控制服务器可基于查询到的系统状态主动发起选主,从而避免了系统出现无主状态。After the master selection is completed, multiple control servers can only communicate with the service coordination device, without mutual communication. In the case that the distributed application coordination service is arranged in multiple control servers, the communication between the control server and the service coordination device communication is manifested as the communication between the process corresponding to the control service and the process corresponding to the coordination service. In the service coordination method of the distributed storage system provided in this embodiment, the control server actively inquires whether the main control server is included in the multiple control servers, and determines whether to send the master selection instruction to other control servers according to the query result, which is beneficial to avoid system failure. The case of the Lord. For example, in a scenario, the current main control server crashes and the service coordination device clears the main control server's identity, but due to reasons such as network connection failure, other control servers cannot be notified to re-elect the main control server. In this case, According to the method provided in this embodiment, other control servers can actively initiate the master selection based on the queried system status, thereby avoiding the system having no master status.
在一实施方式中,分布式存储系统的服务协调方法还包括:在控制服务器为主控制服务器时,定时向服务协调设备发送连接请求;若在设定时间窗口内,未接收到服务协调设备针对连接请求的响应,则停止作为主控制服务器提供的服务。In an embodiment, the service coordination method of the distributed storage system further includes: when the control server is the main control server, periodically sending a connection request to the service coordination device; if within the set time window, the service coordination device does not receive In response to the connection request, the service provided as the main control server is stopped.
本实施例中,对于系统当前的主控制服务器,该主控制服务器主动与服务协调设备进行连接,例如主动向发送获取连接状态的请求,并根据服务协调设备是否返回相应的消息来判断连接是否成功。如果在设定时间窗口内进行的连接均失败,那么该主控制服务器主动停止作为主控制服务器提供的服务,即从主控制服务器的节点位置上退出。如此,有利于避免系统出现双主状态。In this embodiment, for the current main control server of the system, the main control server actively connects with the service coordination device, for example, actively sends a request to obtain the connection status, and judges whether the connection is successful according to whether the service coordination device returns a corresponding message . If all connections within the set time window fail, then the main control server actively stops the service provided as the main control server, that is, withdraws from the node position of the main control server. In this way, it is helpful to avoid the dual-master state of the system.
例如,在一种场景中,系统当前的主控制服务器能够正常工作,但是与服务协调设备之间的通信连接中断,服务协调设备将记录的主控制服务器的身份标识清除,并通知其他控制服务器重新发起选主。重新选主后,会出现系统存在两个主控制服务器的情况。这种情况下,根据本实施例提供的服务 协调方法,原来的主控制服务器因为不能与服务协调设备成功连接,会主动退出主节点位置,从而避免系统出现双主状态。For example, in a scenario, the current main control server of the system can work normally, but the communication connection with the service coordination device is interrupted. The service coordination device clears the recorded identity of the main control server and notifies other control servers to restart Initiate the election of the master. After re-election of the master, there will be two master control servers in the system. In this case, according to the service coordination method provided in this embodiment, because the original main control server cannot successfully connect with the service coordination device, it will actively withdraw from the position of the master node, thereby avoiding the dual-master state of the system.
在一实施方式中,主控制服务器主动连接服务协调设备的过程包括:以预设的第二时间间隔,主动连接服务协调设备,其中,第二时间间隔小于服务协调设备的设定的会话超时时长。In one embodiment, the process of the main control server actively connecting to the service coordination device includes: actively connecting to the service coordination device at a preset second time interval, where the second time interval is less than the session timeout duration set by the service coordination device .
本实施例中,主控制服务器以预设的第二时间间隔主动连接服务协调设备。第二时间间隔小于服务协调设备的设定的会话超时时长,有利于避免出现主控制服务器和服务协调设备之间的会话过期的情况。In this embodiment, the main control server actively connects to the service coordination device at a preset second time interval. The second time interval is less than the session timeout duration set by the service coordination device, which is beneficial to avoid the situation that the session between the main control server and the service coordination device expires.
在一实施方式中,上述设定时间窗口也小于服务协调设备的设定的会话超时时长。In an embodiment, the above-mentioned set time window is also smaller than the session timeout duration set by the service coordination device.
在一实施方式中,分布式存储系统的服务协调方法还包括:在从多个控制服务器中确定一个主控制服务器后,向服务协调设备发送所确定的主控制服务器的标识。主控制服务器的标识例如是主控制服务器的IP或者其他标识,例如服务协调设备分配的唯一身份标识。如此,有利于服务协调设备及时获取重新选出的主控制服务器的标识,从而保持分布式存储系统的正常运行。In an embodiment, the service coordination method of the distributed storage system further includes: after determining a master control server from the plurality of control servers, sending the determined identification of the master control server to the service coordination device. The identifier of the main control server is, for example, the IP or other identifiers of the main control server, such as a unique identifier assigned by the service coordination device. In this way, it is beneficial for the service coordination device to obtain the identification of the re-selected main control server in time, thereby maintaining the normal operation of the distributed storage system.
本实施例还提供另外一种分布式存储系统的服务协调方法,该方法例如由图1中的服务协调设备4000实施,或者在分布式应用程序协调服务布置在多个控制服务器中的情况下,由多个控制服务器共同实施。该方法包括如下步骤:接收控制服务器发送的查询请求;响应于查询请求,获取查询结果,查询结果表征多个控制服务器中是否包含有主控制服务器;向控制服务器发送查询结果。This embodiment also provides another service coordination method for a distributed storage system. The method is implemented by, for example, the service coordination device 4000 in FIG. 1, or when the distributed application coordination service is arranged in multiple control servers, Implemented by multiple control servers. The method includes the following steps: receiving a query request sent by a control server; in response to the query request, obtaining a query result, the query result indicating whether a main control server is included in a plurality of control servers; and sending the query result to the control server.
在一实施方式中,述查询结果为服务器协调设备是否记录有主控制服务器的标识。主控制服务器的标识例如是主控制服务器的IP或者其他标识,例如服务协调设备分配的唯一身份标识。In one embodiment, the query result is whether the server coordination device records the identification of the main control server. The identifier of the main control server is, for example, the IP or other identifiers of the main control server, such as a unique identifier assigned by the service coordination device.
在一实施方式中,分布式存储系统的服务协调方法还包括:接收多个控制服务器中的主控制服务器定时发送的连接请求;向主控制服务器发送针对连接请求的响应消息。In an embodiment, the service coordination method of the distributed storage system further includes: receiving a connection request periodically sent by a main control server among the multiple control servers; and sending a response message for the connection request to the main control server.
本实施例中,对于系统当前的主控制服务器,该主控制服务器主动与服 务协调设备进行连接,例如主动向发送获取连接状态的请求,并根据服务协调设备是否返回相应的消息来判断连接是否成功。如果在设定时间窗口内进行的连接均失败,那么该主控制服务器主动停止作为主控制服务器提供的服务,即从主控制服务器的节点位置上退出。如此,有利于避免系统出现双主状态。In this embodiment, for the current main control server of the system, the main control server actively connects with the service coordination device, for example, actively sends a request to obtain the connection status, and judges whether the connection is successful according to whether the service coordination device returns a corresponding message . If all connections within the set time window fail, then the main control server actively stops the service provided as the main control server, that is, withdraws from the node position of the main control server. In this way, it is helpful to avoid the dual-master state of the system.
本实施例中,服务协调设备定时获取主控制服务器的存活状态过程例如是:服务协调设备接收由主控制服务器定时发送的关于自身活跃状态的消息,在获取到该消息的情况下判断主控制服务器处于存活状态。In this embodiment, the process of the service coordination device regularly acquiring the survival status of the main control server is, for example: the service coordination device receives a message about its own active status periodically sent by the master control server, and judges the master control server when the message is obtained. Is alive.
在一实施方式中,分布式存储系统的服务协调方法还包括:接收多个控制服务器中的主控制服务器定时发送的连接请求;接收控制服务器发送的所确定的主控制服务器的标识;记录主控制服务器的标识。如此,有利于服务协调设备及时获取重新选出的主控制服务器的标识,从而保持分布式存储系统的正常运行。In an embodiment, the service coordination method of the distributed storage system further includes: receiving a connection request periodically sent by the main control server among the multiple control servers; receiving the determined identification of the main control server sent by the control server; recording the main control server The ID of the server. In this way, it is beneficial for the service coordination device to obtain the identification of the re-selected main control server in time, thereby maintaining the normal operation of the distributed storage system.
在一实施方式中,服务协调设备为基于分布式应用程序协调服务(Zookeeper)提供协调服务的服务协调设备。分布式应用程序协调服务(Zookeeper)是一个分布式的,开放源码的分布式应用程序协调服务,它是集群的管理者,监视着集群中各个节点的状态根据节点提交的反馈进行下一步合理操作。基于分布式应用程序协调服务(Zookeeper),服务协调设备能够有效管理多个控制服务器组成的集群,协调多个控制服务器对外提供控制服务。In one embodiment, the service coordination device is a service coordination device that provides coordination services based on a distributed application coordination service (Zookeeper). Distributed Application Coordination Service (Zookeeper) is a distributed, open source distributed application coordination service. It is the manager of the cluster and monitors the status of each node in the cluster to perform the next reasonable operation according to the feedback submitted by the node. . Based on the distributed application coordination service (Zookeeper), the service coordination device can effectively manage a cluster composed of multiple control servers, and coordinate multiple control servers to provide external control services.
图4示出了本实施例提供的分布式存储系统的服务协调方法实施的一个具体例子。如图4所示,在多个控制服务器的初始启动阶段,每一控制服务器均向服务协调设备发送注册请求,即执行步骤S101。服务协调设备响应于控制服务器发送的注册请求,为每个控制服务器分配全局唯一的身份标识,即执行步骤S102。多个控制服务器根据控制服务器分配的身份标识进行选举,按照标识最小者当选的原则选出主控制服务器,由主控制服务器对外提供控制服务器,其他控制服务器作为备份,即执行步骤S103。在系统运行过程中,主控制服务器定期主动连接服务协调设备,并且该定期连接的时间间隔小于 服务协调设备的设定的会话超时时间,即执行步骤S104a。另外,从控制服务器也主动向服务协调设备定期查询系统状态,并且该定期查询的时间间隔小于服务协调设备的设定的会话超时时间,即执行步骤S104b。在该例子中,假设在系统运行过程中出现了主控制服务器能正常提供服务但是与服务协调设备通信断开的情况,这种情况下服务协调设备由于不能获取主控制服务器上报的存活状态,因此将主控制服务器的记录清除,系统变为缺主状态,即执行步骤S105。之后,从控制服务器通过定期查询系统状态,获知系统当前状态为缺主状态,在此基础上向其他控制服务器发送选主指令,主动发起重新选主的操作,选出新的主控制服务器,即执行步骤S106和步骤S107。原主控制服务器在设定时间窗口内与服务协调设备进行的主控连接均失败,在此基础上主动退出主节点位置,即执行步骤S108和步骤S109。FIG. 4 shows a specific example of the implementation of the service coordination method of the distributed storage system provided by this embodiment. As shown in Figure 4, in the initial startup phase of multiple control servers, each control server sends a registration request to the service coordination device, that is, step S101 is executed. In response to the registration request sent by the control server, the service coordination device allocates a globally unique identity to each control server, that is, step S102 is executed. A plurality of control servers are elected according to the identities assigned by the control servers, and the main control server is selected according to the principle of the least identity being elected. The main control server provides the control server externally, and other control servers serve as backups, that is, step S103 is executed. During the operation of the system, the main control server actively connects to the service coordination device periodically, and the time interval of the regular connection is less than the session timeout time set by the service coordination device, that is, step S104a is executed. In addition, the slave control server also actively queries the service coordination device for the system status periodically, and the time interval of the regular query is less than the session timeout time set by the service coordination device, that is, step S104b is executed. In this example, it is assumed that the main control server can provide services normally but the communication with the service coordination device is disconnected during the operation of the system. In this case, the service coordination device cannot obtain the survival status reported by the main control server. The record of the main control server is cleared, and the system becomes the main lack state, that is, step S105 is executed. After that, the slave control server knows that the current state of the system is the lack of master status by periodically querying the system status. On this basis, it sends a master selection instruction to other control servers, actively initiates the re-election operation, and selects a new master control server, namely Step S106 and step S107 are executed. The original master control server fails to perform master control connections with the service coordination device within the set time window, and on this basis, actively exits the master node position, that is, steps S108 and S109 are executed.
在该例子中,即使出现服务协调设备无法通知从控制服务器重新选主的情况,从控制服务器也基于查询到系统处于缺主状态而主动发起重新选主,有利于避免出现系统无主的情况。另外,原主控制服务器在与服务协调设备的连接失败的情况下主动退出主节点位置,避免了新的主控制服务器和原来的主控制服务器同时存在的情况,有利于避免出现系统双主的情况。In this example, even if the service coordination device fails to notify the slave control server to re-elect the master, the slave control server actively initiates the master re-election based on the inquiry that the system is in a master-deficient state, which is beneficial to avoid the situation that the system is not master. In addition, the original main control server actively retreats from the main node position when the connection with the service coordination device fails, which avoids the simultaneous existence of the new main control server and the original main control server, which is beneficial to avoid the situation of dual system masters.
<控制服务器实施例><Control server embodiment>
本实施例提供了一种分布式存储系统的服务协调装置,该分布式存储系统包括服务协调设备和多个控制服务器,该装置应用于任意该控制服务器,包括查询模块和判断模块。This embodiment provides a service coordination device for a distributed storage system. The distributed storage system includes a service coordination device and a plurality of control servers. The device is applied to any control server and includes a query module and a judgment module.
查询模块,设置为向服务协调设备发送查询请求,以得到查询结果,查询结果表征多个控制服务器中是否包含有主控制服务器。The query module is configured to send a query request to the service coordination device to obtain a query result, and the query result represents whether a main control server is included in a plurality of control servers.
判断模块,设置为根据查询结果确定是否向其他控制服务器发送选主指令,选主指令用于从多个控制服务器中确定一个主控制服务器。The judgment module is configured to determine whether to send a master selection instruction to other control servers according to the query result, and the master selection instruction is used to determine a master control server from a plurality of control servers.
在一实施方式中,查询模块在向服务协调设备发送查询请求,以得到查询结果,查询结果表征多个控制服务器中是否包含有主控制服务器时,设置为:以预设的第一时间间隔,向服务协调设备发送查询请求;接收来自服务协调设备响应于查询请求发送的查询结果,查询结果表征多个控制服务器中 是否包含有主控制服务器;其中,第一时间间隔小于服务协调设备的预设的会话超时时长。In one embodiment, when the query module sends a query request to the service coordination device to obtain the query result, the query result indicates whether the main control server is included in the multiple control servers, and the query module is set to: at a preset first time interval, Send a query request to the service coordination device; receive the query result sent from the service coordination device in response to the query request, and the query result represents whether a main control server is included in the multiple control servers; wherein, the first time interval is less than the preset of the service coordination device The session timeout duration of.
在一实施方式中,查询结果为服务协调设备是否记录有主控制服务器的标识。In one embodiment, the query result is whether the service coordination device records the identification of the main control server.
在一实施方式中,该装置还包括连接检测模块,连接检测模块设置为:在控制服务器为主控制服务器时,定时向服务协调设备发送连接请求;若在设定时间窗口内,未接收到服务协调设备针对连接请求的响应,则停止作为主控制服务器提供的服务。In one embodiment, the device further includes a connection detection module, and the connection detection module is configured to: when the control server is the main control server, periodically send a connection request to the service coordination device; if the service is not received within the set time window Coordinating the device's response to the connection request will stop the service provided as the main control server.
在一实施方式中,连接检测模块在控制服务器为主控制服务器时的情况下,定时向服务协调设备发送连接请求时,设置为:以预设的第二时间间隔,向服务协调设备发送连接请求;其中,第二时间间隔小于服务协调设备的预设的会话超时时长。In one embodiment, when the connection detection module sends a connection request to the service coordination device regularly when the control server is the main control server, it is set to: send the connection request to the service coordination device at a preset second time interval ; Wherein, the second time interval is less than the preset session timeout duration of the service coordination device.
在一实施方式中,设定时间窗口小于服务协调设备的设定的会话超时时长。In an embodiment, the set time window is smaller than the session timeout duration set by the service coordination device.
在一实施方式中,判断模块在根据查询结果确定是否向其他控制服务器发送选主指令时,设置为:当查询结果表征多个控制服务器中未包含有主控制服务器时,向其他控制服务器发送选主指令,以从多个控制服务器中确定一个主控制服务器。In one embodiment, when the judging module determines whether to send the master selection instruction to other control servers according to the query result, it is configured to send the selection to other control servers when the query result indicates that the master control server is not included in the plurality of control servers. The main command is used to determine a main control server from a plurality of control servers.
在一实施方式中,该装置还包括标识发送模块,标识发送模块设置为:在从多个控制服务器中确定一个主控制服务器后,向服务协调设备发送所确定的主控制服务器的标识。In an embodiment, the device further includes an identification sending module, and the identification sending module is configured to send the determined identification of the master control server to the service coordination device after determining a master control server from the plurality of control servers.
本实施例还提供一种分布式存储系统的服务协调装置,该分布式存储系统包括服务协调设备和多个实施方法实施例中描述的方法的控制服务器,该装置应用于服务协调设备,包括:第一接收模块、结果获取模块和第一发送模块。This embodiment also provides a service coordination device for a distributed storage system. The distributed storage system includes a service coordination device and a plurality of control servers implementing the methods described in the method embodiments. The device is applied to the service coordination device and includes: The first receiving module, the result obtaining module and the first sending module.
第一接收模块,设置为接收控制服务器发送的查询请求。The first receiving module is configured to receive the query request sent by the control server.
结果获取模块,设置为响应于查询请求,获取查询结果,查询结果表征多个控制服务器中是否包含有主控制服务器。The result obtaining module is configured to obtain the query result in response to the query request, and the query result represents whether the main control server is included in the plurality of control servers.
第一发送模块,设置为向控制服务器发送查询结果。The first sending module is configured to send the query result to the control server.
在一实施方式中,查询结果为服务器协调设备是否记录有主控制服务器的标识。In one embodiment, the query result is whether the server coordination device records the identification of the main control server.
在一实施方式中,该装置还包括第二接收模块和第二发送模块:第二接收模块设置为接收多个控制服务器中的主控制服务器定时发送的连接请求;第二发送模块设置为向主控制服务器发送针对连接请求的响应消息。In an embodiment, the device further includes a second receiving module and a second sending module: the second receiving module is configured to receive connection requests periodically sent by the main control server among the plurality of control servers; the second sending module is configured to send the connection request to the main control server. The control server sends a response message for the connection request.
在一实施方式中,该装置还包括第三接收模块和记录模块:第三接收模块设置为接收控制服务器发送的所确定的主控制服务器的标识;记录模块设置为记录主控制服务器的标识。In an embodiment, the device further includes a third receiving module and a recording module: the third receiving module is set to receive the determined identification of the main control server sent by the control server; the recording module is set to record the identification of the main control server.
在一实施方式中,服务协调设备基于分布式应用程序协调服务(Zookeeper)提供协调服务。In one embodiment, the service coordination device provides coordination services based on a distributed application coordination service (Zookeeper).
<电子设备实施例><Embodiment of Electronic Equipment>
本实施例提供一种电子设备,包括处理器和存储器,存储器存储有能够被处理器执行的机器可执行指令,处理器执行机器可执行指令以实现本申请方法实施例描述的分布式存储系统的服务协调方法。This embodiment provides an electronic device that includes a processor and a memory. The memory stores machine-executable instructions that can be executed by the processor. The processor executes the machine-executable instructions to implement the distributed storage system described in the method embodiments of the present application. Service coordination method.
<系统实施例><System Example>
本实施例提供一种分布式存储系统,包括用户代理服务器、多个存储服务器、多个实施本申请方法实施例描述的第一种方法的控制服务器以及实施本申请方法实施例描述的第二种方法的服务协调设备,其中,控制服务器分别于用户代理服务器、多个存储服务器通信连接和服务协调设备通信连接。This embodiment provides a distributed storage system, including a user agent server, multiple storage servers, multiple control servers that implement the first method described in the method embodiment of this application, and the second method described in the method embodiment of this application. The service coordination device of the method, wherein the control server is respectively connected to the user agent server, the multiple storage servers in communication connection, and the service coordination device communication connection.
<机器可读存储介质实施例><Machine-readable storage medium embodiment>
本实施例提供一种机器可读存储介质,机器可读存储介质存储有机器可执行指令,机器可执行指令在被处理器调用和执行时,机器可执行指令促使处理器实现本申请方法实施例描述的分布式存储系统的服务协调方法。This embodiment provides a machine-readable storage medium. The machine-readable storage medium stores machine-executable instructions. When the machine-executable instructions are called and executed by a processor, the machine-executable instructions cause the processor to implement the method embodiments of the present application. Describe the service coordination method of the distributed storage system.
本申请可以是系统、方法和/或计算机程序产品。计算机程序产品可以包 括计算机可读存储介质,其上载有用于使处理器实现本申请的各个方面的计算机可读程序指令。This application can be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present application.
计算机可读存储介质可以是可以保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以是――但不限于――电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。这里所使用的计算机可读存储介质不被解释为瞬时信号本身,诸如无线电波或者其他自由传播的电磁波、通过波导或其他传输媒介传播的电磁波(例如,通过光纤电缆的光脉冲)、或者通过电线传输的电信号。The computer-readable storage medium may be a tangible device that can hold and store instructions used by the instruction execution device. The computer-readable storage medium may be, for example, but not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. More specific examples (non-exhaustive list) of computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon The protruding structure in the hole card or the groove, and any suitable combination of the above. The computer-readable storage medium used here is not interpreted as the instantaneous signal itself, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through waveguides or other transmission media (for example, light pulses through fiber optic cables), or through wires Transmission of electrical signals.
这里所描述的计算机可读程序指令可以从计算机可读存储介质下载到各个计算/处理设备,或者通过网络、例如因特网、局域网、广域网和/或无线网下载到外部计算机或外部存储设备。网络可以包括铜传输电缆、光纤传输、无线传输、路由器、防火墙、交换机、网关计算机和/或边缘服务器。每个计算/处理设备中的网络适配卡或者网络接口从网络接收计算机可读程序指令,并转发该计算机可读程序指令,以供存储在各个计算/处理设备中的计算机可读存储介质中。The computer-readable program instructions described herein can be downloaded from a computer-readable storage medium to various computing/processing devices, or downloaded to an external computer or external storage device via a network, such as the Internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, optical fiber transmission, wireless transmission, routers, firewalls, switches, gateway computers, and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network, and forwards the computer-readable program instructions for storage in the computer-readable storage medium in each computing/processing device .
用于执行本申请操作的计算机程序指令可以是汇编指令、指令集架构(ISA)指令、机器指令、机器相关指令、微代码、固件指令、状态设置数据、或者以一种或多种编程语言的任意组合编写的源代码或目标代码,编程语言包括面向对象的编程语言—诸如Smalltalk、C++等,以及常规的过程式编程语言—诸如“C”语言或类似的编程语言。计算机可读程序指令可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络—包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连 接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。在一些实施例中,通过利用计算机可读程序指令的状态信息来个性化定制电子电路,例如可编程逻辑电路、现场可编程门阵列(FPGA)或可编程逻辑阵列(PLA),该电子电路可以执行计算机可读程序指令,从而实现本申请的各个方面。The computer program instructions used to perform the operations of this application may be assembly instructions, instruction set architecture (ISA) instructions, machine instructions, machine-related instructions, microcode, firmware instructions, status setting data, or in one or more programming languages. Source code or object code written in any combination. Programming languages include object-oriented programming languages-such as Smalltalk, C++, etc., and conventional procedural programming languages-such as "C" language or similar programming languages. Computer-readable program instructions can be executed entirely on the user's computer, partly on the user's computer, executed as a stand-alone software package, partly on the user's computer and partly executed on a remote computer, or entirely on the remote computer or server carried out. In the case of a remote computer, the remote computer can be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computer (for example, using an Internet service provider to access the Internet connection). In some embodiments, an electronic circuit, such as a programmable logic circuit, a field programmable gate array (FPGA), or a programmable logic array (PLA), can be customized by using the status information of the computer-readable program instructions. The computer-readable program instructions are executed to realize various aspects of the present application.
这里参照根据本申请实施例的方法、装置(系统)和计算机程序产品的流程图和/或框图描述了本申请的各个方面。应当理解,流程图和/或框图的每个方框以及流程图和/或框图中各方框的组合,都可以由计算机可读程序指令实现。Here, various aspects of the present application are described with reference to the flowcharts and/or block diagrams of the methods, devices (systems) and computer program products according to the embodiments of the present application. It should be understood that each block of the flowcharts and/or block diagrams, and combinations of blocks in the flowcharts and/or block diagrams, can be implemented by computer-readable program instructions.
这些计算机可读程序指令可以提供给通用计算机、专用计算机或其它可编程数据处理装置的处理器,从而生产出一种机器,使得这些指令在通过计算机或其它可编程数据处理装置的处理器执行时,产生了实现流程图和/或框图中的一个或多个方框中规定的功能/动作的装置。也可以把这些计算机可读程序指令存储在计算机可读存储介质中,这些指令使得计算机、可编程数据处理装置和/或其他设备以特定方式工作,从而,存储有指令的计算机可读介质则包括一个制造品,其包括实现流程图和/或框图中的一个或多个方框中规定的功能/动作的各个方面的指令。These computer-readable program instructions can be provided to the processor of a general-purpose computer, a special-purpose computer, or other programmable data processing device, thereby producing a machine that makes these instructions when executed by the processor of the computer or other programmable data processing device , A device that implements the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams is produced. It is also possible to store these computer-readable program instructions in a computer-readable storage medium. These instructions make computers, programmable data processing apparatuses, and/or other devices work in a specific manner. Thus, the computer-readable medium storing the instructions includes An article of manufacture, which includes instructions for implementing various aspects of the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
也可以把计算机可读程序指令加载到计算机、其它可编程数据处理装置、或其它设备上,使得在计算机、其它可编程数据处理装置或其它设备上执行一系列操作步骤,以产生计算机实现的过程,从而使得在计算机、其它可编程数据处理装置、或其它设备上执行的指令实现流程图和/或框图中的一个或多个方框中规定的功能/动作。It is also possible to load computer-readable program instructions on a computer, other programmable data processing device, or other equipment, so that a series of operation steps are executed on the computer, other programmable data processing device, or other equipment to produce a computer-implemented process , So that the instructions executed on the computer, other programmable data processing apparatus, or other equipment realize the functions/actions specified in one or more blocks in the flowcharts and/or block diagrams.
附图中的流程图和框图显示了根据本申请的多个实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或指令的一部分,模块、程序段或指令的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用 执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。对于本领域技术人员来说公知的是,通过硬件方式实现、通过软件方式实现以及通过软件和硬件结合的方式实现都是等价的。The flowcharts and block diagrams in the accompanying drawings show the possible implementation architecture, functions, and operations of the system, method, and computer program product according to multiple embodiments of the present application. In this regard, each block in the flowchart or block diagram can represent a module, program segment, or part of an instruction, and the module, program segment, or part of an instruction contains one or more executables for implementing the specified logical functions. instruction. In some alternative implementations, the functions marked in the block may also occur in a different order than the order marked in the drawings. For example, two consecutive blocks can actually be executed in parallel, or they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram and/or flowchart, and the combination of the blocks in the block diagram and/or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or actions Or it can be realized by a combination of dedicated hardware and computer instructions. It is well known to those skilled in the art that realization by hardware, realization by software, and realization by a combination of software and hardware are all equivalent.
以上已经描述了本申请的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。本申请的范围由所附权利要求来限定。The embodiments of the present application have been described above, and the above description is exemplary, not exhaustive, and is not limited to the disclosed embodiments. Without departing from the scope and spirit of the described embodiments, many modifications and changes are obvious to those of ordinary skill in the art. The choice of terms used herein is intended to best explain the principles, practical applications, or technical improvements in the market of the various embodiments, or to enable other ordinary skilled in the art to understand the various embodiments disclosed herein. The scope of the application is defined by the appended claims.
工业实用性Industrial applicability
相关技术中,通过分布式应用程序协调服务(Zookeeper)来协调多个控制服务器的运行,例如通知多个控制服务器进行选主操作。但是,现有的协调方案容易出现无主、双主等情况,从而影响分布式存储系统的稳定性。In related technologies, a distributed application coordination service (Zookeeper) is used to coordinate the operation of multiple control servers, for example, multiple control servers are notified to perform a master selection operation. However, existing coordination schemes are prone to unowned, dual-master, etc., which affect the stability of the distributed storage system.
针对相关技术存在的问题,本申请实施例,控制服务器主动查询多个控制服务器中是否包含有主控制服务器,根据查询结果确定是否向其他控制服务器发送选主指令,能够避免出现系统无主的情况,提高系统的稳定性。In view of the problems in related technologies, in the embodiment of the present application, the control server actively inquires whether the main control server is included in the multiple control servers, and determines whether to send a master selection instruction to other control servers according to the query result, which can avoid the situation that the system has no master , Improve the stability of the system.

Claims (16)

  1. 一种分布式存储系统的服务协调方法,所述分布式存储系统包括服务协调设备和多个控制服务器,所述方法由任意所述控制服务器实施,所述方法包括:A service coordination method for a distributed storage system. The distributed storage system includes a service coordination device and a plurality of control servers. The method is implemented by any of the control servers, and the method includes:
    向所述服务协调设备发送查询请求,以得到查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器;Sending a query request to the service coordination device to obtain a query result, the query result indicating whether the multiple control servers include a main control server;
    根据所述查询结果确定是否向其他所述控制服务器发送选主指令,所述选主指令设置为从所述多个控制服务器中确定一个主控制服务器。Determine whether to send a master selection instruction to other control servers according to the query result, where the master selection instruction is set to determine a master control server from the multiple control servers.
  2. 根据权利要求1所述的方法,其中,所述向所述服务协调设备发送查询请求,以得到查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器,包括:The method according to claim 1, wherein the sending a query request to the service coordination device to obtain a query result, the query result indicating whether a main control server is included in the plurality of control servers, comprises:
    以预设的第一时间间隔,向所述服务协调设备发送查询请求;Sending a query request to the service coordination device at a preset first time interval;
    接收来自所述服务协调设备响应于所述查询请求发送的查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器;Receiving a query result sent from the service coordination device in response to the query request, the query result indicating whether the multiple control servers include a main control server;
    其中,所述第一时间间隔小于所述服务协调设备的预设的会话超时时长。Wherein, the first time interval is less than a preset session timeout duration of the service coordination device.
  3. 根据权利要求2所述的方法,其中,所述查询结果为所述服务协调设备是否记录有主控制服务器的标识。The method according to claim 2, wherein the query result is whether the service coordination device has recorded the identification of the main control server.
  4. 根据权利要求1至3中任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 1 to 3, wherein the method further comprises:
    在所述控制服务器为主控制服务器时,定时向所述服务协调设备发送连接请求;When the control server is the master control server, periodically sending a connection request to the service coordination device;
    若在设定时间窗口内,未接收到所述服务协调设备针对所述连接请求的响应,则停止作为所述主控制服务器提供的服务。If within a set time window, the service coordination device does not receive a response to the connection request, then stop serving as the main control server.
  5. 根据权利要求4所述的方法,其中,所述在所述控制服务器为主控制服务器时的情况下,定时向所述服务协调设备发送连接请求,包括:The method according to claim 4, wherein, when the control server is the main control server, periodically sending a connection request to the service coordination device includes:
    以预设的第二时间间隔,向所述服务协调设备发送所述连接请求;Sending the connection request to the service coordination device at a preset second time interval;
    其中,所述第二时间间隔小于所述服务协调设备的预设的会话超时时长。Wherein, the second time interval is less than a preset session timeout duration of the service coordination device.
  6. 根据权利要求4或5所述的方法,其中,所述设定时间窗口小于所述服务协调设备的设定的会话超时时长。The method according to claim 4 or 5, wherein the set time window is less than the set session timeout duration of the service coordination device.
  7. 根据权利要求1至6中任一项所述的方法,其中,所述根据所述查询 结果确定是否向其他所述控制服务器发送选主指令,所述选主指令设置为从所述多个控制服务器中确定一个主控制服务器包括:The method according to any one of claims 1 to 6, wherein the determining whether to send a master selection instruction to other control servers according to the query result, and the master selection instruction is set to slave the plurality of control servers. The server determines a master control server including:
    当所述查询结果表征所述多个控制服务器中未包含有主控制服务器时,向其他所述控制服务器发送选主指令,以从所述多个控制服务器中确定一个主控制服务器。When the query result indicates that no master control server is included in the multiple control servers, a master selection instruction is sent to other control servers to determine a master control server from the multiple control servers.
  8. 根据权利要求7所述的方法,其中,所述方法还包括:The method according to claim 7, wherein the method further comprises:
    在从所述多个控制服务器中确定一个主控制服务器后,向所述服务协调设备发送所确定的主控制服务器的标识。After determining a main control server from the multiple control servers, send the determined main control server identification to the service coordination device.
  9. 一种分布式存储系统的服务协调方法,所述分布式存储系统包括服务协调设备和多个实施权利要求1-6中任一项所述方法的控制服务器,所述方法由所述服务协调设备实施,所述方法包括:A service coordination method for a distributed storage system, the distributed storage system comprising a service coordination device and a plurality of control servers implementing the method described in any one of claims 1-6, and the method is controlled by the service coordination device Implementation, the method includes:
    接收所述控制服务器发送的查询请求;Receiving a query request sent by the control server;
    响应于所述查询请求,获取查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器;In response to the query request, obtaining a query result, where the query result represents whether a main control server is included in the plurality of control servers;
    向所述控制服务器发送所述查询结果。Sending the query result to the control server.
  10. 根据权利要求9所述的方法,其中,所述查询结果为所述服务器协调设备是否记录有主控制服务器的标识。The method according to claim 9, wherein the query result is whether the server coordination device has recorded the identification of the main control server.
  11. 根据权利要求9或10所述的方法,其中,所述方法还包括:The method according to claim 9 or 10, wherein the method further comprises:
    接收所述多个控制服务器中的主控制服务器定时发送的连接请求;Receiving a connection request periodically sent by a main control server among the multiple control servers;
    向所述主控制服务器发送针对所述连接请求的响应消息。Sending a response message for the connection request to the main control server.
  12. 根据权利要求9至11中任一项所述的方法,其中,所述方法还包括:The method according to any one of claims 9 to 11, wherein the method further comprises:
    接收所述控制服务器发送的所确定的主控制服务器的标识;Receiving the determined identification of the main control server sent by the control server;
    记录所述主控制服务器的标识。Record the identification of the main control server.
  13. 一种分布式存储系统的服务协调装置,所述分布式存储系统包括服务协调设备和多个控制服务器,所述装置应用于任意所述控制服务器,包括:A service coordination device for a distributed storage system. The distributed storage system includes a service coordination device and a plurality of control servers. The device is applied to any of the control servers and includes:
    查询模块,设置为向所述服务协调设备发送查询请求,以得到查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器;A query module, configured to send a query request to the service coordination device to obtain a query result, and the query result represents whether a main control server is included in the plurality of control servers;
    判断模块,设置为根据所述查询结果确定是否向其他所述控制服务器发送选主指令,所述选主指令用于从所述多个控制服务器中确定一个主控制服 务器。The judgment module is configured to determine whether to send a master selection instruction to the other control servers according to the query result, and the master selection instruction is used to determine a master control server from the multiple control servers.
  14. 一种分布式存储系统的服务协调装置,所述分布式存储系统包括服务协调设备和多个实施权利要求1-6中任一项所述方法的控制服务器,所述装置应用于所述服务协调设备,包括:A service coordination device for a distributed storage system, the distributed storage system comprising a service coordination device and a plurality of control servers implementing the method of any one of claims 1-6, and the device is applied to the service coordination Equipment, including:
    第一接收模块,设置为接收所述控制服务器发送的查询请求;The first receiving module is configured to receive the query request sent by the control server;
    结果获取模块,设置为响应于所述查询请求,获取查询结果,所述查询结果表征所述多个控制服务器中是否包含有主控制服务器;The result obtaining module is configured to obtain a query result in response to the query request, and the query result represents whether a main control server is included in the plurality of control servers;
    第一发送模块,设置为向所述控制服务器发送所述查询结果。The first sending module is configured to send the query result to the control server.
  15. 一种电子设备,包括处理器和存储器,所述存储器存储有能够被所述处理器执行的机器可执行指令,所述处理器执行所述机器可执行指令以实现权利要求1至13中任一项所述的分布式存储系统的服务协调方法。An electronic device comprising a processor and a memory, the memory storing machine executable instructions that can be executed by the processor, and the processor executing the machine executable instructions to implement any one of claims 1 to 13 The service coordination method of the distributed storage system described in the item.
  16. 一种分布式存储系统,包括用户代理服务器、多个存储服务器、多个实施权利要求1-6中任一项所述方法的控制服务器以及实施权利要求9-13中任一项所述方法的服务协调设备,其中,所述控制服务器分别与所述用户代理服务器、所述多个存储服务器通信连接和所述服务协调设备通信连接。A distributed storage system, comprising a user agent server, multiple storage servers, multiple control servers implementing the method described in any one of claims 1-6, and a plurality of control servers implementing the method described in any one of claims 9-13 The service coordination device, wherein the control server is respectively in communication connection with the user agent server, the plurality of storage servers, and the service coordination device is in communication connection.
PCT/CN2020/123516 2019-10-25 2020-10-26 Service coordination method and apparatus for distributed storage system, and electronic device WO2021078294A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911024570.3 2019-10-25
CN201911024570.3A CN112714143A (en) 2019-10-25 2019-10-25 Service coordination method and device of distributed storage system and electronic equipment

Publications (1)

Publication Number Publication Date
WO2021078294A1 true WO2021078294A1 (en) 2021-04-29

Family

ID=75541527

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/123516 WO2021078294A1 (en) 2019-10-25 2020-10-26 Service coordination method and apparatus for distributed storage system, and electronic device

Country Status (2)

Country Link
CN (1) CN112714143A (en)
WO (1) WO2021078294A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242720A (en) * 2022-08-03 2022-10-25 北京达佳互联信息技术有限公司 Connection method and device for long connection service, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105306566A (en) * 2015-10-22 2016-02-03 创新科存储技术(深圳)有限公司 Method and system for electing master control node in cloud storage system
US9819541B2 (en) * 2015-03-20 2017-11-14 Cisco Technology, Inc. PTP over IP in a network topology with clock redundancy for better PTP accuracy and stability
CN107579860A (en) * 2017-09-29 2018-01-12 新华三技术有限公司 Node electoral machinery and device

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101436209B (en) * 2008-12-15 2011-01-05 中兴通讯股份有限公司 Method and apparatus for synchronizing multiple databases
CN104754029B (en) * 2014-12-31 2018-04-27 北京天诚盛业科技有限公司 Determine the methods, devices and systems of master management server
CN106533738B (en) * 2016-10-20 2019-09-10 中国民生银行股份有限公司 The methods, devices and systems of distributed batch processing
CN106789197A (en) * 2016-12-07 2017-05-31 高新兴科技集团股份有限公司 A kind of cluster election method and system
CN107528730B (en) * 2017-08-28 2021-08-27 北京格是菁华信息技术有限公司 Multiple redundancy method, multiple redundancy server and system
CN108717379B (en) * 2018-05-08 2023-07-25 平安证券股份有限公司 Electronic device, distributed task scheduling method and storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9819541B2 (en) * 2015-03-20 2017-11-14 Cisco Technology, Inc. PTP over IP in a network topology with clock redundancy for better PTP accuracy and stability
CN105306566A (en) * 2015-10-22 2016-02-03 创新科存储技术(深圳)有限公司 Method and system for electing master control node in cloud storage system
CN107579860A (en) * 2017-09-29 2018-01-12 新华三技术有限公司 Node electoral machinery and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115242720A (en) * 2022-08-03 2022-10-25 北京达佳互联信息技术有限公司 Connection method and device for long connection service, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112714143A (en) 2021-04-27

Similar Documents

Publication Publication Date Title
JP4637842B2 (en) Fast application notification in clustered computing systems
US11249788B2 (en) Cloud management platform, and virtual machine management method and system
US9367261B2 (en) Computer system, data management method and data management program
US20150236902A1 (en) System, method and apparatus to manage services in a network
WO2018137572A1 (en) Strategy management method, device, and system
CN107666493B (en) Database configuration method and equipment thereof
US9390156B2 (en) Distributed directory environment using clustered LDAP servers
US11445013B2 (en) Method for changing member in distributed system and distributed system
US10963353B2 (en) Systems and methods for cross-regional back up of distributed databases on a cloud service
US11546228B2 (en) Zero-touch configuration of network devices using hardware metadata
US11330078B1 (en) Method and system for managing updates of a data manager
US10909009B2 (en) System and method to create a highly available quorum for clustered solutions
US20150347043A1 (en) Cluster consistent logical storage object naming
WO2021078294A1 (en) Service coordination method and apparatus for distributed storage system, and electronic device
WO2021082868A1 (en) Data managmenet method for distributed storage system, apparatus, and electronic device
US10992770B2 (en) Method and system for managing network service
EP3570169A1 (en) Method and system for processing device failure
US10841163B2 (en) Autoinitialization of clustered storage
US20220398073A1 (en) System and method for intelligent update flow across inter and intra update dependencies
US11637737B2 (en) Network data management framework
JP6644902B2 (en) Neighbor monitoring in a hyperscale environment
US11321185B2 (en) Method to detect and exclude orphaned virtual machines from backup
US10972343B2 (en) System and method for device configuration update
US9548940B2 (en) Master election among resource managers
US20200233853A1 (en) Group membership and leader election coordination for distributed applications using a consistent database

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20879706

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20879706

Country of ref document: EP

Kind code of ref document: A1