WO2016074167A1 - 分布式系统中锁服务器故障的处理方法及其系统 - Google Patents
分布式系统中锁服务器故障的处理方法及其系统 Download PDFInfo
- Publication number
- WO2016074167A1 WO2016074167A1 PCT/CN2014/090886 CN2014090886W WO2016074167A1 WO 2016074167 A1 WO2016074167 A1 WO 2016074167A1 CN 2014090886 W CN2014090886 W CN 2014090886W WO 2016074167 A1 WO2016074167 A1 WO 2016074167A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- lock
- server
- lock server
- request
- takeover
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/203—Failover techniques using migration
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
- H04L67/1097—Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/16—Error detection or correction of the data by redundancy in hardware
- G06F11/20—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
- G06F11/202—Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where processing functionality is redundant
- G06F11/2023—Failover techniques
- G06F11/2028—Failover techniques eliminating a faulty processor or activating a spare
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/17—Details of further file system functions
- G06F16/176—Support for shared access to files; File sharing support
- G06F16/1767—Concurrency control, e.g. optimistic or pessimistic approaches
- G06F16/1774—Locking methods, e.g. locking methods for file systems allowing shared and concurrent access to files
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/10—Protocols in which an application is distributed across nodes in the network
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
- G06F2201/805—Real-time
Definitions
- the present invention relates to storage technologies, and more particularly to a method and system for processing a lock server failure in a distributed system.
- NAS Network Attacked Storage
- the same file can receive read and write requests sent by different application hosts.
- the lock server in the node device needs to put the current file.
- Lock used to implement concurrent exclusive access to shared resources.
- the file is released.
- the correspondence between the lock permission information and the application host may be stored in each node or may be stored in a shared storage.
- the shared storage is independent of each node and is accessible to each node, not shown in FIG.
- NFS Network File System V3
- NLM Network Lock Manager
- NSM Network State Manager
- the IP address of the node device 1 drifts to the node device 2, that is, the IP address of the node device 1 is configured to the node device 2.
- the IP drift is transparent to the application host 1, that is, the application host 1 does not know the changes that occur between the node devices.
- the IP address of the failed node device drifts to the new node device, and the application host can pass The lock reclaims the request to re-apply the lock permission of the file that the application in the application host has acquired.
- the lock server in the distributed system needs to securely control the lock request, such as a lock re-request or a lock request. Otherwise, the data obtained by multiple application hosts may be inconsistent due to improper permission control, and even multiple application hosts may simultaneously read. The problem of system crash when writing data.
- the lock servers in the distributed system are all silent, that is, the lock server in the node device enters a silent state.
- the protocol server in the node device receives the lock re-request request
- the lock re-request request is sent to the corresponding lock server according to the information carried in the lock re-request request or the stored lock authority; when the protocol server receives the lock request , directly respond to the requester with a response message rejected.
- a lock request is a new lock permission for an application in the application host to apply for a file to the lock server.
- the distributed system can only process the lock reassuring request, and cannot handle the lock request. Although only one lock server in the distributed system fails, The lock server is silent, which makes the local problem a global problem, and can not handle the normal lock request, which may cause business interruption and reduce the reliability of the distributed system.
- Embodiments of the present invention provide a method for processing a lock server fault in a distributed system and The system solves the problem that in the prior art, when a lock server fails, all the lock servers are silent and cannot handle the lock request, thereby reducing the reliability of the distributed system.
- a method for processing a lock server failure in a distributed system includes at least three lock servers, wherein each lock server stores the same lock server takeover relationship.
- Information including steps:
- the lock server that has not failed in the distributed system receives the first notification message, where the first notification message carries information that the first lock server in the distributed system fails; the first in the distributed system After receiving the first notification message, the second lock server determines that it is the takeover lock server of the first lock server according to the locally stored lock server takeover relationship information, and the takeover lock server enters a silent state; the distributed system After receiving the first notification message, the third lock server determines that it is not the takeover lock server of the first lock server according to the locally stored lock server takeover relationship information; and determines that the third lock is not the takeover lock server.
- the lock authority information is allocated according to the lock request.
- the takeover lock server when the takeover lock server receives the lock reassurance request, the corresponding lock permission information is returned according to the lock permission information table; and the takeover lock server receives When the request is locked, the response message of the rejection is returned.
- the distributed system further includes at least three protocol servers and corresponding lock agents, the protocol server And the corresponding lock agent is located in the same node device, the method further includes: after the protocol server receives the lock request, sending the lock request to a corresponding lock agent, the lock request is a lock reassortment request or Lock request.
- each of the lock agents locally stores the lock server takeover relationship information and the lock server management scope information
- the method further includes: after receiving the lock request, the lock agent determines, according to the locally stored lock server management scope information, a lock server that processes the lock request;
- the lock server identifier for processing the lock request determined in the server management scope information is a fault state, and the lock proxy determines the takeover lock server of the lock server in the fault state according to the locally stored lock server takeover relationship information;
- the incoming lock request is sent to the takeover lock server.
- the lock server takeover relationship information is determined by a consistent hash ring, and the third lock server is configured according to The locally stored lock server takeover relationship information determines that it is not the takeover lock server of the first lock server. Specifically, the third lock server determines that it is not according to the clockwise direction or the counterclockwise direction of the locally stored consistency hash ring. The first lock server takes over the lock server.
- each of the lock agents locally stores the lock server takeover relationship information and the lock server management scope information
- the lock server management scope information and the lock server takeover relationship information are determined by the consistent hash ring; after receiving the lock request, the lock proxy follows a clockwise direction of a locally stored consistency hash ring or Determining, by a counterclockwise direction, a lock server that processes the lock request; if the lock server identifier for processing the lock request in the locally stored consistency hash ring is a fault state; the lock proxy is consistent according to local storage
- the same direction of the sexual hash ring determines the takeover lock server of the lock server that handles the lock request.
- a distributed system for implementing lock server fault processing includes: at least three lock servers, wherein each lock server stores the same lock server takeover relationship information;
- the lock server that has not failed in the lock server is configured to receive a first notification message, where the first notification message carries information that the first lock server is faulty; and the second lock server is configured to take over the relationship information according to the locally stored lock server.
- the takeover lock server Determining that it is the takeover lock server of the first lock server, the takeover lock server enters a silent state; the third lock server is configured to determine that it is not the takeover of the first lock server according to the locally stored lock server takeover relationship information a lock server; the third lock server determined to be the non-takeover lock server is configured to allocate lock permission information according to the lock request after receiving the lock request.
- a lock server for implementing fault processing in a distributed system.
- the lock server includes a receiving module 801, a processing module 803, and a storage module 805.
- the storage module 805 stores a lock.
- the server takes over the relationship information;
- the receiving module 801 is configured to receive the first notification message, and send the first notification message to the processing module 803; the first notification message carries the information of the faulty lock server;
- the processing module 803 determines, according to the lock server takeover relationship information, whether it is the takeover lock server of the fault lock server; if the lock lock server of the fault lock server is the lock, the lock The server enters a silent state;
- the processing module 803 is further configured to: after receiving the lock request, determine whether the lock server is in a silent state, and if the lock server is not in a silent state, assign a lock permission according to the lock request Information; if the lock server is in a silent state, return a response message of the rejection, and assign
- a lock server for implementing fault processing in a distributed system includes: a memory 901 configured to store a lock server takeover relationship information and a lock authority information table; and an interface 902 configured For providing an external connection; a computer readable medium 903 configured to store a computer program; a processor 904 coupled to the memory 901, the interface 902, the computer readable medium 903, configured to operate by The program executes the above-described lock server failure processing method.
- the lock server takeover relationship information and the lock server management scope information are determined by using a consistent hash ring; the processing module 1003 is configured according to the storage module 1005.
- the lock server management range information is specifically: the processing module 1003 determines the lock server that processes the lock request according to the clockwise direction or the counterclockwise direction of the consistency hash ring; the processing module 1003 is configured according to the storage module 1005.
- the stored lock server takeover relationship information determines that the takeover lock server of the lock server that processes the lock request is specifically: the processing module 1003 determines the lock for processing the lock request according to the same direction of the consistent hash ring. The server takes over the lock server.
- a lock manager for implementing fault handling in a distributed system includes the lock server as described above and the lock proxy device as described above.
- the embodiment of the present invention proposes to record the information about the takeover relationship of each lock server in the lock server in the distributed system, so that when one of the lock servers fails, the takeover lock server of the fault lock server is confirmed according to the takeover relationship information of the lock server. Only the takeover lock server is silent, and other non-takeover lock servers are running normally, and the lock request can be processed normally. In this way, when a lock server in a distributed system fails, the affected range can be minimized, and only the takeover server is silent, and other non-takeover lock servers operate normally without affecting the normal service. Operation improves the stability of the distributed system.
- FIG. 1 is a schematic structural diagram of a distributed system in the prior art
- FIG. 2 is a schematic structural diagram of a distributed system according to an embodiment of the present invention.
- 3-1 is a schematic diagram of a consistent hash ring according to an embodiment of the present invention.
- 3-3 is a schematic diagram of an updated consistency hash ring according to an embodiment of the present invention.
- 3-4 are schematic diagrams of still another consistent hash ring according to an embodiment of the present invention.
- FIG. 3-5 are schematic diagrams of an updated consistency hash ring according to an embodiment of the present invention.
- 4-1 is a schematic structural diagram of a distributed system for implementing a lock server fault processing method according to an embodiment of the present invention
- 4-2 is a schematic flowchart of a method for processing a lock server fault in a distributed system according to an embodiment of the present invention
- 4-3 is a schematic flowchart of a method for processing a lock server fault in another distributed system according to an embodiment of the present invention
- 5-1 is a schematic structural diagram of a distributed system for implementing a lock server fault processing method according to an embodiment of the present invention
- 5-2 is a schematic flowchart of a method for processing a lock server fault in a distributed system according to an embodiment of the present invention
- FIG. 5-3 is a schematic diagram of a consistent hash ring according to an embodiment of the present disclosure.
- 6-1 is a schematic structural diagram of a distributed system for implementing a lock server fault processing method according to an embodiment of the present invention
- 6-2 is a schematic flowchart of a method for processing a lock server fault in a distributed system according to an embodiment of the present invention
- FIG. 7 is a schematic structural diagram of a distributed system for processing a fault of a lock server according to an embodiment of the present invention.
- FIG. 8 is a schematic structural diagram of a lock server for implementing fault processing in a distributed system according to an embodiment of the present invention.
- FIG. 9 is a schematic diagram of a lock server device for implementing fault processing in a distributed system according to an embodiment of the present invention.
- FIG. 10 is a schematic structural diagram of a lock agent for implementing fault processing in a distributed system according to an embodiment of the present invention.
- FIG. 11 is a schematic diagram of a lock agent device for implementing fault processing in a distributed system according to an embodiment of the present invention.
- FIG. 12 is a schematic structural diagram of a lock manager for implementing fault processing in a distributed system according to an embodiment of the present invention.
- FIG. 13 is a schematic diagram of a node device for implementing fault processing in a distributed system according to an embodiment of the present invention.
- the embodiment of the present invention is directed to the problem that a lock server in a distributed system needs to silence all the lock servers and the silent period can only process the lock reaffirmation request cannot handle the lock request. It is proposed that the storage lock server takes over in each lock server of the distributed system. Relationship information, such that when a lock server fails, the lock server takes over the relationship information, determines the takeover lock server of the fault lock server, and other non-takeover lock servers (other than the switch lock server in the system) Other lock servers can handle the lock request normally, thus minimizing the scope of the lock server failure and improving the reliability of the entire system.
- the distributed system involved in the embodiment of the present invention is as shown in FIG. 2-1.
- Multiple application hosts are connected to multiple node devices through a NAS network.
- a protocol server and a lock agent are included in each node device.
- Agreement The server is a server that uses different protocols, for example, an FC server, an NFS server, an SMB server, and communicates with an upper application host through a NAS network, and the basic working principle is similar.
- the protocol server is in one-to-one correspondence with the lock agent and is located in the same node device.
- the lock agent also corresponds to the lock server, which may be located in the same node device as the protocol server and the lock agent, or may be located in another node separately. In the embodiment of the present invention, the lock server and the protocol server and the lock agent are located in one node device.
- a management node can be set up separately to control and manage each node device, and a node device can also perform management control of each node device.
- the node device that manages and controls each node device is generally a master node device, and may also be referred to as a management node device.
- the distributed system includes n node devices, and n is a natural number greater than 2.
- Each node device contains a protocol server PS and a lock agent P, respectively.
- the protocol server and the lock agent in the node device 1 are represented by PS1 and P1, respectively.
- the protocol server and lock agent in the node device 2 are represented by PS2 and P2, respectively.
- the lock server can be located in a node device with the protocol server and the lock agent, or it can be located in a different node device.
- the lock server is represented by S.
- m lock servers may be included, and m is a natural number greater than 2.
- a lock server may correspond to a plurality of lock agents in the node device, that is, the lock server S1 may correspond to the lock agent P1 in the node device 1, or may correspond to the lock agent P2 in the node device 2.
- the lock server and the protocol server and lock agent are in one node device.
- the lock server S1 and the lock agent P1 and the protocol server PS1 are located in the node device 1
- the lock server S2 and the lock agent P2 and the protocol server PS2 are located in the node device 2, and so on.
- a lock server can still correspond to multiple lock agents.
- the lock agent can share the information stored in the lock server of the node, so the lock agent can store less information in the cache.
- the lock related service of the fault lock server may be taken over by another lock server in the distributed system, and the lock related service lock of the fault lock server is taken over.
- the server is called a takeover lock server.
- a lock lock server can take over the lock related service of the fault lock server, or multiple takeover lock servers can take over the lock related service of the fault lock server.
- the description will be made with a case where the server is locked.
- the implementation principle is similar, except that an algorithm is needed to allocate the lock related services of the fault lock server among multiple takeover lock servers. It will not be described in detail here.
- the embodiment of the present invention can record the lock server takeover relationship information in two ways, one is recorded by means of a table (as shown in Table 1), and the other is determined by means of a consistent hash loop (eg, Figure 3).
- the embodiment of the present invention is described by taking four lock servers in the distributed system as an example.
- the four lock servers are respectively S1, S2, S3, and S4, and the manner of using the table to record the lock server takeover relationship information is as follows.
- Table 1 Locke Server Takeover Relationship Table
- the lock server takeover relationship record table is uniformly configured by the management node and sent to all lock servers for storage, or can be configured separately in the lock server.
- Lock server Take over the lock server S1 S2 S2 S4 S4 S3 S3 S1
- the lock server S1 when the lock server S1 fails, the lock server that has not failed in the distributed system receives the notification message, and receives the notification message according to the locally stored lock server. Manage the relationship table to determine if you are the takeover server for lock server S1. As shown in Table 1, S2 determines that it is the takeover lock server of the lock server S1, S3 determines that it is not the takeover lock server of the lock server S1, and S4 determines that it is not the takeover lock server of the lock server S1. According to the confirmation result, the lock server S2 enters the silent state. At this time, only the lock re-request request does not process the lock request, and the lock servers S3 and S4 are not silent, and the lock request can be processed normally.
- the embodiment of the present invention further provides another way of recording the lock server takeover relationship information, and determining the takeover relationship of each lock server in the distributed system according to a specific order of the consistency hash ring.
- the consistent hash algorithm can be obtained by using the consistent hash algorithm for the node name, or the consistency can be obtained by using the consistent hash algorithm for the IP address of the node device. Hash ring.
- a consistent hash loop can be obtained by using a consistent hash algorithm for the name or ID of the lock server.
- a consistent hashing loop can also be obtained by using a consistent hash algorithm according to other identifiers of the lock server.
- a consistent hash loop is obtained by using a consistent hash algorithm according to the ID of the lock server to obtain a consistent hash loop.
- the management node pre-configures the consistency hash algorithm and the determination rule of the lock server takeover relationship in each lock server. In this way, after the system is initialized, each server performs a corresponding hash calculation according to the pre-configured information, and obtains a consistent hash ring indicating the lock server takeover relationship. Since the algorithm, parameters, and range determination rules are the same, the consistency hash loops calculated by each lock server are the same. Of course, the management node may also calculate a consistent hash ring according to the pre-configured information, and then broadcast the calculated consistent hash ring to each lock server. The consistency hash ring can also be calculated by the management node and each lock server respectively. At this time, the algorithm, parameters and predetermined rules are the same, so the consistency hash ring and the determination calculated by the management node and each lock server respectively are determined. The takeover lock server is the same.
- the following is an example of performing a hash calculation on the ID of each lock server by the lock server to obtain a consistent hash loop.
- each lock server hashes the ID using a consistent hash algorithm, and clockwisely calculates the result.
- the consistency hash ring is 0-2 32.
- the calculation results can be arranged in the counterclockwise direction from small to large to obtain a consistent hash ring, as shown in Figure 3-2.
- the position of the lock server on the hash ring in the counterclockwise direction is the lock server S4, the lock server S3, the lock server S1, and the lock server S2.
- the takeover relationship of the lock server may be determined by the clockwise direction of the generated consistent hash ring to determine the takeover lock server, or the takeover lock server may be determined according to the counterclockwise principle of the generated consistent hash loop.
- the consistency hash ring is arranged clockwise by the hash calculation result of the ID of the lock server. If the lock server takeover relationship is determined in the clockwise direction of the consistency hash ring, the takeover lock server of the lock server S1 is S2, the takeover lock server of the lock server S2 is S4, and the takeover lock server of the lock server S4 is S3.
- the takeover lock server of the lock server S3 is S1. If the lock server takeover relationship is determined according to the counterclockwise direction of the consistent hash ring in FIG. 3-1, the takeover lock server of the lock server S1 is S3, and the takeover lock server of the lock server S3 is S4, the lock server The takeover lock server of S4 is S2, and the takeover lock server of lock server S2 is S1.
- the takeover lock server can also be determined in the clockwise direction of the consistent hash ring or in the counterclockwise direction of the consistent hash ring.
- the implementation principle and the upper segment The descriptions are the same and will not be exemplified here.
- the takeover relationship of the lock server can be determined according to the clockwise direction of the consistent hash loop generated by the lock server identifier or the counterclockwise direction of the consistency hash loop.
- the determination principle of each lock server takeover relationship is similar to the above method; the difference is that the consistency hash ring at this time It can be obtained by using the consistency hash algorithm according to the name of the node. It will not be described in detail here.
- the lock agent in the node device also needs to store the lock server takeover relationship information.
- the lock server management range information needs to be stored.
- the lock agent receives the lock request (for example, the lock reiterate request or the lock request)
- it determines, based on the stored lock server management scope information, which lock server the lock request should be processed by. If it is determined that the lock server that handles the lock request fails, the takeover lock server is determined according to the lock server takeover relationship information, and the lock request is sent to the takeover lock server for processing.
- the lock server takeover relationship information stored by the lock agent is the same as the lock server takeover relationship information stored in the lock server.
- the lock server takeover relationship table is used to record the lock server takeover relationship information
- the lock server takeover relationship table is uniformly configured by the management node and sent to all lock agents for storage.
- the management node may calculate the consistent hash ring and send it to each lock agent, or may configure the lock agent in advance through the management node, and each lock agent separately calculates Obtaining the same consistency hash ring, the same consistency hash ring can be calculated by the management node and each lock agent respectively.
- the consistency hash ring of the lock agent is the same as the consistency hash ring in the lock server.
- the record mode of the lock server takeover relationship information in the lock agent is the same as that of the lock server takeover relationship information in the lock server, which has been described in detail above and will not be further described herein.
- the lock server management range information in the lock agent is also recorded using a table (as shown in Table 2). It can be pre-configured by the management node and sent to each lock agent in the manner of locking the server management range record table.
- the lock server management range record table is as shown in Table 2.
- the lock agent After the lock agent receives the lock request, the identifier of the file in the lock request is hashed by the same consistent hash algorithm. To see which range the calculated result falls into, the corresponding lock server is responsible for processing.
- the lock request is a lock request
- the file carried in the lock request is (foo1.txt)
- the lock agent performs a hash calculation on (foo1.txt)
- the obtained result is 4500, which should be managed by the lock server S1
- the lock The agent sends a lock request to the lock server S1.
- the lock reaffirms that the file information carried in the request is (foo8.txt), and the lock agent performs a hash calculation on (foo8.txt), and the obtained result is 9000, which should be managed by the lock server S4, and the lock agent will lock the request.
- the lock server management scope information and the lock server takeover relationship information may be embodied in a table (for example, Table 3, lock server information table), or may be stored as lock server takeover relationship tables and locks as shown in Table 1 and Table 2, respectively.
- Server management scope record table for example, Table 3, lock server information table
- Lock server Lock server management scope Take over the lock server S1 1024-5000 S2 S2* 5000-8000 S4 S4 8000-512 S3 S3 512-1024 S1
- the lock agent When a lock server fails, the lock agent identifies the failed lock server in the lock server management scope record table, the lock server takeover relationship table, and/or the lock server information table as a failure. After the lock agent receives the lock request, hashing the unique identifier of the file carried in the lock request Calculate, according to the lock server management scope record table or the lock server information table to determine which lock server falls within the management scope of the lock server, if the determined lock server is in a fault state, the lock agent then takes over the relationship table or locks the server information according to the lock server. The table determines the takeover lock server of the fault lock server, and sends the lock request to the takeover lock server for processing.
- the lock server S2 is faulty.
- the lock agent receives the lock and the file information carried in the request is (foo5.txt), and the lock agent performs a hash calculation on (foo5.txt), and the obtained result is 7000.
- the lock server S2 should be responsible for deal with.
- the lock server S2 is currently in a fault state.
- the takeover lock server of the fault lock server S2 is the lock server S4, so the lock agent sends the lock reiterate request to the takeover lock server S4 for processing.
- the lock agent After reaching the predetermined time or receiving the second notification message of the management node, the lock agent needs to update the stored lock server takeover relationship information and the lock server management scope information, so that the updated lock server takes over the relationship information and the lock server management scope information. Does not include information about the failsafe server.
- the second notification message is used to notify the lock agent to update the lock server takeover relationship information and the lock server management scope information, and the second notification message carries the information of the fault lock server.
- the updated lock server takeover relationship information and the lock server management range information may also be sent by the management node to each lock agent.
- Table 3 the updated lock server information table is shown in Table 4:
- Lock server Lock server management scope Take over the lock server S1 1024-5000 S4 S4 5000-512 S3 S3 512-1024 S1
- a consistent hash loop can be used to determine each The scope of management of the lock server.
- information such as the same consistency hash algorithm, the determination rule of the lock server management scope, and the like may be configured in each lock agent.
- the lock agent calculates a consistent hash loop according to the configured consistency hash algorithm, the determination rule of the lock server management scope, and the like.
- the information such as the consistency hash algorithm configured in the lock proxy and the determination rule of the lock server management scope are consistent with the related information configured in the lock server, so that the consistent hash loop calculated in the lock proxy is consistent with that in the lock server.
- the sexual hash ring is the same.
- the lock agent can determine which lock server handles the lock request according to the consistency hash ring, and send the lock request to the determined lock server for processing. If the determined lock server fails, the lock agent can also confirm the takeover lock server of the fault lock server according to the consistency hash ring, and send the lock request to the takeover lock server for processing.
- the consistency hash ring is calculated by the management node, it is broadcast to each lock agent and each lock server.
- the consistent hash ring shown in Figure 3-1 is generated in a clockwise direction, so the lock server takeover relationship can be determined in the clockwise direction of the consistent hash ring, while the lock server management scope Also follow the clockwise direction of the consistent hash ring.
- the takeover lock server of the lock server S1 is S2
- the takeover lock server of the lock server S2 is S4
- the takeover lock server of the lock server S4 is S3
- the takeover lock server of the lock server S3 is S1.
- the range (1024-5000) between the lock server S3 and the lock server S1 is the management range of the lock server S1, that is, the management of the lock server S1 in the clockwise direction between the lock server S3 and the lock server S1.
- the range (5000-8000) between the lock server S1 and the lock server S2 is the management scope of the lock server S2
- the range (8000-512) between the lock server S2 and the lock server S4 is the management scope of the lock server S4. Therefore, the range (512-1024) between the lock server S4 and the lock server S3 is the management range of the lock server S3.
- the lock server takeover relationship can also be determined in the counterclockwise direction of the consistent hash ring, and the lock server management scope is also consistent.
- the anti-clockwise direction of the Greek ring is determined.
- the method of determining and determining the lock server takeover relationship in the clockwise direction of the consistency hash ring is the same and will not be described separately.
- the protocol server sends a lock request to the lock agent, and the lock agent performs a hash calculation on the unique identifier of the file (for example, FSID, FID), and determines the file according to the calculation result. Which range it belongs to, the lock request is sent to the lock server that manages the scope for corresponding processing.
- the hash algorithm that hashes the unique identifier of a file needs to be the same as the hash algorithm that generates a consistent hash loop.
- the file information carried in the lock request is (foo2.txt)
- the lock agent hashes the file information (foo2.txt)
- the result is 6500. We can see that it falls into the consistency hash ring.
- the lock agent When the lock server S2 fails, the lock agent identifies the lock server S1 in the consistency hash ring as a failure. At this time, after receiving the lock request, the lock agent performs hash calculation on the file information carried in the lock request (foo3.txt), and the result is 7500, and the lock server S1 and the lock server S2 that fall on the consistency hash ring are obtained.
- the range is managed by the lock server S2, but since the lock server S2 is in a fault state, according to the consistency hash ring, the takeover lock server of the lock server S2 is the lock server S4, and therefore, the lock agent sends the lock request to the lock server S4. management.
- the lock agent After reaching the predetermined time or receiving the second notification message of the management node, the lock agent needs to update the stored lock server takeover relationship information and the lock server management scope information, so that the updated lock server takes over the relationship information and the lock server management scope information. Does not include information about the failsafe server.
- the updated lock server takeover relationship information and the lock server management range information may also be sent by the management node to each lock agent and lock server.
- the updated consistency hash ring is as shown in Figure 3-3.
- the management scope of each lock server is updated as follows: the management scope of the lock server S4 is (5000-512), and the management scope of the lock server S3 is (512). -1024), the management scope of the lock server S1 is (1024-5000).
- the takeover lock server of the lock server S1 is S4, the takeover lock server of the lock server S4 is S3, and the takeover lock server of the lock server S3 is S1.
- the consistency hash ring is generated according to the counterclockwise direction, then the lock server takeover relationship can be determined counterclockwise according to the consistency hash ring. At this time, the lock server management range also needs to be pressed.
- the counterclockwise direction of the consistent hash ring is determined. For example, as shown in FIG. 3-2, the takeover lock server of the lock server S1 is S2, the takeover lock server of the lock server S2 is S4, the takeover lock server of the lock server S4 is S3, and the takeover lock server of the lock server S3 is S1. .
- the range (5000-8000) between the lock server S1 and the lock server S2 is managed by the lock server S2
- the range (8000-512) between the lock server S2 and the lock server S4 is managed by the lock server S4, and the lock server S4 and the lock server are operated.
- the range (512-1024) between S3 is managed by the lock server S3
- the range (1024-5000) between the lock server S3 and the lock server S1 is managed by the lock server S1.
- the lock server takeover relationship is determined according to the consistency hash ring shown in FIG. 3-2, and can also be determined in a clockwise direction according to the consistency hash ring. At this time, the lock server management range also needs to be a consistent hash ring. The clockwise direction is determined. The method of determining and determining the lock server takeover relationship in the counterclockwise direction of the consistency hash ring is the same and will not be described separately here.
- the method for obtaining a consistent hash ring by using the consistency hash algorithm according to the name of the node or the lock server ID may be an existing technology, and details are not described herein again.
- the management node device broadcasts a third notification message in the distributed system, and the third notification message is used to notify the lock server to update the locally stored consistency hash ring, the third notification.
- the message carries the information of the newly added lock server.
- the lock server and the lock agent in the distributed system update the locally stored lock server takeover relationship information or the lock server management scope information according to the information of the newly added lock server carried in the third notification message.
- the lock server takeover relationship information or the lock server management scope information may be updated by the management node device and then sent to the lock server and the lock agent in the distributed system.
- the management node when adding a lock server, the management node needs to take over the lock server. Relationship or lock server management scope information for reconfiguration updates.
- the management node can send the updated form to each lock server and lock agent, or send Sending a third notification message to each lock server and lock agent (ie, broadcasting the third notification message in a distributed system), notifying the lock server and the lock agent to update.
- the update rule can be set according to the information of the user, the system load, the traffic, and the like, and is not limited in the embodiment of the present invention.
- the updated lock server takeover relationship information and the lock server management range information are as shown in Table 4.
- Lock server Lock server management scope Take over the lock server S1 1024-4000 S2 S2 4000-7000 S5 S5 7000-9000 S4 S4 9000-512 S3 S3 512-1024 S1
- the lock server takeover relationship information or the lock server management scope information is determined by the consistency hash ring
- a new lock server when a new lock server is added, after the management node detects it, it will carry the ID of the new lock server (or other for The third notification message that calculates the information of the consistent hash ring is broadcast in the distributed system, and after the lock server and the lock agent receive the third notification message, update the local according to the information carried in the third notification message.
- Consistent hash ring After the lock server (including the newly added lock server) and the lock agent receive the third notification message, perform hash calculation on the ID of the lock server carried in the notification message, and locally according to the result of the hash calculation. The calculation result of the newly added lock server is added to the consistency hash ring, and the management scope and takeover relationship of the adjacent lock server in the consistency hash ring are updated according to the determined rules.
- the consistency hash ring shown in Figure 3-1 is calculated based on the ID of the lock server and the clockwise principle.
- a lock server such as S5
- the management node locks the server S5.
- the ID is carried in a third notification message and sent to a lock server in the distributed system.
- Lock service The ID of the lock server S5 is hashed by the consistency hash algorithm. If the result of the calculation is 3,000, the updated consistency hash ring is as shown in Figure 3-5, and the newly added lock is added.
- the server S5 is between the lock server S3 and the lock server S1 in the consistency hash ring.
- the takeover lock server of the lock server S3 is changed to the lock server S5
- the takeover lock server of the lock server S5 is the lock server S1
- the management scope of the lock server S1 is changed to (3000-5000)
- the management scope of the lock server S5 is ( 1024-3000).
- the takeover relationship and management scope of other lock servers in the consistency hash ring are unchanged.
- the embodiment of the invention provides a method for processing a lock server fault in a distributed system, and the flow thereof is shown in FIG. 4-2.
- the method is applied to the distributed system shown in Figure 4-1.
- the distributed system of the embodiment of the present invention there are i lock servers, wherein i is a natural number greater than 2.
- the lock servers are respectively identified as lock server S1, lock server S2, lock server S3... lock server Si.
- Step 401 The lock server that does not fail in the distributed system receives the first notification message, where the first notification message carries information that the first lock server is faulty.
- the management node device detects that the lock server has failed, and broadcasts a first notification message to the lock server in the distributed system, where the first notification message carries the fault lock server S1. The information that the failure occurred.
- the lock server that has not failed in the distributed system can receive the first notification message.
- Step 403 After receiving the first notification message, the second lock server in the distributed system determines that it is the takeover lock server of the first lock server according to the locally stored lock server takeover relationship information, and acts as the takeover lock server. The second lock server enters a silent state.
- All lock servers in a distributed system store lock server takeover relationship information locally. After receiving the first notification message, the lock server S2 takes over the relationship according to the locally stored lock server. Information, to determine that it is the takeover lock server of the fault lock server S1.
- the lock server takeover relationship information records the takeover relationship between the lock servers in the distributed system, that is, when a lock server fails, the takeover lock server takes over the service of the lock server that has failed.
- the takeover lock server of the lock server S1 is the lock server S2, so that when the lock server S1 fails, its corresponding service is taken over by the lock server S2, and the lock server S2 is responsible for processing the lock server S1.
- the original business is the lock server S2
- the takeover lock server When the takeover lock server enters the silent state, it can enter the silent state by starting the silent timer.
- the takeover server may exit the silent state when the silent timer ends.
- the management node may also notify the exiting of the silent state.
- the silent state may be exited after the lock reaffirmation request is processed, which is not limited in the embodiment of the present invention.
- Step 405 After receiving the first notification message, the third lock server in the distributed system determines that it is not the takeover lock server of the first lock server according to the locally stored lock server takeover relationship information.
- the lock server S3 determines that it is not the takeover lock server of the fault lock server S1 according to the locally stored lock server takeover relationship information.
- the lock server S4 determines that it is not the takeover lock server of the fault lock server S1 according to the locally stored lock server takeover relationship information.
- the lock server Si determines that it is not the takeover lock server of the fault lock server S1 according to the locally stored lock server takeover relationship information.
- a takeover lock server that is not a failover server can be collectively referred to as a non-takeover lock server.
- a lock server fails, the service that takes over the failed lock server is taken over by the takeover lock server, and the service of the non-takeover lock server is not affected, and the service can be processed normally.
- the lock server takeover information may be recorded in the manner that the lock server takes over the information table, or the consistency hash algorithm may be used to hash the ID of the lock server, and the consistency hash ring of the lock server is obtained according to a certain rule, according to Consistent hash loops to determine the takeover relationship between lock servers.
- the related content has been described in detail in the foregoing, and will not be described again here.
- the lock server S2 is the takeover lock server of the fault lock server S1.
- the other lock servers are non-takeover lock servers with respect to the fault lock server S1.
- Step 407 When the third lock server that is the non-takeover lock server receives the lock request, the lock authority information is allocated according to the lock request.
- the lock authority information is allocated according to the lock request.
- the lock server S3 is not the takeover lock server of the fault lock server S1, that is, the lock server S3 is the non-takeover lock server of the fault lock server S1, and is not silent. Therefore, when the lock server S3 receives the lock request, the lock authority information is allocated according to the lock request.
- the lock server S4 is a non-takeover lock server of the fail lock server S1, and can handle the lock request.
- the non-takeover lock server of the fault lock server does not need to be silent. Therefore, when the non-takeover lock server receives the lock request, the lock authority information is allocated according to the lock request.
- the processing after the non-takeover lock server receives the lock request is the same as the prior art, for example, checking whether there is a mutually exclusive lock request in the lock permission information table, thereby determining the assigned lock authority information, where No longer.
- the affected range is minimized, and the service of the non-takeover lock server of the fault lock server is not affected, and the non-receive lock server is not affected.
- the lock request can be processed normally, which avoids the problem of service interruption and system instability of the distributed system.
- Step 409 After receiving the lock reassurance request, the second lock server, which is the takeover lock server, returns the corresponding lock authority information according to the lock authority information table.
- the takeover lock server When the takeover lock server is in a silent state, it can handle the lock reiterate request.
- the takeover lock server S2 after receiving the lock reassuring request, the takeover lock server S2 returns the corresponding lock authority information according to the lock authority information table.
- NFS V3 after the takeover lock server lock reiterates the request, it returns an OK response message according to the lock permission information.
- the lock permission information table stores information such as the accessed file information, the authorized lock permission information, the accessed client information, and the like, and the processing of the lock reaffirmation request can be implemented by using the existing implementation scheme, which is no longer in the embodiment of the present invention.
- the lock permission information table may be stored in a shared storage, and each lock server may access the obtained information.
- a lock information table may also be stored locally in each lock server, and the management node is responsible for information synchronization, which is no longer in the embodiment of the present invention.
- the lock reaffirmation request is sent by the client, and the specific triggering process is the same as the current implementation, and will not be described in detail in the embodiment of the present invention.
- Step 411 After receiving the lock request, the second lock server as the takeover lock server returns a response message of the rejection.
- the takeover lock server When the takeover lock server is in a silent state, it can only handle the lock resubmission request and cannot handle the lock request. In the embodiment of the present invention, when the takeover lock server S2 is in the silent state, if the lock request is received, the rejected response message is returned. Of course, after the takeover lock server satisfies certain conditions and exits the silent state, it can process the lock request. Processing at this time and non-takeover lock server processing The lock request is in the same way and will not be described in detail here.
- the distributed system also includes a protocol server and a lock agent.
- the protocol server and the lock agent are in one-to-one correspondence, and the corresponding protocol server and lock agent are located in one node device.
- the lock server can be associated with multiple lock agents, and can be located in the same node device as the protocol server and the lock agent, or can be set in other node devices.
- the embodiment of the present invention further provides a method for processing a lock server fault in a distributed system, and the applicable system is as shown in FIG. 5-1, and the flow thereof is as shown in FIG. 5-2.
- Each node device includes a protocol server, a lock agent, and a lock server.
- the protocol server PS1, the lock agent P1, the lock server S1 are in the node device 1, the protocol server PS2, the lock agent P2, the lock server S2 are in the node device 2, and so on.
- the protocol server has a one-to-one correspondence with the lock agent, that is, the protocol server PS1 sends the lock request to the lock agent P1; the protocol server PS2 sends the lock request to the lock agent P2; and so on.
- the lock server can correspond to multiple lock agents, that is, the lock server S1 can receive the lock request sent by the lock agent P1, and can also receive the lock request sent by other lock agents (such as the lock agent P2).
- Step 501 After receiving the lock request, the protocol server sends the lock request to the corresponding lock proxy, and the lock request may be a lock reiterate request or a lock request.
- the protocol server is connected to an external protocol client for receiving the request of the protocol client and returning the processing result to the protocol client.
- the protocol server may be an NFS (Network File System) protocol server, or may be an SMB (Server Message Block) protocol server, which is not limited in the embodiment of the present invention.
- NFS Network File System
- SMB Server Message Block
- the protocol server After receiving the lock request sent by the protocol client, the protocol server performs protocol conversion processing, converts the received lock request into a lock request of a DLM (Distributed Lock Manager), and sends the lock request. Give the corresponding lock agent.
- the protocol lock server and the corresponding lock agent are located in one node device, and therefore, the protocol server sends the converted lock request to the lock agent in the node. For example, after receiving the lock request, the protocol server PS2 sends the converted lock request to the lock agent P2.
- the protocol server in the distributed system since the lock server in the distributed system enters a silent state when a lock server fails, the protocol server only sends a lock reiterate request to the corresponding lock proxy after receiving the lock request; The lock request, the protocol server will directly return the rejected response message. That is to say, in the distributed system in the prior art, when a lock server fails, the lock server in the distributed system enters a silent state, and the distributed system cannot handle the lock request and affect the distributed system service. The normal operation greatly affects the stability of the distributed system.
- the protocol server After receiving the lock request (including the lock request and the lock reiterate request), the protocol server sends the lock request to the corresponding lock agent, and the lock agent sends it to the lock server responsible for processing the lock request according to a certain rule.
- the takeover lock server in the silent state cannot handle the lock request, and other non-takeover lock servers can handle the lock request, and the lock reiterate request can be processed. Therefore, most of the lock requests in the distributed system can be processed, greatly improving the stability of the distributed system.
- Step 503 After receiving the lock request, the lock agent determines a lock server that processes the lock request according to the locally stored lock server management scope information; and sends the received lock request to the determined lock server.
- the lock server management scope information is stored locally in each lock proxy. After the lock proxy receives the lock request sent by the protocol server, the lock proxy determines the lock server responsible for processing the lock request according to the locally stored lock server management scope information. The lock request is sent to the determined lock server for processing.
- the lock generation After receiving the lock request, the file information (such as the file ID) carried in the lock request is hashed, and the calculated result falls into the management scope of the lock server.
- the lock server is the lock server responsible for processing the lock request.
- the lock agent sends the lock request to the determined lock server for processing.
- the lock proxy When the lock server management scope information is determined by the consistency hash ring, after receiving the lock request, the lock proxy performs hash calculation on the file information (such as the file ID) carried in the lock request, and the calculated result falls into which lock server.
- the scope of management, this lock server is the lock server responsible for handling the lock request, the lock agent sends the lock request to the determined lock server for processing.
- the hash algorithm for hashing the file information carried in the lock request is the same as the consistent hash algorithm for generating a consistent hash loop.
- the management node when the management node finds that the lock server is faulty, the first notification message is broadcasted in the distributed system, and the first notification message carries the information of the failed lock server.
- the lock agent and the lock server both receive the first notification message, and after receiving the first notification message, the lock agent marks the faulty lock server in the locally stored lock server management scope information as a fault. Therefore, after the lock agent receives the lock request, the lock server responsible for processing the lock request determined according to the lock server management scope information may be a lock server that has failed. At this time, the lock agent determines the takeover lock server of the fault lock server according to the stored server takeover relationship information, and the determining method is as described in step 505.
- Step 505 After receiving the lock request, the lock agent determines, according to the locally stored lock server management scope information, a lock server that processes the lock request; when the lock server manages the range information, the lock server that processes the lock request is faulty. In the state, the lock agent determines the takeover lock server of the lock server in the fault state according to the locally stored lock server takeover relationship information; and sends the received lock request to the takeover lock server.
- the lock server takeover relationship information is also recorded using the lock server takeover relationship table.
- the lock server management scope information and the lock server takeover relationship information can be recorded by a lock server information table.
- the lock server information table is shown in Table 3. The table records the management scope of each lock server and the connection between them. It can be pre-configured according to the needs of users.
- the lock server information table and the update status have been described in detail before, and will not be described again.
- the lock server takeover relationship information is also determined by the same consistency hash ring. As described above, if the lock server management range is determined in the clockwise direction of the consistency hash ring, the lock server takeover relationship is also determined in the clockwise direction of the consistency hash ring. Similarly, if the lock server management scope letter is determined in the counterclockwise direction of the consistency hash ring, the lock server takeover relationship is also determined in the counterclockwise direction of the consistency hash ring.
- the range between the lock server S4 and the lock server S1 is managed by the lock server S1, and the range between the lock server S1 and the lock server S2 is managed by the lock server S2, and the lock server S2 and the lock server S3 are operated.
- the range between the lock server S3 is managed by the lock server S3, the range between the lock server S3 and the lock server S4 is managed by the lock server S4;
- the takeover lock server of the lock server S1 is the lock server S2, and the takeover lock server of the lock server S2 is the lock server S3.
- the takeover lock server of the lock server S3 is the lock server S4, and the takeover lock server of the lock server S4 is the lock server S1.
- the protocol server may receive a lock request (for example, a lock reiterate request or a lock request), and send the received lock request to the lock proxy. .
- the lock agent determines the lock server according to the stored lock server management scope information, and sends the lock request to the lock server for processing. If the determined lock server is a failed lock server, the lock agent also needs to determine the takeover lock server of the failed lock server according to the lock server takeover relationship information, and send the lock request to the takeover lock server for processing.
- the takeover lock server can only handle lock reconciliation requests during silent periods and cannot handle lock requests. In this way, the scope of the failure of the lock server is minimized, so that most lock requests can be processed in time to prevent service interruption caused by the lock request not being processed in time or the system crash due to business conflicts. Improve the stability of distributed systems.
- the lock server fault processing method in the distributed system may further include the following steps:
- Step 507 After receiving the first notification message, the lock server that has not failed in the distributed system identifies the first lock server in the locally stored consistency hash ring as a fault state; after reaching the predetermined time, The failed lock server updates the consistent hash ring stored locally, and the first lock server is not included in the updated consistent hash ring.
- the lock server that has not failed in the distributed system receives the first notification message sent by the management node, and the first notification message carries the identifier of the failed lock server; After the first notification message, the non-failed lock server identifies the fail-lock server in the locally stored consistent hash ring as a fault condition.
- a lock server that has not failed can also start a timer. After the scheduled time has elapsed, the non-failed lock server updates the locally stored consistency hash ring, and the updated consistency hash ring does not include the fail lock server.
- the lock server that has not failed in the distributed system may also update the locally stored consistency hash ring after receiving the notification from the management node. Similarly, the updated consistency hash ring does not include the fault.
- the server is locked, and the specific method is as described in step 509.
- Step 509 The non-failed lock server receives a second notification message, where the second notification message is used to notify the lock server to update the locally stored consistency hash ring, where the second notification message carries the fault lock server. Information; a lock server that has not failed updates a locally stored consistent hash ring that is not included in the updated consistent hash ring.
- Step 511 After the lock agent receives the first notification message, identify the fault lock server in the locally stored consistency hash ring as a fault state; after reaching the predetermined time, update the locally stored consistency hash ring. The first lock server is not included in the updated consistency hash ring.
- Step 511 and step 507 or 509 do not have a strict sequence, and step 511 can be performed in parallel with step 507 or 509.
- the embodiment of the present invention may further include the following steps:
- Step 513 The lock server that has not failed receives the third notification message, and updates the locally stored consistency hash ring.
- the updated consistency hash ring includes the newly added lock server.
- the third notification is included. The message is used to notify the lock server to update the locally stored consistency hash ring, and the third notification message carries the information newly added to the lock server.
- Step 515 The lock agent receives the third notification message; updates the locally stored consistency hash ring, and the updated consistency hash ring includes the newly added lock server.
- Steps 513 and 515 can be performed in parallel without strict prioritization.
- the embodiment of the present invention provides a method for processing a lock server fault in another distributed system, and the flow thereof is as shown in FIG. 6-2. The method is applied to the distributed system shown in Figures 6-1.
- node devices there are four node devices, each of which includes a protocol server, a lock agent, and a lock server.
- the protocol server PS1, the lock agent P1, the lock server S1 are in the node device 1
- the protocol server PS2, the lock agent P2, the lock server S2 are in the node device 2, and so on.
- the protocol server has a one-to-one correspondence with the lock agent in the node, and the lock agent can correspond to multiple lock servers.
- the lock server can also be separately set in another node device, for example, four lock servers are disposed in the node device 5 (not shown).
- the lock server takeover relationship information and the lock server management range information are all determined in accordance with the clockwise direction of the consistency hash ring.
- Both the lock agent and the lock server store a consistent hash ring locally, and the stored consistent hash ring is the same.
- the consistent hash ring in the lock agent P1 is represented by HP1
- the consistent hash ring in the lock server S1 is represented by HS1, and so on.
- the consistency hash ring in this embodiment is calculated according to the name of the node device. As described above, the consistency hash ring can also be calculated according to the ID of the lock server or the IP address of the node device, and details are not described herein.
- the consistency hash ring and its update have been described in detail in the foregoing and will not be described here.
- Step 601 The lock server that has not failed receives the first notification message, where the first notification message is included. Carrying information that the lock server S1 has failed.
- the management node in the distributed system detects that the lock server S1 is faulty, it broadcasts a first notification message to the lock agent and the lock server in the distributed system, notifying the lock agent and the lock server S1 that a failure has occurred.
- the management node may be located in one of the node devices, or a node device may be separately set to perform the management function of the distributed system. The location of the management node does not affect the implementation of the embodiment, and thus the location of the management node is not limited, and is not attached. This is shown in Figure 6-1.
- the management node broadcasts the first notification message in a distributed system, and the lock agent in the distributed system and the lock server that has not failed receive the first notification message.
- Step 602 The lock server that has not failed determines whether it is the takeover lock server of the fault lock server S1 according to the local consistency hash ring.
- the lock server S2 determines that it is the takeover lock server of the fault lock server S1 according to the local consistency hash ring HS2.
- the lock server S3 determines that it is not the takeover lock server of the fail lock server S1 according to the local caustic hash ring HS3;
- the lock server S4 determines that it is not the takeover lock server of the fail lock server S1 according to the local consistency hash ring HS4.
- the lock server S3 and the lock server S4 are both referred to as a non-takeover lock server in this embodiment, and the lock server S2 is referred to as a takeover lock server.
- the lock server S1 is a faulty server.
- Step 603 The lock server that has not failed identifies the lock server S1 in the locally stored consistency ring as a fault state.
- the lock server S2 may also identify the lock server S1 in the locally stored consistency ring HS2 as a fault state.
- the lock server S2 can also start a timer, and after the timer ends, the local consistency ring HS2 is updated.
- the lock server S3 and the lock server S4 perform the same operation.
- Step 603 and step 602 have no strict sequence.
- the lock server that has not failed may first identify the lock server S1 of the locally stored consistency hash ring as a fault state, or may first determine according to the locally stored consistency hash ring. Whether it is the takeover lock server of the fault lock server S1.
- Step 604 The lock server S2 determines that it is the takeover lock server of the fault lock server S1, and enters a silent state.
- the takeover lock server that enters the silent state can only process the lock reiterate request and cannot process Lock request.
- the lock server S2 can also start a silent timer. When the silent timer ends, the lock server S2 exits the silent state.
- the lock server S3 or the lock server S4 confirms that it is not the takeover lock server of the fail lock server S1, and keeps the current state unchanged.
- the non-takeover lock server can handle the lock request normally.
- the lock server S2 After the lock server S2 determines that it is the takeover lock server of the fault lock server S1, it can return a response message to the management node, and notify the management node that it is the takeover lock server of the fault lock server S1.
- the management node can also store the same consistent hash ring locally.
- the management node detects that the lock server S1 is faulty, it confirms that the lock server S2 is the takeover lock of the fault lock server S1 according to the locally stored consistency hash ring. server.
- the lock server in the distributed system fails, only the takeover lock server of the fault lock server needs to be silent, and the other non-takeover lock servers work normally, so that other non-takeover lock servers are working.
- the lock request can be processed to minimize the impact range and improve the reliability of the distributed system.
- Step 605 The lock agent receives the first notification message, and identifies the lock server S1 in the locally stored consistency ring as a fault state.
- the lock agent in the distributed system also receives the first notification message due to the first notification message broadcast by the management node in the distributed system.
- the lock agent After receiving the first notification message, the lock agent identifies the lock server S1 in the locally stored consistency ring as a fault state.
- the lock agent P2 in the distributed system identifies the lock server S1 in the locally stored consistency ring HP2 as a fault state.
- the lock agent can also start a timer, and after the timer ends, update the consistency ring of the local storage.
- the duration of the timer here is equal to the duration of the timer in step 603, and may be slightly longer or equal to the duration of the silence timer in step 604.
- step 605 and the previous steps are not strictly sequential. In general, the operation of the lock agent and the operation of the lock server can be performed simultaneously.
- Step 606 After receiving the lock request, the protocol server sends the lock request to the corresponding lock agent, and the protocol server and the corresponding lock agent are located in the same node device.
- the lock request may be a lock reiterate request or a lock request. Since the distributed lock system only silently takes over the lock server, the other lock servers can work normally. Therefore, after receiving the lock request, the protocol server sends the lock request to the corresponding lock agent in the node device. For example, after receiving the lock request, the protocol server PS1 sends a lock request to the lock agent P1.
- the lock request may be sent by the client to the protocol server through the NAS network, and the format of the lock is the same as that of the existing implementation, and details are not described herein.
- the lock reaffirmation request may be to initiate a lock reassortment request to the protocol server after the client receives the notification from the protocol server.
- the protocol server can notify the specific client to initiate the lock re-request.
- the request can be made in the prior art, and the format is the same as the existing implementation, which is not described in the embodiment of the present invention.
- Step 607 After receiving the lock request, the lock agent confirms which lock server should be handled by the lock server according to the locally stored consistency hash ring, and sends the lock request to the determined lock for processing the lock request. server.
- the lock request in this step may be a lock request or a lock reiterate request.
- the lock agent After receiving the lock request, the lock agent hashes the unique identifier (FSID, FID) of the file carried in the lock request, and determines which lock is located according to the calculated result in the locally stored consistency hash ring.
- the server is responsible for processing, sending a lock request to the lock server that acknowledges the lock request. If it is determined that the lock server processing the lock request is in a fault state in the consistency hash ring, it is also necessary to determine a takeover lock server of the lock server that processes the lock request according to the consistency hash ring, and the lock agent will The lock request is sent to the takeover lock server for processing.
- the lock agent P1 receives the lock request sent by the protocol server PS1
- the file identifier in the lock request is calculated by using a consistent hash algorithm, and the obtained result falls within the range between the lock server S4 and the lock server S1, and the lock is obtained.
- the proxy P1 determines that the received lock request is processed by the lock server S1 according to the timely direction of the locally stored consistency hash ring, but the lock server S1 is faulty in the locally stored consistency hash ring. State, therefore, the lock agent P1 determines that the lock server S2 is the takeover lock server of the lock server S1 according to the clockwise direction of the locally stored consistency hash ring, and the lock agent P1 sends the lock request to the lock server S2.
- Step 608 After receiving the lock request, the lock server S2 performs corresponding processing. If the lock server S2 receives the lock reassortment request, according to the information recorded in the lock privilege information table, the lock privilege information corresponding to the unique identifier of the file carried in the lock reassurance request is returned. When the NFS v3 lock server S2 receives the lock reassurance request, it returns an OK response message according to the information recorded in the lock authority information table.
- the lock server S2 may also store the corresponding relationship between the assigned new lock authority information and the file unique identifier carried in the lock request in the lock authority information table, for the subsequent lock reiterate request. Processing.
- the lock permission information table can be uniformly managed by the management node, and then sent to each lock server after being updated. It can also be stored uniformly by the management node, and the required information can be obtained from the management node when the lock server needs it.
- the lock permission information table can also be independently stored and managed by each lock server, so that when a lock server allocates new lock authority information, it also needs to notify other lock servers to update their stored lock rights information table.
- the lock permission information is returned by the lock server to the lock agent that sends the lock request, and then returned by the lock agent to the protocol server in the node, and the protocol server returns to the client that initiated the lock request.
- the lock server S2 returns the lock authority information to the lock agent P1
- the lock agent P1 returns the lock authority information to the protocol server PS1
- the protocol server PS1 returns the lock authority information to the client that initiated the lock request.
- Step 609 After receiving the lock request, the non-takeover lock server performs corresponding processing. For example, when the lock request received by the lock server S3 is received, the lock request is assigned a new lock right information and returned to the client. Similarly, the lock server S3 can also carry the assigned new lock rights information and the lock request. The corresponding relationship of the unique identifier of the file is stored in the lock permission information table, and is used for the subsequent lock reconciliation request processing.
- the lock server S3 at this time is not the takeover lock server of the fail lock server S1, and does not enter the silent state, and can process the lock request.
- the lock server S3 may be the takeover lock server of the other fault lock servers. Therefore, when the lock server S3 receives the lock request, it also needs to check whether it is in the silent state. If it is in the silent state, the processing mode is the same as step 608, that is, the lock request cannot be processed, and the rejected response message is directly returned. In this step, in order to make the different handling modes of the takeover lock server and the non-takeover lock server of the fault lock server clearer, the step of the non-takeover server to check whether it is in the silent state is omitted.
- Step 610 The lock server that has not failed updates the locally stored consistency hash ring.
- the second notification message may be sent by the management node, where the second notification message is used to notify the lock server to update the locally stored consistency hash ring, where the second notification message carries the information of the fault lock server.
- the management node may send an update notification after the lock permission information of the fault lock server is reiterated, or may send an update notification after detecting the fault lock server for a period of time.
- the consistency hash ring is obtained by hash calculation according to the name of the node device where the lock server is located, and determining the takeover lock of the fault lock server according to the clockwise direction of the consistency hash ring.
- the server when the lock server fails, updates the consistency hash ring only affects the lock server adjacent to the fault lock server on the consistent hash ring, and the scope of influence is small.
- Step 611 The lock agent updates the consistency hash ring. After the predetermined time has elapsed, for example, after the timer expires, the lock agent updates the locally stored consistency hash ring, and the updated consistency hash ring does not include the fault lock server.
- the update notification can also be sent by the management node, and the lock agent receives the update pass. After that, the consistency hash ring of the local storage is updated, and the faulty lock server is not included in the updated consistency hash ring.
- the method embodiment may further include:
- Step 612 The lock server that has not failed receives the third notification message and updates the locally stored consistency hash ring, and the updated consistency hash ring includes the newly added lock server; the third notification message is used for The notification lock server updates the locally stored consistency hash ring, and the third notification message carries information newly added to the lock server.
- the management node detects that a new lock server joins, it notifies the non-failed lock server in the distributed system to update the locally stored consistency hash ring, and the updated consistency hash ring includes the newly added lock server.
- the consistency hash ring is obtained by hash calculation according to the name of the node device where the lock server is located, and determining the takeover lock of the fault lock server according to the clockwise direction of the consistency hash ring.
- the server so when newly added to the lock server, it only affects the lock server adjacent to the newly added lock server on the consistent hash ring, and the scope of influence is small.
- Step 613 The lock agent receives the third notification message and updates the locally stored consistency hash ring.
- the management node detects that a new lock server is added, it notifies the lock agent in the distributed system to update the consistent hash ring of the local storage, and the updated consistency hash ring includes the newly added lock server.
- steps 612 and 613 There is no strict sequence requirement between steps 612 and 613, which can be performed simultaneously. In addition, there is no strict sequence between steps 612 and 613 and the aforementioned steps 601-611.
- the embodiments of the present invention are merely illustrative and are not strictly limited to the flow of the method of the present invention.
- a lock server in a distributed system fails, only the takeover lock server determined by the specific direction of the consistent hash ring needs to be silent, and other distributed systems
- the non-takeover lock server does not need to be silent, and can handle the lock request normally, so that most of the lock requests received by the distributed system can be processed, and the lock service will be handled.
- the scope of the failure of the server is minimized to improve the reliability of the distributed system.
- the embodiment of the invention further provides a distributed system for handling a fault of a lock server, the structure of which is shown in FIG.
- the distributed system includes four lock servers.
- the lock server is respectively identified as a lock server S1, a lock server S2, a lock server S3, and a lock server S4.
- Each lock server stores a lock server takeover relationship information locally.
- the lock server that does not fail in the distributed system receives the first notification message, where the first notification message carries information that the first lock server is faulty;
- the second lock server in the distributed system determines that it is the takeover lock server of the first lock server according to the locally stored lock server takeover relationship information; the takeover lock server enters silence status.
- the third lock server in the distributed system determines that it is not the takeover lock server of the first lock server according to the locally stored lock server takeover relationship information; the third lock server receives When the lock request is reached, the lock authority information is allocated according to the lock request.
- the second lock server that is the takeover lock server receives the lock reassurance request, returns the corresponding lock authority information according to the lock authority information table; if the second lock server receives the lock request, returns the rejected response message. .
- the takeover lock server entering the silent state may exit the silent state after receiving the notification of the management node, or may exit the silent state after the predetermined time.
- the lock server that has not failed in the distributed system may also store the lock server management range information of the local storage or the fault in the lock server takeover relationship information.
- the lock server is identified as a fault state, and after the set time or after receiving the second notification message of the management node, updating the locally stored lock server management scope information or the lock server takeover relationship information, and the updated lock server management scope information Or the lock server takeover relationship information does not include the information of the fault lock server.
- the lock server that has not failed in the distributed system may also update the locally stored consistency hash ring after receiving the third notification message sent by the management node, and the updated consistency hash ring includes the newly added lock.
- the third notification message is used to notify the lock server to update the locally stored consistency hash ring, where the third notification message carries information newly added to the lock server.
- the distributed system further includes four distributed protocol systems and corresponding lock agents in the distributed system.
- the protocol server and the corresponding lock agent are located in one node device, and the lock server can be located in one node device with the protocol server and the lock agent, and the lock server can also be separately set in other node devices.
- the lock server and the protocol server and the lock agent are described in one node device as an example.
- Each node device includes a protocol server, a lock agent, and a lock server.
- the protocol server PS1, the lock agent P1, the lock server S1 are in the node device 1, the protocol server PS2, the lock agent P2, the lock server S2 are in the node device 2, and so on.
- the protocol server has a one-to-one correspondence with the lock agent, that is, the protocol server PS1 sends the lock request to the lock agent P1; the protocol server PS2 sends the lock request to the lock agent P2; and so on.
- the lock server can correspond to multiple lock agents, that is, the lock server S1 can receive the lock request sent by the lock agent P1, and can also receive the lock request sent by other lock agents (such as the lock agent P2).
- a lock request can either reclaim a request or lock a request for a lock.
- the protocol server After receiving the lock request, the protocol server sends the lock request to the corresponding lock agent.
- the protocol server and the corresponding lock agent are located in the same node device.
- the lock agent After receiving the lock request, the lock agent determines the lock server that processes the lock request according to the locally stored lock server management scope information; if the determined lock server that handles the lock request is not in a fault state, the lock The proxy sends the received lock request to the lock server that handles the lock request.
- the lock agent After receiving the lock request, the lock agent determines, according to the locally stored lock server management scope information, a lock server that processes the lock request; when the lock server that determines the handle lock request in the locally stored lock server management scope information is in a fault state, Determining, according to the locally stored lock server takeover relationship information, the takeover lock server of the lock server that processes the lock request; and sending the received lock request to the takeover lock server.
- lock server management scope information or the lock server takeover relationship information mentioned here may be recorded in a tabular manner, and may also be determined by means of a consistent hash loop.
- the specific implementation manner has been detailed in the foregoing. And the description will not be repeated here.
- the lock server range information or the lock server takeover relationship information in the lock agent is recorded in the same manner as the lock server range information or the lock server takeover relationship information in the lock server.
- the lock server scope information in the lock proxy and the lock server or the update rule of the lock server takeover relationship information are also the same. Therefore, the lock server management scope information or the lock server takeover relationship information obtained after the lock proxy and the lock server are updated are also the same.
- the unfailed lock server in the distributed system determines itself in the clockwise direction of the locally stored consistent hash ring after receiving the first notification message. Whether it is the takeover lock server of the fault lock server. As described above, the lock server that has not failed can also determine whether it is the takeover lock server of the failover server according to the counterclockwise direction of the locally stored consistency hash ring.
- the second lock server determines that it is the takeover lock server of the first lock server according to the locally stored consistency hash ring, the second lock server that takes over the lock server enters a silent state; if the server is the takeover lock server The second lock server receives the lock reiterate request, and returns the corresponding lock authority information according to the lock authority information table; if the second lock server that takes over the lock server receives the lock request, returns a rejected response message.
- the third lock server that has not failed in the distributed system determines that it is not the takeover lock server of the first lock server according to the locally stored consistency hash ring, the third lock server Upon receiving the lock request, the lock authority information is allocated according to the lock request.
- the lock agent After receiving the lock request, the lock agent determines the lock server that processes the lock request according to the locally stored consistency hash ring.
- the consistency hash ring of each lock agent local storage is the same, and the consistency hash ring of the lock agent local storage is the same as the consistency hash ring stored locally by the lock server.
- the lock agent may also identify the locally stored lock server management scope information or the first lock server in the lock server takeover relationship information as a fault state, and after receiving the set time or receiving After the second notification of the management node, the locally stored lock server management scope information or the lock server takeover relationship information is updated.
- the information of the first lock server is not included in the updated lock server management scope information or the lock server takeover relationship information.
- the lock agent may further receive a third notification message sent by the management node, where the third notification message is used to notify the lock server to update the locally stored consistency hash ring, where the third notification message carries the newly added lock Server information, update the local storage lock server management scope information or lock server takeover relationship information.
- the updated lock server management scope information or the lock server takeover relationship information contains information of the newly added lock server.
- each lock server determines whether it is the takeover lock server of the fault lock server according to the locally stored lock server takeover relationship information. When it is determined that it is not the takeover lock server of the fault lock server, that is, when the non-takeover lock server is received, the non-takeover lock server can normally allocate the lock authority information for the lock request after receiving the lock request. If it is determined that it is the takeover lock server of the fault lock server, it enters the silent state, and after receiving the lock request, returns a response message of the rejection. In this way, only the takeover lock server is silent and other non-takeover lock servers can handle the lock request normally. The scope of the lock server failure is minimized, and most of the lock services can be performed normally. This prevents the service interruption caused by the inability to process the lock request or the system crash due to the lock permission conflict. The stability of the distributed system.
- the embodiment of the invention further provides a lock server 8 for implementing fault processing in a distributed system, the structure of which is shown in FIG.
- the lock server 8 includes a receiving module 801, a processing module 803, and a storage module 805.
- the receiving module 801 is configured to receive the first notification message, and send the first notification message to the processing module 803.
- the first notification message carries the information of the faulty lock server.
- the processing module 803 determines whether it is the takeover lock server of the fault lock server according to the lock server takeover relationship information stored in the storage module 805. If it is the takeover lock server of the failover server, the lock server enters a silent state, and the processing module 803 is further configured to start the silence timer. If it is not the takeover lock server of the failover server, the processing module 803 does not process. After the processing module 803 starts the silent timer, the lock server 8 enters a silent state.
- the receiving module 801 is further configured to receive a lock request sent by the lock agent, and send the received lock request to the processing module 803.
- the processing module 803 is further configured to return the corresponding lock privilege information according to the lock privilege information table stored in the storage module 805 after receiving the lock re-request.
- the processing module 803 is further configured to determine whether the lock server is in a silent state after receiving the lock request, and determining whether the silence timer is in an activated state; The lock server is not silent when it is not in the startup state.
- the processing module 803 allocates lock authority information to the lock request and sends the lock authority information to the receiving module 801. If the quiet timer is in the startup state, the lock server is in a silent state, and the processing module 803 sends a rejected response message to the receiving module 801.
- the processing lock authority information and the file information in the lock request may also be It is stored in the lock authority information table in the storage module 805.
- the receiving module 801 is further configured to return the received lock rights information or the rejected response message to the lock agent.
- the lock server takeover relationship information may be determined by using a lock server takeover relationship table (as shown in Table 1), or may be determined by a consistent hash ring (as shown in FIG. 3-1).
- the processing module 803 determines whether it is the takeover lock server of the fault lock server according to the lock server takeover relationship information stored in the storage module 805. Specifically, the processing module 803 determines whether it is in the clockwise direction of the consistent hash loop. It is the takeover lock server of the fault lock server.
- the processing module 803 determines whether it is the takeover lock server of the fault lock server according to the lock server takeover relationship information stored in the storage module 805. Specifically, the processing module 803 determines the counterclockwise direction of the consistent hash loop. Whether it is the takeover lock server of the fault lock server.
- the receiving module 801 is further configured to receive the second notification message, and send the second notification message to the processing module 803, where the second notification message is used to notify the lock server to update the consistency hash ring, the second The notification message carries the information of the faulty lock server.
- the processing module 803 is further configured to update the consistency hash ring stored in the storage module 805, so that the fault lock server is not included in the updated consistency hash ring.
- the receiving module 801 is further configured to receive the third notification message, and send the third notification message to the processing module 803, where the third notification message is used to notify the lock server to update the consistency hash ring, and the third The notification message also carries information about the newly added lock server.
- the processing module 803 is further configured to update the consistency hash ring stored in the storage module 805, so that the updated consistency hash ring includes the newly added lock server.
- the processing module 803 is further configured to: after receiving the first notification message, the storage module
- the fault lock server in the consistency hash ring in 805 is identified as a fault state.
- the storage module 805 is configured to store information required to be used by the lock server 8 such as a lock server takeover relationship information and a lock authority information table.
- the lock server includes: a memory 901 configured to store a lock server takeover relationship information and a lock authority information table; an interface 902 configured to provide an external connection; and a computer readable medium 903 configured to For storing a computer program; the processor 904 is coupled to the memory 901, the interface 902, and the computer readable medium 903, and configured to execute the lock server failure processing method described above by running the program.
- a memory 901 configured to store a lock server takeover relationship information and a lock authority information table
- an interface 902 configured to provide an external connection
- a computer readable medium 903 configured to For storing a computer program
- the processor 904 is coupled to the memory 901, the interface 902, and the computer readable medium 903, and configured to execute the lock server failure processing method described above by running the program.
- the embodiment of the invention further provides a lock agent device 10 for implementing fault processing in a distributed system, the structure of which is shown in FIG.
- the lock agent includes a receiving module 1001, a processing module 1003, a storage module 1005, and a sending module 1007.
- the receiving module 1001 is configured to receive a lock request sent by the protocol server, and send the received lock request to the processing module 1003, where the lock request may be a lock reassortment request or a lock request.
- the processing module 1003 is configured to determine, after the lock request is received, the lock server that processes the lock request according to the lock server management scope information stored in the storage module 1003, and send the received lock request to the sending module 1007.
- the processing module 1003 is further configured to: after receiving the lock request, determine, according to the lock server management scope information stored in the storage module 1003, the lock server that processes the lock request; and the processing in the lock server management scope information
- the lock server of the lock request is in a fault state, determining the lock service for processing the lock request according to the lock server takeover relationship information stored in the storage module 1005.
- the server takes over the lock server and sends the received lock request to the sending module 1007.
- the sending module 1007 is configured to send a lock request to the determined lock server.
- the sending module 1007 sends a lock request to the takeover lock server.
- the sending module 1007 sends a lock request to the determined takeover lock server that processes the lock request.
- the lock server takeover relationship information and the lock server management scope information may be determined by using a lock server information table (as shown in Table 3), or may pass a consistent hash ring (as shown in Figure 3-1). Show) to determine.
- the processing module 1003 may specifically determine, according to the lock server management range information stored in the storage module 1005, that the processing module 1003 determines the lock for processing the lock request according to a clockwise direction or a counterclockwise direction of the consistency hash ring. server.
- the processing module 1003 determines, according to the lock server takeover relationship information stored in the storage module 1005, that the takeover lock server of the lock server that processes the lock request may be: the processing module 1003 is the same according to the consistency hash ring.
- the direction determines the takeover lock server of the lock server that handles the lock request. That is, if the processing module 1003 determines the lock server that processes the lock request in the clockwise direction of the consistent hash loop, the processing is also determined in a clockwise direction of the consistent hash loop.
- the lock request server of the lock server takes over the lock server. If the processing module 1003 determines the lock server that processes the lock request in a counterclockwise direction of the consistent hash ring, the processing of the lock request is also determined in a counterclockwise direction of the consistent hash loop. The lock server takes over the lock server. For specific confirmation, refer to the previous detailed description, and details are not described herein.
- the receiving module 1001 is further configured to receive the first notification message, and send the first notification message to the processing module 1003.
- the first notification message carries information of the faulty lock server.
- the processing module 1003 is further configured to identify the fault lock server in the consistency hash ring in the storage module 1005 as a fault state after receiving the first notification message.
- the receiving module 1001 is further configured to receive the second notification message, and send the second notification message to the processing module 1003, where the second notification message is used to notify the lock server to update the consistency hash ring.
- the processing module 1003 is further configured to update the consistency hash ring stored in the storage module 1005, so that the fault lock server is not included in the updated consistency hash ring.
- the receiving module 1001 is further configured to receive a third notification message, and send the third notification message to the processing module 1003, where the third notification message is used to notify the lock server to update the consistency hash ring, and the third The notification message also carries the identifier of the newly added lock server.
- the processing module 1003 is further configured to update the consistency hash ring stored in the storage module 1005, so that the newly added lock server is included in the updated consistency hash ring.
- the storage module 1005 is configured to store information related to the lock server takeover relationship information, the lock server management scope information, and the lock permission information table, which are required to be used by the lock server.
- the embodiment of the invention further provides a lock agent device for implementing fault processing in a distributed system, and the structure thereof is as shown in FIG.
- the lock proxy device includes: a memory 1101 configured to store lock server takeover relationship information and lock server management scope information; an interface 1102 configured to provide an external connection; the computer readable medium 1103,
- the processor 1104 is configured to be coupled to the memory 1101, the interface 1102, and the computer readable medium 1103, and configured to execute the lock server failure processing method described above by running the program. The method flow described in Figures 4-3, 5-2 and 6-2 and the corresponding text will not be repeated here.
- the embodiment of the present invention further provides a lock manager for implementing fault processing in a distributed system.
- the structure is as shown in FIG. 12, and includes a lock agent 1201 and a lock server 1203.
- the structure of the lock server 1203 and the functions implemented are as shown in Figures 8 and 9 and the corresponding textual description.
- the structure of the lock agent 1201 and the functions implemented are as shown in Figures 10 and 11 and the corresponding scheme descriptions.
- the lock manager performs the lock server fault processing method described above, and the method flows described in the following figures 4-2, 4-3, 5-2, and 6-2, and corresponding texts, and are not described herein again.
- the embodiment of the invention also provides a protocol service for implementing fault processing in a distributed system.
- the protocol server is configured to receive a lock request sent by an external protocol client, and send the lock request to the lock agent.
- the lock request may be a lock reconciliation request or a lock request.
- the embodiment of the present invention further provides a node device 13 for implementing fault processing in a distributed system.
- the structure of the node device 13 is as shown in FIG. 13 and includes a protocol server 1305, a lock agent 1301, and a lock server 1303.
- the protocol server 1305 is configured to receive a lock request sent by an external protocol client, and send the lock request to the lock agent 1301.
- the lock request may be a lock reconciliation request or a lock request.
- the lock agent 1301 is configured to determine, according to the locally stored lock server management scope information, a lock server that processes the lock request, and send the lock request to the determined lock server that processes the lock request.
- the lock agent 1301 is further configured to determine, according to the stored lock server management scope information, a lock server that processes the lock request, and according to the locally stored lock server takeover relationship when the determined lock server that processes the lock request is faulty The information determines to take over the lock server and send the lock request to the takeover lock server.
- the lock server 1303 is configured to: when the received lock request is a lock reassurance request, feed back the corresponding lock authority information according to the stored lock authority information table; the lock server 130 is further configured to: when the lock request is received, the lock request is When you check whether you are in a silent state, if it is in a silent state, it returns a response message for rejection. If it is not in a silent state, it assigns lock permission information.
- the structure of the lock server 1303 and the functions implemented are as shown in Figures 8 and 9 and the corresponding textual description.
- the structure and implementation of the lock agent 1301 are illustrated in Figures 10 and 11 and the corresponding scheme descriptions.
- the node device performs the lock server fault processing method described above, and the method flows described in the following figures 4-2, 4-3, 5-2, and 6-2, and corresponding texts, and are not described herein again.
- aspects of the invention, or possible implementations of various aspects may be embodied as a system, method, or computer program product.
- aspects of the invention, or possible implementations of various aspects The manner may be in the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, etc.), or a combination of software and hardware aspects, collectively referred to herein as "circuit,” “module,” or “system. ".
- aspects of the invention, or possible implementations of various aspects may take the form of a computer program product, which is a computer readable program code stored in a computer readable medium.
- the computer readable medium can be a computer readable signal medium or a computer readable storage medium.
- the computer readable storage medium includes, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing, such as random access memory (RAM), read only memory (ROM), Erase programmable read-only memory (EPROM or flash memory), optical fiber, portable read-only memory (CD-ROM).
- the processor in the computer reads the computer readable program code stored in the computer readable medium such that the processor is capable of performing the various functional steps specified in each step of the flowchart, or a combination of steps; A device that functions as specified in each block, or combination of blocks.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Hardware Redundancy (AREA)
- Computer And Data Communications (AREA)
Abstract
Description
锁服务器 | 接管锁服务器 |
S1 | S2 |
S2 | S4 |
S4 | S3 |
S3 | S1 |
锁服务器 | 锁服务器管理范围 |
S1 | 1024-5000 |
S2 | 5000-8000 |
S4 | 8000-512 |
S3 | 512-1024 |
锁服务器 | 锁服务器管理范围 | 接管锁服务器 |
S1 | 1024-5000 | S2 |
S2* | 5000-8000 | S4 |
S4 | 8000-512 | S3 |
S3 | 512-1024 | S1 |
锁服务器 | 锁服务器管理范围 | 接管锁服务器 |
S1 | 1024-5000 | S4 |
S4 | 5000-512 | S3 |
S3 | 512-1024 | S1 |
锁服务器 | 锁服务器管理范围 | 接管锁服务器 |
S1 | 1024-4000 | S2 |
S2 | 4000-7000 | S5 |
S5 | 7000-9000 | S4 |
S4 | 9000-512 | S3 |
S3 | 512-1024 | S1 |
Claims (33)
- 一种分布式系统中锁服务器故障处理方法,其特征在于,所述分布式系统中包括至少三个锁服务器,其中,每个锁服务器中存储有同一锁服务器接管关系信息,所述方法包括:所述分布式系统中未发生故障的锁服务器接收第一通知消息,所述第一通知消息中携带所述分布式系统中的第一锁服务器发生故障的信息;所述分布式系统中的第二锁服务器接收到所述第一通知消息后,根据本地存储的锁服务器接管关系信息确定自己为所述第一锁服务器的接管锁服务器,所述接管锁服务器进入静默状态;所述分布式系统中的第三锁服务器接收到所述第一通知消息后,根据本地存储的锁服务器接管关系信息,确定自己不是所述第一锁服务器的接管锁服务器;所述第三锁服务器接收到加锁请求时,根据所述加锁请求分配锁权限信息。
- 根据权利要求1所述的方法,其特征在于,还包括:所述接管锁服务器接收到锁重申请求时,根据锁权限信息表返回对应的锁权限信息;所述接管锁服务器接收到加锁请求时,返回拒绝的响应消息。
- 根据权利要求1或2所述的方法,其特征在于,所述分布式系统中还包括至少三个协议服务器和相应的锁代理,所述协议服务器和相应的锁代理位于同一节点设备中,所述方法还包括:当所述协议服务器接收到锁请求后,将所述锁请求发送给的相应的锁代理,所述锁请求为锁重申请求或加锁请求。
- 根据权利要求3所述的方法,其特征在于,所述每个锁代理本地存储有所述锁服务器接管关系信息和锁服务器管理范围信息,所述方法还包括:所述锁代理接收到锁请求后,根据本地存储的锁服务器管理范围信息确 定处理所述锁请求的锁服务器;若所述锁服务器管理范围信息中确定出的处理所述锁请求的锁服务器标识为故障状态,所述锁代理根据本地存储的锁服务器接管关系信息确定所述故障状态的锁服务器的接管锁服务器;将接收到的锁请求发送给所述接管锁服务器。
- 根据权利要求3所述的方法,其特征在于,所述锁服务器接管关系信息通过一致性哈希环来确定,所述第三锁服务器根据本地存储的锁服务器接管关系信息确定自己不是所述第一锁服务器的接管锁服务器具体为:所述第三锁服务器按照本地存储的一致性哈希环的顺时针方向或者逆时针方向确定自己不是所述第一锁服务器的接管锁服务器。
- 根据权利要求5所述的方法,其特征在于,所述未发生故障的锁服务器接收到第一通知消息之后,所述方法还包括:所述未发生故障的锁服务器将本地存储的一致性哈希环中的所述第一锁服务器标识为故障状态;到达预定的时间后,更新本地存储的所述一致性哈希环,所述更新后的一致性哈希环中不包括所述第一锁服务器。
- 根据权利要求5-6任一所述的方法,其特征在于,所述每个锁代理本地存储有所述锁服务器接管关系信息和锁服务器管理范围信息,所述锁服务器管理范围信息和所述锁服务器接管关系信息通过所述一致性哈希环来确定;所述锁代理接收到锁请求后,按照本地存储的一致性哈希环的顺时针方向或者逆时针方向确定处理所述锁请求的锁服务器;若所述本地存储的一致性哈希环中的所述处理所述锁请求的锁服务器标识为故障状态;所述锁代理按照本地存储的一致性哈希环的同样的方向确定所述处理所述锁请求的锁服务器的接管锁服务器。
- 根据权利要求7所述的方法,其特征在于,还包括:所述锁代理接收所述第一通知消息;所述锁代理将本地存储的一致性哈希环中的所述第一锁服务器标识为故障状态;到达预定的时间后,更新本地存储的一致性哈希环,所述更新后的一致性哈希环中不包括所述第一锁服务器。
- 根据权利要求7所述的方法,其特征在于,还包括:所述未发生故障的锁服务器接收第二通知消息,更新本地存储的一致性哈希环,所述更新后的一致性哈希环中不包括所述第一锁服务器,其中所述第二通知消息用于通知锁服务器更新本地存储的一致性哈希环,所述第二通知消息中携带所述第一锁服务器的信息;所述锁代理接收第二通知消息,更新本地存储的一致性哈希环,所述更新后的一致性哈希环中不包括所述第一锁服务器。
- 根据权利要求6-9任一所述的方法,其特征在于,还包括:所述未发生故障的锁服务器接收第三通知消息,更新本地存储的一致性哈希环,更新后的一致性哈希环中包含了新加入的锁服务器;其中所述所述第三通知消息用于通知锁服务器更新本地存储的一致性哈希环,所述第三通知消息中携带新加入的锁服务器的信息;所述锁代理接收所述第三通知消息;更新本地存储的一致性哈希环,更新后的一致性哈希环中包含了新加入的锁服务器。
- 一种实现锁服务器故障处理的分布式系统,其特征在于,包括至少三个锁服务器,所述每个锁服务器中存储有同一锁服务器接管关系信息;所述至少三个锁服务器中未发生故障的锁服务器用于接收第一通知消息,所述第一通知消息中携带第一锁服务器发生故障的信息;第二锁服务器用于根据本地存储的锁服务器接管关系信息确定自己为所述第一锁服务器的接管锁服务器,所述接管锁服务器进入静默状态;第三锁服务器用于根据本地存储的锁服务器接管关系信息,确定自己不 是所述第一锁服务器的接管锁服务器;所述第三锁服务器用于接收到加锁请求后,根据所述加锁请求分配锁权限信息。
- 根据权利要求11所述的系统,其特征在于,还包括,所述接管锁服务器用于:接收到锁重申请求时,根据锁权限信息表返回对应的锁权限信息;接收到加锁请求时,返回拒绝的响应消息。
- 根据权利要求11或12所述的系统,其特征在于,所述分布式系统中还包括至少三个个协议服务器和锁代理,其中,所述协议服务器和对应的锁代理位于一个节点设备中,所述系统还包括:所述协议服务器用于接收到锁请求后,将所述锁请求发送给对应的锁代理,所述锁请求为锁重申请求或加锁请求。
- 根据权利要求13所述的系统,其特征在于,所述每个锁代理中存储有所述锁服务器接管关系信息和锁服务器管理范围信息,所述系统还包括:所述锁代理用于接收到锁请求后,根据本地存储的锁服务器管理范围信息确定处理所述锁请求的锁服务器;若所述锁服务器管理范围信息中的处理所述锁请求的锁服务器标识为故障状态,所述锁代理还用于根据本地存储的锁服务器接管关系信息确定所述处理所述锁请求的锁服务器的接管锁服务器;并将接收到的锁请求发送给所述处理所述锁请求的锁服务器的接管锁服务器。
- 根据权利要求13所述的系统,其特征在于,所述锁服务器接管关系信息通过一致性哈希环来确定,所述第三锁服务器还用于按照本地存储的一致性哈希环的顺时针方向或者逆时针方向,确定自己不是所述第一锁服务器的接管锁服务器。
- 根据权利要求15所述的系统,其特征在于,还包括:所述未发生故障的锁服务器还用于在接收到第一通知消息之后,将本地存储的一致性哈希环中的所述第一锁服务器标识为故障状态;并在到达预定 的时间后,更新本地存储的一致性哈希环,所述更新后的一致性哈希环中不包括所述第一锁服务器。
- 根据权利要求15或16所述的系统,其特征在于,所述每个锁代理中存储有所述锁服务器接管关系信息和锁服务器管理范围信息,所述锁服务器管理范围信息和所述锁服务器接管关系信息通过所述一致性哈希环来确定;所述锁代理还用于接收到锁请求后,按照本地存储的一致性哈希环的顺时针方向或者逆时针方向确定处理所述锁请求的锁服务器;若所述本地存储的一致性哈希环中的所述处理所述锁请求的锁服务器标识为故障状态;所述锁代理还用于按照本地存储的一致性哈希环的同样的方向确定所述处理所述锁请求的锁服务器的接管锁服务器。
- 根据权利要求17所述的系统,所述所述系统还包括:所述锁代理还用于接收所述第一通知消息;将本地存储的一致性哈希环中的所述第一锁服务器标识为故障状态;并到达预定的时间后,更新本地存储的一致性哈希环,所述更新后的一致性哈希环中不包括所述第一锁服务器。
- 根据权利要求17所述的系统,其特征在于,还包括:所述未发生故障的锁服务器还用于接收第二通知消息,并更新本地存储的一致性哈希环,所述更新后的一致性哈希环中不包括所述第一锁服务器,其中,所述第二通知消息用于通知锁服务器更新本地存储的一致性哈希环,所述第二通知消息中携带所述第一锁服务器的信息;所述锁代理还用于接收第二通知消息,更新本地存储的一致性哈希环,所述更新后的一致性哈希环中不包括所述第一锁服务器。
- 根据权利要求16-19任一所述的系统,其特征在于,还包括:所述未发生故障的锁服务器还用于接收第三通知消息,更新本地存储的一致性哈希环,更新后的一致性哈希环中包含了新加入的锁服务器;所述第三通知消息用于通知锁服务器更新本地存储的一致性哈希环,所述第三通知消息中携带新加入锁服务器的信息;所述锁代理还用于接收所述第三通知消息;并更新本地存储的一致性哈希环,更新后的一致性哈希环中包含了新加入的锁服务器。
- 一种在分布式系统中实现故障处理的锁服务器,其特征在于,所述锁服务器包括接收模块801、处理模块803和存储模块805,所述存储模块805中存储有锁服务器接管关系信息;所述接收模块801用于接收第一通知消息,并将所述第一通知消息发送给所述处理模块803;所述第一通知消息中携带有故障锁服务器的信息;所述处理模块803用于接收到第一通知消息后,根据所述锁服务器接管关系信息,确定自己是否是故障锁服务器的接管锁服务器;若为所述故障锁服务器的接管锁服务器,所述锁服务器进入静默状态;所述处理模块803还用于接收到加锁请求之后,确定所述锁服务器是否处于静默状态,如果所述锁服务器不处于静默状态,则根据所述加锁请求分配锁权限信息;若所述锁服务器处于静默状态,则返回拒绝的响应消息,并将分配的锁权限信息或拒绝的响应消息发送给接收模块801;所述接收模块801还用于将接收到的锁权限信息或者拒绝的响应消息返回给锁代理。
- 根据权利要求21所述的锁服务器,其特征在于,所述处理模块803还用于接收到锁重申请求之后,根据存储模块805中存储的锁权限信息表返回对应的锁权限信息。
- 根据权利要求22或23所述的锁服务器,其特征在于,所述锁服务器接管关系信息可以通过一致性哈希环来确定;所述处理模块803根据存储模块805中存储的锁服务器接管关系信息确定自己是否是故障锁服务器的接管锁服务器具体为:处理模块803按所述一致性哈希环的顺时针方向或者逆时针方向确定自己是否是故障锁服务器的接管锁服务器。
- 根据权利要求23所述的锁服务器,其特征在于,所述处理模块803还用于在接收到所述第一通知消息之后,将存储模块805中的一致性哈希环中的故障锁服务器标识为故障状态。
- 根据权利要求23或24所述的锁服务器,其特征在于,所述接收模块801还用于接收第二通知消息,并将所述第二通知消息发送给处理模块803,述第二通知消息用于通知锁服务器更新存储模块805中的一致性哈希环,所述第二通知消息中携带所述故障锁服务器的信息;所述处理模块803还用于更新存储模块805中存储的一致性哈希环,使得更新后的一致性哈希环中不包括故障锁服务器。
- 根据权利要求23-25任一所述的锁服务器,其特征在于,所述接收模块801还用于接收第三通知消息,并将所述第三通知消息发送给处理模块803,所述第三通知消息用于通知锁服务器更新一致性哈希环,所述第三通知消息中还携带有新增加的锁服务器的标识;所述处理模块803还用于更新存储模块805中存储的一致性哈希环,使得更新后的一致性哈希环中包括新增加的锁服务器。
- 一种在分布式系统中实现故障处理的锁代理装置,包括接收模块1001、处理模块1003、存储模块1005和发送模块1007,其特征在于,所述接收模块1001用于接收协议服务器发送的锁请求,并将接收的锁请求发送给处理模块1003,其中所述锁请求是锁重申请求或加锁请求;所述处理模块1003用于接收到锁请求之后,根据存储模块1003中存储的锁服务器管理范围信息,确定处理所述锁请求的锁服务器,将接收到的锁请求发送给发送模块1007;所述发送模块1007用于将锁请求发送给确定的锁服务器。
- 根据权利要求27所述的锁代理装置,其特征在于,所述处理模块1003还用于在确定的所述锁服务器管理范围信息中的处理所述锁请求的锁服务器标识为故障状态时,根据存储模块1005中存储的锁服务器接管关系信息,确定所述处理所述锁请求的锁服务器的接管锁服务器。
- 根据权利要求28所述的锁代理装置,其特征在于,锁服务器接管关系信息和锁服务器管理范围信息通过一致性哈希环来确定;所述处理模块1003根据存储模块1005中存储的锁服务器管理范围信息 具体为:处理模块1003按照所述一致性哈希环的顺时针方向或者逆时针方向确定所述处理所述锁请求的锁服务器;所述处理模块1003根据存储模块1005中存储的锁服务器接管关系信息确定所述处理所述锁请求的锁服务器的接管锁服务器具体为:处理模块1003按照所述一致性哈希环的相同的方向确定所述处理所述锁请求的锁服务器的接管锁服务器。
- 根据权利要求29所述的锁代理装置,其特征在于,所述接收模块1001还用于接收第一通知消息,并将所述第一通知消息发送给处理模块1003;所述第一通知消息中携带有故障锁服务器的信息;所述处理模块1003还用于在接收到所述第一通知消息之后,将存储模块1005中的一致性哈希环中的故障锁服务器标识为故障状态。
- 根据权利要求29所述的锁代理装置,其特征在于,所述接收模块1001还用于接收第二通知消息,并将所述第二通知消息发送给处理模块1003,用于通知锁服务器更新本地存储的一致性哈希环,所述第二通知消息中携带所述故障锁服务器的信息;所述处理模块1003还用于更新存储模块1005中存储的一致性哈希环,使得更新后的一致性哈希环中不包括故障锁服务器。
- 根据权利要求29-31任一所述的锁代理装置,其特征在于,所述接收模块1001还用于接收第三通知消息,并将所述第三通知消息发送给处理模块1003,所述第三通知消息用于通知锁服务器更新一致性哈希环,所述第三通知消息中还携带有新增加的锁服务器的标识;所述处理模块1003还用于更新存储模块1005中存储的一致性哈希环,使得更新后的一致性哈希环中包括新增加的锁服务器。
- 一种在分布式系统中实现故障处理的锁管理器,其特征在于,包括如权利要求21-26所述的锁服务器和如权利要求27-32所述的锁代理装置。
Priority Applications (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP14906065.9A EP3059932B1 (en) | 2014-11-12 | 2014-11-12 | Lock server malfunction processing method and system thereof in distribution system |
CN201480064790.8A CN105794182B (zh) | 2014-11-12 | 2014-11-12 | 分布式系统中锁服务器故障的处理方法及其系统 |
PCT/CN2014/090886 WO2016074167A1 (zh) | 2014-11-12 | 2014-11-12 | 分布式系统中锁服务器故障的处理方法及其系统 |
CN201711118701.5A CN108023939B (zh) | 2014-11-12 | 2014-11-12 | 分布式系统中锁服务器故障的处理方法及其系统 |
HUE14906065A HUE042424T2 (hu) | 2014-11-12 | 2014-11-12 | Zárolás kiszolgáló meghibásodásának feldolgozási eljárása és rendszere egy elosztott rendszerben |
JP2016539939A JP6388290B2 (ja) | 2014-11-12 | 2014-11-12 | 分散システムにおけるロック・サーバの故障を処理するための方法およびシステム |
US15/592,217 US9952947B2 (en) | 2014-11-12 | 2017-05-11 | Method and system for processing fault of lock server in distributed system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/CN2014/090886 WO2016074167A1 (zh) | 2014-11-12 | 2014-11-12 | 分布式系统中锁服务器故障的处理方法及其系统 |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US15/592,217 Continuation US9952947B2 (en) | 2014-11-12 | 2017-05-11 | Method and system for processing fault of lock server in distributed system |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016074167A1 true WO2016074167A1 (zh) | 2016-05-19 |
Family
ID=55953572
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2014/090886 WO2016074167A1 (zh) | 2014-11-12 | 2014-11-12 | 分布式系统中锁服务器故障的处理方法及其系统 |
Country Status (6)
Country | Link |
---|---|
US (1) | US9952947B2 (zh) |
EP (1) | EP3059932B1 (zh) |
JP (1) | JP6388290B2 (zh) |
CN (2) | CN108023939B (zh) |
HU (1) | HUE042424T2 (zh) |
WO (1) | WO2016074167A1 (zh) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10846185B2 (en) | 2015-12-30 | 2020-11-24 | Huawei Technologies Co., Ltd. | Method for processing acquire lock request and server |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108352995B (zh) * | 2016-11-25 | 2020-09-08 | 华为技术有限公司 | 一种smb业务故障处理方法和存储设备 |
CN110580232B (zh) * | 2018-06-08 | 2021-10-29 | 杭州宏杉科技股份有限公司 | 一种锁管理的方法及装置 |
US10999361B2 (en) * | 2018-09-07 | 2021-05-04 | Red Hat, Inc. | Consistent hash-based load balancer |
CN111125048B (zh) * | 2019-12-06 | 2022-04-22 | 浪潮电子信息产业股份有限公司 | 一种故障通知方法、装置、设备及计算机可读存储介质 |
CN114710976B (zh) * | 2020-10-16 | 2023-06-16 | 华为技术有限公司 | 一种锁重申方法、锁管理方法以及服务器 |
CN113034752B (zh) * | 2021-05-25 | 2021-08-03 | 德施曼机电(中国)有限公司 | 一种智能锁故障处理方法、装置及计算机可读存储介质 |
CN115277379B (zh) * | 2022-07-08 | 2023-08-01 | 北京城市网邻信息技术有限公司 | 分布式锁容灾处理方法、装置、电子设备及存储介质 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6173293B1 (en) * | 1998-03-13 | 2001-01-09 | Digital Equipment Corporation | Scalable distributed file system |
CN101252603A (zh) * | 2008-04-11 | 2008-08-27 | 清华大学 | 基于存储区域网络san的集群分布式锁管理方法 |
Family Cites Families (27)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5943422A (en) | 1996-08-12 | 1999-08-24 | Intertrust Technologies Corp. | Steganographic techniques for securely delivering electronic digital rights management control information over insecure communication channels |
US6990606B2 (en) | 2000-07-28 | 2006-01-24 | International Business Machines Corporation | Cascading failover of a data management application for shared disk file systems in loosely coupled node clusters |
JP4478321B2 (ja) * | 2000-11-27 | 2010-06-09 | 富士通株式会社 | ストレージシステム |
US6601070B2 (en) * | 2001-04-05 | 2003-07-29 | Hewlett-Packard Development Company, L.P. | Distribution of physical file systems |
US7765329B2 (en) * | 2002-06-05 | 2010-07-27 | Silicon Graphics International | Messaging between heterogeneous clients of a storage area network |
US7289992B2 (en) * | 2003-05-01 | 2007-10-30 | International Business Machines Corporation | Method, system, and program for lock and transaction management |
US7356531B1 (en) * | 2003-07-25 | 2008-04-08 | Symantec Operating Corporation | Network file system record lock recovery in a highly available environment |
US7162666B2 (en) | 2004-03-26 | 2007-01-09 | Emc Corporation | Multi-processor system having a watchdog for interrupting the multiple processors and deferring preemption until release of spinlocks |
US7962915B2 (en) * | 2005-03-18 | 2011-06-14 | International Business Machines Corporation | System and method for preserving state for a cluster of data servers in the presence of load-balancing, failover, and fail-back events |
US7619761B2 (en) * | 2005-06-30 | 2009-11-17 | Microsoft Corporation | Extensible and distributed job execution service in a server farm |
JP4371321B2 (ja) * | 2006-03-10 | 2009-11-25 | 富士通株式会社 | Nfsサーバ、nfsサーバ制御プログラム、nfsサーバ制御方法 |
CN100432940C (zh) * | 2006-10-19 | 2008-11-12 | 华为技术有限公司 | 计算机集群系统中共享资源锁分配方法与计算机及集群系统 |
US8990954B2 (en) * | 2007-06-20 | 2015-03-24 | International Business Machines Corporation | Distributed lock manager for file system objects in a shared file system |
US8244846B2 (en) | 2007-12-26 | 2012-08-14 | Symantec Corporation | Balanced consistent hashing for distributed resource management |
US8276013B2 (en) * | 2008-02-13 | 2012-09-25 | Broadcom Corporation | System and method for reducing a link failure detection delay using a link energy signal while in a low power idle mode |
US8296599B1 (en) * | 2009-06-30 | 2012-10-23 | Symantec Corporation | System and method for implementing clustered network file system lock management |
US9026510B2 (en) * | 2011-03-01 | 2015-05-05 | Vmware, Inc. | Configuration-less network locking infrastructure for shared file systems |
US8533171B2 (en) * | 2011-04-08 | 2013-09-10 | Symantec Corporation | Method and system for restarting file lock services at an adoptive node during a network filesystem server migration or failover |
EP2686805A4 (en) | 2011-10-31 | 2016-02-24 | Hewlett Packard Development Co | Maintaining a File Capture |
US8661005B2 (en) | 2011-12-08 | 2014-02-25 | International Business Machines Corporation | Optimized deletion and insertion for high-performance resizable RCU-protected hash tables |
US8856583B1 (en) * | 2012-01-20 | 2014-10-07 | Google Inc. | Failover operation on a replicated distributed database system while maintaining access invariance |
US20140019429A1 (en) | 2012-07-12 | 2014-01-16 | Volker Driesen | Downtime reduction for lifecycle management events |
US8874535B2 (en) | 2012-10-16 | 2014-10-28 | International Business Machines Corporation | Performance of RCU-based searches and updates of cyclic data structures |
CN103812685B (zh) * | 2012-11-15 | 2018-02-27 | 腾讯科技(深圳)有限公司 | 同时在线统计系统及统计方法 |
CN103297268B (zh) * | 2013-05-13 | 2016-04-06 | 北京邮电大学 | 基于p2p技术的分布式数据一致性维护系统和方法 |
US10049022B2 (en) * | 2013-06-24 | 2018-08-14 | Oracle International Corporation | Systems and methods to retain and reclaim resource locks and client states after server failures |
US9304861B2 (en) * | 2013-06-27 | 2016-04-05 | International Business Machines Corporation | Unobtrusive failover in clustered network-attached storage |
-
2014
- 2014-11-12 CN CN201711118701.5A patent/CN108023939B/zh active Active
- 2014-11-12 HU HUE14906065A patent/HUE042424T2/hu unknown
- 2014-11-12 CN CN201480064790.8A patent/CN105794182B/zh active Active
- 2014-11-12 WO PCT/CN2014/090886 patent/WO2016074167A1/zh active Application Filing
- 2014-11-12 EP EP14906065.9A patent/EP3059932B1/en active Active
- 2014-11-12 JP JP2016539939A patent/JP6388290B2/ja active Active
-
2017
- 2017-05-11 US US15/592,217 patent/US9952947B2/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6173293B1 (en) * | 1998-03-13 | 2001-01-09 | Digital Equipment Corporation | Scalable distributed file system |
CN101252603A (zh) * | 2008-04-11 | 2008-08-27 | 清华大学 | 基于存储区域网络san的集群分布式锁管理方法 |
Non-Patent Citations (2)
Title |
---|
CHAI, MENGZHU.: "Design and implementation of fault tolerant mechanism of lock server based on distributed system", JOURNAL OF NANCHANG COLLEGE OF EDUCATION, vol. 28, no. 10, 31 December 2013 (2013-12-31), pages 195 and 196, XP008182655 * |
See also references of EP3059932A4 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10846185B2 (en) | 2015-12-30 | 2020-11-24 | Huawei Technologies Co., Ltd. | Method for processing acquire lock request and server |
Also Published As
Publication number | Publication date |
---|---|
EP3059932B1 (en) | 2018-09-19 |
US9952947B2 (en) | 2018-04-24 |
EP3059932A4 (en) | 2016-12-21 |
JP2017500653A (ja) | 2017-01-05 |
CN105794182A (zh) | 2016-07-20 |
JP6388290B2 (ja) | 2018-09-12 |
CN105794182B (zh) | 2017-12-15 |
EP3059932A1 (en) | 2016-08-24 |
CN108023939B (zh) | 2021-02-05 |
HUE042424T2 (hu) | 2019-07-29 |
CN108023939A (zh) | 2018-05-11 |
US20170242762A1 (en) | 2017-08-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016074167A1 (zh) | 分布式系统中锁服务器故障的处理方法及其系统 | |
US11445019B2 (en) | Methods, systems, and media for providing distributed database access during a network split | |
US10657012B2 (en) | Dynamically changing members of a consensus group in a distributed self-healing coordination service | |
JP6210987B2 (ja) | クラスタ化クライアントのフェイルオーバ | |
US10846185B2 (en) | Method for processing acquire lock request and server | |
WO2017190594A1 (zh) | 分布式锁管理的方法、装置及系统 | |
US10963353B2 (en) | Systems and methods for cross-regional back up of distributed databases on a cloud service | |
WO2020207078A1 (zh) | 数据处理方法、装置和分布式数据库系统 | |
CN116361073A (zh) | 数据安全管理方法、装置、设备及存储介质 | |
NZ622122B2 (en) | Clustered client failover |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
REEP | Request for entry into the european phase |
Ref document number: 2014906065 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2014906065 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2016539939 Country of ref document: JP Kind code of ref document: A |
|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 14906065 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |