WO2023000534A1 - Communication method and apparatus between cluster nodes - Google Patents

Communication method and apparatus between cluster nodes Download PDF

Info

Publication number
WO2023000534A1
WO2023000534A1 PCT/CN2021/127507 CN2021127507W WO2023000534A1 WO 2023000534 A1 WO2023000534 A1 WO 2023000534A1 CN 2021127507 W CN2021127507 W CN 2021127507W WO 2023000534 A1 WO2023000534 A1 WO 2023000534A1
Authority
WO
WIPO (PCT)
Prior art keywords
cluster node
cluster
message
information
distributed lock
Prior art date
Application number
PCT/CN2021/127507
Other languages
French (fr)
Chinese (zh)
Inventor
李宏伟
颜秉珩
Original Assignee
苏州浪潮智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 苏州浪潮智能科技有限公司 filed Critical 苏州浪潮智能科技有限公司
Publication of WO2023000534A1 publication Critical patent/WO2023000534A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0662Virtualisation aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/56Provisioning of proxy services
    • H04L67/568Storing data temporarily at an intermediate stage, e.g. caching
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L69/00Network arrangements, protocols or services independent of the application payload and not provided for in the other groups of this subclass
    • H04L69/16Implementation or adaptation of Internet protocol [IP], of transmission control protocol [TCP] or of user datagram protocol [UDP]
    • H04L69/161Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields
    • H04L69/162Implementation details of TCP/IP or UDP/IP stack architecture; Specification of modified or new header fields involving adaptations of sockets based mechanisms

Definitions

  • the present application relates to the field of network transmission, and more specifically, to a communication method and device among cluster nodes.
  • the cluster file system can be shared and mounted by multiple servers at the same time, it is often used as a bridge between multiple computing nodes and centralized storage.
  • the cluster file system can provide file concurrent access control, integrity assurance, high availability, and redundancy, etc., and is used by virtualization systems to store virtual machine images and share storage pools.
  • the distributed lock manager (DLM) is a key component of the cluster file system and is used to manage concurrent access to shared resources; it mainly solves the problem of disk cache consistency between cluster nodes and improves the efficiency of shared file access .
  • Common cluster file systems such as GFS, VMFS, OpenVMS Files, ocfs2, etc. have implemented their own DLM.
  • the purpose of the embodiment of the present application is to propose a communication method and device between cluster nodes, which can avoid the contention phenomenon of DLM communication, reduce the read and write overhead and delay of data in large-scale clusters, and improve the sharing of disks in clusters. Availability and usability of communication systems.
  • the first aspect of the embodiment of the present application provides a communication method between cluster nodes, including performing the following steps:
  • the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode , and continuously monitor the message communication area of the first cluster node on the shared disk;
  • the information processing function is called to process the second Distributed lock manager information, pop up the buffer where the second distributed lock manager information is located, and send a message reply to the second cluster node.
  • the information of the first distributed lock manager is written into the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, it continues to monitor the first lock manager on the shared disk.
  • the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the second cluster node on the shared disk corresponding to the first cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
  • the first cluster node in response to the first cluster node listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk receiving the first message for the first distributed lock manager message reply, and feedback that the first distributed lock manager completes the information processing and stops listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk.
  • the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the first cluster node on the shared disk corresponding to the second cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
  • the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk receives the first message for the first distributed lock manager in response to the first cluster node listening. The message is replied, and the first distributed lock manager is fed back that the information processing is completed.
  • the first cluster node in response to switching the communication mode of the first cluster node from the network communication mode to the socket communication mode, stops listening to any area on the shared disk, wherein the listening includes a period for the shared disk sexual polling.
  • generating a plurality of buffers corresponding to the channel depth in each channel includes: acquiring a predetermined channel depth based on the required message concurrent processing capability, and generating buffers corresponding to the channel depth in each channel A plurality of buffers with a positively related number of depths, wherein each buffer is configured with a disk space for storing a piece of first distributed lock manager information, a piece of second distributed lock manager information, or a message reply.
  • the message communication area for the third cluster node is divided on the shared disk, and the message communication area of the third cluster node is divided into a message communication area corresponding to each cluster node. corresponding to multiple channels, and generate multiple buffers corresponding to the channel depth in each channel, and at the same time, divide the channel corresponding to the third cluster node in the message communication area of other existing cluster nodes.
  • the second aspect of the embodiments of the present application provides a communication device between cluster nodes, including:
  • the controller stores program codes executable by the processor, and the processor performs the following steps when running the program codes:
  • the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode , and continuously monitor the message communication area of the first cluster node on the shared disk;
  • the information processing function is called to process the second Distributed lock manager information, pop up the buffer where the second distributed lock manager information is located, and send a message reply to the second cluster node.
  • the information of the first distributed lock manager is written into the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, it continues to monitor the first lock manager on the shared disk.
  • the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the second cluster node on the shared disk corresponding to the first cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
  • the first cluster node in response to the first cluster node listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk receiving the first message for the first distributed lock manager message reply, and feedback that the first distributed lock manager completes the information processing and stops listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk.
  • the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the first cluster node on the shared disk corresponding to the second cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
  • the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk receives the first message for the first distributed lock manager in response to the first cluster node listening. The message is replied, and the first distributed lock manager is fed back that the information processing is completed.
  • the first cluster node in response to switching the communication mode of the first cluster node from the network communication mode to the socket communication mode, stops listening to any area on the shared disk, wherein the listening includes a period for the shared disk sexual polling.
  • generating a plurality of buffers corresponding to the channel depth in each channel includes: acquiring a predetermined channel depth based on the required message concurrent processing capability, and generating buffers corresponding to the channel depth in each channel A plurality of buffers with a positively related number of depths, wherein each buffer is configured with a disk space for storing a piece of first distributed lock manager information, a piece of second distributed lock manager information, or a message reply.
  • the message communication area for the third cluster node is divided on the shared disk, and the message communication area of the third cluster node is divided into a message communication area corresponding to each cluster node. corresponding to multiple channels, and generate multiple buffers corresponding to the channel depth in each channel, and at the same time, divide the channel corresponding to the third cluster node in the message communication area of other existing cluster nodes.
  • the present application has the following beneficial technical effects: the method and device for inter-cluster node communication provided by the embodiment of the present application, by dividing the message communication area for each cluster node on the shared disk of the cluster, each message communication area is divided into each a plurality of channels corresponding to a cluster node, and generate a plurality of buffers corresponding to the depth of the channel in each channel; in response to the first cluster node of the cluster detecting that the first cluster node is in the same cluster as the second cluster node The socket connection of the node is interrupted, and the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode, and the message communication area of the first cluster node on the shared disk is continuously monitored; in response to the first cluster node Send the first distributed lock manager information to the second cluster node in the network communication mode, and write the first distributed lock manager information into the message communication area of the second cluster node on the shared disk with the first cluster In the channel corresponding to the node; in response to the first cluster node listening to the second distributed
  • FIG. 1 is a schematic flow diagram of a communication method between cluster nodes provided by the present application
  • Fig. 2 is the space division diagram of the shared disk of the communication method between cluster nodes provided by the present application
  • FIG. 3 is a communication flowchart of the communication method between cluster nodes provided by the present application.
  • the first aspect of the embodiment of the present application proposes a method to avoid the contention phenomenon of DLM communication, reduce the data read and write overhead and delay in large-scale clusters, and improve the practicability of the shared disk communication system in the cluster
  • FIG. 1 shows a schematic flowchart of a communication method between cluster nodes provided by the present application.
  • the communication method between cluster nodes includes the following steps:
  • Step S101 divide the message communication area for each cluster node on the shared disk of the cluster, divide a plurality of channels corresponding to each cluster node in each message communication area, and generate and channel depth in each channel Corresponding multiple buffers;
  • Step S103 in response to the first cluster node of the cluster detecting that the socket connection between the first cluster node and the second cluster node in the same cluster is interrupted, switching the communication mode of the first cluster node from the socket communication mode to Network communication mode, and continuously monitor the message communication area of the first cluster node on the shared disk;
  • Step S105 in response to the first cluster node sending the first distributed lock manager information to the second cluster node in the network communication mode, and writing the first distributed lock manager information into the second cluster node on the shared disk In the channel corresponding to the first cluster node in the message communication area;
  • Step S107 in response to the first cluster node receiving the second distributed lock manager information in the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk, and calling the information processing function Processing the second distributed lock manager information, popping the buffer where the second distributed lock manager information is located, and sending a message reply to the second cluster node.
  • the program can be stored in a computer-readable storage medium, and the program can be executed when , may include the flow of the embodiments of the above-mentioned methods.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM) and the like.
  • the computer program embodiments can achieve the same or similar effects as any of the corresponding foregoing method embodiments.
  • the information of the first distributed lock manager is written into the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, it continues to monitor the first lock manager on the shared disk.
  • the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the second cluster node on the shared disk corresponding to the first cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
  • the first cluster node in response to the first cluster node listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk receiving the first message for the first distributed lock manager message reply, and feedback that the first distributed lock manager completes the information processing and stops listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk.
  • the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the first cluster node on the shared disk corresponding to the second cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
  • the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk receives the first message for the first distributed lock manager in response to the first cluster node listening. The message is replied, and the first distributed lock manager is fed back that the information processing is completed.
  • the first cluster node in response to switching the communication mode of the first cluster node from the network communication mode to the socket communication mode, stops listening to any area on the shared disk, wherein the listening includes a period for the shared disk sexual polling.
  • generating a plurality of buffers corresponding to the channel depth in each channel includes: acquiring a predetermined channel depth based on the required message concurrent processing capability, and generating buffers corresponding to the channel depth in each channel A plurality of buffers with a positively related number of depths, wherein each buffer is configured with a disk space for storing a piece of first distributed lock manager information, a piece of second distributed lock manager information, or a message reply.
  • the message communication area for the third cluster node is divided on the shared disk, and the message communication area of the third cluster node is divided into a message communication area corresponding to each cluster node. corresponding to multiple channels, and generate multiple buffers corresponding to the channel depth in each channel, and at the same time, divide the channel corresponding to the third cluster node in the message communication area of other existing cluster nodes.
  • This program presents a distributed lock manager implementation method based on shared disk multi-channel communication.
  • First specify an area in the shared disk, and reserve an address space for each node in the cluster as a message communication area.
  • the communication area is composed of channels (channels) supported by the cluster with the maximum number of nodes (let it be M), and each channel is composed of a buffer (buffer) with a depth of N.
  • node A senses that the network connection with node B is disconnected, it will switch from the socket communication mode to the network communication mode.
  • A will write the DLM message to channel A in the communication area of node B; at the same time, because the socket connection is bidirectional, B will also perceive that the connection with A is disconnected, so B will listen to channel A in its own message area .
  • the monitoring process is realized by polling.
  • B detects a valid message, it processes the message and writes the reply message back to the area.
  • B wants to send a message to A, it will also write the message into channel B in the message area of A, thus realizing two-way message communication.
  • the node in order to reduce the IO pressure of disk polling, the node will only add the channel corresponding to the node to the polling channel list after it senses that it is disconnected from other nodes, avoiding unnecessary IO overhead.
  • the DLM lock manager scheme based on shared disk multi-channel communication optimizes the communication area selection problem in disk communication. By introducing a multi-message channel mechanism, it avoids the contention of the message sending area, greatly improves the efficiency of the disk communication scheme, and reduces The IO pressure of the disk is reduced, and the application range and practicability of the disk communication scheme are further improved. This solution is applicable to IP-SAN storage and FC-SAN storage.
  • the communication area is composed of channels (channels) that support the maximum number of nodes (let it be M) in the cluster, and each channel is composed of slots (buffers) with a depth of N.
  • the cluster consists of 5 nodes, so a communication area of 5 nodes is reserved in the formatting stage, each communication area consists of M channels, and each channel consists of a depth of N
  • the slot (buffer) composition is shown in Figure 2 for details. It should be noted that the diagonal area is not used, since nodes do not send messages to themselves.
  • the cluster file system also supports dynamically adding nodes to the cluster; when adding nodes, the message communication area of the nodes will be correspondingly increased.
  • node A When node A senses that the network connection with node B is disconnected, it will switch from the socket communication mode to the network communication mode. A will write the DLM message to channel A in the communication area of node B; at the same time, because the socket connection is bidirectional, B will also perceive that the connection with A is disconnected, so B will listen to channel A in its own message area .
  • the monitoring process is realized by polling. When B detects a valid message, it processes the message and writes the reply message back to the area. Similarly, when B wants to send a message to A, it will also write the message into channel B in the message area of A, thus realizing two-way message communication. In addition, in order to reduce the IO pressure of disk polling, the node will only add the channel corresponding to the node to the polling channel list after it senses that it is disconnected from other nodes, avoiding unnecessary IO overhead.
  • node 3 when node 3 sends a message to node 1, it mainly includes the following steps:
  • Node 3 writes the DLM message to [node 1, chan 3, slot x];
  • Node 3 adds chan 3 to the message sending monitoring list, and periodically polls to check the message return result
  • node 1 Since node 1 receives the node 3 socket connection disconnection event, it will put the message receiving channel [node 1, chan 3, slot x] corresponding to node 1 into the message receiving monitoring list, and periodically poll to see the sending of the other node information;
  • Node 1 receives the message from Node 3 by polling
  • Node 1 calls the information processing function to process the message
  • Node 1 finishes processing the message, and writes the ACK message back to the message channel [node 1, chan 3, slot x], indicating that the message is processed;
  • Node 3 receives the message reply from node 1 through polling, and removes [node 1, chan 3, slot x] from the message sending monitoring list to complete a message sending process.
  • node 1 when node 1 sends a message to node 3, it will select the channel [node 3, chan 1, slot y] for message communication, and the process is the same as above.
  • node 3 needs to poll [node 3, chan x, slot x] to receive information that may be sent by other nodes during normal operation, and additional Polling [node 1, chan 3, slot x].
  • node 1 finishes processing the message, it no longer writes the ACK message back to the message channel [node 1, chan 3, slot x], but writes it back to [node 3, chan 1, slot x], so
  • node 3 can only poll [node 3, chan x, slot x] forever, further reducing the polling pressure of node 3 on the disk under the premise of avoiding the contention phenomenon of DLM communication.
  • the method disclosed according to the embodiment of the present application may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium.
  • the computer program is executed by the CPU, the above functions defined in the methods disclosed in the embodiments of the present application are executed.
  • the above-mentioned method steps and system units can also be implemented by using a controller and a computer-readable storage medium for storing a computer program that enables the controller to realize the functions of the above-mentioned steps or units.
  • each message communication area is divided into a message communication area with each cluster node.
  • the socket connection is interrupted, and the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode, and the message communication area of the first cluster node on the shared disk is continuously monitored; in response to the first cluster node being in the network In the communication mode, the first distributed lock manager information is sent to the second cluster node, and the first distributed lock manager information is written in the message communication area of the second cluster node on the shared disk that is related to the first cluster node.
  • the processing function processes the information of the second distributed lock manager, pops up the buffer where the information of the second distributed lock manager is located, and sends a message reply to the second cluster node, which can avoid the contention phenomenon of DLM communication and reduce large
  • the read and write overhead and delay of data in large-scale clusters improve the practicability and availability of the shared disk communication system in the cluster.
  • the second aspect of the embodiment of the present application proposes a method to avoid the contention phenomenon of DLM communication, reduce the read and write overhead and delay of data in large-scale clusters, and improve the practicability of the shared disk communication system in the cluster
  • Devices include:
  • the controller stores program codes executable by the processor, and the processor performs the following steps when running the program codes:
  • the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode , and continuously monitor the message communication area of the first cluster node on the shared disk;
  • the information processing function is called to process the second Distributed lock manager information, pop up the buffer where the second distributed lock manager information is located, and send a message reply to the second cluster node.
  • the devices and equipment disclosed in the examples of this application can be various electronic terminal equipment, such as mobile phones, personal digital assistants (PDA), tablet computers (PAD), smart TVs, etc., or large terminal equipment, such as servers, etc. Therefore, the scope of protection disclosed in the embodiments of the present application should not be limited to a specific type of device or equipment.
  • the client disclosed in the embodiments of the present application may be applied to any of the above-mentioned electronic terminal devices in the form of electronic hardware, computer software, or a combination of the two.
  • the information of the first distributed lock manager is written into the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, it continues to monitor the first lock manager on the shared disk.
  • the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the second cluster node on the shared disk corresponding to the first cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
  • the first cluster node in response to the first cluster node listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk receiving the first message for the first distributed lock manager message reply, and feedback that the first distributed lock manager completes the information processing and stops listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk.
  • the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the first cluster node on the shared disk corresponding to the second cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
  • the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk receives the first message for the first distributed lock manager in response to the first cluster node listening. The message is replied, and the first distributed lock manager is fed back that the information processing is completed.
  • the first cluster node in response to switching the communication mode of the first cluster node from the network communication mode to the socket communication mode, stops listening to any area on the shared disk, wherein the listening includes a period for the shared disk sexual polling.
  • generating a plurality of buffers corresponding to the channel depth in each channel includes: acquiring a predetermined channel depth based on the required message concurrent processing capability, and generating buffers corresponding to the channel depth in each channel A plurality of buffers with a positively related number of depths, wherein each buffer is configured with a disk space for storing a piece of first distributed lock manager information, a piece of second distributed lock manager information, or a message reply.
  • the message communication area for the third cluster node is divided on the shared disk, and the message communication area of the third cluster node is divided into a message communication area corresponding to each cluster node. corresponding to multiple channels, and generate multiple buffers corresponding to the channel depth in each channel, and at the same time, divide the channel corresponding to the third cluster node in the message communication area of other existing cluster nodes.
  • the communication device between cluster nodes divides the message communication area for each cluster node on the shared disk of the cluster, and divides each message communication area to communicate with each cluster. a plurality of channels corresponding to a node, and generate a plurality of buffers corresponding to the depth of the channel in each channel; in response to a first cluster node of the cluster detecting a connection between the first cluster node and a second cluster node in the same cluster
  • the socket connection is interrupted, and the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode, and the message communication area of the first cluster node on the shared disk is continuously monitored; in response to the first cluster node being in the network In the communication mode, the first distributed lock manager information is sent to the second cluster node, and the first distributed lock manager information is written in the message communication area of the second cluster node on the shared disk that is related to the first cluster node.
  • the processing function processes the information of the second distributed lock manager, pops up the buffer where the information of the second distributed lock manager is located, and sends a message reply to the second cluster node, which can avoid the contention phenomenon of DLM communication and reduce large
  • the read and write overhead and delay of data in large-scale clusters improve the practicability and availability of the shared disk communication system in the cluster.
  • the embodiment of the above-mentioned device uses the embodiment of the communication method between cluster nodes to specifically illustrate the working process of each module.
  • Those skilled in the art can easily imagine that applying these modules to the cluster node In other embodiments of the inter-communication method.
  • each step in the embodiment of the communication method between cluster nodes can be interleaved, replaced, added, or deleted, these reasonable permutations and combinations should also belong to the protection scope of the present application for the device. And the scope of protection of the present application should not be limited to the examples described.
  • the implementation of all or part of the processes in the methods of the above embodiments can be completed by instructing related hardware through computer programs, and the programs can be stored in a computer-readable storage medium
  • the program When the program is executed, it may include the processes of the embodiments of the above-mentioned methods.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM) and the like.
  • the computer program embodiments can achieve the same or similar effects as any of the corresponding foregoing method embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Multi Processors (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present application discloses a communication method and apparatus between cluster nodes. The method comprises: dividing a message communication area for each cluster node on a shared disk of a cluster, a plurality of channels corresponding to each cluster node, and a plurality of buffers corresponding to the depths of the channels; switching the communication mode of a first cluster node from a socket communication mode to a network communication mode, and continuously monitoring the message communication area of the first cluster node on the shared disk; writing first distributed lock manager information into the channel corresponding to the first cluster node in the message communication area of a second cluster node on the shared disk; and invoking an information processing function to process second distributed lock manager information, popping up the buffer where the second distributed lock manager information is located, and sending a message reply to the second cluster node. The present application can avoid the phenomenon of contention of DLM communication, reduce the read-write overhead and delay of data in a large-scale cluster, and improve the practicability and availability of a shared disk communication system in the cluster.

Description

一种集群节点间通信方法和装置A communication method and device between cluster nodes
本申请要求在2021年07月20日提交中国专利局、申请号为202110820023.7、发明名称为“一种集群节点间通信方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application submitted to the China Patent Office on July 20, 2021, with the application number 202110820023.7, and the title of the invention is "a communication method and device between cluster nodes", the entire content of which is incorporated herein by reference. Applying.
技术领域technical field
本申请涉及网络传输领域,更具体地,特别是指一种集群节点间通信方法和装置。The present application relates to the field of network transmission, and more specifically, to a communication method and device among cluster nodes.
背景技术Background technique
在服务器虚拟化领域,由于集群文件系统可以同时被多个服务器共享并进行挂载,常会被用作多个计算节点与集中式存储之间的桥梁。集群文件系统能够提供文件并发访问控制,完整性保证,高可用以及冗余性等,被虚拟化系统用于存储虚拟机镜像,共享存储池等。而分布式锁管理器(DLM)是构成集群文件系统的关键组件,用于对共享资源的并发访问进行管理;它主要解决了集群节点之间磁盘缓存一致性问题,提高了共享文件访问的效率。常见的集群文件系统如GFS,VMFS,OpenVMS Files,ocfs2等都实现了自己的DLM。In the field of server virtualization, since the cluster file system can be shared and mounted by multiple servers at the same time, it is often used as a bridge between multiple computing nodes and centralized storage. The cluster file system can provide file concurrent access control, integrity assurance, high availability, and redundancy, etc., and is used by virtualization systems to store virtual machine images and share storage pools. The distributed lock manager (DLM) is a key component of the cluster file system and is used to manage concurrent access to shared resources; it mainly solves the problem of disk cache consistency between cluster nodes and improves the efficiency of shared file access . Common cluster file systems such as GFS, VMFS, OpenVMS Files, ocfs2, etc. have implemented their own DLM.
在DLM工作过程中,需要依赖于网络进行节点间通信以同步锁信息,包括锁信息查询、获取远程锁、锁降级等操作,因此网络的可靠性直接影响了DLM的效率与稳定性。常见的DLM实现方法中,在集群的节点之间基于指定端口建立socket(套接字)长连接,对锁消息进行封装后通过TCP/IP来进行锁信息交互。但网络的稳定性较差,网络的波动、延迟都会影响到DLM消息的传输,直接影响到了集群文件系统的工作,乃至引发文件系统的保护机制(fence),造成集群的部分节点瘫痪。而在服务器虚拟化场景中, TCP/IP网络的可靠性是较低的,因此该设计会大大影响系统的整体可靠性。During the working process of DLM, it needs to rely on the network for inter-node communication to synchronize lock information, including operations such as lock information query, remote lock acquisition, and lock downgrade. Therefore, the reliability of the network directly affects the efficiency and stability of DLM. In a common DLM implementation method, a persistent socket (socket) connection is established between nodes of the cluster based on a specified port, and lock information is exchanged through TCP/IP after encapsulating lock messages. However, the stability of the network is poor. Network fluctuations and delays will affect the transmission of DLM messages, directly affect the work of the cluster file system, and even trigger the protection mechanism (fence) of the file system, causing some nodes in the cluster to be paralyzed. However, in the server virtualization scenario, the reliability of the TCP/IP network is low, so this design will greatly affect the overall reliability of the system.
针对上述问题,现有技术中提供了一种基于共享磁盘并行通信的DLM实现方法(公开号为109376014B),使得集群文件系统的工作不依赖于TCP/IP网络,大大提高了系统的可靠性与高可用性。然而该方案中当多个节点向故障节点发送消息时,需要依赖于disk paxos算法来争抢消息发送区域。然而disk paxos算法本身执行过程较为复杂,需要经历多节点发起提案、等待提案被接收等以及在选举失败后通过随机延迟来避免冲突等过程,会显著的增加IO(读写)开销和时延,这一过程本身是比较耗时的。尤其对于大规模集群,disk paxos的冲突概率增加,使得IO开销和时延进一步增加,影响了集群的性能,也限制了集群的规模。In view of the above problems, a DLM implementation method based on shared disk parallel communication is provided in the prior art (the publication number is 109376014B), so that the work of the cluster file system does not depend on the TCP/IP network, which greatly improves the reliability and reliability of the system. high availability. However, in this scheme, when multiple nodes send messages to the faulty node, they need to rely on the disk paxos algorithm to compete for the message sending area. However, the execution process of the disk paxos algorithm itself is relatively complicated. It needs to go through processes such as multi-node initiation of proposals, waiting for proposals to be received, and random delays to avoid conflicts after election failures, which will significantly increase IO (reading and writing) overhead and delay. This process itself is relatively time-consuming. Especially for large-scale clusters, the collision probability of disk paxos increases, which further increases IO overhead and delay, affects the performance of the cluster, and limits the scale of the cluster.
针对现有技术中节点间DLM通信的争抢机制在大规模集群中数据读写开销高、时延长的问题,目前尚无有效的解决方案。There is currently no effective solution to the problem of high data read and write overhead and prolonged time in large-scale clusters in the contention mechanism of DLM communication between nodes in the prior art.
发明内容Contents of the invention
有鉴于此,本申请实施例的目的在于提出一种集群节点间通信方法和装置,能够避免DLM通信的争抢现象,降低大规模集群中数据的读写开销和时延,提升集群中共享磁盘通信系统的实用性和可用性。In view of this, the purpose of the embodiment of the present application is to propose a communication method and device between cluster nodes, which can avoid the contention phenomenon of DLM communication, reduce the read and write overhead and delay of data in large-scale clusters, and improve the sharing of disks in clusters. Availability and usability of communication systems.
基于上述目的,本申请实施例的第一方面提供了一种集群节点间通信方法,包括执行以下步骤:Based on the above purpose, the first aspect of the embodiment of the present application provides a communication method between cluster nodes, including performing the following steps:
在集群的共享磁盘上为每个集群节点划分消息通信区域,在每个消息通信区域上划分出与每个集群节点相对应的多个通道,并在每个通道内生成与通道深度相对应的多个缓冲器;Divide a message communication area for each cluster node on the shared disk of the cluster, divide multiple channels corresponding to each cluster node in each message communication area, and generate a channel corresponding to the channel depth in each channel multiple buffers;
响应于集群的第一集群节点检测到第一集群节点与处于同一集群的第二集群节点的套接字连接中断,而将第一集群节点的通信模式从套接字通信模式切换为网络通信模式,并持续监听共享磁盘上第一集群节点的消息通信区域;In response to the first cluster node of the cluster detecting that the socket connection between the first cluster node and the second cluster node in the same cluster is interrupted, the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode , and continuously monitor the message communication area of the first cluster node on the shared disk;
响应于第一集群节点在网络通信模式下向第二集群节点发送第一分布 式锁管理器信息,而将第一分布式锁管理器信息写入在共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中;Responsive to the first cluster node sending the first distributed lock manager information to the second cluster node in the network communication mode, and writing the first distributed lock manager information into the message communication area of the second cluster node on the shared disk in the channel corresponding to the first cluster node;
响应于第一集群节点监听到共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道中接收到第二分布式锁管理器信息,而调用信息处理函数处理第二分布式锁管理器信息,弹出第二分布式锁管理器信息所在的缓冲器,并向第二集群节点发送消息回复。In response to the first cluster node monitoring the receipt of the second distributed lock manager information in the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk, the information processing function is called to process the second Distributed lock manager information, pop up the buffer where the second distributed lock manager information is located, and send a message reply to the second cluster node.
在一些实施方式中,将第一分布式锁管理器信息写入在共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中之后,还持续监听共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道。In some implementations, after the information of the first distributed lock manager is written into the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, it continues to monitor the first lock manager on the shared disk. A channel corresponding to the first cluster node in the message communication area of the two cluster nodes.
在一些实施方式中,响应于第二集群节点监听到共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中接收到第一分布式锁管理器信息,而调用信息处理函数处理第一分布式锁管理器信息,弹出第一分布式锁管理器信息所在的缓冲器,并向共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中写入针对第一分布式锁管理器信息的第一消息回复。In some implementations, in response to the second cluster node monitoring the receipt of the first distributed lock manager information in the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the second cluster node on the shared disk corresponding to the first cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
在一些实施方式中,响应于第一集群节点监听到共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道接收到针对第一分布式锁管理器信息的第一消息回复,而反馈第一分布式锁管理器信息处理完成并停止监听共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道。In some implementations, in response to the first cluster node listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk receiving the first message for the first distributed lock manager message reply, and feedback that the first distributed lock manager completes the information processing and stops listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk.
在一些实施方式中,响应于第二集群节点监听到共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中接收到第一分布式锁管理器信息,而调用信息处理函数处理第一分布式锁管理器信息,弹出第一分布式锁管理器信息所在的缓冲器,并向共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道中写入针对第一分布式锁管理器信息的第一消息回复。In some implementations, in response to the second cluster node monitoring the receipt of the first distributed lock manager information in the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the first cluster node on the shared disk corresponding to the second cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
在一些实施方式中,响应于第一集群节点监听到共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道接收到针对第一分布式锁管理器信息的第一消息回复,而反馈第一分布式锁管理器信息处理完成。In some implementations, the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk receives the first message for the first distributed lock manager in response to the first cluster node listening. The message is replied, and the first distributed lock manager is fed back that the information processing is completed.
在一些实施方式中,响应于第一集群节点的通信模式从网络通信模式切换为套接字通信模式,而停止第一集群节点对共享磁盘上任何区域的监听,其中监听包括针对共享磁盘的周期性的轮询。In some implementations, in response to switching the communication mode of the first cluster node from the network communication mode to the socket communication mode, the first cluster node stops listening to any area on the shared disk, wherein the listening includes a period for the shared disk sexual polling.
在一些实施方式中,在每个通道内生成与通道深度相对应的多个缓冲器包括:获取基于所需的消息并发处理能力而预先确定的通道深度,并在每个通道内分别生成与通道深度正相关数量的多个缓冲器,其中每个缓冲器配置有存储一条第一分布式锁管理器信息、一条第二分布式锁管理器信息、或一条消息回复的磁盘空间。In some implementations, generating a plurality of buffers corresponding to the channel depth in each channel includes: acquiring a predetermined channel depth based on the required message concurrent processing capability, and generating buffers corresponding to the channel depth in each channel A plurality of buffers with a positively related number of depths, wherein each buffer is configured with a disk space for storing a piece of first distributed lock manager information, a piece of second distributed lock manager information, or a message reply.
在一些实施方式中,响应于集群中加入了第三集群节点,而在共享磁盘上为第三集群节点划分消息通信区域,在第三集群节点的消息通信区域上划分出与每个集群节点相对应的多个通道,并在每个通道内生成与通道深度相对应的多个缓冲器,同时还在现有的其它集群节点的消息通信区域中划分出与第三集群节点相对应的通道。In some implementations, in response to the addition of the third cluster node in the cluster, the message communication area for the third cluster node is divided on the shared disk, and the message communication area of the third cluster node is divided into a message communication area corresponding to each cluster node. corresponding to multiple channels, and generate multiple buffers corresponding to the channel depth in each channel, and at the same time, divide the channel corresponding to the third cluster node in the message communication area of other existing cluster nodes.
本申请实施例的第二方面提供了一种集群节点间通信装置,包括:The second aspect of the embodiments of the present application provides a communication device between cluster nodes, including:
处理器;processor;
控制器,存储有处理器可运行的程序代码,处理器在运行程序代码时执行以下步骤:The controller stores program codes executable by the processor, and the processor performs the following steps when running the program codes:
在集群的共享磁盘上为每个集群节点划分消息通信区域,在每个消息通信区域上划分出与每个集群节点相对应的多个通道,并在每个通道内生成与通道深度相对应的多个缓冲器;Divide a message communication area for each cluster node on the shared disk of the cluster, divide multiple channels corresponding to each cluster node in each message communication area, and generate a channel corresponding to the channel depth in each channel multiple buffers;
响应于集群的第一集群节点检测到第一集群节点与处于同一集群的第二集群节点的套接字连接中断,而将第一集群节点的通信模式从套接字通信模式切换为网络通信模式,并持续监听共享磁盘上第一集群节点的 消息通信区域;In response to the first cluster node of the cluster detecting that the socket connection between the first cluster node and the second cluster node in the same cluster is interrupted, the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode , and continuously monitor the message communication area of the first cluster node on the shared disk;
响应于第一集群节点在网络通信模式下向第二集群节点发送第一分布式锁管理器信息,而将第一分布式锁管理器信息写入在共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中;Responsive to the first cluster node sending the first distributed lock manager information to the second cluster node in the network communication mode, and writing the first distributed lock manager information into the message communication area of the second cluster node on the shared disk in the channel corresponding to the first cluster node;
响应于第一集群节点监听到共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道中接收到第二分布式锁管理器信息,而调用信息处理函数处理第二分布式锁管理器信息,弹出第二分布式锁管理器信息所在的缓冲器,并向第二集群节点发送消息回复。In response to the first cluster node monitoring the receipt of the second distributed lock manager information in the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk, the information processing function is called to process the second Distributed lock manager information, pop up the buffer where the second distributed lock manager information is located, and send a message reply to the second cluster node.
在一些实施方式中,将第一分布式锁管理器信息写入在共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中之后,还持续监听共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道。In some implementations, after the information of the first distributed lock manager is written into the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, it continues to monitor the first lock manager on the shared disk. A channel corresponding to the first cluster node in the message communication area of the two cluster nodes.
在一些实施方式中,响应于第二集群节点监听到共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中接收到第一分布式锁管理器信息,而调用信息处理函数处理第一分布式锁管理器信息,弹出第一分布式锁管理器信息所在的缓冲器,并向共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中写入针对第一分布式锁管理器信息的第一消息回复。In some implementations, in response to the second cluster node monitoring the receipt of the first distributed lock manager information in the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the second cluster node on the shared disk corresponding to the first cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
在一些实施方式中,响应于第一集群节点监听到共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道接收到针对第一分布式锁管理器信息的第一消息回复,而反馈第一分布式锁管理器信息处理完成并停止监听共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道。In some implementations, in response to the first cluster node listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk receiving the first message for the first distributed lock manager message reply, and feedback that the first distributed lock manager completes the information processing and stops listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk.
在一些实施方式中,响应于第二集群节点监听到共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中接收到第一分布式锁管理器信息,而调用信息处理函数处理第一分布式锁管理器信息,弹出第一分布式锁管理器信息所在的缓冲器,并向共享磁盘上第一集群节点 的消息通信区域中的与第二集群节点相对应的通道中写入针对第一分布式锁管理器信息的第一消息回复。In some implementations, in response to the second cluster node monitoring the receipt of the first distributed lock manager information in the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the first cluster node on the shared disk corresponding to the second cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
在一些实施方式中,响应于第一集群节点监听到共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道接收到针对第一分布式锁管理器信息的第一消息回复,而反馈第一分布式锁管理器信息处理完成。In some implementations, the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk receives the first message for the first distributed lock manager in response to the first cluster node listening. The message is replied, and the first distributed lock manager is fed back that the information processing is completed.
在一些实施方式中,响应于第一集群节点的通信模式从网络通信模式切换为套接字通信模式,而停止第一集群节点对共享磁盘上任何区域的监听,其中监听包括针对共享磁盘的周期性的轮询。In some implementations, in response to switching the communication mode of the first cluster node from the network communication mode to the socket communication mode, the first cluster node stops listening to any area on the shared disk, wherein the listening includes a period for the shared disk sexual polling.
在一些实施方式中,在每个通道内生成与通道深度相对应的多个缓冲器包括:获取基于所需的消息并发处理能力而预先确定的通道深度,并在每个通道内分别生成与通道深度正相关数量的多个缓冲器,其中每个缓冲器配置有存储一条第一分布式锁管理器信息、一条第二分布式锁管理器信息、或一条消息回复的磁盘空间。In some implementations, generating a plurality of buffers corresponding to the channel depth in each channel includes: acquiring a predetermined channel depth based on the required message concurrent processing capability, and generating buffers corresponding to the channel depth in each channel A plurality of buffers with a positively related number of depths, wherein each buffer is configured with a disk space for storing a piece of first distributed lock manager information, a piece of second distributed lock manager information, or a message reply.
在一些实施方式中,响应于集群中加入了第三集群节点,而在共享磁盘上为第三集群节点划分消息通信区域,在第三集群节点的消息通信区域上划分出与每个集群节点相对应的多个通道,并在每个通道内生成与通道深度相对应的多个缓冲器,同时还在现有的其它集群节点的消息通信区域中划分出与第三集群节点相对应的通道。In some implementations, in response to the addition of the third cluster node in the cluster, the message communication area for the third cluster node is divided on the shared disk, and the message communication area of the third cluster node is divided into a message communication area corresponding to each cluster node. corresponding to multiple channels, and generate multiple buffers corresponding to the channel depth in each channel, and at the same time, divide the channel corresponding to the third cluster node in the message communication area of other existing cluster nodes.
本申请具有以下有益技术效果:本申请实施例提供的集群节点间通信方法和装置,通过在集群的共享磁盘上为每个集群节点划分消息通信区域,在每个消息通信区域上划分出与每个集群节点相对应的多个通道,并在每个通道内生成与通道深度相对应的多个缓冲器;响应于集群的第一集群节点检测到第一集群节点与处于同一集群的第二集群节点的套接字连接中断,而将第一集群节点的通信模式从套接字通信模式切换为网络通信模式,并持续监听共享磁盘上第一集群节点的消息通信区域;响应于第一集群节点在网络通信模式下向第二集群节点发送第一分布式锁管理器信息,而将第一分布式锁管理器信息写入在共享磁盘上第二集群节点的消息通信区域中 的与第一集群节点相对应的通道中;响应于第一集群节点监听到共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道中接收到第二分布式锁管理器信息,而调用信息处理函数处理第二分布式锁管理器信息,弹出第二分布式锁管理器信息所在的缓冲器,并向第二集群节点发送消息回复的技术方案,能够避免DLM通信的争抢现象,降低大规模集群中数据的读写开销和时延,提升集群中共享磁盘通信系统的实用性和可用性。The present application has the following beneficial technical effects: the method and device for inter-cluster node communication provided by the embodiment of the present application, by dividing the message communication area for each cluster node on the shared disk of the cluster, each message communication area is divided into each a plurality of channels corresponding to a cluster node, and generate a plurality of buffers corresponding to the depth of the channel in each channel; in response to the first cluster node of the cluster detecting that the first cluster node is in the same cluster as the second cluster node The socket connection of the node is interrupted, and the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode, and the message communication area of the first cluster node on the shared disk is continuously monitored; in response to the first cluster node Send the first distributed lock manager information to the second cluster node in the network communication mode, and write the first distributed lock manager information into the message communication area of the second cluster node on the shared disk with the first cluster In the channel corresponding to the node; in response to the first cluster node listening to the second distributed lock manager information received in the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk, and Calling the information processing function to process the information of the second distributed lock manager, popping up the buffer where the information of the second distributed lock manager is located, and sending a message reply to the second cluster node can avoid the contention phenomenon of DLM communication, Reduce the read and write overhead and delay of data in large-scale clusters, and improve the practicability and availability of shared disk communication systems in clusters.
附图说明Description of drawings
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present application. Those skilled in the art can also obtain other drawings based on these drawings without creative work.
图1为本申请提供的集群节点间通信方法的流程示意图;FIG. 1 is a schematic flow diagram of a communication method between cluster nodes provided by the present application;
图2为本申请提供的集群节点间通信方法的共享磁盘的空间划分图;Fig. 2 is the space division diagram of the shared disk of the communication method between cluster nodes provided by the present application;
图3为本申请提供的集群节点间通信方法的通信流程图。FIG. 3 is a communication flowchart of the communication method between cluster nodes provided by the present application.
具体实施方式detailed description
为使本申请的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本申请实施例进一步详细说明。In order to make the purpose, technical solution and advantages of the present application clearer, the embodiments of the present application will be further described in detail below in combination with specific embodiments and with reference to the accompanying drawings.
需要说明的是,本申请实施例中所有使用“第一”和“第二”的表述均是为了区分两个相同名称非相同的实体或者非相同的参量,可见“第一”“第二”仅为了表述的方便,不应理解为对本申请实施例的限定,后续实施例对此不再一一说明。It should be noted that all expressions using "first" and "second" in the embodiments of this application are to distinguish between two entities with the same name but different parameters or parameters that are not the same, see "first" and "second" It is only for the convenience of expression, and should not be construed as a limitation on the embodiments of the present application, which will not be described one by one in the subsequent embodiments.
基于上述目的,本申请实施例的第一个方面,提出了一种避免DLM通信的争抢现象,降低大规模集群中数据的读写开销和时延,提升集群中共享磁盘通信系统的实用性和可用性的集群节点间通信方法的一个实施例。 图1示出的是本申请提供的集群节点间通信方法的流程示意图。Based on the above purpose, the first aspect of the embodiment of the present application proposes a method to avoid the contention phenomenon of DLM communication, reduce the data read and write overhead and delay in large-scale clusters, and improve the practicability of the shared disk communication system in the cluster An embodiment of a method for inter-node communication of a cluster with availability. FIG. 1 shows a schematic flowchart of a communication method between cluster nodes provided by the present application.
所述的集群节点间通信方法,如图1所示,包括执行以下步骤:The communication method between cluster nodes, as shown in Figure 1, includes the following steps:
步骤S101,在集群的共享磁盘上为每个集群节点划分消息通信区域,在每个消息通信区域上划分出与每个集群节点相对应的多个通道,并在每个通道内生成与通道深度相对应的多个缓冲器;Step S101, divide the message communication area for each cluster node on the shared disk of the cluster, divide a plurality of channels corresponding to each cluster node in each message communication area, and generate and channel depth in each channel Corresponding multiple buffers;
步骤S103,响应于集群的第一集群节点检测到第一集群节点与处于同一集群的第二集群节点的套接字连接中断,而将第一集群节点的通信模式从套接字通信模式切换为网络通信模式,并持续监听共享磁盘上第一集群节点的消息通信区域;Step S103, in response to the first cluster node of the cluster detecting that the socket connection between the first cluster node and the second cluster node in the same cluster is interrupted, switching the communication mode of the first cluster node from the socket communication mode to Network communication mode, and continuously monitor the message communication area of the first cluster node on the shared disk;
步骤S105,响应于第一集群节点在网络通信模式下向第二集群节点发送第一分布式锁管理器信息,而将第一分布式锁管理器信息写入在共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中;Step S105, in response to the first cluster node sending the first distributed lock manager information to the second cluster node in the network communication mode, and writing the first distributed lock manager information into the second cluster node on the shared disk In the channel corresponding to the first cluster node in the message communication area;
步骤S107,响应于第一集群节点监听到共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道中接收到第二分布式锁管理器信息,而调用信息处理函数处理第二分布式锁管理器信息,弹出第二分布式锁管理器信息所在的缓冲器,并向第二集群节点发送消息回复。Step S107, in response to the first cluster node receiving the second distributed lock manager information in the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk, and calling the information processing function Processing the second distributed lock manager information, popping the buffer where the second distributed lock manager information is located, and sending a message reply to the second cluster node.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(ROM)或随机存储记忆体(RAM)等。所述计算机程序的实施例,可以达到与之对应的前述任意方法实施例相同或者相类似的效果。Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented through computer programs to instruct relevant hardware to complete. The program can be stored in a computer-readable storage medium, and the program can be executed when , may include the flow of the embodiments of the above-mentioned methods. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM) and the like. The computer program embodiments can achieve the same or similar effects as any of the corresponding foregoing method embodiments.
在一些实施方式中,将第一分布式锁管理器信息写入在共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中之后,还持续监听共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道。In some implementations, after the information of the first distributed lock manager is written into the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, it continues to monitor the first lock manager on the shared disk. A channel corresponding to the first cluster node in the message communication area of the two cluster nodes.
在一些实施方式中,响应于第二集群节点监听到共享磁盘上第二集群 节点的消息通信区域中的与第一集群节点相对应的通道中接收到第一分布式锁管理器信息,而调用信息处理函数处理第一分布式锁管理器信息,弹出第一分布式锁管理器信息所在的缓冲器,并向共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中写入针对第一分布式锁管理器信息的第一消息回复。In some implementations, in response to the second cluster node monitoring the receipt of the first distributed lock manager information in the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the second cluster node on the shared disk corresponding to the first cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
在一些实施方式中,响应于第一集群节点监听到共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道接收到针对第一分布式锁管理器信息的第一消息回复,而反馈第一分布式锁管理器信息处理完成并停止监听共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道。In some implementations, in response to the first cluster node listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk receiving the first message for the first distributed lock manager message reply, and feedback that the first distributed lock manager completes the information processing and stops listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk.
在一些实施方式中,响应于第二集群节点监听到共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中接收到第一分布式锁管理器信息,而调用信息处理函数处理第一分布式锁管理器信息,弹出第一分布式锁管理器信息所在的缓冲器,并向共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道中写入针对第一分布式锁管理器信息的第一消息回复。In some implementations, in response to the second cluster node monitoring the receipt of the first distributed lock manager information in the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the first cluster node on the shared disk corresponding to the second cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
在一些实施方式中,响应于第一集群节点监听到共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道接收到针对第一分布式锁管理器信息的第一消息回复,而反馈第一分布式锁管理器信息处理完成。In some implementations, the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk receives the first message for the first distributed lock manager in response to the first cluster node listening. The message is replied, and the first distributed lock manager is fed back that the information processing is completed.
在一些实施方式中,响应于第一集群节点的通信模式从网络通信模式切换为套接字通信模式,而停止第一集群节点对共享磁盘上任何区域的监听,其中监听包括针对共享磁盘的周期性的轮询。In some implementations, in response to switching the communication mode of the first cluster node from the network communication mode to the socket communication mode, the first cluster node stops listening to any area on the shared disk, wherein the listening includes a period for the shared disk sexual polling.
在一些实施方式中,在每个通道内生成与通道深度相对应的多个缓冲器包括:获取基于所需的消息并发处理能力而预先确定的通道深度,并在每个通道内分别生成与通道深度正相关数量的多个缓冲器,其中每个缓冲器配置有存储一条第一分布式锁管理器信息、一条第二分布式锁管理器信息、或一条消息回复的磁盘空间。In some implementations, generating a plurality of buffers corresponding to the channel depth in each channel includes: acquiring a predetermined channel depth based on the required message concurrent processing capability, and generating buffers corresponding to the channel depth in each channel A plurality of buffers with a positively related number of depths, wherein each buffer is configured with a disk space for storing a piece of first distributed lock manager information, a piece of second distributed lock manager information, or a message reply.
在一些实施方式中,响应于集群中加入了第三集群节点,而在共享磁盘上为第三集群节点划分消息通信区域,在第三集群节点的消息通信区域上划分出与每个集群节点相对应的多个通道,并在每个通道内生成与通道深度相对应的多个缓冲器,同时还在现有的其它集群节点的消息通信区域中划分出与第三集群节点相对应的通道。In some implementations, in response to the addition of the third cluster node in the cluster, the message communication area for the third cluster node is divided on the shared disk, and the message communication area of the third cluster node is divided into a message communication area corresponding to each cluster node. corresponding to multiple channels, and generate multiple buffers corresponding to the channel depth in each channel, and at the same time, divide the channel corresponding to the third cluster node in the message communication area of other existing cluster nodes.
结合这里的公开所描述的各种示例性逻辑块、模块、电路和算法步骤可以被实现为电子硬件、计算机软件或两者的组合。为了清楚地说明硬件和软件的这种可互换性,已经就各种示意性组件、方块、模块、电路和步骤的功能对其进行了一般性的描述。这种功能是被实现为软件还是被实现为硬件取决于具体应用以及施加给整个系统的设计约束。本领域技术人员可以针对每种具体应用以各种方式来实现所述的功能,但是这种实现决定不应被解释为导致脱离本申请实施例公开的范围。The various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described generally in terms of their functionality. Whether such functionality is implemented as software or as hardware depends upon the particular application and design constraints imposed on the overall system. Those skilled in the art can implement the described functions in various ways for each specific application, but such implementation decisions should not be interpreted as causing a departure from the scope disclosed in the embodiments of the present application.
下面根据图2、3所示的具体实施例进一步阐述本申请的具体实施方式。The specific implementation manner of the present application will be further described below according to the specific embodiments shown in FIGS. 2 and 3 .
本方案给出了一种基于共享磁盘多通道通信的分布式锁管理器实现方法。首先,在共享磁盘中指定区域,为集群中的每个节点预留一段地址空间作为消息通信区域。该通信区域由集群支持最大节点个数(设其为M)个通道(channel)组成,每个通道由深度为N的buffer(缓冲器)组成。其次,当节点A感知到和节点B的网络连接断开后,会从socket通信模式切换至网络通信模式。A会将DLM消息写入到节点B的通信区域中的通道A;同时,由于socket连接是双向的,因此B也会感知到和A连接断开,因此B会监听自己消息区域中的通道A。该监听过程是通过轮询实现的,当B检测到有效消息后对消息进行处理,并将回复消息写回该区域。同理,当B要向A发送消息时,也会将消息写入A消息区域的通道B,从而实现了双向消息通信。此外,为了减小磁盘轮询的IO压力,节点只有在感知到和其它节点连接断开后,才会将该节点对应的通道加入的轮询的通道列表,避免了不必要的IO开销。This program presents a distributed lock manager implementation method based on shared disk multi-channel communication. First, specify an area in the shared disk, and reserve an address space for each node in the cluster as a message communication area. The communication area is composed of channels (channels) supported by the cluster with the maximum number of nodes (let it be M), and each channel is composed of a buffer (buffer) with a depth of N. Secondly, when node A senses that the network connection with node B is disconnected, it will switch from the socket communication mode to the network communication mode. A will write the DLM message to channel A in the communication area of node B; at the same time, because the socket connection is bidirectional, B will also perceive that the connection with A is disconnected, so B will listen to channel A in its own message area . The monitoring process is realized by polling. When B detects a valid message, it processes the message and writes the reply message back to the area. Similarly, when B wants to send a message to A, it will also write the message into channel B in the message area of A, thus realizing two-way message communication. In addition, in order to reduce the IO pressure of disk polling, the node will only add the channel corresponding to the node to the polling channel list after it senses that it is disconnected from other nodes, avoiding unnecessary IO overhead.
基于共享磁盘多通道通信的DLM锁管理器方案优化磁盘通信中的通信区域选择问题,通过引入了多消息通道机制避免了消息发送区域的争抢 问题,大大提高了磁盘通信方案的效率,并降低了磁盘的IO压力,进一步提高磁盘通信方案的使用范围和实用性。该方案可适用于IP-SAN存储及FC-SAN存储。The DLM lock manager scheme based on shared disk multi-channel communication optimizes the communication area selection problem in disk communication. By introducing a multi-message channel mechanism, it avoids the contention of the message sending area, greatly improves the efficiency of the disk communication scheme, and reduces The IO pressure of the disk is reduced, and the application range and practicability of the disk communication scheme are further improved. This solution is applicable to IP-SAN storage and FC-SAN storage.
具体来说,首先在共享磁盘中指定区域,为集群中的每个节点预留一段地址空间作为消息通信区域。该通信区域由集群支持最大节点个数(设其为M)个通道(channel)组成,每个通道由深度为N的slot(buffer)组成。如“通信空间磁盘布局”中所示,该集群由5个节点组成,因此格式化阶段预留了5个节点的通信区域,每个通信区域由M个通道组成,每个通道由深度为N的slot(buffer)组成。这种划分方式详见图2。应当注意对角线区域未使用,因为节点不会向自己发送消息。此外,集群文件系统还支持向集群动态增加节点;在增加节点时,会相应的增加节点的消息通信区域。Specifically, first specify an area in the shared disk, and reserve an address space for each node in the cluster as a message communication area. The communication area is composed of channels (channels) that support the maximum number of nodes (let it be M) in the cluster, and each channel is composed of slots (buffers) with a depth of N. As shown in "Communication Space Disk Layout", the cluster consists of 5 nodes, so a communication area of 5 nodes is reserved in the formatting stage, each communication area consists of M channels, and each channel consists of a depth of N The slot (buffer) composition. This division method is shown in Figure 2 for details. It should be noted that the diagonal area is not used, since nodes do not send messages to themselves. In addition, the cluster file system also supports dynamically adding nodes to the cluster; when adding nodes, the message communication area of the nodes will be correspondingly increased.
当节点A感知到和节点B的网络连接断开后,会从socket通信模式切换至网络通信模式。A会将DLM消息写入到节点B的通信区域中的通道A;同时,由于socket连接是双向的,因此B也会感知到和A连接断开,因此B会监听自己消息区域中的通道A。该监听过程是通过轮询实现的,当B检测到有效消息后对消息进行处理,并将回复消息写回该区域。同理,当B要向A发送消息时,也会将消息写入A消息区域的通道B,从而实现了双向消息通信。此外,为了减小磁盘轮询的IO压力,节点只有在感知到和其它节点连接断开后,才会将该节点对应的通道加入的轮询的通道列表,避免了不必要的IO开销。When node A senses that the network connection with node B is disconnected, it will switch from the socket communication mode to the network communication mode. A will write the DLM message to channel A in the communication area of node B; at the same time, because the socket connection is bidirectional, B will also perceive that the connection with A is disconnected, so B will listen to channel A in its own message area . The monitoring process is realized by polling. When B detects a valid message, it processes the message and writes the reply message back to the area. Similarly, when B wants to send a message to A, it will also write the message into channel B in the message area of A, thus realizing two-way message communication. In addition, in order to reduce the IO pressure of disk polling, the node will only add the channel corresponding to the node to the polling channel list after it senses that it is disconnected from other nodes, avoiding unnecessary IO overhead.
同样以图2为例,节点3向节点1发送数据,则通道位于节点1的通信区域的通道3,记为[node 1,chan 3,slot x]。其中x是消息通道缓存的索引,消息接收方和发送方会按约定递增,可实现消息的并发发送和接收。Also take Figure 2 as an example, when node 3 sends data to node 1, the channel is located in channel 3 of the communication area of node 1, which is recorded as [node 1, chan 3, slot x]. Where x is the index of the message channel cache, and the message receiver and sender will increment according to the agreement, which can realize concurrent sending and receiving of messages.
参见图3中的编号,节点3向节点1发送消息时主要包括以下几个步骤:Referring to the numbers in Figure 3, when node 3 sends a message to node 1, it mainly includes the following steps:
(1)节点3将DLM消息写入[node 1,chan 3,slot x];(1) Node 3 writes the DLM message to [node 1, chan 3, slot x];
(2)节点3将chan 3加入消息发送监听列表,并周期性轮询来查看消 息返回结果;(2) Node 3 adds chan 3 to the message sending monitoring list, and periodically polls to check the message return result;
(3)由于节点1在接收到节点3socket连接断开事件后,会把node1对应的消息接收通道[node 1,chan 3,slot x]放入消息接收监听列表,周期性轮询看对方节点发送消息;(3) Since node 1 receives the node 3 socket connection disconnection event, it will put the message receiving channel [node 1, chan 3, slot x] corresponding to node 1 into the message receiving monitoring list, and periodically poll to see the sending of the other node information;
(4)节点1通过轮询接收到来自节点3的消息;(4) Node 1 receives the message from Node 3 by polling;
(5)节点1调用信息处理函数对消息进行处理;(5) Node 1 calls the information processing function to process the message;
(6)节点1消息处理结束,将ACK消息写回消息通道[node 1,chan 3,slot x],表示消息处理完毕;(6) Node 1 finishes processing the message, and writes the ACK message back to the message channel [node 1, chan 3, slot x], indicating that the message is processed;
(7)节点3通过轮询接收到节点1的消息回复,并将[node 1,chan 3,slot x]移出消息发送监听列表,完成一次消息发送流程。(7) Node 3 receives the message reply from node 1 through polling, and removes [node 1, chan 3, slot x] from the message sending monitoring list to complete a message sending process.
同理,当节点1向节点3发送消息时,会选择通道[node 3,chan 1,slot y]进行消息通信,过程同上。Similarly, when node 1 sends a message to node 3, it will select the channel [node 3, chan 1, slot y] for message communication, and the process is the same as above.
不过应当说明的是,这种方式中节点3在正常运转时需要轮询[node 3,chan x,slot x]以接收其它节点可能发来的信息,并且在向节点1发送消息后需要额外的轮询[node 1,chan 3,slot x]。在另一种实施方式中,节点1消息处理结束,不再将ACK消息写回消息通道[node 1,chan 3,slot x],而是写回[node 3,chan 1,slot x],这样一来节点3就可以永远只需要轮询[node 3,chan x,slot x],在同样避免DLM通信的争抢现象的前提下进一步降低了节点3对磁盘的轮询压力。However, it should be noted that in this way, node 3 needs to poll [node 3, chan x, slot x] to receive information that may be sent by other nodes during normal operation, and additional Polling [node 1, chan 3, slot x]. In another embodiment, when node 1 finishes processing the message, it no longer writes the ACK message back to the message channel [node 1, chan 3, slot x], but writes it back to [node 3, chan 1, slot x], so First, node 3 can only poll [node 3, chan x, slot x] forever, further reducing the polling pressure of node 3 on the disk under the premise of avoiding the contention phenomenon of DLM communication.
顺带一提,虽然本申请围绕的是如何避免DLM通信的争抢现象,但是实际上本申请的技术方案不仅仅适用与DLM数据。对于任何旨在避免争抢通信的数据传输,都可以套用本申请的技术方案以获得相同或相似的技术效果。By the way, although this application focuses on how to avoid contention in DLM communication, in fact the technical solution of this application is not only applicable to DLM data. For any data transmission aimed at avoiding communication contention, the technical solution of the present application can be applied to obtain the same or similar technical effects.
此外,根据本申请实施例公开的方法还可以被实现为由CPU执行的计算机程序,该计算机程序可以存储在计算机可读存储介质中。在该计算机程序被CPU执行时,执行本申请实施例公开的方法中限定的上述功能。上述方法步骤以及系统单元也可以利用控制器以及用于存储使得控制器实现 上述步骤或单元功能的计算机程序的计算机可读存储介质实现。In addition, the method disclosed according to the embodiment of the present application may also be implemented as a computer program executed by a CPU, and the computer program may be stored in a computer-readable storage medium. When the computer program is executed by the CPU, the above functions defined in the methods disclosed in the embodiments of the present application are executed. The above-mentioned method steps and system units can also be implemented by using a controller and a computer-readable storage medium for storing a computer program that enables the controller to realize the functions of the above-mentioned steps or units.
从上述实施例可以看出,本申请实施例提供的集群节点间通信方法,通过在集群的共享磁盘上为每个集群节点划分消息通信区域,在每个消息通信区域上划分出与每个集群节点相对应的多个通道,并在每个通道内生成与通道深度相对应的多个缓冲器;响应于集群的第一集群节点检测到第一集群节点与处于同一集群的第二集群节点的套接字连接中断,而将第一集群节点的通信模式从套接字通信模式切换为网络通信模式,并持续监听共享磁盘上第一集群节点的消息通信区域;响应于第一集群节点在网络通信模式下向第二集群节点发送第一分布式锁管理器信息,而将第一分布式锁管理器信息写入在共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中;响应于第一集群节点监听到共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道中接收到第二分布式锁管理器信息,而调用信息处理函数处理第二分布式锁管理器信息,弹出第二分布式锁管理器信息所在的缓冲器,并向第二集群节点发送消息回复的技术方案,能够避免DLM通信的争抢现象,降低大规模集群中数据的读写开销和时延,提升集群中共享磁盘通信系统的实用性和可用性。It can be seen from the above embodiments that in the communication method between cluster nodes provided by the embodiment of the present application, by dividing the message communication area for each cluster node on the shared disk of the cluster, each message communication area is divided into a message communication area with each cluster node. a plurality of channels corresponding to a node, and generate a plurality of buffers corresponding to the depth of the channel in each channel; in response to a first cluster node of the cluster detecting a connection between the first cluster node and a second cluster node in the same cluster The socket connection is interrupted, and the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode, and the message communication area of the first cluster node on the shared disk is continuously monitored; in response to the first cluster node being in the network In the communication mode, the first distributed lock manager information is sent to the second cluster node, and the first distributed lock manager information is written in the message communication area of the second cluster node on the shared disk that is related to the first cluster node. In the corresponding channel; in response to receiving the second distributed lock manager information in the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk by the first cluster node, and calling the information The processing function processes the information of the second distributed lock manager, pops up the buffer where the information of the second distributed lock manager is located, and sends a message reply to the second cluster node, which can avoid the contention phenomenon of DLM communication and reduce large The read and write overhead and delay of data in large-scale clusters improve the practicability and availability of the shared disk communication system in the cluster.
需要特别指出的是,上述集群节点间通信方法的各个实施例中的各个步骤均可以相互交叉、替换、增加、删减,因此,这些合理的排列组合变换之于集群节点间通信方法也应当属于本申请的保护范围,并且不应将本申请的保护范围局限在所述实施例之上。It should be pointed out that each step in each embodiment of the communication method between cluster nodes can be interleaved, replaced, added, or deleted. Therefore, these reasonable permutations and combinations should also belong to the communication method between cluster nodes. protection scope of the present application and should not limit the protection scope of the application to the examples described.
基于上述目的,本申请实施例的第二个方面,提出了一种避免DLM通信的争抢现象,降低大规模集群中数据的读写开销和时延,提升集群中共享磁盘通信系统的实用性和可用性的集群节点间通信装置的一个实施例。装置包括:Based on the above purpose, the second aspect of the embodiment of the present application proposes a method to avoid the contention phenomenon of DLM communication, reduce the read and write overhead and delay of data in large-scale clusters, and improve the practicability of the shared disk communication system in the cluster An embodiment of a means for inter-node communication of a cluster with availability. Devices include:
处理器;processor;
控制器,存储有处理器可运行的程序代码,处理器在运行程序代码时执行以下步骤:The controller stores program codes executable by the processor, and the processor performs the following steps when running the program codes:
在集群的共享磁盘上为每个集群节点划分消息通信区域,在每个消息通信区域上划分出与每个集群节点相对应的多个通道,并在每个通道内生成与通道深度相对应的多个缓冲器;Divide a message communication area for each cluster node on the shared disk of the cluster, divide multiple channels corresponding to each cluster node in each message communication area, and generate a channel corresponding to the channel depth in each channel multiple buffers;
响应于集群的第一集群节点检测到第一集群节点与处于同一集群的第二集群节点的套接字连接中断,而将第一集群节点的通信模式从套接字通信模式切换为网络通信模式,并持续监听共享磁盘上第一集群节点的消息通信区域;In response to the first cluster node of the cluster detecting that the socket connection between the first cluster node and the second cluster node in the same cluster is interrupted, the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode , and continuously monitor the message communication area of the first cluster node on the shared disk;
响应于第一集群节点在网络通信模式下向第二集群节点发送第一分布式锁管理器信息,而将第一分布式锁管理器信息写入在共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中;Responsive to the first cluster node sending the first distributed lock manager information to the second cluster node in the network communication mode, and writing the first distributed lock manager information into the message communication area of the second cluster node on the shared disk in the channel corresponding to the first cluster node;
响应于第一集群节点监听到共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道中接收到第二分布式锁管理器信息,而调用信息处理函数处理第二分布式锁管理器信息,弹出第二分布式锁管理器信息所在的缓冲器,并向第二集群节点发送消息回复。In response to the first cluster node monitoring the receipt of the second distributed lock manager information in the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk, the information processing function is called to process the second Distributed lock manager information, pop up the buffer where the second distributed lock manager information is located, and send a message reply to the second cluster node.
本申请例公开所述的装置、设备等可为各种电子终端设备,例如手机、个人数字助理(PDA)、平板电脑(PAD)、智能电视等,也可以是大型终端设备,如服务器等,因此本申请实施例公开的保护范围不应限定为某种特定类型的装置、设备。本申请实施例公开所述的客户端可以是以电子硬件、计算机软件或两者的组合形式应用于上述任意一种电子终端设备中。The devices and equipment disclosed in the examples of this application can be various electronic terminal equipment, such as mobile phones, personal digital assistants (PDA), tablet computers (PAD), smart TVs, etc., or large terminal equipment, such as servers, etc. Therefore, the scope of protection disclosed in the embodiments of the present application should not be limited to a specific type of device or equipment. The client disclosed in the embodiments of the present application may be applied to any of the above-mentioned electronic terminal devices in the form of electronic hardware, computer software, or a combination of the two.
在一些实施方式中,将第一分布式锁管理器信息写入在共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中之后,还持续监听共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道。In some implementations, after the information of the first distributed lock manager is written into the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, it continues to monitor the first lock manager on the shared disk. A channel corresponding to the first cluster node in the message communication area of the two cluster nodes.
在一些实施方式中,响应于第二集群节点监听到共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中接收到第一分布式锁管理器信息,而调用信息处理函数处理第一分布式锁管理器信息,弹出第一分布式锁管理器信息所在的缓冲器,并向共享磁盘上第二集群节点 的消息通信区域中的与第一集群节点相对应的通道中写入针对第一分布式锁管理器信息的第一消息回复。In some implementations, in response to the second cluster node monitoring the receipt of the first distributed lock manager information in the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the second cluster node on the shared disk corresponding to the first cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
在一些实施方式中,响应于第一集群节点监听到共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道接收到针对第一分布式锁管理器信息的第一消息回复,而反馈第一分布式锁管理器信息处理完成并停止监听共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道。In some implementations, in response to the first cluster node listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk receiving the first message for the first distributed lock manager message reply, and feedback that the first distributed lock manager completes the information processing and stops listening to the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk.
在一些实施方式中,响应于第二集群节点监听到共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中接收到第一分布式锁管理器信息,而调用信息处理函数处理第一分布式锁管理器信息,弹出第一分布式锁管理器信息所在的缓冲器,并向共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道中写入针对第一分布式锁管理器信息的第一消息回复。In some implementations, in response to the second cluster node monitoring the receipt of the first distributed lock manager information in the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk, the call The information processing function processes the information of the first distributed lock manager, pops up the buffer where the information of the first distributed lock manager is located, and sends the information to the message communication area of the first cluster node on the shared disk corresponding to the second cluster node A first message reply to the information of the first distributed lock manager is written in the channel.
在一些实施方式中,响应于第一集群节点监听到共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道接收到针对第一分布式锁管理器信息的第一消息回复,而反馈第一分布式锁管理器信息处理完成。In some implementations, the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk receives the first message for the first distributed lock manager in response to the first cluster node listening. The message is replied, and the first distributed lock manager is fed back that the information processing is completed.
在一些实施方式中,响应于第一集群节点的通信模式从网络通信模式切换为套接字通信模式,而停止第一集群节点对共享磁盘上任何区域的监听,其中监听包括针对共享磁盘的周期性的轮询。In some implementations, in response to switching the communication mode of the first cluster node from the network communication mode to the socket communication mode, the first cluster node stops listening to any area on the shared disk, wherein the listening includes a period for the shared disk sexual polling.
在一些实施方式中,在每个通道内生成与通道深度相对应的多个缓冲器包括:获取基于所需的消息并发处理能力而预先确定的通道深度,并在每个通道内分别生成与通道深度正相关数量的多个缓冲器,其中每个缓冲器配置有存储一条第一分布式锁管理器信息、一条第二分布式锁管理器信息、或一条消息回复的磁盘空间。In some implementations, generating a plurality of buffers corresponding to the channel depth in each channel includes: acquiring a predetermined channel depth based on the required message concurrent processing capability, and generating buffers corresponding to the channel depth in each channel A plurality of buffers with a positively related number of depths, wherein each buffer is configured with a disk space for storing a piece of first distributed lock manager information, a piece of second distributed lock manager information, or a message reply.
在一些实施方式中,响应于集群中加入了第三集群节点,而在共享磁盘上为第三集群节点划分消息通信区域,在第三集群节点的消息通信区域 上划分出与每个集群节点相对应的多个通道,并在每个通道内生成与通道深度相对应的多个缓冲器,同时还在现有的其它集群节点的消息通信区域中划分出与第三集群节点相对应的通道。In some implementations, in response to the addition of the third cluster node in the cluster, the message communication area for the third cluster node is divided on the shared disk, and the message communication area of the third cluster node is divided into a message communication area corresponding to each cluster node. corresponding to multiple channels, and generate multiple buffers corresponding to the channel depth in each channel, and at the same time, divide the channel corresponding to the third cluster node in the message communication area of other existing cluster nodes.
从上述实施例可以看出,本申请实施例提供的集群节点间通信装置,通过在集群的共享磁盘上为每个集群节点划分消息通信区域,在每个消息通信区域上划分出与每个集群节点相对应的多个通道,并在每个通道内生成与通道深度相对应的多个缓冲器;响应于集群的第一集群节点检测到第一集群节点与处于同一集群的第二集群节点的套接字连接中断,而将第一集群节点的通信模式从套接字通信模式切换为网络通信模式,并持续监听共享磁盘上第一集群节点的消息通信区域;响应于第一集群节点在网络通信模式下向第二集群节点发送第一分布式锁管理器信息,而将第一分布式锁管理器信息写入在共享磁盘上第二集群节点的消息通信区域中的与第一集群节点相对应的通道中;响应于第一集群节点监听到共享磁盘上第一集群节点的消息通信区域中的与第二集群节点相对应的通道中接收到第二分布式锁管理器信息,而调用信息处理函数处理第二分布式锁管理器信息,弹出第二分布式锁管理器信息所在的缓冲器,并向第二集群节点发送消息回复的技术方案,能够避免DLM通信的争抢现象,降低大规模集群中数据的读写开销和时延,提升集群中共享磁盘通信系统的实用性和可用性。It can be seen from the above embodiments that the communication device between cluster nodes provided by the embodiment of the present application divides the message communication area for each cluster node on the shared disk of the cluster, and divides each message communication area to communicate with each cluster. a plurality of channels corresponding to a node, and generate a plurality of buffers corresponding to the depth of the channel in each channel; in response to a first cluster node of the cluster detecting a connection between the first cluster node and a second cluster node in the same cluster The socket connection is interrupted, and the communication mode of the first cluster node is switched from the socket communication mode to the network communication mode, and the message communication area of the first cluster node on the shared disk is continuously monitored; in response to the first cluster node being in the network In the communication mode, the first distributed lock manager information is sent to the second cluster node, and the first distributed lock manager information is written in the message communication area of the second cluster node on the shared disk that is related to the first cluster node. In the corresponding channel; in response to receiving the second distributed lock manager information in the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk by the first cluster node, and calling the information The processing function processes the information of the second distributed lock manager, pops up the buffer where the information of the second distributed lock manager is located, and sends a message reply to the second cluster node, which can avoid the contention phenomenon of DLM communication and reduce large The read and write overhead and delay of data in large-scale clusters improve the practicability and availability of the shared disk communication system in the cluster.
需要特别指出的是,上述装置的实施例采用了所述集群节点间通信方法的实施例来具体说明各模块的工作过程,本领域技术人员能够很容易想到,将这些模块应用到所述集群节点间通信方法的其他实施例中。当然,由于所述集群节点间通信方法实施例中的各个步骤均可以相互交叉、替换、增加、删减,因此,这些合理的排列组合变换之于所述装置也应当属于本申请的保护范围,并且不应将本申请的保护范围局限在所述实施例之上。It should be pointed out that the embodiment of the above-mentioned device uses the embodiment of the communication method between cluster nodes to specifically illustrate the working process of each module. Those skilled in the art can easily imagine that applying these modules to the cluster node In other embodiments of the inter-communication method. Of course, since each step in the embodiment of the communication method between cluster nodes can be interleaved, replaced, added, or deleted, these reasonable permutations and combinations should also belong to the protection scope of the present application for the device. And the scope of protection of the present application should not be limited to the examples described.
最后需要说明的是,本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关硬件来完成,所述的程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、 只读存储记忆体(ROM)或随机存储记忆体(RAM)等。所述计算机程序的实施例,可以达到与之对应的前述任意方法实施例相同或者相类似的效果。Finally, it should be noted that those skilled in the art can understand that the implementation of all or part of the processes in the methods of the above embodiments can be completed by instructing related hardware through computer programs, and the programs can be stored in a computer-readable storage medium When the program is executed, it may include the processes of the embodiments of the above-mentioned methods. Wherein, the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a random access memory (RAM) and the like. The computer program embodiments can achieve the same or similar effects as any of the corresponding foregoing method embodiments.
以上是本申请公开的示例性实施例,但是应当注意,在不背离权利要求限定的本申请实施例公开的范围的前提下,可以进行多种改变和修改。根据这里描述的公开实施例的方法权利要求的功能、步骤和/或动作不需以任何特定顺序执行。此外,尽管本申请实施例公开的元素可以以个体形式描述或要求,但除非明确限制为单数,也可以理解为多个。The above are the exemplary embodiments disclosed in the present application, but it should be noted that various changes and modifications can be made without departing from the scope of the embodiments disclosed in the present application defined by the claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. In addition, although the elements disclosed in the embodiments of the present application may be described or required in an individual form, they may also be understood as plural unless explicitly limited to a singular number.
所属领域的普通技术人员应当理解:以上任何实施例的讨论仅为示例性的,并非旨在暗示本申请实施例公开的范围(包括权利要求)被限于这些例子;在本申请实施例的思路下,以上实施例或者不同实施例中的技术特征之间也可以进行组合,并存在如上所述的本申请实施例的不同方面的许多其它变化,为了简明它们没有在细节中提供。因此,凡在本申请实施例的精神和原则之内,所做的任何省略、修改、等同替换、改进等,均应包含在本申请实施例的保护范围之内。Those of ordinary skill in the art should understand that: the discussion of any of the above embodiments is exemplary only, and is not intended to imply that the scope (including claims) disclosed by the embodiments of the present application is limited to these examples; under the idea of the embodiments of the present application , the technical features in the above embodiments or different embodiments can also be combined, and there are many other changes in different aspects of the embodiments of the present application as described above, which are not provided in details for the sake of brevity. Therefore, within the spirit and principle of the embodiments of the present application, any omissions, modifications, equivalent replacements, improvements, etc., shall be included in the protection scope of the embodiments of the present application.

Claims (10)

  1. 一种集群节点间通信方法,其特征在于,包括执行以下步骤:A communication method between cluster nodes, characterized in that it comprises the following steps:
    在集群的共享磁盘上为每个集群节点划分消息通信区域,在每个所述消息通信区域上划分出与每个所述集群节点相对应的多个通道,并在每个所述通道内生成与通道深度相对应的多个缓冲器;Divide a message communication area for each cluster node on the shared disk of the cluster, divide a plurality of channels corresponding to each of the cluster nodes in each of the message communication areas, and generate in each of the channels Multiple buffers corresponding to channel depth;
    响应于集群的第一集群节点检测到所述第一集群节点与处于同一集群的第二集群节点的套接字连接中断,而将所述第一集群节点的通信模式从套接字通信模式切换为网络通信模式,并持续监听所述共享磁盘上所述第一集群节点的所述消息通信区域;In response to the first cluster node of the cluster detecting that the socket connection between the first cluster node and the second cluster node in the same cluster is interrupted, switching the communication mode of the first cluster node from the socket communication mode In network communication mode, and continuously monitor the message communication area of the first cluster node on the shared disk;
    响应于所述第一集群节点在所述网络通信模式下向所述第二集群节点发送第一分布式锁管理器信息,而将所述第一分布式锁管理器信息写入在所述共享磁盘上所述第二集群节点的所述消息通信区域中的与所述第一集群节点相对应的所述通道中;In response to the first cluster node sending the first distributed lock manager information to the second cluster node in the network communication mode, writing the first distributed lock manager information into the shared In the channel corresponding to the first cluster node in the message communication area of the second cluster node on the disk;
    响应于所述第一集群节点监听到所述共享磁盘上所述第一集群节点的所述消息通信区域中的与所述第二集群节点相对应的所述通道中接收到第二分布式锁管理器信息,而调用信息处理函数处理所述第二分布式锁管理器信息,弹出所述第二分布式锁管理器信息所在的所述缓冲器,并向所述第二集群节点发送消息回复。Responsive to the first cluster node receiving a second distributed lock in the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk Manager information, call an information processing function to process the second distributed lock manager information, pop up the buffer where the second distributed lock manager information is located, and send a message reply to the second cluster node .
  2. 根据权利要求1所述的方法,其特征在于,将所述第一分布式锁管理器信息写入在所述共享磁盘上所述第二集群节点的所述消息通信区域中的与所述第一集群节点相对应的所述通道中之后,还持续监听所述共享磁盘上所述第二集群节点的所述消息通信区域中的与所述第一集群节点相对应的所述通道。The method according to claim 1, wherein the information of the first distributed lock manager is written in the message communication area of the second cluster node on the shared disk with the first After entering the channel corresponding to a cluster node, continuously monitor the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk.
  3. 根据权利要求2所述的方法,其特征在于,响应于所述第二集群节点监听到所述共享磁盘上所述第二集群节点的所述消息通信区域中的与所述第一集群节点相对应的所述通道中接收到所述第一分布式锁管理器信息,而调用信息处理函数处理所述第一分布式锁管理器信息,弹出所述第一分布式锁 管理器信息所在的所述缓冲器,并向所述共享磁盘上所述第二集群节点的所述消息通信区域中的与所述第一集群节点相对应的所述通道中写入针对所述第一分布式锁管理器信息的第一消息回复。The method according to claim 2, wherein, in response to the second cluster node listening to the message communication area of the second cluster node on the shared disk The information of the first distributed lock manager is received in the corresponding channel, and the information processing function is called to process the information of the first distributed lock manager, and the information of the first distributed lock manager is popped up. the buffer, and write the information for the first distributed lock management into the channel corresponding to the first cluster node in the message communication area of the second cluster node on the shared disk Reply to the first message of the device information.
  4. 根据权利要求3所述的方法,其特征在于,响应于所述第一集群节点监听到所述共享磁盘上所述第二集群节点的所述消息通信区域中的与所述第一集群节点相对应的所述通道接收到针对所述第一分布式锁管理器信息的所述第一消息回复,而反馈所述第一分布式锁管理器信息处理完成并停止监听所述共享磁盘上所述第二集群节点的所述消息通信区域中的与所述第一集群节点相对应的所述通道。The method according to claim 3, wherein, in response to the first cluster node listening to the message communication area of the second cluster node on the shared disk The corresponding channel receives the first message reply for the information of the first distributed lock manager, and feeds back that the information processing of the first distributed lock manager is completed and stops listening to the The channel corresponding to the first cluster node in the message communication area of the second cluster node.
  5. 根据权利要求1所述的方法,其特征在于,响应于所述第二集群节点监听到所述共享磁盘上所述第二集群节点的所述消息通信区域中的与所述第一集群节点相对应的所述通道中接收到所述第一分布式锁管理器信息,而调用信息处理函数处理所述第一分布式锁管理器信息,弹出所述第一分布式锁管理器信息所在的所述缓冲器,并向所述共享磁盘上所述第一集群节点的所述消息通信区域中的与所述第二集群节点相对应的所述通道中写入针对所述第一分布式锁管理器信息的第一消息回复。The method according to claim 1, wherein, in response to the second cluster node listening to the message communication area of the second cluster node on the shared disk The information of the first distributed lock manager is received in the corresponding channel, and the information processing function is called to process the information of the first distributed lock manager, and the information of the first distributed lock manager is popped up. the buffer, and write the information for the first distributed lock management to the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk Reply to the first message of the device information.
  6. 根据权利要求5所述的方法,其特征在于,响应于所述第一集群节点监听到所述共享磁盘上所述第一集群节点的所述消息通信区域中的与所述第二集群节点相对应的所述通道接收到针对所述第一分布式锁管理器信息的所述第一消息回复,而反馈所述第一分布式锁管理器信息处理完成。The method according to claim 5, wherein, in response to the first cluster node listening to the message communication area of the first cluster node on the shared disk The corresponding channel receives the first message reply for the first distributed lock manager information, and feeds back that the processing of the first distributed lock manager information is completed.
  7. 根据权利要求1所述的方法,其特征在于,响应于所述第一集群节点的通信模式从网络通信模式切换为套接字通信模式,而停止所述第一集群节点对所述共享磁盘上任何区域的监听;其中所述监听包括针对所述共享磁盘的周期性的轮询。The method according to claim 1, wherein in response to switching the communication mode of the first cluster node from the network communication mode to the socket communication mode, stopping the first cluster node from communicating with the shared disk Monitoring of any area; wherein the monitoring includes periodic polling for the shared disk.
  8. 根据权利要求1所述的方法,其特征在于,在每个所述通道内生成与通道深度相对应的多个缓冲器包括:获取基于所需的消息并发处理能力而预先确定的所述通道深度,并在每个所述通道内分别生成与所述通道深度正相关数量的多个所述缓冲器,其中每个所述缓冲器配置有存储一条所述第一分 布式锁管理器信息、一条所述第二分布式锁管理器信息、或一条消息回复的磁盘空间。The method according to claim 1, wherein generating a plurality of buffers corresponding to the channel depth in each of the channels comprises: obtaining the channel depth predetermined based on the required message concurrent processing capability , and generate a plurality of buffers in each of the channels that are positively correlated with the channel depth, wherein each of the buffers is configured to store a piece of the first distributed lock manager information, a piece of The second distributed lock manager information, or the disk space of a message reply.
  9. 根据权利要求1所述的方法,其特征在于,响应于集群中加入了第三集群节点,而在所述共享磁盘上为所述第三集群节点划分消息通信区域,在所述第三集群节点的所述消息通信区域上划分出与每个所述集群节点相对应的多个通道,并在每个所述通道内生成与通道深度相对应的多个缓冲器,同时还在现有的其它集群节点的所述消息通信区域中划分出与所述第三集群节点相对应的所述通道。The method according to claim 1, wherein, in response to the addition of a third cluster node in the cluster, a message communication area is divided for the third cluster node on the shared disk, and the third cluster node Divide multiple channels corresponding to each of the cluster nodes on the message communication area, and generate multiple buffers corresponding to the channel depth in each of the channels. The channel corresponding to the third cluster node is divided in the message communication area of the cluster node.
  10. 一种集群节点间通信装置,其特征在于,包括:A communication device between cluster nodes, characterized in that it includes:
    处理器;processor;
    控制器,存储有所述处理器可运行的程序代码,所述处理器在运行所述程序代码时执行以下步骤:The controller stores program code executable by the processor, and the processor executes the following steps when running the program code:
    在集群的共享磁盘上为每个集群节点划分消息通信区域,在每个所述消息通信区域上划分出与每个所述集群节点相对应的多个通道,并在每个所述通道内生成与通道深度相对应的多个缓冲器;Divide a message communication area for each cluster node on the shared disk of the cluster, divide a plurality of channels corresponding to each of the cluster nodes in each of the message communication areas, and generate in each of the channels Multiple buffers corresponding to channel depth;
    响应于集群的第一集群节点检测到所述第一集群节点与处于同一集群的第二集群节点的套接字连接中断,而将所述第一集群节点的通信模式从套接字通信模式切换为网络通信模式,并持续监听所述共享磁盘上所述第一集群节点的所述消息通信区域;In response to the first cluster node of the cluster detecting that the socket connection between the first cluster node and the second cluster node in the same cluster is interrupted, switching the communication mode of the first cluster node from the socket communication mode In network communication mode, and continuously monitor the message communication area of the first cluster node on the shared disk;
    响应于所述第一集群节点在所述网络通信模式下向所述第二集群节点发送第一分布式锁管理器信息,而将所述第一分布式锁管理器信息写入在所述共享磁盘上所述第二集群节点的所述消息通信区域中的与所述第一集群节点相对应的所述通道中;In response to the first cluster node sending the first distributed lock manager information to the second cluster node in the network communication mode, writing the first distributed lock manager information into the shared In the channel corresponding to the first cluster node in the message communication area of the second cluster node on the disk;
    响应于所述第一集群节点监听到所述共享磁盘上所述第一集群节点的所述消息通信区域中的与所述第二集群节点相对应的所述通道中接收到第二分布式锁管理器信息,而调用信息处理函数处理所述第二分布式锁管理器信息,弹出所述第二分布式锁管理器信息所在的所述缓冲器,并向所述第二集群节点发送消息回复。Responsive to the first cluster node receiving a second distributed lock in the channel corresponding to the second cluster node in the message communication area of the first cluster node on the shared disk Manager information, call an information processing function to process the second distributed lock manager information, pop up the buffer where the second distributed lock manager information is located, and send a message reply to the second cluster node .
PCT/CN2021/127507 2021-07-20 2021-10-29 Communication method and apparatus between cluster nodes WO2023000534A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110820023.7 2021-07-20
CN202110820023.7A CN113676515A (en) 2021-07-20 2021-07-20 Method and device for communication among cluster nodes

Publications (1)

Publication Number Publication Date
WO2023000534A1 true WO2023000534A1 (en) 2023-01-26

Family

ID=78539610

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/127507 WO2023000534A1 (en) 2021-07-20 2021-10-29 Communication method and apparatus between cluster nodes

Country Status (2)

Country Link
CN (1) CN113676515A (en)
WO (1) WO2023000534A1 (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140032595A1 (en) * 2012-07-25 2014-01-30 Netapp, Inc. Contention-free multi-path data access in distributed compute systems
US20150052104A1 (en) * 2011-05-31 2015-02-19 Ori Software Development Ltd. Efficient distributed lock manager
CN108512753A (en) * 2017-02-28 2018-09-07 华为技术有限公司 The method and device that message is transmitted in a kind of cluster file system
CN109246182A (en) * 2018-07-26 2019-01-18 郑州云海信息技术有限公司 A kind of Distributed Lock Manager and its implementation
CN111756826A (en) * 2020-06-12 2020-10-09 浪潮电子信息产业股份有限公司 DLM lock information transmission method and related device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150052104A1 (en) * 2011-05-31 2015-02-19 Ori Software Development Ltd. Efficient distributed lock manager
US20140032595A1 (en) * 2012-07-25 2014-01-30 Netapp, Inc. Contention-free multi-path data access in distributed compute systems
CN108512753A (en) * 2017-02-28 2018-09-07 华为技术有限公司 The method and device that message is transmitted in a kind of cluster file system
CN109246182A (en) * 2018-07-26 2019-01-18 郑州云海信息技术有限公司 A kind of Distributed Lock Manager and its implementation
CN111756826A (en) * 2020-06-12 2020-10-09 浪潮电子信息产业股份有限公司 DLM lock information transmission method and related device

Also Published As

Publication number Publication date
CN113676515A (en) 2021-11-19

Similar Documents

Publication Publication Date Title
WO2021254330A1 (en) Memory management method and system, client, server and storage medium
US20240137295A1 (en) Distributed workload reassignment following communication failure
CN111078426A (en) High concurrency solution under back-end micro-service architecture
CN111130835A (en) Data center dual-active system, switching method, device, equipment and medium
CN112783667B (en) Memory sharing system and method based on virtual environment
WO2018010501A1 (en) Global transaction identifier (gtid) synchronization method, apparatus and system, and storage medium
EP4213038A1 (en) Data processing method and apparatus based on distributed storage, device, and medium
CN106936931B (en) Method, related equipment and system for realizing distributed lock
US20130139178A1 (en) Cluster management system and method
US10789138B2 (en) SMB service fault processing method and storage device
US20240152290A1 (en) Data writing method, data reading method, apparatus, device, system, and medium
WO2023185071A1 (en) Data query method, data writing method, related apparatus and system
WO2024088268A1 (en) Rdma event management methods, and device and storage medium
WO2023116438A1 (en) Data access method and apparatus, and device
CN111541762A (en) Data processing method, management server, device and storage medium
CN107517277B (en) Method and device for realizing sanlock
CN111666167A (en) Input event reading processing optimization method, nonvolatile memory and terminal equipment
WO2023000534A1 (en) Communication method and apparatus between cluster nodes
CN117407159A (en) Memory space management method and device, equipment and storage medium
WO2020119608A1 (en) Spark shuffle-based remote direct memory access system and method
CN109753292B (en) Method and device for deploying multiple applications in multiple single instance database service
CN111541667A (en) Method, equipment and storage medium for intersystem message communication
CN107193989B (en) NAS cluster cache processing method and system
CN113419673B (en) Method, system, medium and device for RBD access storage cluster
CN113676502B (en) Application service access method, device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21950772

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21950772

Country of ref document: EP

Kind code of ref document: A1