CN112596960B - Distributed storage service switching method and device - Google Patents

Distributed storage service switching method and device Download PDF

Info

Publication number
CN112596960B
CN112596960B CN202011344216.1A CN202011344216A CN112596960B CN 112596960 B CN112596960 B CN 112596960B CN 202011344216 A CN202011344216 A CN 202011344216A CN 112596960 B CN112596960 B CN 112596960B
Authority
CN
China
Prior art keywords
target
network card
intelligent network
storage server
distributed storage
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011344216.1A
Other languages
Chinese (zh)
Other versions
CN112596960A (en
Inventor
钟晋明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
New H3C Cloud Technologies Co Ltd
Original Assignee
New H3C Cloud Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by New H3C Cloud Technologies Co Ltd filed Critical New H3C Cloud Technologies Co Ltd
Priority to CN202011344216.1A priority Critical patent/CN112596960B/en
Publication of CN112596960A publication Critical patent/CN112596960A/en
Application granted granted Critical
Publication of CN112596960B publication Critical patent/CN112596960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/16Error detection or correction of the data by redundancy in hardware
    • G06F11/20Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements
    • G06F11/2053Error detection or correction of the data by redundancy in hardware using active fault-masking, e.g. by switching out faulty elements or by switching in spare elements where persistent mass storage functionality or persistent mass storage control functionality is redundant
    • G06F11/2089Redundant storage control functionality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The present disclosure relates to the field of data storage technologies, and in particular, to a method and an apparatus for switching distributed storage services. The method comprises the following steps: receiving a first data read-write request sent by a client, and determining a target storage server for processing the first data read-write request; the first data read-write request is sent to a target intelligent network card corresponding to a target storage server, so that the target intelligent network card processes the first data read-write request based on a distributed storage service running locally; and if the target intelligent network card fault is detected, starting the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state, so that the target storage server processes data on the basis of the distributed storage service which runs locally, and the second data read-write request which is sent by the client and is required to be processed by the target storage server.

Description

Distributed storage service switching method and device
Technical Field
The present disclosure relates to the field of data storage technologies, and in particular, to a method and an apparatus for switching distributed storage services.
Background
Distributed storage is the decentralized storage of data on multiple independent devices. The traditional network storage system adopts a centralized storage server to store all data, and the storage server becomes a bottleneck of system performance, is also a focus of reliability and safety, and cannot meet the requirements of large-scale storage application. The distributed network storage system adopts an expandable system structure, utilizes a plurality of storage servers to share the storage load, and utilizes the position servers to position the storage information, thereby improving the reliability, availability and access efficiency of the system and being easy to expand.
The traditional Serversan (software defined storage) distributed storage architecture is: the server is provided with a common network card, and the distributed storage software runs on an x86 CPU of the host. The communication between nodes adopts TCP protocol and passes through the kernel network protocol stack. Distributed storage on each node manages NVMe devices local to the physical host (PCIe). The cluster linkage provides storage service for the outside and ensures consistency of storage data.
However, in the conventional distributed storage architecture, the distributed storage software running on the physical machine occupies the cpu, the memory and other system resources of the physical machine; the physical machine cpu resources are shared with the virtual machines, and the number of cpus occupied by the distributed storage software affects the number of virtual machines that can be established. The distributed storage performance running on a physical machine can be affected by the physical machine OS and virtual machine operation. The distributed storage software running on the physical machine manages its local storage resources and also consumes a portion of the system resources, i.e., accesses the abstract block layer via system calls.
To solve the above problem, nvme disk, intelligent network card, and Nvmeof target may be configured on the storage server. The intelligent network card is provided with a cpu independent of a physical server, such as an Arm cpu, and distributed storage software is operated on the Arm cpu, wherein a data disk managed by the distributed storage software is formed into a network data disk through an Nvmeof target connected to a host. Multiple Arm storage nodes constitute a distributed storage.
However, the reliability of the distributed storage cluster based on the intelligent network card is limited by the number of Arm nodes of the intelligent network card, such as the minimum number of nodes of the distributed storage cluster is 3. When 3 intelligent network cards form an intelligent network card fault in a cluster environment, if only 2 nodes exist in the cluster, the risk of brain fracture exists; when more than two intelligent network cards in the cluster environment are formed by 3 intelligent network cards, the cluster cannot provide service.
For distributed storage of a certain number of nodes, when a small number of nodes of a cluster are down, the number of the remaining storage cluster nodes still can meet the requirement, and the conventional processing method is to trigger data balance. When only the intelligent network card fails and the disk is normal, the node cannot read and write the disk normally due to the failure of the intelligent network card, and the cluster can provide storage service, but new storage IO can be scattered to the disks of other nodes, and finally data imbalance of the whole cluster is caused.
Disclosure of Invention
The application provides a distributed storage service switching method and device, which are used for solving the problem that the distributed storage service is unavailable due to the failure of an intelligent network card in the prior art.
In a first aspect, the present application provides a distributed storage service switching method, applied to a distributed storage system, where each storage server in the distributed storage system is configured with a corresponding intelligent network card, each intelligent network card runs a distributed storage service, each intelligent network card establishes an RDMA channel with a controller for managing local storage resources on the corresponding storage server, and each storage server is deployed with the distributed storage service set to a to-be-started state, where the method includes:
receiving a first data read-write request sent by a client, and determining a target storage server for processing the first data read-write request;
the first data read-write request is sent to a target intelligent network card corresponding to a target storage server, so that the target intelligent network card processes the first data read-write request based on a distributed storage service running locally;
and if the target intelligent network card fault is detected, starting the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state, so that the target storage server processes data on the basis of the distributed storage service which runs locally, and the second data read-write request which is sent by the client and is required to be processed by the target storage server.
Optionally, the step of the target intelligent network card performing data processing on the first data read-write request based on the locally operated distributed storage service includes:
the target intelligent network card sends the first data read-write request to a controller on the target storage server for managing local storage resources, wherein the controller processes the first data read-write request through a corresponding RDMA channel.
Optionally, during normal operation, the target intelligent network card writes heartbeat counting information into a first designated position in the memory of the target storage server based on a preset period;
the step of detecting the fault of the target intelligent network card comprises the following steps:
and when the heartbeat count maintained at the first appointed position in the memory of the target storage server is detected not to be increased within the preset time length, determining that the fault of the target intelligent network card is detected.
Optionally, after the distributed storage service deployed on the target storage server and set to the to-be-started state is started, the target storage server writes heartbeat count information into a second designated position in the memory based on a preset period;
when the target intelligent network card is detected to be recovered to be normal, if the state of the distributed storage service deployed on the target storage server is to be started and/or the heartbeat count maintained at the second designated position in the memory of the target storage server is not increased within the preset duration, the target intelligent network card starts the distributed storage service and performs data processing on a third data read-write request which is sent by the client and is required to be processed by the target storage server.
Optionally, the method further comprises:
when the target intelligent network card is detected to be recovered to be normal, if the distributed storage service deployed on the target storage server is normal in operation, the target intelligent network card sends a switching instruction to the target storage server and starts a timer, so that the target storage server sets the locally operated distributed storage service to be in a to-be-started state, starts an RDMA channel between the local storage service and the target intelligent network card and sends a switching completion instruction to the target intelligent network card; and if the target intelligent network card does not receive the switching completion instruction when the timer is overtime, starting the distributed storage service by the target intelligent network card, and performing data processing on a third data read-write request which is sent by the client and is required to be processed by the target storage server.
Optionally, the method further comprises:
when the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state is started, the target storage server processes the second data read-write request based on the metadata stored in the third designated position.
In a second aspect, the present application provides a distributed storage service switching device, applied to a distributed storage system, where each storage server in the distributed storage system is configured with a corresponding intelligent network card, each intelligent network card runs a distributed storage service, each intelligent network card establishes an RDMA channel with a controller for managing local storage resources on the corresponding storage server, and each storage server is deployed with a distributed storage service set to a to-be-started state, where the device includes:
the receiving unit is used for receiving a first data read-write request sent by the client and determining a target storage server for processing the first data read-write request;
the sending unit is used for sending the first data read-write request to a target intelligent network card corresponding to a target storage server, so that the target intelligent network card processes the first data read-write request based on a distributed storage service running locally;
and the switching unit is used for starting the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state when the target intelligent network card fault is detected, so that the target storage server processes data on the basis of the distributed storage service which runs locally and sends a second data read-write request which is sent by the client and needs to be processed by the target storage server.
Optionally, the step of the target intelligent network card performing data processing on the first data read-write request based on the locally operated distributed storage service includes:
the target intelligent network card sends the first data read-write request to a controller on the target storage server for managing local storage resources, wherein the controller processes the first data read-write request through a corresponding RDMA channel.
Optionally, during normal operation, the target intelligent network card writes heartbeat counting information into a first designated position in the memory of the target storage server based on a preset period;
when the fault of the target intelligent network card is detected, the switching unit is specifically configured to:
and when the heartbeat count maintained at the first appointed position in the memory of the target storage server is detected not to be increased within the preset time length, determining that the fault of the target intelligent network card is detected.
Optionally, after the distributed storage service deployed on the target storage server and set to the to-be-started state is started, the target storage server writes heartbeat count information into a second designated position in the memory based on a preset period;
And the switching unit is further configured to, when detecting that the target intelligent network card returns to normal, if the state of the distributed storage service deployed on the target storage server is to be started and/or the heartbeat count maintained at the second designated position in the memory of the target storage server is not increased within a preset duration, start the distributed storage service by using the target intelligent network card, and perform data processing on a third data read-write request sent by the client and requiring processing by using the target storage server.
Optionally, when the switching unit is further configured to detect that the target intelligent network card is restored to normal, if the distributed storage service deployed on the target storage server is running normally, the target intelligent network card sends a switching instruction to the target storage server, and starts a timer, so that the target storage server sets the locally running distributed storage service to a to-be-started state, starts an RDMA channel between the local storage service and the target intelligent network card, and sends a switching completion instruction to the target intelligent network card; and if the target intelligent network card does not receive the switching completion instruction when the timer is overtime, starting the distributed storage service by the target intelligent network card, and performing data processing on a third data read-write request which is sent by the client and is required to be processed by the target storage server.
Optionally, the apparatus further comprises:
and the loading unit loads metadata stored in storage resources of the target storage server to a third designated position in a memory of the target storage server when the distributed storage service is started on the target intelligent network card, wherein when the fault of the target intelligent network card is detected, the target storage server performs data processing on the second data read-write request based on the metadata stored in the third designated position when the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state is started.
In a third aspect, an embodiment of the present application provides a distributed storage service switching apparatus, including:
a memory for storing program instructions;
a processor for invoking program instructions stored in said memory, performing the steps of the method according to any of the first aspects above in accordance with the obtained program instructions.
In a fourth aspect, embodiments of the present application also provide a computer-readable storage medium storing computer-executable instructions for causing a computer to perform the steps of the method according to any one of the first aspects.
As can be seen from the above, in the distributed storage service switching method provided by the embodiment of the present application, a first data read-write request sent by a client is received, and a target storage server for processing the first data read-write request is determined; the first data read-write request is sent to a target intelligent network card corresponding to a target storage server, so that the target intelligent network card processes the first data read-write request based on a distributed storage service running locally; and if the target intelligent network card fault is detected, starting the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state, so that the target storage server processes data on the basis of the distributed storage service which runs locally, and the second data read-write request which is sent by the client and is required to be processed by the target storage server.
By adopting the distributed storage service switching method provided by the embodiment of the application, when the intelligent network card fails and the distributed storage service cannot be provided, the distributed storage service which is deployed on the storage server in advance can be started, so that the problem that the distributed storage system is not available due to the failure of the intelligent network card is avoided, the reliability of the distributed storage system is improved, and the user experience is enhanced.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the following description will briefly describe the drawings that are required to be used in the embodiments of the present application or the description in the prior art, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may also be obtained according to these drawings of the embodiments of the present application for a person having ordinary skill in the art.
Fig. 1 is a schematic structural diagram of a distributed storage system according to an embodiment of the present application;
fig. 2 is a detailed flowchart of a distributed storage service switching method according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a distributed storage service switching device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of another distributed storage service switching device according to an embodiment of the present application.
Detailed Description
The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to any or all possible combinations including one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in embodiments of the present application to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, a first message may also be referred to as a second message, and similarly, a second message may also be referred to as a first message, without departing from the scope of the present application. Depending on the context, furthermore, the word "if" used may be interpreted as "at … …" or "at … …" or "in response to a determination".
The following describes the structure of the distributed storage system provided in the embodiment of the present application in detail in connection with a specific application scenario. As an example, referring to fig. 1, a schematic structural diagram of a distributed storage system provided in the present application is shown, where the distributed storage system includes 3 storage servers (storage server 1, storage server 2 and storage server 3), and it should be noted that in the embodiment of the present application, distributed storage services set to a to-be-started state are deployed on each storage server. Each storage server is configured with a corresponding intelligent network card (intelligent network card 1, intelligent network card 2 and intelligent network card 3), each storage server comprises a storage resource for storing data and a controller for managing local storage resources, distributed storage service is operated on each intelligent network card, and a remote direct data access (Remote Direct Memory Access, RDMA) channel is established between each intelligent network card configured by each storage server and the controller for managing the local storage resources on the corresponding storage server. The storage resources managed by each controller may be made up of multiple disks. The disk may specifically be a disk that complies with the Non-volatile memory host controller interface specification (Non-Volatile Memory express, NVMe), which may be referred to herein as an NVMe disk.
That is, for a storage server, when the corresponding intelligent network card is normal, the intelligent network card can provide distributed storage service, the corresponding intelligent network card is adopted to process data read-write request which is sent by the client and needs to be processed by itself, and when the intelligent network card fails, the distributed storage service which is locally deployed by the storage server can be adopted to process data read-write request which is sent by the client and needs to be processed by itself.
It should be noted that the above system configuration, the number of devices, the number of disks, and the like are only illustrative, and are not limiting.
The intelligent network card may have a separate central processing unit (Central Processing Unit, CPU). The intelligent network card can also be called an accelerator card, has certain network and storage acceleration capability, and can be generally realized by a Field programmable gate array (Field-Programmable Gate Array, FPGA). The intelligent network card can perform data input/output processing based on distributed storage, so that a processor of a storage server can be liberated from the service, and the intelligent network card can only process data input/output.
As an example, referring to fig. 2, a detailed flowchart of a distributed storage service switching method provided in an embodiment of the present application is shown, where the method is applied to a distributed storage system, each storage server in the distributed storage system is configured with a corresponding intelligent network card, each intelligent network card runs a distributed storage service, each intelligent network card establishes an RDMA channel with a controller on the corresponding storage server for managing a local storage resource, and each storage server is deployed with the distributed storage service set to a to-be-started state, and the method includes the following steps:
Step 200: and receiving a first data read-write request sent by the client, and determining a target storage server for processing the first data read-write request.
Specifically, after the distributed storage system receives a data read-write request sent by a client, the management device serving as a management node distributes the data read-write request to one storage server in the distributed storage system for processing, that is, the management device sends the data read-write request to an intelligent network card corresponding to the storage server for processing the data read-write request. It should be noted that, the target storage server is any storage server in the distributed storage system.
Further, in this embodiment of the present application, a distributed storage service needs to be executed in advance on an intelligent network card, specifically, the intelligent network card performs an initialization operation based on user configuration, so that the intelligent network card executes the distributed storage service, and an RDMA data channel is established between the intelligent network card and a controller for managing local storage resources on a storage server corresponding to the intelligent network card.
Specifically, in the embodiment of the present application, the intelligent network card performs an initialization operation based on user configuration, so that when the intelligent network card runs the distributed storage service, a preferred implementation manner is that the intelligent network card runs the distributed storage service locally based on a configuration instruction issued by a user, and forms a distributed storage service cluster with other intelligent network cards running the distributed storage service.
That is, in the embodiment of the present application, the distributed storage service is run on the intelligent network card.
Further, the intelligent network card performs an initialization operation based on user configuration, so that when the intelligent network card and a controller for managing local storage resources on a storage server corresponding to the intelligent network card establish an RDMA data channel, a preferred implementation manner is that the intelligent network card establishes an RDMA data channel as an initiator end and a controller for managing local storage resources on a storage server corresponding to the initiator end based on a configuration command issued by a user, wherein the controller for managing local storage resources on the storage server corresponding to the intelligent network card is configured as a target end of NVMe over Fabrics protocols.
Specifically, when the intelligent network card establishes a remote direct data access RDMA data channel as an initiator end and a controller for managing local storage resources on a corresponding storage server based on a configuration command issued by a user, in the local configuration NVMe over Fabrics protocol, a preferred implementation manner is that the intelligent network card configures an IP address of RDMA locally as an initiator end based on the configuration command issued by the user, and uses an initiator tool in the NVMe over Fabrics protocol to connect the controller for managing local storage resources on the corresponding storage server; the controller on the storage server corresponding to the intelligent network card for managing the local storage resource configures an RDMA IP address on a local network card interface with RDMA capability based on a configuration command issued by a user, and configures a NVMe over Fabrics protocol on a network card chip with NVMe-oF target offlat capability as a target end. That is, the network card chip included in the controller on each storage server is a network card chip integrated with RDMA function and NVMe-oftarget offflat capability.
For example, initializing an intelligent network card and each controller corresponding to each storage server, configuring NVMe over Fabrics protocols for each intelligent network card and each controller, using the intelligent network card as an initiator terminal, and using the controller as a target terminal to connect, and starting the distributed storage service.
Optionally, the initialization process may specifically include the following steps:
step 1: each intelligent network card can be configured with an area for storing metadata corresponding to the storage data in the corresponding storage server, so that the area can also be called as a metadata area. In the initialization process, metadata stored in the storage resource is loaded into the metadata area.
In another preferred implementation, an area for storing metadata corresponding to storage data in a storage resource on the storage server is configured in a memory of the storage server, so that the area may also be referred to as a metadata area. Then, during the initialization process, metadata in the storage resource is loaded into the metadata area of the memory.
Step 2: each controller may configure an internet protocol address (Internet Protocol, IP) of RDMA and configure NVMe over Fabrics protocol as a target side.
Step 3: each intelligent network card can be configured with an IP address of RDMA and a NVMe over Fabrics protocol, and is used as an initiator terminal, and an initiator tool in NVMe over Fabrics protocol is used to connect with a corresponding controller to start the distributed storage service.
The method for starting the distributed storage service specifically includes: each intelligent network card marks and records the allocated disk locally, and metadata corresponding to the stored data in the allocated disk is written into a metadata area configured by the intelligent network card through an RDMA protocol.
Each intelligent network card can input/output the stored data through RDMA protocol and metadata corresponding to the stored data.
In the embodiments of the present application, when the RDMA data channel of each storage server is enabled, a preferred implementation is to configure an IP address, such as 1.1.1.2,1.1.2.2, for a network card interface with RDMA capability for each storage server.
In enabling RDMA data channels for each intelligent network card, one preferred implementation is to configure the RDMA capable network card interface on each intelligent network card with an IP address 1.1.1.1,1.1.2.1, … …, using the RoCEv2 protocol.
That is, each storage server enables an RDMA data channel, i.e. configures an IP address (e.g. IP 11) for a network card interface with RDMA capability on the storage server 1, and each intelligent network card enables an RDMA data channel, i.e. configures an IP address (IP 21) for a network card interface with RDMA capability on the intelligent network card 1, where the intelligent network card corresponding to the storage server 1 is the intelligent network card 1, and then the intelligent network card 1 can serve as an initiator end, and an RDMA data channel is established between the network card interface with the IP address configured as IP 11 and the network card interface with the IP address configured as IP 21 on the storage server 1.
Step 210: and sending the first data read-write request to a target intelligent network card corresponding to a target storage server, so that the target intelligent network card processes the first data read-write request based on a distributed storage service running locally.
Specifically, the management device in the distributed storage system sends the first data read-write request to a target intelligent network card corresponding to a target storage server for processing the data read-write request, and when the target intelligent network card receives the first data read-write request, the target intelligent network card processes the first data read-write request based on a distributed storage service running locally.
For example, when the distributed storage service running on the target intelligent network card is started, if the running state of the distributed storage service deployed on the target storage server is determined to be the to-be-started state, metadata can be loaded from a disk into the memory of the target storage server in an RDMA writing mode, and after the starting is completed, the distributed storage service is provided.
In this embodiment of the present application, when the target intelligent network card performs data processing on the first data read-write request based on a locally running distributed storage service, one preferred implementation manner is that the target intelligent network card sends the first data read-write request to a controller on the target storage server for managing local storage resources, where the controller performs data processing on the first data read-write request through an RDMA channel corresponding to the controller.
For example, after receiving a first data read-write request, the target intelligent network card sends the first data read-write request to a controller in a corresponding target storage server, and when receiving the first data read-write request, the controller analyzes the first data read-write request, initiates a DMA operation through an RDMA data channel established with the target intelligent network card, and returns a data read-write result to the target intelligent network card. The target intelligent network card directly accesses the storage resource on the target storage server, and performs read-write operation on the storage resource.
Step 220: and if the fault of the target intelligent network card is detected, starting the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state, so that the target storage server processes data of a second data read-write request which is sent by the client and is required to be processed by the target storage server based on the distributed storage service running locally.
In the embodiment of the present application, a first designated location is preset in a memory of a target storage server, and when the target intelligent network card operates normally, heartbeat count information is written into the first designated location in the memory of the target storage server based on a preset period; of course, a second designated location is further preset in the memory of the target storage server, and after the distributed storage service deployed on the target storage server and set to the to-be-started state is started, the target storage server writes heartbeat count information into the second designated location in the memory based on a preset period.
Further, the running state (to be started/run) of the distributed storage server deployed on the target storage server may also be written into the above-described second specified location. In practical applications, the first designated location and the second designated location may be accessed by means of memory access to write/read data.
Then, when detecting the failure of the target intelligent network card, a preferred implementation manner is to determine that the failure of the target intelligent network card is detected when the heartbeat count maintained at the first designated location in the memory of the target storage server is detected not to increase within a preset time period.
That is, when the distributed storage service on the target intelligent network card operates normally, the heartbeat count is written into the first designated position in the memory of the target storage server periodically, that is, the heartbeat count of the first designated position increases with time, and then when the heartbeat count is detected not to increase within the preset time period, it is determined that the target intelligent network card fails.
For example, when the target intelligent network card fails (e.g., software crash), the target intelligent network card can no longer write the heartbeat count to the first designated area. The target storage server can periodically check the heartbeat count of the first designated position, and if the heartbeat count of the first designated position is determined not to be increased within a certain time, the target intelligent network card is determined to be abnormal. A handover procedure needs to be entered.
Specifically, the switching procedure is as follows: the target storage server closes the Nvme of service; starting a distributed storage service which is locally deployed and is set to be in a state to be started, wherein at the moment, as the target intelligent network card writes the latest metadata into the memory of the target storage server, the target storage server does not need to load the metadata from a disk when starting the distributed storage service; if the starting is determined to be successful, the heartbeat is counted periodically and written into a second designated area (which can be written through memory access), and the success of conversion is determined; further, the target storage server may also write the running state of the locally deployed distributed storage service to the second designated area. If the starting is determined to be unsuccessful, the heartbeat count is not written into the second designated area regularly, and the conversion failure is determined.
In this embodiment of the present application, when the distributed storage service is started on the target intelligent network card, metadata stored in a storage resource of the target storage server is loaded to a third designated location in a memory of the target storage server, where when the failure of the target intelligent network card is detected, and when the distributed storage service that is deployed on the target storage server and is set to a to-be-started state is started, the target storage server performs data processing on the second data read-write request based on the metadata stored in the third designated location.
That is, when the intelligent network card is started to start the distributed storage service, metadata in the disk is loaded into a designated position of the memory of the storage server, and the intelligent network card updates metadata content while processing a data read-write request based on the metadata stored in the designated position of the memory of the storage server; when the intelligent network card fault is detected, the distributed storage service is switched to the storage server, and the metadata stored in the appointed position of the memory of the storage server is the latest metadata, so that the storage server does not need to execute metadata loading operation, and the metadata stored in the appointed position of the memory of the storage server is directly used for carrying out data processing on the subsequently received data read-write request.
Further, when the target intelligent network card is detected to be recovered to be normal, if the state of the distributed storage service deployed on the target storage server is to be started and/or the heartbeat count maintained at the second designated position in the memory of the target storage server is not increased within a preset duration, the target intelligent network card starts the distributed storage service, and performs data processing on a third data read-write request sent by the client and needing to be processed by the target storage server.
For example, after repairing the failure of the target intelligent network card, judging whether the distributed storage service deployed on the target storage server runs normally, specifically, acquiring the running state of the distributed storage service deployed on the target storage server from the second designated area, if the running state is to be started, directly starting the local distributed storage service by the target intelligent network card, if the running (started) state is to be started, continuously acquiring the heartbeat count written by the target storage server from the second designated area, judging whether the distributed storage service deployed on the target storage server runs normally according to the heartbeat count, if not, directly starting the local distributed storage service by the target intelligent network card, and adopting the local distributed storage service to process the data read-write request which needs to be processed by the target storage server subsequently.
Further, when the target intelligent network card is detected to be recovered to be normal, if the distributed storage service deployed on the target storage server is normal in operation, the target intelligent network card sends a switching instruction to the target storage server and starts a timer, so that the target storage server sets the distributed storage service running locally to be in a state to be started, starts an RDMA channel between the distributed storage service and the target intelligent network card, and sends a switching completion instruction to the target intelligent network card; and if the target intelligent network card receives the switching completion instruction/does not receive the switching completion instruction when the timer expires, the target intelligent network card starts a distributed storage service and performs data processing on a third data read-write request which is sent by the client and needs to be processed by the target storage server.
For example, after repairing a fault of a target intelligent network card, judging that a distributed storage service deployed on a target storage server runs normally, sending an acquisition command to the target storage server, starting an acquisition command processing timer, and waiting for receiving the acquisition finish command; when the target storage server receives the acquire command, the distributed storage service running locally is closed, the NVMe-of target service is started, the running state of the distributed storage service deployed locally is written into a second designated area, and the acquire finish command is sent to the target intelligent network card; if the target intelligent network card receives the acquire finish command before the timer times out or does not receive the acquire finish command after the timer times out, the target intelligent network card directly starts a local distributed storage service, and the local distributed storage service is adopted to process the data of the subsequent data read-write request which needs to be processed by the target storage server.
Based on the same inventive concept as the above method embodiment, an exemplary embodiment, referring to fig. 3, is a schematic structural diagram of a distributed storage service switching device provided in an embodiment of the present application, where each storage server in the distributed storage system is configured with a corresponding intelligent network card, each intelligent network card runs a distributed storage service, each intelligent network card establishes an RDMA channel with a controller for managing local storage resources on a corresponding storage server, and each storage server is deployed with a distributed storage service set to a to-be-started state, where the device includes:
a receiving unit 30, configured to receive a first data read-write request sent by a client, and determine a target storage server that processes the first data read-write request;
the sending unit 31 is configured to send the first data read-write request to a target intelligent network card corresponding to a target storage server, so that the target intelligent network card performs data processing on the first data read-write request based on a distributed storage service running locally;
and the switching unit 32 is configured to start the distributed storage service deployed on the target storage server and set to a to-be-started state when the target intelligent network card fault is detected, so that the target storage server performs data processing on a second data read-write request sent by the client and needing to be processed by the target storage server based on the locally operated distributed storage service.
Optionally, the step of the target intelligent network card performing data processing on the first data read-write request based on the locally operated distributed storage service includes:
the target intelligent network card sends the first data read-write request to a controller on the target storage server for managing local storage resources, wherein the controller processes the first data read-write request through a corresponding RDMA channel.
Optionally, during normal operation, the target intelligent network card writes heartbeat counting information into a first designated position in the memory of the target storage server based on a preset period;
when the target intelligent network card fault is detected, the switching unit 32 is specifically configured to:
and when the heartbeat count maintained at the first appointed position in the memory of the target storage server is detected not to be increased within the preset time length, determining that the fault of the target intelligent network card is detected.
Optionally, after the distributed storage service deployed on the target storage server and set to the to-be-started state is started, the target storage server writes heartbeat count information into a second designated position in the memory based on a preset period;
The switching unit 32 is further configured to, when detecting that the target intelligent network card returns to normal, if the state of the distributed storage service deployed on the target storage server is to be started and/or the heartbeat count maintained at the second designated location in the memory of the target storage server is not increased within a preset duration, start the distributed storage service by using the target intelligent network card, and perform data processing on a third data read-write request sent by the client and requiring processing by using the target storage server.
Optionally, when the switching unit 32 is further configured to detect that the target intelligent network card is restored to normal, if the distributed storage service deployed on the target storage server is running normally, the target intelligent network card sends a switching instruction to the target storage server, and starts a timer, so that the target storage server sets the locally running distributed storage service to a to-be-started state, starts an RDMA channel with the target intelligent network card, and sends a switching completion instruction to the target intelligent network card; and if the target intelligent network card does not receive the switching completion instruction when the timer is overtime, starting the distributed storage service by the target intelligent network card, and performing data processing on a third data read-write request which is sent by the client and is required to be processed by the target storage server.
Optionally, the apparatus further comprises:
and the loading unit loads metadata stored in storage resources of the target storage server to a third designated position in a memory of the target storage server when the distributed storage service is started on the target intelligent network card, wherein when the fault of the target intelligent network card is detected, the target storage server performs data processing on the second data read-write request based on the metadata stored in the third designated position when the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state is started.
The above units may be one or more integrated circuits configured to implement the above methods, for example: one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated as ASIC), or one or more microprocessors (digital singnal processor, abbreviated as DSP), or one or more field programmable gate arrays (Field Programmable Gate Array, abbreviated as FPGA), or the like. For another example, when a unit is implemented in the form of a processing element scheduler code, the processing element may be a general purpose processor, such as a central processing unit (Central Processing Unit, CPU) or other processor that may invoke the program code. For another example, the units may be integrated together and implemented in the form of a system-on-a-chip (SOC).
Further, in the distributed storage service switching device provided in the embodiment of the present application, from a hardware level, a hardware architecture schematic diagram of the distributed storage service switching device may be shown in fig. 4, and the distributed storage service switching device may include: a memory 40 and a processor 41,
memory 40 is used to store program instructions; the processor 41 invokes the program instructions stored in the memory 40 to execute the above-described method embodiments in accordance with the obtained program instructions. The specific implementation manner and the technical effect are similar, and are not repeated here.
Optionally, the present application further provides a distributed storage service switching device, including at least one processing element (or chip) for performing the above-described method embodiments.
Optionally, the present application also provides a program product, such as a computer readable storage medium, storing computer executable instructions for causing the computer to perform the above-described method embodiments.
Here, a machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that may contain or store information, such as executable instructions, data, or the like. For example, a machine-readable storage medium may be: RAM (Radom Access Memory, random access memory), volatile memory, non-volatile memory, flash memory, a storage drive (e.g., hard drive), a solid state drive, any type of storage disk (e.g., optical disk, dvd, etc.), or a similar storage medium, or a combination thereof.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. A typical implementation device is a computer, which may be in the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email device, game console, tablet computer, wearable device, or a combination of any of these devices.
For convenience of description, the above devices are described as being functionally divided into various units, respectively. Of course, the functions of each element may be implemented in one or more software and/or hardware elements when implemented in the present application.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Moreover, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the preferred embodiments of the present invention is not intended to limit the invention to the precise form disclosed, and any modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present invention are intended to be included within the scope of the present invention.

Claims (12)

1. The distributed storage service switching method is characterized by being applied to a distributed storage system, wherein each storage server in the distributed storage system is respectively configured with a corresponding intelligent network card, each intelligent network card runs distributed storage service, each intelligent network card establishes an RDMA channel with a controller for managing local storage resources on the corresponding storage server, and each storage server is deployed with the distributed storage service set to be in a state to be started, and the method comprises the following steps:
Receiving a first data read-write request sent by a client, and determining a target storage server for processing the first data read-write request;
the first data read-write request is sent to a target intelligent network card corresponding to a target storage server, so that the target intelligent network card processes the first data read-write request based on a distributed storage service running locally;
and if the target intelligent network card fault is detected, starting the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state, so that the target storage server processes data on the basis of the distributed storage service which runs locally, and the second data read-write request which is sent by the client and is required to be processed by the target storage server.
2. The method of claim 1, wherein the step of the target intelligent network card performing data processing on the first data read-write request based on a locally-running distributed storage service comprises:
the target intelligent network card sends the first data read-write request to a controller on the target storage server for managing local storage resources, wherein the controller processes the first data read-write request through a corresponding RDMA channel.
3. The method according to claim 1 or 2, wherein the target intelligent network card writes heartbeat count information to a first designated location in the memory of the target storage server based on a preset period during normal operation;
the step of detecting the fault of the target intelligent network card comprises the following steps:
and when the heartbeat count maintained at the first appointed position in the memory of the target storage server is detected not to be increased within the preset time length, determining that the fault of the target intelligent network card is detected.
4. The method of claim 3, wherein after the distributed storage service deployed on the target storage server and set to a ready-to-start state starts, the target storage server writes heartbeat count information to a second designated location in the memory based on a preset period;
when the target intelligent network card is detected to be recovered to be normal, if the state of the distributed storage service deployed on the target storage server is to be started and/or the heartbeat count maintained at the second designated position in the memory of the target storage server is not increased within the preset duration, the target intelligent network card starts the distributed storage service and performs data processing on a third data read-write request which is sent by the client and is required to be processed by the target storage server.
5. The method of claim 4, wherein the method further comprises:
when the target intelligent network card is detected to be recovered to be normal, if the distributed storage service deployed on the target storage server is normal in operation, the target intelligent network card sends a switching instruction to the target storage server and starts a timer, so that the target storage server sets the locally operated distributed storage service to be in a to-be-started state, starts an RDMA channel between the local storage service and the target intelligent network card and sends a switching completion instruction to the target intelligent network card; and if the target intelligent network card does not receive the switching completion instruction when the timer is overtime, starting the distributed storage service by the target intelligent network card, and performing data processing on a third data read-write request which is sent by the client and is required to be processed by the target storage server.
6. The method of claim 1, wherein the method further comprises:
when the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state is started, the target storage server processes the second data read-write request based on the metadata stored in the third designated position.
7. The utility model provides a distributed storage service switching device which characterized in that is applied to distributed storage system, each storage server in the distributed storage system is configured with corresponding intelligent network card respectively, and each intelligent network card is operated distributed storage service on, each intelligent network card establishes RDMA passageway with the controller that is used for managing local storage resource on its corresponding storage server respectively, and each storage server is disposed with the distributed storage service that is set up to wait for the start-up state, the device includes:
the receiving unit is used for receiving a first data read-write request sent by the client and determining a target storage server for processing the first data read-write request;
the sending unit is used for sending the first data read-write request to a target intelligent network card corresponding to a target storage server, so that the target intelligent network card processes the first data read-write request based on a distributed storage service running locally;
and the switching unit is used for starting the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state when the target intelligent network card fault is detected, so that the target storage server processes data on the basis of the distributed storage service which runs locally and sends a second data read-write request which is sent by the client and needs to be processed by the target storage server.
8. The apparatus of claim 7, wherein the target intelligent network card performs data processing on the first data read-write request based on a locally-running distributed storage service comprises:
the target intelligent network card sends the first data read-write request to a controller on the target storage server for managing local storage resources, wherein the controller processes the first data read-write request through a corresponding RDMA channel.
9. The apparatus of claim 7 or 8, wherein the target intelligent network card writes heartbeat count information to a first designated location in the memory of the target storage server based on a preset period during normal operation;
when the fault of the target intelligent network card is detected, the switching unit is specifically configured to:
and when the heartbeat count maintained at the first appointed position in the memory of the target storage server is detected not to be increased within the preset time length, determining that the fault of the target intelligent network card is detected.
10. The apparatus of claim 9, wherein after the distributed storage service deployed on the target storage server and set to a ready-to-start state starts, the target storage server writes heartbeat count information to a second designated location in memory based on a preset period;
And the switching unit is further configured to, when detecting that the target intelligent network card returns to normal, if the state of the distributed storage service deployed on the target storage server is to be started and/or the heartbeat count maintained at the second designated position in the memory of the target storage server is not increased within a preset duration, start the distributed storage service by using the target intelligent network card, and perform data processing on a third data read-write request sent by the client and requiring processing by using the target storage server.
11. The apparatus of claim 10, wherein the switching unit is further configured to,
when the target intelligent network card is detected to be recovered to be normal, if the distributed storage service deployed on the target storage server is normal in operation, the target intelligent network card sends a switching instruction to the target storage server and starts a timer, so that the target storage server sets the locally operated distributed storage service to be in a to-be-started state, starts an RDMA channel between the local storage service and the target intelligent network card and sends a switching completion instruction to the target intelligent network card; and if the target intelligent network card does not receive the switching completion instruction when the timer is overtime, starting the distributed storage service by the target intelligent network card, and performing data processing on a third data read-write request which is sent by the client and is required to be processed by the target storage server.
12. The apparatus of claim 7, wherein the apparatus further comprises:
and the loading unit loads metadata stored in storage resources of the target storage server to a third designated position in a memory of the target storage server when the distributed storage service is started on the target intelligent network card, wherein when the fault of the target intelligent network card is detected, the target storage server performs data processing on the second data read-write request based on the metadata stored in the third designated position when the distributed storage service which is deployed on the target storage server and is set to be in a to-be-started state is started.
CN202011344216.1A 2020-11-25 2020-11-25 Distributed storage service switching method and device Active CN112596960B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011344216.1A CN112596960B (en) 2020-11-25 2020-11-25 Distributed storage service switching method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011344216.1A CN112596960B (en) 2020-11-25 2020-11-25 Distributed storage service switching method and device

Publications (2)

Publication Number Publication Date
CN112596960A CN112596960A (en) 2021-04-02
CN112596960B true CN112596960B (en) 2023-06-13

Family

ID=75184122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011344216.1A Active CN112596960B (en) 2020-11-25 2020-11-25 Distributed storage service switching method and device

Country Status (1)

Country Link
CN (1) CN112596960B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113253925B (en) * 2021-04-30 2022-08-30 新华三大数据技术有限公司 Method and device for optimizing read-write performance
CN113282246B (en) * 2021-06-15 2023-07-04 杭州海康威视数字技术股份有限公司 Data processing method and device
CN113824812B (en) * 2021-08-27 2023-02-28 济南浪潮数据技术有限公司 Method, device and storage medium for HDFS service to acquire service node IP
CN114338721B (en) * 2021-12-28 2024-01-02 中国电信股份有限公司 Data processing method and device, target network system and readable storage medium
CN114327903B (en) * 2021-12-30 2023-11-03 苏州浪潮智能科技有限公司 NVMe-oF management system, resource allocation method and IO read-write method
CN116560900A (en) * 2022-01-30 2023-08-08 华为技术有限公司 Method for reading data or method for writing data and related system thereof
CN114546279B (en) * 2022-02-24 2023-11-14 重庆紫光华山智安科技有限公司 IO request prediction method and device, storage node and readable storage medium
CN115022328B (en) * 2022-06-24 2023-08-08 脸萌有限公司 Server cluster, testing method and device of server cluster and electronic equipment
CN115242807A (en) * 2022-06-30 2022-10-25 深圳震有科技股份有限公司 Data access method in 5G communication system and related equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200412071A (en) * 2002-12-31 2004-07-01 Inventec Corp Method for balancing load of network interface card testing
CN101115054A (en) * 2006-07-26 2008-01-30 惠普开发有限公司 Memory-mapped buffers for network interface controllers
US7739543B1 (en) * 2003-04-23 2010-06-15 Netapp, Inc. System and method for transport-level failover for loosely coupled iSCSI target devices
CN107085503A (en) * 2017-03-27 2017-08-22 联想(北京)有限公司 storage device, storage system and information processing method
CN109327539A (en) * 2018-11-15 2019-02-12 上海天玑数据技术有限公司 A kind of distributed block storage system and its data routing method

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7457861B1 (en) * 2003-12-05 2008-11-25 Unisys Corporation Optimizing virtual interface architecture (VIA) on multiprocessor servers and physically independent consolidated NICs
US9313274B2 (en) * 2013-09-05 2016-04-12 Google Inc. Isolating clients of distributed storage systems
US10257273B2 (en) * 2015-07-31 2019-04-09 Netapp, Inc. Systems, methods and devices for RDMA read/write operations
US10713210B2 (en) * 2015-10-13 2020-07-14 Microsoft Technology Licensing, Llc Distributed self-directed lock-free RDMA-based B-tree key-value manager
US9836368B2 (en) * 2015-10-22 2017-12-05 Netapp, Inc. Implementing automatic switchover
US20180336061A1 (en) * 2017-05-16 2018-11-22 Lenovo Enterprise Solutions (Singapore) Pte. Ltd. Storing file portions in data storage space available to service processors across a plurality of endpoint devices
US10503590B2 (en) * 2017-09-21 2019-12-10 International Business Machines Corporation Storage array comprising a host-offloaded storage function
US11347678B2 (en) * 2018-08-06 2022-05-31 Oracle International Corporation One-sided reliable remote direct memory operations

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW200412071A (en) * 2002-12-31 2004-07-01 Inventec Corp Method for balancing load of network interface card testing
US7739543B1 (en) * 2003-04-23 2010-06-15 Netapp, Inc. System and method for transport-level failover for loosely coupled iSCSI target devices
CN101115054A (en) * 2006-07-26 2008-01-30 惠普开发有限公司 Memory-mapped buffers for network interface controllers
CN107085503A (en) * 2017-03-27 2017-08-22 联想(北京)有限公司 storage device, storage system and information processing method
CN109327539A (en) * 2018-11-15 2019-02-12 上海天玑数据技术有限公司 A kind of distributed block storage system and its data routing method

Also Published As

Publication number Publication date
CN112596960A (en) 2021-04-02

Similar Documents

Publication Publication Date Title
CN112596960B (en) Distributed storage service switching method and device
US7587492B2 (en) Dynamic performance management for virtual servers
CN102355369B (en) Virtual clustered system as well as processing method and processing device thereof
US20120324071A1 (en) Managing resources in a distributed system using dynamic clusters
US20180004777A1 (en) Data distribution across nodes of a distributed database base system
WO2016200712A1 (en) Recovery in data centers
US11010190B2 (en) Methods, mediums, and systems for provisioning application services
US10467106B2 (en) Data processing method, data processing system, and non-transitory computer program product for controlling a workload delay time
US11102284B2 (en) Service processing methods and systems based on a consortium blockchain network
CN106331081B (en) Information synchronization method and device
CN112596669A (en) Data processing method and device based on distributed storage
US8650281B1 (en) Intelligent arbitration servers for network partition arbitration
US11307780B2 (en) Cluster group change preparation techniques
CN110781039B (en) Sentinel process election method and device
CN112631994A (en) Data migration method and system
CN111831408A (en) Asynchronous task processing method and device, electronic equipment and medium
WO2015139327A1 (en) Failover method, apparatus and system
CN107179998A (en) A kind of method and device for configuring peripheral hardware core buffer
US9430338B2 (en) Method and computing device for recording log entries
CN115438021A (en) Resource allocation method and device for database server
CN115202803A (en) Fault processing method and device
CN110908821A (en) Method, device, equipment and storage medium for task failure management
US20230125909A1 (en) Managing applications in a cluster
CN116089020B (en) Virtual machine operation method, capacity expansion method and capacity expansion system
CN114301927B (en) Main node selection method, device and medium in distributed system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant