CN111309524A

CN111309524A - Distributed storage system fault recovery method, device, terminal and storage medium

Info

Publication number: CN111309524A
Application number: CN202010095163.8A
Authority: CN
Inventors: 任洪亮; 李景要
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2020-02-14
Filing date: 2020-02-14
Publication date: 2020-06-19

Abstract

The embodiment of the application provides a distributed storage system fault recovery method, a device, a terminal and a storage medium, and the method comprises the following steps: searching a placing group where the lost object is located; deleting the placement group; deleting the osd process data corresponding to the placement group and marking the osd process as finished; and checking the cluster state, and checking a cluster fault recovery result according to the cluster state. The invention can lead the cluster to continuously provide the service to the outside under the condition of not damaging the existing cluster for the cluster which exceeds the fault domain, and ensure the cyclic coverage writing and the playback scene of the required video under certain conditions.

Description

Distributed storage system fault recovery method, device, terminal and storage medium

Technical Field

The invention relates to the technical field of storage, in particular to a distributed storage system fault recovery method, a distributed storage system fault recovery device, a distributed storage system fault recovery terminal and a storage medium.

Background

The distributed storage system is a reliable and autonomous distributed object storage, and can simultaneously provide file system storage, object storage and block storage. In the fault domain range, when a node or a hard disk fails, such as downtime, power failure and other accidents, the service of the node can be taken over by other standby nodes to ensure that the service is normally provided, the service is not affected, and the storage still can normally provide the service at the moment.

If the cluster has a disk hardware fault and can not be recovered to normal, other nodes have a hard disk fault and can not be recovered to normal in the process of replacing the disk. Causing the cluster to go beyond the fault domain and partial data loss occurs. When the front-end service reads or modifies the lost object, the osd is abnormal or returns the EIO, so that the front-end service is interrupted. At this time, a large number of video files are created in the cluster according to the video service requirement, and the video files are continuously written in a circular covering manner, so that the video playback cannot be executed under the condition that the osd is abnormal.

Disclosure of Invention

In order to overcome the defects in the prior art, the invention provides a distributed storage system fault recovery method, a distributed storage system fault recovery device, a distributed storage system terminal and a storage medium, so as to solve the technical problems.

In a first aspect, an embodiment of the present application provides a distributed storage system failure recovery method, where the method includes:

searching a placing group where the lost object is located;

deleting the placement group;

deleting the osd process data corresponding to the placement group and marking the osd process as finished;

and checking the cluster state, and checking a cluster fault recovery result according to the cluster state.

Further, the deleting placement group includes:

stop the osd service;

calling an icfs-objectstore-tool to delete the placement group;

reloading the system configuration file;

the osd service is started.

Further, deleting the osd process data corresponding to the placement group and marking the osd process as completed includes:

stop the osd service;

calling an icfs-objectstore-tool to delete data in the osd process corresponding to the placement group;

marking the corresponding osd process as a completion state;

reloading the system configuration file;

the osd service is opened.

Further, the checking the cluster state and verifying the cluster failure recovery result according to the cluster state includes:

calling an icfs-s tool to check the cluster state;

judging whether the cluster state is an active clean state:

if so, judging that the cluster fault recovery is successful;

if not, judging that the cluster fault recovery fails.

In a second aspect, an embodiment of the present application provides a distributed storage system failure recovery apparatus, where the apparatus includes:

the group searching unit is configured for searching a placing group in which the lost object is positioned;

a group deleting unit configured to delete the placement group;

the process deleting unit is configured to delete the osd process data corresponding to the placement group and mark the osd process as complete;

and the state verification unit is configured to check the cluster state and verify the cluster fault recovery result according to the cluster state.

Further, the group deleting unit includes:

a first stopping module configured to stop the osd service;

the group deleting module is configured to call an icfs-objectstore-tool to delete the placement group;

the first loading module is configured to reload the system configuration file;

a first initiation module configured to initiate the osd service.

Further, the process deleting unit includes:

a second stopping module configured to stop the osd service;

the process deleting module is configured to call the icfs-objectstore-tool to delete the data in the osd process corresponding to the placement group;

a process marking module configured to mark the corresponding osd process as a completed state;

a second loading module configured to reload the system configuration file;

a second starting module configured to start the osd service.

Further, the state verifying unit includes:

the state viewing module is configured for calling an icfs-s tool to view the cluster state;

a state judgment module configured to judge the state;

the success judging module is configured to judge that the cluster fault is successfully recovered if the cluster state is an active clean state;

and the failure determination module is configured to determine that the cluster failure recovery fails if the cluster state is not the active clean state.

In a third aspect, a terminal is provided, including:

a processor, a memory, wherein,

the memory is used for storing a computer program which,

the processor is configured to call and run the computer program from the memory, so that the terminal performs the above-mentioned method of the terminal.

In a fourth aspect, a computer storage medium is provided having stored therein instructions that, when executed on a computer, cause the computer to perform the method of the above aspects.

In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the above aspects.

The beneficial effect of the invention is that,

the fault recovery method, the device, the terminal and the storage medium of the distributed storage system provided by the invention have the advantages that the fault recovery result is verified by positioning the placement group where the lost object is located, deleting the placement group where the lost object is located and osd process data of the placement group, marking the osd process as a completion state and checking the cluster state. The invention can lead the cluster to continuously provide the service to the outside under the condition of not damaging the existing cluster for the cluster which exceeds the fault domain, and ensure the cyclic coverage writing and the playback scene of the required video under certain conditions.

In addition, the invention has reliable design principle, simple structure and very wide application prospect.

Drawings

In order to more clearly illustrate the embodiments or technical solutions in the prior art of the present invention, the drawings used in the description of the embodiments or prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a schematic flow chart diagram of a method of one embodiment of the present application.

FIG. 2 is a schematic block diagram of an apparatus of one embodiment of the present application.

Fig. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.

Detailed Description

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiment of the present invention will be clearly and completely described below with reference to the drawings in the embodiment of the present invention, and it is obvious that the described embodiment is only a part of the embodiment of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The following explains key terms appearing in the present application.

Placement group

Placement group for placement purpose Placement group

The PG is used for specifying the number of the directories of the storage objects in the storage pool;

the PGP is the OSD distribution combination number of the storage pool PG.

BlueStore object storage engine

Build on top of bare disk devices and do much optimization work for new storage devices such as SSDs.

Osd Object Storage Device

And the process responsible for responding to the client requests and returning the specific data corresponds to a hard disk in the cluster.

FIG. 1 is a schematic flow chart diagram of a method of one embodiment of the present application. The execution subject in fig. 1 may be a distributed storage system failure recovery apparatus.

As shown in fig. 1, the method 100 includes:

step 110, searching a placing group where the lost object is located;

step 120, deleting the placement group;

step 130, deleting the osd process data corresponding to the placement group and marking the osd process as finished;

step 140, checking the cluster state, and checking the cluster fault recovery result according to the cluster state.

In order to facilitate understanding of the present invention, the distributed storage system fault recovery method provided in the present invention is further described below with reference to the principle of the distributed storage system fault recovery method of the present invention and the process of performing fault recovery on the distributed storage system in the embodiment.

Specifically, the distributed storage system fault recovery method includes:

and S1, searching the placement group of the lost object.

Calling a tool icfs health detail | grep underfound # to search a pg group with object loss, and acquiring an osd number in the group.

And S2, deleting the placement group.

The osd service is stopped first, calling the command systemctl stop icfs-osd @ $ i to stop the osd service.

Calling the icfs-ObjectStore-tool to delete the pg found in step S1, where the icfs-ObjectStore-tool is a tool provided by ceph that can operate the object and pg, and can be used to view, modify, and delete the data in the object, pg, and dump OSD journal on the ObjectStore. The call command for the icfs-objectstore-tool is: icfs-objectstore-tool-data-path/var/lib/icfs/osd/icfs- $ i-journel-path/var/lib/icfs/osd/icfs- $ i/journel-type filescore-op remove-pgid $ { pg _ num } s $ num. When the cluster fault exceeds the fault domain, the data cannot be recovered by using a rollback mode; at this time, if the cluster still wants to provide services to the outside, it needs to delete the object pg that has been lost and make the failed OSD come back quickly.

And calling a systemctl daemon-load command to reload the configuration file of the distributed file system, and deleting the corresponding placement group information in the reloaded configuration file.

The system lctl start icfs-osd @ $ i command is called to open the osd service.

S3, deleting the osd process data corresponding to the placement group and marking the osd process as finished.

First, the call systemctl stop icfs-osd @ $ i command stops the osd service.

The icfs-objectstore-tool is invoked to delete the data in the OSDjournal corresponding to the placement group deleted in step S2. The specific calling method of the icfs-object store-tool side comprises the following steps: icfs-objectstore-tool-data-path/var/lib/icfs/osd/icfs- $ i-journel-path/var/lib/icfs/osd/icfs- $ i/journel-type filescore-oprm-past-intervals-pgid $ { pg _ num } s $ num.

The osd process is marked as complete, and after the marking is completed, bluestore considers that pg can be listed as authority date. The specific marking method of the process comprises the following steps: icfs-objectstore-tool-data-path/var/lib/icfs/osd/icfs- $ i-journel-path/var/lib/icfs/osd/icfs- $ i/journel-typefilescore-op mark-complete-pgid $ { pg _ num } s num.

And calling a command system daemon-load to reload the configuration file, and updating the process information of the deleted placement group in the reloaded configuration file to ensure that the process data does not interfere with the distributed file system.

The command system start icfs-osd @ $ i is called to open the osd service.

And S4, checking the cluster state, and checking the cluster fault recovery result according to the cluster state.

Invoke the icfs-s command to view cluster state, for example:

the cluster state mainly includes the following categories:

HEALTH _ WARN represents the cluster war state, and the normal state is OK

Several states of pgs are illustrated below:

waiting is in a synchronous state. PG is executing synchronous processing

degraded is degraded state. After Peering is completed, the PG detects that any PG instance has an inconsistent (needs to be synchronized/repaired) object or the current ActingSet is less than the storage pool copy number.

Recovering: is recovering state. The cluster is performing migration or synchronization of objects and their copies.

recovery _ wait: waiting for Recovery resource reservation.

stuck unclean: a non-clean state. PG cannot recover from the last failure.

Active: an active state. The PG can handle read and write requests from clients normally.

The normal PG state is 100% active + clean, which means that all PGs are accessible and all copies are available for all PGs. Therefore, after the cluster state is obtained, whether the cluster state is 100% active + clean (active clean state) or not is checked, and if the cluster state is the active + clean (active clean state), the cluster fault recovery is successful; if not, indicating that the cluster failure recovery fails.

As shown in fig. 2, the apparatus 200 includes:

a group search unit 210 configured to search a placement group in which the lost object is located;

a group deleting unit 220 configured to delete the placement group;

a process deleting unit 230 configured to delete the osd process data corresponding to the placement group and mark the osd process as completed;

and the state verification unit 240 is configured to check the cluster state and verify a cluster failure recovery result according to the cluster state.

Optionally, as an embodiment of the present application, the group deleting unit includes:

a first stopping module configured to stop the osd service;

the first loading module is configured to reload the system configuration file;

a first initiation module configured to initiate the osd service.

Optionally, as an embodiment of the present application, the process deleting unit includes:

a second stopping module configured to stop the osd service;

a second loading module configured to reload the system configuration file;

a second starting module configured to start the osd service.

Optionally, as an embodiment of the present application, the state verification unit includes:

a state judgment module configured to judge the state;

Fig. 3 is a schematic structural diagram of a terminal device 300 according to an embodiment of the present invention, where the terminal device 300 may be used to execute the method for recovering from a failure in a distributed storage system according to the embodiment of the present application.

Among them, the terminal apparatus 300 may include: a processor 310, a memory 320, and a communication unit 330. The components communicate via one or more buses, and those skilled in the art will appreciate that the architecture of the servers shown in the figures is not limiting of the application, and may be a bus architecture, a star architecture, a combination of more or fewer components than those shown, or a different arrangement of components.

The memory 320 may be used for storing instructions executed by the processor 310, and the memory 320 may be implemented by any type of volatile or non-volatile storage terminal or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk. The executable instructions in memory 320, when executed by processor 310, enable terminal 300 to perform some or all of the steps in the method embodiments described below.

The processor 310 is a control center of the storage terminal, connects various parts of the entire electronic terminal using various interfaces and lines, and performs various functions of the electronic terminal and/or processes data by operating or executing software programs and/or modules stored in the memory 320 and calling data stored in the memory. The processor may be composed of an Integrated Circuit (IC), for example, a single packaged IC, or a plurality of packaged ICs connected with the same or different functions. For example, the processor 310 may include only a Central Processing Unit (CPU). In the embodiments of the present application, the CPU may be a single arithmetic core or may include multiple arithmetic cores.

A communication unit 330, configured to establish a communication channel so that the storage terminal can communicate with other terminals. And receiving user data sent by other terminals or sending the user data to other terminals.

The present application also provides a computer storage medium, wherein the computer storage medium may store a program, and the program may include some or all of the steps in the embodiments provided in the present application when executed. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM) or a Random Access Memory (RAM).

Therefore, the fault recovery result is verified by positioning the placement group where the lost object is located, deleting the placement group where the lost object is located and osd process data of the placement group, marking the osd process as a completion state, and checking the cluster state. For the cluster which exceeds the fault domain, the invention can continuously provide the service to the outside by the cluster without destroying the existing cluster condition, and ensure the cyclic coverage writing and the playback scene of the video required under certain conditions.

Those skilled in the art will clearly understand that the techniques in the embodiments of the present application may be implemented by way of software plus a required general hardware platform. Based on such understanding, the technical solutions in the embodiments of the present application may be embodied in the form of a software product, where the computer software product is stored in a storage medium, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and the like, and includes several instructions to enable a computer terminal (which may be a personal computer, a server, or a second terminal, a network terminal, and the like) to perform all or part of the steps of the method according to the embodiments of the present invention.

The same and similar parts in the various embodiments in this specification may be referred to each other. Especially, for the terminal embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant points can be referred to the description in the method embodiment.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

Although the present invention has been described in detail by referring to the drawings in connection with the preferred embodiments, the present invention is not limited thereto. Various equivalent modifications or substitutions can be made on the embodiments of the present invention by those skilled in the art without departing from the spirit and scope of the present invention, and these modifications or substitutions are within the scope of the present invention/any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A distributed storage system failure recovery method, the method comprising:

searching a placing group where the lost object is located;

deleting the placement group;

2. The method of claim 1, wherein deleting the placement group comprises:

stop the osd service;

calling an icfs-objectstore-tool to delete the placement group;

reloading the system configuration file;

the osd service is started.

3. The method of claim 1, wherein deleting osd process data corresponding to a placement group and marking the osd process as complete comprises:

stop the osd service;

marking the corresponding osd process as a completion state;

reloading the system configuration file;

the osd service is opened.

4. The method of claim 1, wherein said checking the cluster status and verifying the cluster failure recovery result according to the cluster status comprises:

calling an icfs-s tool to check the cluster state;

judging whether the cluster state is an active clean state:

if so, judging that the cluster fault recovery is successful;

if not, judging that the cluster fault recovery fails.

5. A distributed storage system failover system, the system comprising:

a group deleting unit configured to delete the placement group;

6. The system of claim 5, wherein the group deletion unit comprises:

a first stopping module configured to stop the osd service;

the first loading module is configured to reload the system configuration file;

a first initiation module configured to initiate the osd service.

7. The system of claim 5, wherein the process deletion unit comprises:

a second stopping module configured to stop the osd service;

a second loading module configured to reload the system configuration file;

a second starting module configured to start the osd service.

8. The system of claim 5, wherein the state verification unit comprises:

a state judgment module configured to judge the state;

9. A terminal, comprising:

a processor;

a memory for storing instructions for execution by the processor;

wherein the processor is configured to perform the method of any one of claims 1-4.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.