CN113791922A

CN113791922A - Exception handling method, system and device for distributed storage system

Info

Publication number: CN113791922A
Application number: CN202110874446.7A
Authority: CN
Inventors: 谢有权; 李吉龙
Original assignee: Jinan Inspur Data Technology Co Ltd
Current assignee: Jinan Inspur Data Technology Co Ltd
Priority date: 2021-07-30
Filing date: 2021-07-30
Publication date: 2021-12-14
Anticipated expiration: 2041-07-30
Also published as: CN113791922B

Abstract

The invention provides an exception handling method, a system and a device of a distributed storage system, wherein the method comprises the following steps: receiving request information sent by a client through an object storage unit; if the object group is in a normal state, determining the corresponding object group according to the request information, packaging the request information into a transaction through the determined object group, and storing the transaction; if the object group is in an abnormal state, determining the corresponding object group according to the request information, packaging the request information and the log information into a transaction through the determined object group, and storing the transaction; and when the object group is in an abnormal state, judging whether the number of the log information exceeds a preset threshold value or not through a preset judgment mark, if so, executing a data full recovery process of the object group, and otherwise, executing a data incremental recovery process of the object group. The invention can effectively simplify the log recording, reduce the data recovery amount in the data recovery process after the distributed storage system is abnormal, and reduce the pressure of the database.

Description

Exception handling method, system and device for distributed storage system

Technical Field

The invention relates to the technical field of computer storage, in particular to an exception handling method, system and device of a distributed storage system.

Background

With the rapid development of the fields of cloud computing, big data and the like, higher requirements are put forward on the performance of distributed storage. Especially, with the continuous improvement of the performance of the high-speed storage device and the continuous popularization of the application scenario, how the distributed storage system under the full-flash device exerts the performance of the high-performance storage device is an important direction for currently researching the distributed storage system.

Currently, when I/O processing is performed under a distributed storage system architecture, corresponding log records are generated and stored. Specifically, the method comprises the following steps: under the copy redundancy strategy, a client sends a write request to a main object storage service unit of the object, then the object storage unit finds an object group where the object is located, the object group further encapsulates the request into a transaction according to the operation type, records the log of the object group, and encapsulates the log into the transaction; and finally, sending the transaction to a slave object storage service unit and a local back-end storage of the object to complete the redundancy and the disk dropping of the data. Wherein the log of the object group is recorded into a database stored at the back end.

However, since the distributed storage system generally operates in a configuration of slow storage devices and a mixture of slow and high speed devices, the architectural design mainly focuses on the model of the slow devices and the mixture of slow and high speed devices, and the architecture exposes some defects in the case of full high speed devices. Particularly, intersection exists between the data I/O flow and the exception handling flow; each time the I/O processing is carried out, log information is generated, and the log information is packaged together with data and sent to the slave nodes and the local storage engine for storage. Because the log information needs to be submitted to the database, the pressure of the database is increased under the condition of massive small data blocks, the consumption of a CPU is increased, and equipment resources are occupied.

Disclosure of Invention

In view of the above problems, an object of the present invention is to provide an exception handling method, system and device for a distributed storage system, which can effectively simplify log logging, reduce the recovery amount of data in the data recovery process after an exception occurs in the distributed storage system, and reduce the pressure on a database.

In order to achieve the purpose, the invention is realized by the following technical scheme: an exception handling method of a distributed storage system, comprising:

receiving request information sent by a client through an object storage unit;

the object storage unit judges the working state of the current object group through the state machine of the object group;

if the object group is in a normal state, determining the corresponding object group according to the request information, packaging the request information into a transaction through the determined object group, and storing the transaction;

if the object group is in an abnormal state, determining the corresponding object group according to the request information, packaging the request information and the log information into a transaction through the determined object group, and storing the transaction;

and when the object group is in an abnormal state, judging whether the number of the log information exceeds a preset threshold value or not through a preset judgment mark, if so, executing a data full recovery process of the object group, and otherwise, executing a data incremental recovery process of the object group.

Further, the determining the corresponding object group according to the request information includes:

reading a request object in the request information;

and calculating a corresponding object group according to the ID of the request object.

Further, the encapsulating the request information into a transaction through the determined object group and storing the transaction includes:

packaging the request information into a transaction through the object group;

determining a corresponding redundant object storage service unit according to the request object;

storing the transaction in a redundant object storage service unit;

the redundant object storage service unit analyzes the transaction and generates object data;

and writing the object data into the preset storage device.

Further, the encapsulating the request information and the log information into a transaction through the determined object group, and the storing includes:

packaging the request information into a transaction through the object group;

recording log information and packaging the log information into a transaction;

storing the transaction in a redundant object storage service unit;

the redundant object storage service unit analyzes the transaction and generates object data and log information;

and writing the object database into a preset storage device, and writing the log information into the database.

Further, the preset determination flag is a determination flag do _ backoff file set according to a preset threshold log _ max of the number of log information, and if the number of log information is greater than log _ max, the value of do _ backoff file is set to tube.

Further, the determining, by the preset determination flag, whether the number of log information exceeds a preset threshold includes:

reading the value of the judgment mark do _ backoff fill, and if the value is true, the number of the log information exceeds a preset threshold; if not, the amount of the log information does not exceed the preset threshold.

Further, the data full recovery process of the object group includes:

all objects are copied to the object storage service unit from the redundant object group of the redundant object storage service unit for object data recovery.

Further, the data increment recovery process of the object group includes:

acquiring log information of the object group in abnormal time from a redundant object group of a redundant object storage service unit, calculating a data range to be updated of each object of the object group according to the log information, and writing the data range into an update parameter list;

and according to the updated parameter list, finding out the corresponding object in the redundant object group of the redundant object storage service unit, and performing data recovery on the abnormal object.

Correspondingly, the invention also discloses an exception handling system of the distributed storage system, which comprises: the receiving module is used for receiving request information sent by the client through the object storage unit;

the state judgment module is used for judging the working state of the current object group by using the object storage unit through the state machine of the object group;

the first storage module is used for determining a corresponding object group according to the request information, packaging the request information into a transaction through the determined object group and storing the transaction;

the second storage module is used for determining a corresponding object group according to the request information, packaging the request information and the log information into a transaction through the determined object group, and storing the transaction;

and the data recovery module is used for judging whether the number of the log information exceeds a preset threshold value or not through a preset judgment mark, if so, executing a data full recovery process of the object group, and otherwise, executing a data incremental recovery process of the object group.

Further, the first storage module is specifically configured to:

packaging the request information into a transaction through the object group;

storing the transaction in a redundant object storage service unit;

and writing the object data into the preset storage device.

Further, the second storage module is specifically configured to:

packaging the request information into a transaction through the object group;

recording log information and packaging the log information into a transaction;

storing the transaction in a redundant object storage service unit;

Correspondingly, the invention discloses an exception handling device of a distributed storage system, which comprises:

a memory for storing an exception handler of the distributed storage system;

a processor for implementing the steps of the exception handling method of the distributed storage system as described in any one of the above when executing the exception handler of the distributed storage system.

Accordingly, the present invention discloses a readable storage medium, on which an exception handling program of a distributed storage system is stored, wherein the exception handling program of the distributed storage system, when executed by a processor, implements the steps of the exception handling method of the distributed storage system according to any one of the above.

Compared with the prior art, the invention has the beneficial effects that: the invention provides an exception handling method, system and device of a distributed storage system, which can optimize exception handling under a distributed storage system frame and judge whether an object group records log information or not according to the state of the object group; logs are not recorded when the distributed cluster is in a normal condition, so that the data volume sent to the redundant nodes can be reduced, and the pressure of a DB database in the back-end storage is relieved; meanwhile, the consumption of the CPU can be reduced, and the system resources can be saved. The performance of the distributed storage in a full-flash environment is further improved.

Therefore, compared with the prior art, the invention has prominent substantive features and remarkable progress, and the beneficial effects of the implementation are also obvious.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a system block diagram of the present invention.

In the figure, 1 is a receiving module; 2 is a state judgment module; 3 is a first storage module; 4 is a second storage module; and 5, a data recovery module.

Detailed Description

The core of the invention is to provide an exception handling method for a distributed storage system, and in the prior art, the distributed storage system architecture can expose some defects under full high-speed equipment. Particularly, intersection exists between the data I/O flow and the exception handling flow; each time the I/O processing is carried out, log information is generated, and the log information is packaged together with data and sent to the slave nodes and the local storage engine for storage. Because the log information needs to be submitted to the database, the pressure of the database is increased under the condition of massive small data blocks, the consumption of a CPU is increased, and equipment resources are occupied.

According to the exception handling method of the distributed storage system, whether a log is recorded or not in each request is judged according to the state of the object group; when the object group is in a normal state, no log is recorded; only if an abnormality occurs in the state of the object group, a log is recorded. And determining a data full recovery flow adopting the object group or a data increment recovery flow adopting the object group according to the quantity of the log information during data recovery. Therefore, the method can effectively simplify the log recording, reduce the data recovery amount in the data recovery process after the distributed storage system is abnormal, and reduce the pressure of the database.

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The first embodiment is as follows:

as shown in fig. 1, the present embodiment provides an exception handling method for a distributed storage system, including the following steps:

s1: and receiving request information sent by the client through the object storage unit.

S2: the object storage unit judges the working state of the current object group through the state machine of the object group.

S3: and if the object group is in a normal state, determining the corresponding object group according to the request information, packaging the request information into a transaction through the determined object group, and storing the transaction.

If the object group is in a normal state, the request object in the request information is read, and the corresponding object group is calculated according to the ID of the request object. Then packaging the request information into a transaction through the object group; and determining a corresponding redundant object storage service unit according to the request object. After the determination, storing the transaction in a redundant object storage service unit; and analyzing the transaction through the redundant object storage service unit to generate object data. And finally, writing the object data into the preset storage device.

S4: and if the object group is in an abnormal state, determining the corresponding object group according to the request information, packaging the request information and the log information into a transaction through the determined object group, and storing the transaction.

If the object group is in an abnormal state, the state machine reads the request object in the request information, and calculates the corresponding object group according to the ID of the request object. And packaging the request information into a transaction through the object group, recording log information, and packaging the log information into the transaction. Then, the corresponding redundant object storage service unit is determined according to the request object, and the transaction is stored in the determined redundant object storage service unit. And analyzing the transaction through the redundant object storage service unit to generate object data and log information, writing the object database into preset storage equipment, and writing the log information into the database.

S5: and when the object group is in an abnormal state, judging whether the number of the log information exceeds a preset threshold value or not through a preset judgment mark, if so, executing a data full recovery process of the object group, and otherwise, executing a data incremental recovery process of the object group.

In this step, the preset determination flag adopts a determination flag do _ backoff file set according to a preset threshold log _ max of the number of log information, and if the number of log information is greater than log _ max, the value of do _ backoff file is set to true.

Firstly, reading the value of a judgment mark do _ backfill, if the value is tube, the number of log information exceeds a preset threshold value, and executing a data full recovery process of an object group; if not, the quantity of the log information does not exceed a preset threshold value, and a data increment recovery process of the object group is executed.

The data full recovery process of the object group specifically comprises the following steps: all objects are copied to the object storage service unit from the redundant object group of the redundant object storage service unit for object data recovery.

The data increment recovery process of the object group specifically comprises the following steps: firstly, obtaining the log information of the object group in the time of the abnormal state from the redundant object group of the redundant object storage service unit, calculating the data range of each object of the object group which needs to be updated according to the log information, and writing the data range into the update parameter list missing. And then, finding out a corresponding object in a redundant object group of the redundant object storage service unit according to the updating parameter list missing, and performing data recovery on the abnormal object.

The embodiment provides an exception handling method for a distributed storage system, which can optimize exception handling in a distributed storage system framework and judge whether an object group records log information or not according to the state of the object group; logs are not recorded when the distributed cluster is in a normal condition, so that the data volume sent to the redundant nodes can be reduced, and the pressure of a DB database in the back-end storage is relieved; meanwhile, the consumption of the CPU can be reduced, and the system resources can be saved. The performance of the distributed storage in a full-flash environment is further improved.

Example two:

based on the first embodiment, as shown in fig. 2, the present invention further discloses an exception handling system of a distributed storage system, including: the device comprises a receiving module 1, a state judging module 2, a first storage module 3, a second storage module 4 and a data recovery module 5.

And the receiving module 1 is used for receiving the request information sent by the client through the object storage unit.

And the state judgment module 2 is used for judging the working state of the current object group by using the object storage unit through the state machine of the object group.

And the first storage module 3 is used for determining a corresponding object group according to the request information, packaging the request information into a transaction through the determined object group, and storing the transaction. The first storage module 3 is specifically configured to: and reading the request object in the request information, and calculating a corresponding object group according to the ID of the request object. Packaging the request information into a transaction through the object group; and determining a corresponding redundant object storage service unit according to the request object. Storing the transaction in a redundant object storage service unit; and analyzing the transaction through the redundant object storage service unit to generate object data. And writing the object data into the preset storage device.

And the second storage module 4 is used for determining a corresponding object group according to the request information, packaging the request information and the log information into a transaction through the determined object group, and storing the transaction. The second storage module 4 is specifically configured to: and reading the request object in the request information, and calculating a corresponding object group according to the ID of the request object. And packaging the request information into a transaction through the object group, recording log information, and packaging the log information into the transaction. And determining a corresponding redundant object storage service unit according to the request object, and storing the transaction in the determined redundant object storage service unit. And analyzing the transaction through the redundant object storage service unit to generate object data and log information, writing the object database into preset storage equipment, and writing the log information into the database.

And the data recovery module 5 is configured to determine whether the number of the log information exceeds a preset threshold through a preset determination flag, if so, execute a data full recovery process of the object group, and otherwise, execute a data incremental recovery process of the object group. The data recovery module 5 is specifically configured to: after the abnormal object group is abnormal, initializing the object group for a period of time, judging whether the judgment mark do _ background is true, indicating that the quantity of log information exceeds a maximum threshold value log _ max if the judgment mark do _ background is true, and copying all the objects in the object group from the redundant object group if all the objects in the object group need to be recovered; if the do _ background file is not true, the object group needs to acquire the log information in the period of time from the redundant object group, calculate a data range, which needs to be updated, of each object in the object group according to the log information, and place the data range into a missing list; and then performing data recovery on the objects in the object group according to the missing list.

The embodiment provides an exception handling system of a distributed storage system, which can judge whether to record a log or not in each request according to the state of an object group; when the object group is in a normal state, no log is recorded; only if an abnormality occurs in the state of the object group, a log is recorded. The process of obtaining the authority log is simplified in the data recovery process.

Example three:

the embodiment discloses an exception handling device of a distributed storage system, which comprises a processor and a memory; wherein the processor implements the following steps when executing an exception handler of the distributed storage system stored in the memory:

1. and receiving request information sent by the client through the object storage unit.

2. The object storage unit judges the working state of the current object group through the state machine of the object group.

3. And if the object group is in a normal state, determining the corresponding object group according to the request information, packaging the request information into a transaction through the determined object group, and storing the transaction.

4. And if the object group is in an abnormal state, determining the corresponding object group according to the request information, packaging the request information and the log information into a transaction through the determined object group, and storing the transaction.

5. And when the object group is in an abnormal state, judging whether the number of the log information exceeds a preset threshold value or not through a preset judgment mark, if so, executing a data full recovery process of the object group, and otherwise, executing a data incremental recovery process of the object group.

Further, the exception handling apparatus of the distributed storage system in this embodiment may further include:

the input interface is used for acquiring an exception handling program of the distributed storage system imported from the outside, storing the acquired exception handling program of the distributed storage system into the memory, and also used for acquiring various instructions and parameters transmitted by external terminal equipment and transmitting the instructions and parameters to the processor, so that the processor utilizes the instructions and the parameters to perform corresponding processing. In this embodiment, the input interface may specifically include, but is not limited to, a USB interface, a serial interface, a voice input interface, a fingerprint input interface, a hard disk reading interface, and the like.

And the output interface is used for outputting various data generated by the processor to the terminal equipment connected with the output interface, so that other terminal equipment connected with the output interface can acquire various data generated by the processor. In this embodiment, the output interface may specifically include, but is not limited to, a USB interface, a serial interface, and the like.

And the communication unit is used for establishing remote communication connection between the exception handling device of the distributed storage system and the external server so that the exception handling device of the distributed storage system can mount the mirror image file into the external server. In this embodiment, the communication unit may specifically include, but is not limited to, a remote communication unit based on a wireless communication technology or a wired communication technology.

And the keyboard is used for acquiring various parameter data or instructions input by a user through real-time key cap knocking.

And the display is used for displaying relevant information in the short circuit positioning process of the power supply line of the running server in real time.

The mouse can be used for assisting a user in inputting data and simplifying the operation of the user.

The embodiment provides an exception handling device of a distributed storage system, which can judge whether to record a log or not at each request according to the state of an object group; when the object group is in a normal state, no log is recorded; only if an abnormality occurs in the state of the object group, a log is recorded. And determining a data full recovery flow adopting the object group or a data increment recovery flow adopting the object group according to the quantity of the log information during data recovery.

Example four:

the present embodiments also disclose a readable storage medium, where the readable storage medium includes Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, a hard disk, a removable hard disk, a CD-ROM, or any other form of storage medium known in the art. The readable storage medium has stored therein an exception handler of the distributed storage system, which when executed by a processor implements the steps of:

In conclusion, the method and the device can effectively simplify the log recording, reduce the recovery amount of data in the data recovery process after the distributed storage system is abnormal, and reduce the pressure of the database.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The method disclosed by the embodiment corresponds to the system disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed system, system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional modules in the embodiments of the present invention may be integrated into one processing unit, or each module may exist alone physically, or two or more modules are integrated into one unit.

Similarly, each processing unit in the embodiments of the present invention may be integrated into one functional module, or each processing unit may exist physically, or two or more processing units are integrated into one functional module.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The exception handling method, system, apparatus and readable storage medium of the distributed storage system provided by the present invention are described in detail above. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. An exception handling method for a distributed storage system, comprising:

receiving request information sent by a client through an object storage unit;

2. The method of exception handling for a distributed storage system according to claim 1, wherein said determining a corresponding set of objects based on request information comprises:

reading a request object in the request information;

3. The method for exception handling in a distributed storage system according to claim 2, wherein said encapsulating request information into a transaction by the determined object group and storing comprises:

packaging the request information into a transaction through the object group;

storing the transaction in a redundant object storage service unit;

and writing the object data into the preset storage device.

4. The method for exception handling in a distributed storage system according to claim 2, wherein said encapsulating request information and log information into a transaction by the determined object group and storing comprises:

packaging the request information into a transaction through the object group;

recording log information and packaging the log information into a transaction;

storing the transaction in a redundant object storage service unit;

5. The method for handling the exception of the distributed storage system according to claim 4, wherein the predetermined determination flag is a determination flag do _ back set according to a predetermined threshold log _ max of the amount of log information, and if the amount of log information is greater than log _ max, the value of do _ back is set to tune.

6. The method for processing the exception in the distributed storage system according to claim 5, wherein the determining whether the amount of the log information exceeds the preset threshold value by the preset determination flag includes:

7. The exception handling method for the distributed storage system according to claim 4, wherein the data full volume recovery process for the object group comprises:

8. The exception handling method for the distributed storage system according to claim 4, wherein the data increment recovery process for the object group includes:

9. An exception handling system for a distributed storage system, comprising:

the receiving module is used for receiving request information sent by the client through the object storage unit;

10. An exception handling apparatus for a distributed storage system, comprising:

a memory for storing an exception handler of the distributed storage system;

a processor for implementing the steps of the exception handling method of the distributed storage system according to any one of claims 1 to 8 when executing the exception handler of the distributed storage system.