CN116841459A

CN116841459A - Single-copy high-availability method of distributed file system

Info

Publication number: CN116841459A
Application number: CN202310537898.5A
Authority: CN
Inventors: 杨兴博
Original assignee: Beijing Yanrong Technology Co ltd
Current assignee: Beijing Yanrong Technology Co ltd
Priority date: 2023-05-15
Filing date: 2023-05-15
Publication date: 2023-10-03

Abstract

The invention discloses a single-copy high availability method of a distributed file system, which specifically comprises the following steps: s1, a user creates 10 files file1, file2 and file10 at a client; s2, the client sends file1-file10 data to a storage server; s3, when the service of the node 1is normal, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 1iscsi mounts the disks; the invention relates to the technical field of file systems. The single-copy high-availability method of the distributed file system reduces storage overhead and can better adapt to the cloud primary scene. At present, most cloud storage products have own copy mechanisms, and the new scheme can effectively utilize the space of cloud storage, prevent extra redundancy, reduce performance improvement caused by copy writing, separate service and data, and prevent service reliability from influencing data availability. In addition, the storage device can be flexibly adjusted after separation, and the service device does not need to be adjusted.

Description

Single-copy high-availability method of distributed file system

Technical Field

The invention relates to the technical field of file systems, in particular to a single-copy high-availability method of a distributed file system.

Background

Distributed file systems are an important component of the development of storage systems in recent years. And high availability of services is particularly important for storage services. The traditional single copy mode does not guarantee high availability. When the service is down, storage may be interrupted until the service resumes. Currently, most of the storage systems on the market adopt a data redundancy mode such as multiple copies, EC and the like to ensure service availability.

However, at present, cloud storage is gradually rising, and data redundancy of a file system and data redundancy of the cloud storage can be overlapped with each other, so that storage cost is increased sharply.

An IP-SAN storage-based approach is provided herein to ensure service availability of a file system in a single copy scenario.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a single-copy high-availability method of a distributed file system, and solves the problem that the effect of the traditional storage scheme is not very good.

In order to achieve the above purpose, the invention is realized by the following technical scheme: a single-copy high availability method of a distributed file system specifically comprises the following steps:

s1, a user creates 10 files file1, file2 and file10 at a client;

s2, the client sends file1-file10 data to a storage server;

s3, when the service of the node 1is normal, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 1iscsi mounts the disks;

s4, when the service of the node 1 fails, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 2iscsi mounts the disks;

s5, when the user reads data at the client, for example, file1, the user sends a reading request to the storage server;

s6, when the node 1is normal, the storage server reads file1 file content from the disk 1 through the node 1 and returns the file1 file content to the client;

and S7, when the node 1 fails, the storage server reads file1 file content from the disk 1 through the node 2 and returns the file1 file content to the client.

Preferably, in S3, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.

Preferably, in S4, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.

Advantageous effects

The invention provides a single-copy high availability method of a distributed file system. Compared with the prior art, the method has the following beneficial effects:

(1) The single-copy high-availability method of the distributed file system reduces the storage overhead and can better adapt to the cloud primary scene. Most of the cloud storage products at present have own copy mechanisms, and the new scheme can effectively utilize the space of cloud storage to prevent extra redundancy.

(2) The single-copy high-availability method of the distributed file system reduces performance improvement caused by copy writing, service and data separation, and prevents service reliability from influencing data availability. In addition, the storage device can be flexibly adjusted after separation, and the service device does not need to be adjusted.

Drawings

FIG. 1is a diagram of a single copy mode architecture under a conventional approach of the present invention;

FIG. 2is a diagram of a multi-copy mode architecture according to a conventional scheme of the present invention;

fig. 3 is a schematic diagram of an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-3, the present invention provides a technical solution: a single copy high availability approach to distributed file systems, conventional storage schemes typically have single copy and multiple copy modes. In the single copy mode, the user data has only one copy of data at the storage server. In the multiple copy mode, the user data will retain multiple copies. Taking the actual scenario as an example, if you have 10 blocks of 100G hard disk for storing user data, single copy and multiple copy modes are as follows:

the architecture diagram of the single copy mode is shown in fig. 1, and in the single copy mode, the user data access flow is as follows:

1: a user creates 10 files file1, file2, file10 at a client;

2: the client sends the file1-file10 data to a storage server;

3: the storage server uniformly stores the data files on each disk, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10;

4: when a user reads data (such as file 1) from a client, sending a read request to a storage server;

5: the storage server reads the file content from the disk 1 and returns it to the client.

In the single copy mode, 10 blocks of 100G data discs can be used for storing user data, and the space utilization rate reaches 100%. However, this solution has serious drawbacks: high availability is not supported. Upon a failure (e.g., a network outage or power loss) of node 1 or node 2, the files on disks 1-5 or 6-10 may be rendered inaccessible.

The architecture diagram of the multi-copy mode is shown in FIG. 2 (two copies are taken as an example)

In fig. 2, node 1is a master node and node 2is a standby node. The disks 1-5 on the node 1 and the disks 6-10 on the node 2 are mirror images of each other, respectively, that is, when data is written into the disks 1-5, the same data is written into the disks 6-10. In the double-copy mode, the user data access flow is as follows:

1: a user creates 5 files file1, file2, file5 at a client;

2: the client sends file1-file15 data to a storage server;

3: the storage server uniformly stores the data files on each disk of the node 1, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file5 is stored on disk 5. The data of file1-5 is then synchronously copied to disk 6-10 of node 2.

4: when a user reads data at a client, such as file1, a read request is sent to a storage server;

5: when the node 1is normal, the storage server reads the file content of the file1 from the disk 1 of the node 1, and returns the file content to the client.

6: when the node 1 fails, the storage server reads the file content of the file1 from the disk 6 of the node 2, and returns the file content to the client.

This solution solves the high availability problem of single copy mode, but there is also another problem: the space utilization is low. In the double-copy mode, only half of the space can be used for storing user data, and the other half of the space is needed for backup, so that the space utilization rate is 50%. The more the copies are, the more serious the space waste is, and in order to solve the problem that the single copy does not support high availability, the multiple copies have the space waste. We propose this patent, a way to achieve high availability in single copy mode based on IP-SAN storage.

As shown in fig. 3, in this embodiment, node 1is a master node and node 2is a standby node. Both nodes mount the same disk group through iscsi protocol. The same disk is accessible to both slave node 1 and slave node 2. The specific user data access flow is as follows:

1: a user creates 10 files file1, file2, file10 at a client;

2: the client sends the file1-file10 data to a storage server;

3: when the node 1is in normal service, the storage server uniformly writes the data files into the disks 1-10 in a mode of mounting the disks by the node 1iscsi, for example, file 1is stored on the disk 1, file 2is stored on the disk 2, and the like, and file10 is stored on the disk 10;

4: when the service of the node 1 fails, the storage server uniformly writes the data files into the disks 1-10 in a mode of mounting the disks by the node 2iscsi, for example, file 1is stored on the disk 1, file 2is stored on the disk 2, and the like, and file10 is stored on the disk 10;

5: when a user reads data at a client, such as file1, a read request is sent to a storage server;

6: when the node 1is normal, the storage server reads file1 file content from the disk 1 through the node 1 and returns the file1 file content to the client;

7: when the node 1 fails, the storage server reads the file1 file content from the disk 1 through the node 2, and returns the file1 file content to the client.

The main characteristics of the scheme are as follows:

1: decoupling service availability from data availability

In a new approach, we use IP-SAN to provide storage instead of local disks, separating data and services. After separation, the availability of the service will no longer affect the availability of the data. The service for providing the data supports dynamic switching, and the data availability is ensured not to be influenced by the service availability any more.

2: data access uniqueness

When a plurality of services mount the same data disk, the services may be subjected to different reasons such as service states, network states and the like, and the risks of simultaneous mounting and dirty data reading and writing occur. In our scheme, when the storage service mounts data, the Multiple Mount Protection (MMP) characteristic is used, so that the data can be read and written by one service at the same time, and the problem of conflict does not exist.

3: dynamic handoff of services

In addition to the storage service, there is a management service responsible for switching data services. When the main service of the data service is in fault, downtime, network abnormality or other abnormal conditions cannot be accessed, the management service marks the abnormal state of the main service and performs master-slave switching. And after the standby service is switched to the main service, the standby service is re-installed and stored, and the service is normally provided for the outside. And after the main service is recovered to be normal, the management service can re-mark the state of the main service and change the state into standby waiting of the standby service.

4: without data redundancy

In this scheme, since the data availability is no longer affected by the service availability. Therefore, the double-copy and erasure code data redundancy modes are not needed, the usability is provided, and a large amount of disk cost is saved. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A single copy high availability method for a distributed file system, characterized by: the method specifically comprises the following steps:

s1, a user creates 10 files file1, file2 and file10 at a client;

s2, the client sends file1-file10 data to a storage server;

2. A single copy high availability method of a distributed file system as claimed in claim 1, wherein: in S3, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.

3. A single copy high availability method of a distributed file system as claimed in claim 1, wherein: in S4, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.