CN116841459A - Single-copy high-availability method of distributed file system - Google Patents
Single-copy high-availability method of distributed file system Download PDFInfo
- Publication number
- CN116841459A CN116841459A CN202310537898.5A CN202310537898A CN116841459A CN 116841459 A CN116841459 A CN 116841459A CN 202310537898 A CN202310537898 A CN 202310537898A CN 116841459 A CN116841459 A CN 116841459A
- Authority
- CN
- China
- Prior art keywords
- node
- data
- disk
- file1
- service
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 19
- 238000000926 separation method Methods 0.000 abstract description 4
- 230000006872 improvement Effects 0.000 abstract description 2
- 230000007246 mechanism Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 5
- 230000009471 action Effects 0.000 description 3
- 230000002159 abnormal effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 239000002699 waste material Substances 0.000 description 2
- 230000005856 abnormality Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000630 rising effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0628—Interfaces specially adapted for storage systems making use of a particular technique
- G06F3/0638—Organizing or formatting or addressing of data
- G06F3/0643—Management of files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0602—Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
- G06F3/061—Improving I/O performance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/06—Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
- G06F3/0601—Interfaces specially adapted for storage systems
- G06F3/0668—Interfaces specially adapted for storage systems adopting a particular infrastructure
- G06F3/0671—In-line storage system
- G06F3/0673—Single storage device
- G06F3/0674—Disk device
- G06F3/0676—Magnetic disk device
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a single-copy high availability method of a distributed file system, which specifically comprises the following steps: s1, a user creates 10 files file1, file2 and file10 at a client; s2, the client sends file1-file10 data to a storage server; s3, when the service of the node 1is normal, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 1iscsi mounts the disks; the invention relates to the technical field of file systems. The single-copy high-availability method of the distributed file system reduces storage overhead and can better adapt to the cloud primary scene. At present, most cloud storage products have own copy mechanisms, and the new scheme can effectively utilize the space of cloud storage, prevent extra redundancy, reduce performance improvement caused by copy writing, separate service and data, and prevent service reliability from influencing data availability. In addition, the storage device can be flexibly adjusted after separation, and the service device does not need to be adjusted.
Description
Technical Field
The invention relates to the technical field of file systems, in particular to a single-copy high-availability method of a distributed file system.
Background
Distributed file systems are an important component of the development of storage systems in recent years. And high availability of services is particularly important for storage services. The traditional single copy mode does not guarantee high availability. When the service is down, storage may be interrupted until the service resumes. Currently, most of the storage systems on the market adopt a data redundancy mode such as multiple copies, EC and the like to ensure service availability.
However, at present, cloud storage is gradually rising, and data redundancy of a file system and data redundancy of the cloud storage can be overlapped with each other, so that storage cost is increased sharply.
An IP-SAN storage-based approach is provided herein to ensure service availability of a file system in a single copy scenario.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a single-copy high-availability method of a distributed file system, and solves the problem that the effect of the traditional storage scheme is not very good.
In order to achieve the above purpose, the invention is realized by the following technical scheme: a single-copy high availability method of a distributed file system specifically comprises the following steps:
s1, a user creates 10 files file1, file2 and file10 at a client;
s2, the client sends file1-file10 data to a storage server;
s3, when the service of the node 1is normal, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 1iscsi mounts the disks;
s4, when the service of the node 1 fails, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 2iscsi mounts the disks;
s5, when the user reads data at the client, for example, file1, the user sends a reading request to the storage server;
s6, when the node 1is normal, the storage server reads file1 file content from the disk 1 through the node 1 and returns the file1 file content to the client;
and S7, when the node 1 fails, the storage server reads file1 file content from the disk 1 through the node 2 and returns the file1 file content to the client.
Preferably, in S3, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.
Preferably, in S4, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.
Advantageous effects
The invention provides a single-copy high availability method of a distributed file system. Compared with the prior art, the method has the following beneficial effects:
(1) The single-copy high-availability method of the distributed file system reduces the storage overhead and can better adapt to the cloud primary scene. Most of the cloud storage products at present have own copy mechanisms, and the new scheme can effectively utilize the space of cloud storage to prevent extra redundancy.
(2) The single-copy high-availability method of the distributed file system reduces performance improvement caused by copy writing, service and data separation, and prevents service reliability from influencing data availability. In addition, the storage device can be flexibly adjusted after separation, and the service device does not need to be adjusted.
Drawings
FIG. 1is a diagram of a single copy mode architecture under a conventional approach of the present invention;
FIG. 2is a diagram of a multi-copy mode architecture according to a conventional scheme of the present invention;
fig. 3 is a schematic diagram of an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-3, the present invention provides a technical solution: a single copy high availability approach to distributed file systems, conventional storage schemes typically have single copy and multiple copy modes. In the single copy mode, the user data has only one copy of data at the storage server. In the multiple copy mode, the user data will retain multiple copies. Taking the actual scenario as an example, if you have 10 blocks of 100G hard disk for storing user data, single copy and multiple copy modes are as follows:
the architecture diagram of the single copy mode is shown in fig. 1, and in the single copy mode, the user data access flow is as follows:
1: a user creates 10 files file1, file2, file10 at a client;
2: the client sends the file1-file10 data to a storage server;
3: the storage server uniformly stores the data files on each disk, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10;
4: when a user reads data (such as file 1) from a client, sending a read request to a storage server;
5: the storage server reads the file content from the disk 1 and returns it to the client.
In the single copy mode, 10 blocks of 100G data discs can be used for storing user data, and the space utilization rate reaches 100%. However, this solution has serious drawbacks: high availability is not supported. Upon a failure (e.g., a network outage or power loss) of node 1 or node 2, the files on disks 1-5 or 6-10 may be rendered inaccessible.
The architecture diagram of the multi-copy mode is shown in FIG. 2 (two copies are taken as an example)
In fig. 2, node 1is a master node and node 2is a standby node. The disks 1-5 on the node 1 and the disks 6-10 on the node 2 are mirror images of each other, respectively, that is, when data is written into the disks 1-5, the same data is written into the disks 6-10. In the double-copy mode, the user data access flow is as follows:
1: a user creates 5 files file1, file2, file5 at a client;
2: the client sends file1-file15 data to a storage server;
3: the storage server uniformly stores the data files on each disk of the node 1, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file5 is stored on disk 5. The data of file1-5 is then synchronously copied to disk 6-10 of node 2.
4: when a user reads data at a client, such as file1, a read request is sent to a storage server;
5: when the node 1is normal, the storage server reads the file content of the file1 from the disk 1 of the node 1, and returns the file content to the client.
6: when the node 1 fails, the storage server reads the file content of the file1 from the disk 6 of the node 2, and returns the file content to the client.
This solution solves the high availability problem of single copy mode, but there is also another problem: the space utilization is low. In the double-copy mode, only half of the space can be used for storing user data, and the other half of the space is needed for backup, so that the space utilization rate is 50%. The more the copies are, the more serious the space waste is, and in order to solve the problem that the single copy does not support high availability, the multiple copies have the space waste. We propose this patent, a way to achieve high availability in single copy mode based on IP-SAN storage.
As shown in fig. 3, in this embodiment, node 1is a master node and node 2is a standby node. Both nodes mount the same disk group through iscsi protocol. The same disk is accessible to both slave node 1 and slave node 2. The specific user data access flow is as follows:
1: a user creates 10 files file1, file2, file10 at a client;
2: the client sends the file1-file10 data to a storage server;
3: when the node 1is in normal service, the storage server uniformly writes the data files into the disks 1-10 in a mode of mounting the disks by the node 1iscsi, for example, file 1is stored on the disk 1, file 2is stored on the disk 2, and the like, and file10 is stored on the disk 10;
4: when the service of the node 1 fails, the storage server uniformly writes the data files into the disks 1-10 in a mode of mounting the disks by the node 2iscsi, for example, file 1is stored on the disk 1, file 2is stored on the disk 2, and the like, and file10 is stored on the disk 10;
5: when a user reads data at a client, such as file1, a read request is sent to a storage server;
6: when the node 1is normal, the storage server reads file1 file content from the disk 1 through the node 1 and returns the file1 file content to the client;
7: when the node 1 fails, the storage server reads the file1 file content from the disk 1 through the node 2, and returns the file1 file content to the client.
The main characteristics of the scheme are as follows:
1: decoupling service availability from data availability
In a new approach, we use IP-SAN to provide storage instead of local disks, separating data and services. After separation, the availability of the service will no longer affect the availability of the data. The service for providing the data supports dynamic switching, and the data availability is ensured not to be influenced by the service availability any more.
2: data access uniqueness
When a plurality of services mount the same data disk, the services may be subjected to different reasons such as service states, network states and the like, and the risks of simultaneous mounting and dirty data reading and writing occur. In our scheme, when the storage service mounts data, the Multiple Mount Protection (MMP) characteristic is used, so that the data can be read and written by one service at the same time, and the problem of conflict does not exist.
3: dynamic handoff of services
In addition to the storage service, there is a management service responsible for switching data services. When the main service of the data service is in fault, downtime, network abnormality or other abnormal conditions cannot be accessed, the management service marks the abnormal state of the main service and performs master-slave switching. And after the standby service is switched to the main service, the standby service is re-installed and stored, and the service is normally provided for the outside. And after the main service is recovered to be normal, the management service can re-mark the state of the main service and change the state into standby waiting of the standby service.
4: without data redundancy
In this scheme, since the data availability is no longer affected by the service availability. Therefore, the double-copy and erasure code data redundancy modes are not needed, the usability is provided, and a large amount of disk cost is saved. It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (3)
1. A single copy high availability method for a distributed file system, characterized by: the method specifically comprises the following steps:
s1, a user creates 10 files file1, file2 and file10 at a client;
s2, the client sends file1-file10 data to a storage server;
s3, when the service of the node 1is normal, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 1iscsi mounts the disks;
s4, when the service of the node 1 fails, the storage server uniformly writes the data files into the disks 1-10 in a mode that the node 2iscsi mounts the disks;
s5, when the user reads data at the client, for example, file1, the user sends a reading request to the storage server;
s6, when the node 1is normal, the storage server reads file1 file content from the disk 1 through the node 1 and returns the file1 file content to the client;
and S7, when the node 1 fails, the storage server reads file1 file content from the disk 1 through the node 2 and returns the file1 file content to the client.
2. A single copy high availability method of a distributed file system as claimed in claim 1, wherein: in S3, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.
3. A single copy high availability method of a distributed file system as claimed in claim 1, wherein: in S4, for example, file 1is stored on disk 1, file 2is stored on disk 2, and similarly, file10 is stored on disk 10.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310537898.5A CN116841459A (en) | 2023-05-15 | 2023-05-15 | Single-copy high-availability method of distributed file system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310537898.5A CN116841459A (en) | 2023-05-15 | 2023-05-15 | Single-copy high-availability method of distributed file system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116841459A true CN116841459A (en) | 2023-10-03 |
Family
ID=88169444
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310537898.5A Pending CN116841459A (en) | 2023-05-15 | 2023-05-15 | Single-copy high-availability method of distributed file system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116841459A (en) |
-
2023
- 2023-05-15 CN CN202310537898.5A patent/CN116841459A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9501542B1 (en) | Methods and apparatus for volume synchronization | |
US6912669B2 (en) | Method and apparatus for maintaining cache coherency in a storage system | |
US7296126B2 (en) | Storage system and data processing system | |
US7406575B2 (en) | Method and system for storing data | |
US9678686B2 (en) | Managing sequentiality of tracks for asynchronous PPRC tracks on secondary | |
US7861049B2 (en) | Methods and apparatus for archiving digital data | |
CN106407040A (en) | Remote data copy method and system | |
EP3537687B1 (en) | Access method for distributed storage system, related device and related system | |
CN104935654A (en) | Caching method, write point client and read client in server cluster system | |
CN101808127B (en) | Data backup method, system and server | |
CN103777897A (en) | Method and system for copying data between primary and secondary storage locations | |
WO2012075845A1 (en) | Distributed file system | |
JP2008108145A (en) | Computer system, and management method of data using the same | |
US10628298B1 (en) | Resumable garbage collection | |
JP2006227964A (en) | Storage system, processing method and program | |
CN101552799A (en) | Media node fault-tolerance method and device | |
US9513996B2 (en) | Information processing apparatus, computer-readable recording medium having stored program for controlling information processing apparatus, and method for controlling information processing apparatus | |
US7080197B2 (en) | System and method of cache management for storage controllers | |
CN101329691A (en) | Redundant magnetic disk array sharing file system and read-write method | |
US7680839B1 (en) | System and method for resynchronizing mirrored volumes | |
CN114077517A (en) | Data processing method, equipment and system | |
WO2020135889A1 (en) | Method for dynamic loading of disk and cloud storage system | |
CN109218386B (en) | High-availability method for managing Hadoop namespace | |
CN116841459A (en) | Single-copy high-availability method of distributed file system | |
CN114089923A (en) | Double-live storage system and data processing method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |