CN111488127B - Data parallel storage method and device based on disk cluster and data reading method - Google Patents

Data parallel storage method and device based on disk cluster and data reading method Download PDF

Info

Publication number
CN111488127B
CN111488127B CN202010300208.0A CN202010300208A CN111488127B CN 111488127 B CN111488127 B CN 111488127B CN 202010300208 A CN202010300208 A CN 202010300208A CN 111488127 B CN111488127 B CN 111488127B
Authority
CN
China
Prior art keywords
disk
data
metadata
cluster
stored
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010300208.0A
Other languages
Chinese (zh)
Other versions
CN111488127A (en
Inventor
邸忠辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Inspur Intelligent Technology Co Ltd
Original Assignee
Suzhou Inspur Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Inspur Intelligent Technology Co Ltd filed Critical Suzhou Inspur Intelligent Technology Co Ltd
Priority to CN202010300208.0A priority Critical patent/CN111488127B/en
Publication of CN111488127A publication Critical patent/CN111488127A/en
Application granted granted Critical
Publication of CN111488127B publication Critical patent/CN111488127B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0653Monitoring storage devices or systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/0671In-line storage system
    • G06F3/0683Plurality of storage devices
    • G06F3/0689Disk arrays, e.g. RAID, JBOD

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a device and equipment for parallel storage of data based on disk clusters and a computer readable storage medium, wherein the method comprises the following steps: generating metadata of user data to be stored, performing random hash calculation according to a metadata serial number, and selecting a target disk for storing the user data in a disk cluster according to a hash value; judging whether the residual storage capacity of the target disk is greater than or equal to a preset capacity threshold value or not; if the residual storage capacity of the target disk is larger than or equal to the preset capacity threshold, storing the user data into a data area of the target disk, and storing the metadata of the user data into a disk member metadata area of the target disk; and updating the data volume and the residual storage capacity of the target disk stored in the cluster metadata area of each disk in the disk cluster. The method, the device, the equipment and the computer readable storage medium provided by the invention improve the data storage performance. The invention also discloses a data reading method based on the disk cluster, which can improve the data reading efficiency.

Description

Data parallel storage method and device based on disk cluster and data reading method
Technical Field
The invention relates to the field of cloud computing data center computing, in particular to a disk cluster-based data parallel storage method, device and equipment and a computer-readable storage medium, and further relates to a disk cluster-based data reading method.
Background
In cloud computing data centers, data storage is a core concern for users. For data performance and security, redundant array of independent disks (raid) is often used to store the same data in different locations on multiple hard disks. The data verification provides a fault tolerance function, and the data throughput of the storage system is greatly improved by simultaneously storing and reading data on a plurality of disks. However, while providing redundant data check to improve data security, write amplification of data is increased, storage capacity of the storage system is reduced, and performance of the storage system is lowered.
Therefore, raid0 may be employed for non-critical data to improve performance and system capacity. However, because raid0 stores the same data in different disks according to stripes, when any hard disk fails, the whole system is damaged, and the reliability is very high and low; and raid itself is costly.
In recent years, JBOD (Just a Bunch Of Disks) has become popular, which is a storage device with multiple disk drives, also called "Span" that logically connects several physical Disks in series one after another to provide a large logical disk. The data on the Span is stored from the first disk simply, and when the storage space of the first disk is used up, the data are stored from the following disks in turn. Span access performance is exactly equivalent to performance on a single disk. Span also does not provide data security. It is simply a way to provide a use of disk space, and the storage capacity of a Span is equal to the sum of the capacities of all the disks that make up the Span. JBOD has the advantage of low cost and does not lose all data if a single disk fails, and has the disadvantage of lower performance due to sequential storage of multiple disks.
From the above, it can be seen that how to improve the data storage performance is a problem to be solved at present.
Disclosure of Invention
The invention aims to provide a disk cluster-based data parallel storage method, a disk cluster-based data parallel storage device, a disk cluster-based data parallel storage equipment and a computer readable storage medium, which solve the problem of low data storage performance caused by sequential storage of a plurality of disks in the prior art. The invention also provides a data reading method based on the disk cluster.
In order to solve the above technical problem, the present invention provides a data parallel storage method based on disk clusters, which includes:
generating metadata of user data to be stored, and performing random hash calculation according to the serial number of the metadata to obtain a hash value;
according to the hash value, selecting a target disk for storing the user data to be stored from a plurality of disks of a disk cluster; each disk in the disk cluster comprises a data area for storing storage data and a metadata area for storing metadata describing the storage data; the metadata area of each disk comprises a disk member metadata area for storing metadata of data stored on the disk, a disk cluster metadata area for storing the number of disks of the whole disk cluster, and data capacity, data volume and residual storage capacity of each disk;
judging whether the residual storage capacity of the target disk is greater than or equal to a preset capacity threshold value or not; if the residual storage capacity of the target disk is larger than or equal to the preset capacity threshold, storing the user data to be stored into a data area of the target disk, and storing the metadata of the user data to be stored into a disk member metadata area of the target disk;
and updating the data volume and the residual storage capacity of the target disk stored in the cluster metadata area of each disk in the disk cluster.
Preferably, the determining whether the remaining storage capacity of the target disk is greater than or equal to a preset capacity threshold includes:
if the residual storage capacity of the target disk is smaller than the preset capacity threshold, judging whether a disk with the residual storage capacity larger than or equal to the preset capacity threshold exists in the disk cluster;
if the disk with the residual storage capacity larger than or equal to the preset capacity threshold exists in the disk cluster, storing the user data to be stored into an alternative disk with the maximum current residual storage capacity in the disk cluster;
storing the metadata of the user data to be stored into a disk cluster metadata area of a first disk in the disk cluster;
and updating the metadata in the cluster metadata areas of other disks of the disk cluster according to the metadata in the cluster metadata area of the first disk at preset time intervals, so that the same cluster metadata copy is stored in the cluster metadata area of each disk of the disk cluster.
Preferably, the determining whether there is a disk with a remaining storage capacity greater than or equal to the preset capacity threshold in the disk cluster includes:
and if the disk with the residual storage capacity larger than or equal to the preset capacity threshold does not exist in the disk cluster, returning a message that the storage space of the disk cluster is insufficient.
Preferably, the metadata of the user data to be stored is generated, and random hash calculation is performed according to the serial number of the metadata to obtain a hash value;
generating metadata of the file to be stored according to system characteristic data of the file to be stored; the system characteristic data of the file to be stored comprises a file name, a file size, the total number of data blocks used by the file, the size of the data blocks, the storage position of the data blocks, a file type, an access authority, and creation and access time;
and carrying out random hash calculation according to the file number of the file to be stored to generate a hash value.
The invention also provides a data reading method based on the disk cluster, which comprises the following steps:
acquiring metadata of user data to be read, and performing random hash calculation according to a serial number of the metadata of the user data to be read to obtain a hash value;
and determining the disk of the data to be read in the disk cluster according to the hash value, and acquiring the disk position of the user data to be read according to the metadata of the user data to be read so as to read the data to be read.
Preferably, the determining, according to the hash value, the disk and disk position where the data to be read is located in the disk cluster so as to read the data to be read includes:
determining a target disk of the user data to be read in the disk cluster according to the hash value;
judging whether the user data to be read can be found in the data area of the target disk;
and if the user data to be read can be found in the data area of the target disk, reading the user data to be read.
Preferably, the determining whether the user data to be read can be found in the data area of the target disk includes:
if the user data to be read is not found in the data area of the target disk, determining an actual disk where the user data to be read is located according to metadata in a disk cluster metadata area of the target disk metadata area;
and reading the user data to be read in the data area of the actual disk.
The invention also provides a data parallel storage device based on the disk cluster, which comprises:
the hash calculation module is used for generating metadata of user data to be stored and carrying out random hash calculation according to the serial number of the metadata to obtain a hash value;
the selecting module is used for selecting a target disk for storing the user data to be stored from a plurality of disks of the disk cluster according to the hash value; each disk in the disk cluster comprises a data area for storing storage data and a metadata area for storing metadata describing the storage data; the metadata area of each disk comprises a disk member metadata area for storing metadata of data stored on the disk, a disk cluster metadata area for storing the number of disks of the whole disk cluster, and data capacity, data volume and residual storage capacity of each disk;
the judging module is used for judging whether the residual storage capacity of the target disk is greater than or equal to a preset capacity threshold value or not;
the storage module is used for storing the user data to be stored into a data area of the target disk and storing the metadata of the user data to be stored into a disk member metadata area of the target disk if the residual storage capacity of the target disk is greater than or equal to the preset capacity threshold;
and the updating module is used for updating the data volume and the residual storage capacity of the target disk stored in the disk cluster metadata area of each disk in the disk cluster.
The invention also provides a data parallel storage device based on the disk cluster, which comprises:
a memory for storing a computer program; and the processor is used for realizing the steps of the data parallel storage method based on the disk cluster when executing the computer program.
The invention further provides a computer readable storage medium, which stores a computer program, and the computer program realizes the steps of the data parallel storage method based on the disk cluster when being executed by a processor.
The data parallel storage method based on the disk cluster can store different user data to different disks in the disk cluster in parallel. The invention divides the disks according to groups, and a group of specified disks are members of the current disk cluster. Each disk in the disk cluster is divided into a data area and a metadata area, the data area is used for storing storage data, and the metadata area is used for storing metadata describing the storage data; the metadata area is divided into a disk member metadata area and a disk cluster metadata area, the disk member metadata area is used for storing metadata of data stored in the disk, and the disk cluster metadata area is used for storing the number of disks of the whole disk cluster, the data capacity of each disk, the data volume and the residual storage capacity. And the cluster metadata area of each disk in the cluster stores the same cluster metadata copy. When user data needs to be stored, firstly, a metadata structure of the user data to be stored is distributed and generated, random hash calculation is carried out according to a metadata serial number to obtain a hash value, and a target disk for storing the user data to be stored is determined in a plurality of disks of the disk cluster according to the hash value. And secondly, judging whether the residual storage capacity of the target disk is greater than or equal to a preset capacity threshold value. If the residual storage capacity of the target disk is larger than or equal to the preset capacity threshold value, which indicates that more storage space still exists in the target disk, storing the user data to be stored in a data area of the target disk, and storing the metadata of the user data to be stored in a disk member metadata area of the target disk. And after the storage of the user data to be stored is finished, updating the data volume and the residual storage capacity of the target disk stored in the disk metadata area of each disk in the disk cluster. The method provided by the invention can respectively carry out random hash calculation on the metadata serial numbers of different user data when storing different user data, thereby randomly storing different user data to different disks in parallel. The invention can not cause the phenomenon that all data is lost due to the failure of a single disk, greatly improves the performance of non-critical data storage, and reduces the cost of the non-critical data storage.
Accordingly, the invention also provides a data parallel storage device, equipment and computer readable storage medium based on disk clusters, which have the technical effects.
The data reading method based on the disk cluster can improve the data reading efficiency.
Drawings
In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the embodiments or technical solutions of the present invention will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flowchart of a first embodiment of a method for parallel storage of data based on disk clusters according to the present invention;
FIG. 2 is a schematic diagram of a structure of a disk cluster provided by the present invention;
FIG. 3 is a schematic diagram of a data area and a metadata area of each disk in a cluster provided by the present invention;
FIG. 4 is a flowchart illustrating a second embodiment of a method for parallel storage of data based on disk clusters according to the present invention;
FIG. 5 is a flowchart of a first embodiment of a method for reading data from a disk cluster according to the present invention;
FIG. 6 is a flowchart illustrating a second embodiment of a data reading method based on disk clusters according to the present invention;
fig. 7 is a block diagram of a data parallel storage device based on a disk cluster according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a data parallel storage method, a device, equipment and a computer readable storage medium based on a disk cluster, which improve the data storage performance and reduce the data storage cost. The invention also provides a data reading method based on the disk cluster, and the data reading efficiency is improved.
In order that those skilled in the art will better understand the disclosure, reference will now be made in detail to the embodiments of the disclosure as illustrated in the accompanying drawings. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be obtained by a person skilled in the art without making any creative effort based on the embodiments in the present invention, belong to the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a first embodiment of a method for parallel storing data based on disk clusters according to the present invention; the specific operation steps are as follows:
step S101: generating metadata of user data to be stored, and performing random hash calculation according to the serial number of the metadata to obtain a hash value;
when user data needs to be stored, firstly, a metadata structure of the user data is distributed and generated, then, random hash calculation is carried out according to a metadata serial number of the user data, and the user data to be stored is stored in a corresponding disk according to a hash value.
It should be noted that, in this embodiment, the user data to be stored may be a file to be stored or a data block to be stored. When a file to be stored needs to be stored, firstly generating metadata of the file to be stored according to system characteristic data of the file to be stored, and then performing random hash calculation according to a file number of the file to be stored to generate a hash value; the system characteristic data of the file to be stored comprises a file name, a file size, the total number of data blocks used by the file, a data block size, a data block storage position, a file type, an access authority, creation and access time and the like.
Step S102: according to the hash value, selecting a target disk for storing the user data to be stored from a plurality of disks of a disk cluster; each disk in the disk cluster comprises a data area for storing storage data and a metadata area for storing metadata describing the storage data; the metadata area of each disk comprises a disk member metadata area for storing metadata of data stored on the disk, a disk cluster metadata area for storing the number of disks of the whole disk cluster, and data capacity, data volume and residual storage capacity of each disk;
the invention divides the disks according to groups, a group of specified disks are taken as the members of the disk cluster, and the structure of the disk cluster is shown in figure 2; it should be noted that, in the present invention, the number of disk members in a disk cluster is not limited. As shown in fig. 3, each disk in the disk cluster is divided into a metadata area and a data area, and the metadata area is further divided into a disk member metadata area and a disk cluster metadata area; the data area is used for storing storage data, and the metadata area is used for storing metadata describing the storage data; the disk member metadata area is used for storing metadata of data stored in the disk, and the disk cluster metadata area is used for storing metadata of the number of disks of the whole disk cluster, the data capacity, the data volume, the residual storage capacity and the like of each disk.
The storage capacity of each disk in the disk cluster may be the same or different.
Step S103: judging whether the residual storage capacity of the target disk is greater than or equal to a preset capacity threshold value or not;
step S104: if the residual storage capacity of the target disk is larger than or equal to the preset capacity threshold, storing the user data to be stored into a data area of the target disk, and storing the metadata of the user data to be stored into a disk member metadata area of the target disk;
step S105: and updating the data volume and the residual storage capacity of the target disk stored in the cluster metadata area of each disk in the disk cluster.
When updating the metadata in the cluster metadata area of each disk in the cluster, according to the disk member serial number, the metadata in the cluster metadata area of the first disk in the cluster can be updated first, and then the metadata in the cluster metadata areas of other disks in the cluster can be updated at preset time intervals; or after updating the metadata in the cluster metadata area of the first disk, updating the metadata in the cluster metadata areas of other disks in real time, so as to ensure that the cluster metadata areas of all disks in the cluster store the same metadata sample.
It should be noted that, in this embodiment, the cluster metadata of other disks in the cluster, such as the second disk and the third disk, may also be updated first. The disk for updating the cluster metadata first may also be set according to other performances of each disk member.
The data parallel storage method provided by the embodiment can respectively perform random hash calculation on the metadata serial numbers of different user data when storing different user data, so that different user data can be stored on different disks in parallel, thereby greatly improving the performance of a non-critical data storage system and reducing the cost of the non-critical data storage system.
Based on the above embodiment, in this implementation, when it is determined that the remaining storage capacity of the target disk for storing the user data is smaller than the preset capacity threshold according to the hash value, the spare disk is selected from the other disk members of the disk cluster to store the user data.
Referring to fig. 4, fig. 4 is a flowchart illustrating a second embodiment of a method for parallel storing data based on disk clusters according to the present invention; the specific operation steps are as follows:
step S401: generating metadata of user data to be stored, and performing random hash calculation according to the serial number of the metadata to obtain a hash value;
step S402: according to the hash value, selecting a target disk for storing the user data to be stored from a plurality of disks of a disk cluster;
step S403: judging whether the residual storage capacity of the target disk is greater than or equal to a preset capacity threshold value or not;
step S404: if the residual storage capacity of the target disk is smaller than the preset capacity threshold, judging whether a disk with the residual storage capacity larger than or equal to the preset capacity threshold exists in the disk cluster;
step S405: if the disk with the residual storage capacity larger than or equal to the preset capacity threshold does not exist in the disk cluster, returning a message that the storage space of the disk cluster is insufficient;
step S406: if the disk with the residual storage capacity larger than or equal to the preset capacity threshold exists in the disk cluster, storing the user data to be stored into an alternative disk with the maximum current residual storage capacity in the disk cluster;
step S407: storing the metadata of the user data to be stored into a disk cluster metadata area of a first disk in the disk cluster;
step S408: and updating the metadata in the cluster metadata areas of other disks of the disk cluster according to the metadata in the cluster metadata area of the first disk at preset time intervals, so that the same cluster metadata copy is stored in the cluster metadata area of each disk of the disk cluster.
In this embodiment, when user data needs to be stored, a metadata structure of the user data is first allocated and generated, then random hash calculation is performed according to a metadata serial number of the user data, and the user data is randomly stored in a corresponding target disk according to a hash value. And if the data storage capacity of the target disk is exhausted, storing the user data to a disk with the largest residual capacity in the member disks of the disk cluster, and storing the metadata information of the user data to a disk cluster metadata area of a first disk of the disk cluster. And if all the member disks of the disk cluster have no residual capacity capable of storing the user data, returning a message that the storage space of the disk cluster is insufficient.
Referring to fig. 5, fig. 5 is a flowchart illustrating a data reading method based on disk clusters according to a first embodiment of the present invention; the specific operation steps are as follows:
step S501: acquiring metadata of user data to be read, and performing random hash calculation according to a serial number of the metadata of the user data to be read to obtain a hash value;
step S502: and determining the disk of the data to be read in the disk cluster according to the hash value, and acquiring the disk position of the user data to be read according to the metadata of the user data to be read so as to read the data to be read.
The disk position of the user data is stored and acquired by the metadata, so that the disk position of the user data to be read can be acquired according to the metadata.
When the user data in the disk cluster provided by the invention is read, the metadata of the user data to be read is firstly obtained, random hash calculation is carried out according to the metadata serial number, the corresponding disk and the disk position are found according to the hash value obtained by calculation, and the data of the corresponding position are read.
Based on the above embodiment, in this embodiment, when reading user data in a disk cluster, metadata of the user data is obtained, after a target disk where the user data is stored is determined according to the metadata, the user data is first searched in the target disk, and if the user data is not found, an actual disk and a disk position where the user data is stored are searched in a disk cluster metadata area of the target disk, so as to read the user data.
Referring to fig. 6, fig. 6 is a flowchart illustrating a data reading method based on disk clusters according to a second embodiment of the present invention; the specific operation steps are as follows:
step S601: acquiring metadata of user data to be read, and performing random hash calculation according to a serial number of the metadata of the user data to be read to obtain a hash value;
step S602: determining a target disk where the user data to be read is located in the disk cluster according to the hash value;
step S603: judging whether the user data to be read can be found in the data area of the target disk;
whether user data (or files) are stored on a disk is determined by querying whether the metadata of the user data (or files) is on the disk.
Step S604: if the user data to be read can be found in the data area of the target disk, reading the user data to be read;
step S605: if the user data to be read is not found in the data area of the target disk, determining an actual disk where the user data to be read is located according to metadata in a disk cluster metadata area of the target disk metadata area;
it should be noted that, in other embodiments of the present invention, if the user data to be read is not found in the data area of the target disk, the actual disk where the user data to be read is located may be found in the metadata of the disk cluster metadata area of any one disk member of the disk cluster.
Step S606: and reading the user data to be read in the data area of the actual disk.
The method provided by the embodiment of the invention can efficiently and quickly read the user data in the disk cluster.
Referring to fig. 7, fig. 7 is a block diagram illustrating a structure of a data parallel storage device based on disk clusters according to an embodiment of the present invention; the specific device may include:
the hash calculation module 100 is configured to generate metadata of user data to be stored, and perform random hash calculation according to a serial number of the metadata to obtain a hash value;
a selecting module 200, configured to select, according to the hash value, a target disk for storing the user data to be stored from multiple disks of a disk cluster; each disk in the disk cluster comprises a data area for storing storage data and a metadata area for storing metadata describing the storage data; the metadata area of each disk comprises a disk member metadata area for storing metadata of data stored in the disk, a disk cluster metadata area for storing the number of disks of the whole disk cluster, and data capacity, data volume and residual storage capacity of each disk;
a determining module 300, configured to determine whether a remaining storage capacity of the target disk is greater than or equal to a preset capacity threshold;
a storage module 400, configured to store the to-be-stored user data in a data area of the target disk and store metadata of the to-be-stored user data in a disk member metadata area of the target disk if the remaining storage capacity of the target disk is greater than or equal to the preset capacity threshold;
an updating module 500, configured to update the data size and the remaining storage capacity of the target disk stored in the cluster metadata area of each disk in the cluster.
The data parallel storage device based on a disk cluster in this embodiment is used to implement the foregoing data parallel storage method based on a disk cluster, and therefore specific embodiments in the data parallel storage device based on a disk cluster can be seen in the foregoing embodiments of the data parallel storage method based on a disk cluster, for example, the hash calculation module 100, the selection module 200, the judgment module 300, the storage module 400, and the update module 500 are respectively used to implement steps S101, S102, S103, S104, and S105 in the data parallel storage method based on a disk cluster, so that the specific embodiments thereof may refer to descriptions of corresponding respective partial embodiments, and are not described herein again.
The embodiment of the present invention further provides a data parallel storage device based on a disk cluster, including: a memory for storing a computer program; and the processor is used for realizing the steps of the data parallel storage method based on the disk cluster when executing the computer program.
The specific embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the data parallel storage method based on disk clusters are implemented.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The present invention provides a method, an apparatus, a device, a computer readable storage medium for parallel data storage based on disk clusters, and a method for reading data based on disk clusters. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (8)

1. A data parallel storage method based on disk clusters is characterized by comprising the following steps:
generating metadata of user data to be stored, and performing random hash calculation according to the serial number of the metadata to obtain a hash value;
according to the hash value, selecting a target disk for storing the user data to be stored from a plurality of disks of a disk cluster; each disk in the disk cluster comprises a data area for storing storage data and a metadata area for storing metadata describing the storage data; the metadata area of each disk comprises a disk member metadata area for storing metadata of data stored on the disk, a disk cluster metadata area for storing the number of disks of the whole disk cluster, and data capacity, data volume and residual storage capacity of each disk;
judging whether the residual storage capacity of the target disk is greater than or equal to a preset capacity threshold value or not;
if the residual storage capacity of the target disk is larger than or equal to the preset capacity threshold, storing the user data to be stored into a data area of the target disk, and storing the metadata of the user data to be stored into a disk member metadata area of the target disk;
and updating the data volume and the residual storage capacity of the target disk stored in the cluster metadata area of each disk in the disk cluster.
2. The method of claim 1, wherein the determining whether the remaining storage capacity of the target disk is greater than or equal to a preset capacity threshold comprises:
if the residual storage capacity of the target disk is smaller than the preset capacity threshold, judging whether a disk with the residual storage capacity larger than or equal to the preset capacity threshold exists in the disk cluster;
if the disk with the residual storage capacity larger than or equal to the preset capacity threshold exists in the disk cluster, storing the user data to be stored into an alternative disk with the maximum current residual storage capacity in the disk cluster;
storing the metadata of the user data to be stored into a disk cluster metadata area of a first disk in the disk cluster;
and updating the metadata in the cluster metadata areas of other disks of the disk cluster according to the metadata in the cluster metadata area of the first disk at preset time intervals, so that the same cluster metadata copy is stored in the cluster metadata area of each disk of the disk cluster.
3. The method of claim 2, wherein the determining whether there are disks in the disk cluster with a remaining storage capacity greater than or equal to the preset capacity threshold comprises:
and if the disk with the residual storage capacity larger than or equal to the preset capacity threshold does not exist in the disk cluster, returning a message that the storage space of the disk cluster is insufficient.
4. The method of claim 1, wherein the user data to be stored is a file to be stored, the generating metadata of the user data to be stored, and performing random hash calculation according to a sequence number of the metadata to obtain a hash value comprises:
generating metadata of the file to be stored according to the system characteristic data of the file to be stored; the system characteristic data of the file to be stored comprises a file name, a file size, the total number of data blocks used by the file, the size of the data blocks, the storage position of the data blocks, a file type, an access authority, and creation and access time;
and carrying out random hash calculation according to the file number of the file to be stored to generate a hash value.
5. A data reading method based on disk clusters is characterized by comprising the following steps:
acquiring metadata of user data to be read, and performing random hash calculation according to a serial number of the metadata of the user data to be read to obtain a hash value;
determining a disk of the data to be read in a disk cluster according to the hash value, and acquiring a disk position of the user data to be read according to metadata of the user data to be read so as to read the data to be read; each disk in the disk cluster comprises a data area for storing storage data and a metadata area for storing metadata describing the storage data; the metadata area of each disk comprises a disk member metadata area for storing metadata of data stored on the disk, a disk cluster metadata area for storing the number of disks of the whole disk cluster, and data capacity, data volume and residual storage capacity of each disk;
determining the disk and disk position of the data to be read in the disk cluster according to the hash value so as to read the data to be read, including:
determining a target disk where the user data to be read is located in the disk cluster according to the hash value;
judging whether the user data to be read can be found in the data area of the target disk;
if the user data to be read can be found in the data area of the target disk, reading the user data to be read;
if the user data to be read is not found in the data area of the target disk, determining an actual disk where the user data to be read is located according to metadata in a disk cluster metadata area of the target disk metadata area;
and reading the user data to be read in the data area of the actual disk.
6. A cluster-based data parallel storage device, comprising:
the hash calculation module is used for generating metadata of user data to be stored and carrying out random hash calculation according to the serial number of the metadata to obtain a hash value;
the selecting module is used for selecting a target disk for storing the user data to be stored from a plurality of disks of the disk cluster according to the hash value; each disk in the disk cluster comprises a data area for storing storage data and a metadata area for storing metadata describing the storage data; the metadata area of each disk comprises a disk member metadata area for storing metadata of data stored on the disk, a disk cluster metadata area for storing the number of disks of the whole disk cluster, and data capacity, data volume and residual storage capacity of each disk;
the judging module is used for judging whether the residual storage capacity of the target disk is greater than or equal to a preset capacity threshold value or not;
the storage module is used for storing the user data to be stored into a data area of the target disk and storing the metadata of the user data to be stored into a disk member metadata area of the target disk if the residual storage capacity of the target disk is greater than or equal to the preset capacity threshold;
and the updating module is used for updating the data volume and the residual storage capacity of the target disk stored in the disk cluster metadata area of each disk in the disk cluster.
7. A cluster-based data parallel storage device, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the method for cluster-based parallel storage of data according to any one of claims 1 to 4 when executing the computer program.
8. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of a method for cluster-based parallel storage of data according to any one of claims 1 to 4.
CN202010300208.0A 2020-04-16 2020-04-16 Data parallel storage method and device based on disk cluster and data reading method Active CN111488127B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010300208.0A CN111488127B (en) 2020-04-16 2020-04-16 Data parallel storage method and device based on disk cluster and data reading method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010300208.0A CN111488127B (en) 2020-04-16 2020-04-16 Data parallel storage method and device based on disk cluster and data reading method

Publications (2)

Publication Number Publication Date
CN111488127A CN111488127A (en) 2020-08-04
CN111488127B true CN111488127B (en) 2023-01-10

Family

ID=71794975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010300208.0A Active CN111488127B (en) 2020-04-16 2020-04-16 Data parallel storage method and device based on disk cluster and data reading method

Country Status (1)

Country Link
CN (1) CN111488127B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112685420A (en) * 2020-12-31 2021-04-20 北京存金所贵金属有限公司 Method, device, scheduling controller and system for expanding block chain data
CN114301931B (en) * 2022-03-11 2022-07-08 上海凯翔信息科技有限公司 Data synchronization system based on cloud NAS
CN115390752B (en) * 2022-08-10 2023-04-18 中科豪联(杭州)技术有限公司 Multi-disk cache file management method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530066A (en) * 2013-09-16 2014-01-22 华为技术有限公司 Data storage method, device and system
CN106527960A (en) * 2015-09-14 2017-03-22 中兴通讯股份有限公司 Management method for multi-memory disk loads, device, document system and memory network system
CN109445712A (en) * 2018-11-09 2019-03-08 浪潮电子信息产业股份有限公司 A kind of command processing method, system, equipment and computer readable storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103530066A (en) * 2013-09-16 2014-01-22 华为技术有限公司 Data storage method, device and system
CN106527960A (en) * 2015-09-14 2017-03-22 中兴通讯股份有限公司 Management method for multi-memory disk loads, device, document system and memory network system
CN109445712A (en) * 2018-11-09 2019-03-08 浪潮电子信息产业股份有限公司 A kind of command processing method, system, equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN111488127A (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN111488127B (en) Data parallel storage method and device based on disk cluster and data reading method
CN107807794B (en) Data storage method and device
US7886111B2 (en) System and method for raid management, reallocation, and restriping
US20080201335A1 (en) Method and Apparatus for Storing Data in a Peer to Peer Network
US20060156059A1 (en) Method and apparatus for reconstructing data in object-based storage arrays
CN103608784B (en) Method for creating network volumes, data storage method, storage device and storage system
CN109582213B (en) Data reconstruction method and device and data storage system
CN106293492B (en) Storage management method and distributed file system
WO2016137402A1 (en) Data stripping, allocation and reconstruction
US10310752B1 (en) Extent selection with mapped raid
CN108632029A (en) key value solid state drive
CN113986149B (en) System fault processing method, device, equipment and storage medium
CN107729536A (en) A kind of date storage method and device
CN108121497B (en) Storage method and storage system
CN104216664A (en) Network volume creating method, data storage method, storage equipment and storage system
KR101963629B1 (en) Memory management system and method thereof
CN116339644B (en) Method, device, equipment and medium for creating redundant array of independent disk
CN104598171B (en) Array method for reconstructing and device based on metadata
CN110658994A (en) Data processing method and device based on HDD (hard disk drive) and SSD (solid State disk) hybrid disk array
CN107526692B (en) Analysis system for managing information storage table and control method thereof
US11860746B2 (en) Resilient data storage system with efficient space management
CN113157715B (en) Erasure code data center rack collaborative updating method
US20220066658A1 (en) Raid member distribution for granular disk array growth
CN107247564B (en) Data processing method and system
CN113805811A (en) Method, system, equipment and storage medium for optimizing read-write access file

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant