CN106527960B - Multi-storage-disk load management method and device, file system and storage network system - Google Patents

Multi-storage-disk load management method and device, file system and storage network system Download PDF

Info

Publication number
CN106527960B
CN106527960B CN201510582124.XA CN201510582124A CN106527960B CN 106527960 B CN106527960 B CN 106527960B CN 201510582124 A CN201510582124 A CN 201510582124A CN 106527960 B CN106527960 B CN 106527960B
Authority
CN
China
Prior art keywords
storage
disk
disks
file
file access
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510582124.XA
Other languages
Chinese (zh)
Other versions
CN106527960A (en
Inventor
张斌
陈颖川
张宇
王井贵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
ZTE Corp
Original Assignee
ZTE Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ZTE Corp filed Critical ZTE Corp
Priority to CN201510582124.XA priority Critical patent/CN106527960B/en
Priority to PCT/CN2016/098071 priority patent/WO2017045545A1/en
Publication of CN106527960A publication Critical patent/CN106527960A/en
Application granted granted Critical
Publication of CN106527960B publication Critical patent/CN106527960B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Abstract

The invention discloses a multi-storage-disk load management method, a multi-storage-disk load management device, a multi-storage-disk load management file system and a storage network system. The invention also adopts the hash algorithm to realize a multi-disk load balancing mechanism, can evenly distribute mass files on the multi-disk without metadata, has simple and efficient system structure, low requirement on hardware (mainly a memory), no metadata and single point failure caused by metadata damage, and can improve the safety of system storage.

Description

Multi-storage-disk load management method and device, file system and storage network system
Technical Field
The invention relates to the field of communication, in particular to a multi-storage-disk load management method, a multi-storage-disk load management device, a multi-storage-disk load management file system and a storage network system.
Background
With the improvement of hardware design and manufacturing process, more storage disks (mechanical hard disks or solid state hard disks, hereinafter collectively referred to as "multi-disks") can be extended in the existing server product, and how to efficiently utilize the multi-disks to form a storage service system which can face "multi-disk load balancing, high concurrency and high throughput", people have made many designs and implementations. Currently, from the perspective of "multi-disk load balancing", most of the conventional methods provide a metadata area (i.e. metadata), and complete balanced access of multi-disk files in the metadata area, that is, the positions of the files on the multi-disk files are uniformly mapped in the metadata area, each file path search must pass through the metadata area, and after the physical position of the file is found in the metadata area, actual access operation is performed. In order to maintain the metadata, an additional metadata controller needs to be provided, the metadata controller consumes a large amount of CPU resources when the storage system is busy (the problems of CPU performance improvement and cost increase are brought), meanwhile, as the number of files increases sharply, the metadata area consumes a large amount of valuable physical memory (the problems of memory expansion and cost increase are brought), and even if the most compact and efficient data structure is used, the memory overhead of the metadata area still cannot be ignored. On the other hand, if the metadata area is damaged, or the metadata controller crashes, it means "crash" of the system.
Therefore, the existing mode of realizing multi-disk load balance through the metadata area has the problems of high cost and system breakdown caused by failure of the metadata area.
Disclosure of Invention
The invention provides a multi-storage-disk load management method and device, and aims to solve the problems that the existing mode of realizing multi-disk load balance through a metadata area has high cost and high cost, and a system is broken down due to failure of the metadata area.
In order to solve the above technical problem, the present invention provides a multi-storage disk load management method, including:
acquiring a storage disk list, wherein the storage disk list comprises identification marks of storage disks;
receiving a file access request, and acquiring file full-path information in the file access request;
and selecting one storage disk from the storage disks as a target storage disk accessed by the file access request by adopting a hash algorithm according to the file full path information and the identification marks of the storage disks.
In an embodiment of the present invention, selecting one of the storage disks as a target storage disk accessed by the file access request by using a hash algorithm according to the file full path information and the identification of each storage disk includes:
processing the identification marks of the storage disks through a hash algorithm to obtain storage medium factors of the storage disks;
processing the file full path information through a hash algorithm to obtain a file full path factor;
integrating the file full path factors and the storage medium factors of the storage disks to obtain integration factors corresponding to the storage disks;
and selecting one storage disk from the storage disks as a target storage disk accessed by the file access request according to the integration factor corresponding to each storage disk.
In an embodiment of the present invention, selecting one of the storage disks as a target storage disk accessed by the file access request according to the consolidation factor corresponding to each storage disk includes:
processing the integration factors corresponding to the storage disks through a hash algorithm to obtain selection factors corresponding to the storage disks;
and selecting the storage disk corresponding to the selection factor with the maximum value as the target storage disk.
In an embodiment of the present invention, integrating the file full path factor and the storage medium factor of each storage disk includes: and performing exclusive OR processing on the file full path factor and the storage medium factor of each storage disk respectively to obtain an integration factor corresponding to each storage disk.
In an embodiment of the present invention, the method further comprises: and monitoring the working state of each storage disk, and replacing the storage disk with an abnormal state according to the monitoring result.
In an embodiment of the invention, the identifier is a physical location identifier of each storage disk.
In an embodiment of the present invention, the physical location identifier includes a frame number of a frame in which the storage disk is located and a slot number of a slot in which the storage disk is located.
In order to solve the above problem, the present invention further provides a multi-storage disk load management apparatus, including:
the multi-disk position management module is used for acquiring a storage disk list, and the storage disk list comprises identification marks of the storage disks;
the request receiving module is used for receiving a file access request containing file full path information;
and the multi-disk load storage management module is used for selecting one storage disk from the storage disks as a target storage disk accessed by the file access request by adopting a hash algorithm according to the file full path information in the file access request and the identification marks of the storage disks.
In an embodiment of the present invention, the multi-disk load storage management module includes a computation submodule, an integration submodule, and a selection submodule;
the computing submodule is used for processing the identification marks of the storage disks through a hash algorithm to obtain storage medium factors of the storage disks; the file full path information is processed through a hash algorithm to obtain a file full path factor;
the integration sub-module is used for integrating the file full path factor and the storage medium factor of each storage disk to obtain an integration factor corresponding to each storage disk;
and the selection submodule is used for selecting one storage disk from the storage disks as a target storage disk accessed by the file access request according to the integration factor corresponding to each storage disk.
In an embodiment of the present invention, the selecting sub-module selects one of the storage disks as a target storage disk accessed by the file access request according to the integration factor corresponding to each storage disk, including:
processing the integration factors corresponding to the storage disks through a hash algorithm to obtain selection factors corresponding to the storage disks;
and selecting the storage disk corresponding to the selection factor with the maximum value as the target storage disk.
In an embodiment of the present invention, the storage system further includes a status monitoring module, which monitors an operating status of each storage disk.
In an embodiment of the invention, the identifier is a physical location identifier of each storage disk.
In order to solve the above problem, the present invention further provides a distributed file system, which includes a file access client, a file access interface, a plurality of storage disks, and the multi-storage disk load management apparatus as described above;
the file access client sends a file access request to the multi-storage-disk load management device through the file access interface;
the multi-storage disk load management device receives the file access request and selects one storage disk from the plurality of storage disks as a target storage disk accessed by the file access request.
In order to solve the above problem, the present invention further provides a distributed storage network system, which includes a file access client, a file access interface, a plurality of storage nodes, and the multi-storage disk load management apparatus as described above; the storage node comprises a plurality of storage disks;
the file access client sends a file access request to the multi-storage-disk load management device through the file access interface;
and the multi-storage-disk load management device receives the file access request, selects one of the storage nodes as a target storage node according to the file access request, and selects one of the storage disks of the target storage node as a target storage disk accessed by the file access request.
The invention has the beneficial effects that:
the invention provides a multi-storage-disk load management method, a multi-storage-disk load management device, a multi-storage-disk load management file system and a storage network system. The invention also adopts the hash algorithm to realize a multi-disk load balancing mechanism, can evenly distribute mass files on the multi-disk without metadata, has simple and efficient system structure, low requirement on hardware (mainly a memory), no metadata and single point failure caused by metadata damage, and can improve the safety of system storage.
In addition, the invention can also monitor the state of each storage disk, replace the bad storage disk, guarantee the normal storage of the file; in the aspect of elastic expansion, only a storage disk needs to be added, and the capacity and the throughput of the whole system can be improved.
Drawings
Fig. 1 is a flowchart illustrating a multi-storage-disk load management method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a process for selecting a target storage disk by using a hash algorithm according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating a process of selecting a target storage disk according to an integration factor according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a multi-storage-disk load management apparatus according to a second embodiment of the present invention;
fig. 5 is a schematic structural diagram of a second load management apparatus for multiple storage disks according to a second embodiment of the present invention;
fig. 6 is a schematic structural diagram of a multi-storage-disk load management apparatus according to a second embodiment of the present invention;
fig. 7 is a schematic structural diagram of a distributed file system according to a third embodiment of the present invention;
fig. 8 is a mapping relationship between a storage disk and a mount point according to a third embodiment of the present invention;
fig. 9 is a flowchart illustrating a multi-storage-disk load management method according to a third embodiment of the present invention;
fig. 10 is a schematic structural diagram of a distributed storage network system according to a fourth embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following detailed description and accompanying drawings.
The first embodiment is as follows:
in the embodiment, the load of multiple storage disks is managed by adopting a hash algorithm, the system architecture is very simple, no additional metadata area is needed, massive file access services can be carried out as long as a server and the storage disks (namely, the storage disks comprise mechanical hard disks and/or solid state disks), and the deployment and implementation are very convenient. The method has high access performance, original metadata retrieval operation is evolved into hash calculation operation, and the physical position of file storage can be obtained by one-time quick three-row operation regardless of billions or billions of files; the states of the storage disks can be monitored, the bad storage disk can be replaced, and normal storage of files is guaranteed; meanwhile, the system has the characteristic of easy expansion, and the system capacity and the throughput can be linearly improved as long as a storage disk (a mechanical hard disk or a solid state hard disk) is added. The invention is explained in further detail below with specific examples:
the following description takes a file storage process as an example, and please refer to fig. 1, where the method for managing load of multiple storage disks includes:
step 101: acquiring a storage disk list, wherein the storage disk list comprises identification marks of storage disks;
step 102: receiving a file access request, wherein the file access request contains file full path information;
step 103: selecting one storage disk from the storage disks as a target storage disk accessed by the file access request by adopting a hash algorithm according to the file full path information and the identification marks of the storage disks;
step 104: and performing corresponding file access operation on the target storage disk. The file access request in this embodiment may be a file storage request or a file reading request; when the file storage request is a file storage request, writing operation of a corresponding file is carried out on the target storage disk; and when the request is a file reading request, reading the corresponding file on the target storage disk.
In step 103, selecting one target storage disk from the storage disks as a file access request access target storage disk by using a hash algorithm according to the file full path information and the identification identifier of each storage disk, as shown in fig. 2, the method includes:
step 201: processing the identification marks of the storage disks through a hash algorithm to obtain storage medium factors of the storage disks; the identifier may be mapped to a positive integer by a hash algorithm, although mapping to other forms is not excluded; as long as the equipartition distribution characteristic of the hash algorithm can be applied;
step 202: processing the file full path information through a hash algorithm to obtain a file full path factor; the identification identifier may also be mapped to a positive integer by a hash algorithm, although mapping to other forms is not excluded; as long as the equipartition distribution characteristic of the hash algorithm can be applied; the specific hash algorithm in the corresponding embodiment can also be flexibly selected as long as the above purpose can be achieved;
step 203: integrating the obtained file full path factors with the storage medium factors of the storage disks to obtain integration factors corresponding to the storage disks; i.e. how many storage disks have how many consolidation factors;
step 204: and selecting one storage disk from the storage disks as a target storage disk accessed by the file access request according to the integration factor corresponding to each storage disk.
The same algorithm may be used for the hashing algorithms in step 201 and step 202.
The integration processing in step 203 may specifically be performing exclusive or processing on the obtained file full path factor and the storage medium factor of each storage disk, respectively, to obtain an integration factor corresponding to each storage disk.
Referring to fig. 3, the specific process of step 204 includes:
step 301: processing the integration factors corresponding to the storage disks through a hash algorithm to obtain selection factors corresponding to the storage disks; the algorithm used in this step is the same as that in step 201 and step 202;
step 302: and selecting the storage disk corresponding to the selection factor with the maximum value as a target storage disk.
In this embodiment, the identification mark in the storage disk list is a physical location identification mark of each storage disk; the storage device may specifically include a storage server and/or a disk cluster (JBOD), where the storage server and the disk cluster include a plurality Of storage Disks, and the storage Disks may be solid state Disks or mechanical hard Disks. In this embodiment, an application program for file access, that is, a file access client, is further provided on the storage device.
In this embodiment, the storage server and the JBOD may be numbered, for example, the storage server is numbered as frame number 0, the first disk cluster is numbered as frame number 1, the second disk cluster is numbered as frame number 2, and so on, and the nth disk cluster is numbered as frame number N;
further, the distributed file system daemon processes number slots of each storage disk (mechanical hard disk or solid state disk) in the storage server, and number slots of each storage disk in the disk cluster;
in this embodiment, each storage disk on the storage device is provided with a uniform and unique physical location number, that is, "frame number + slot number", which is called a physical location identification of the storage disk; when the storage disk is started, all the storage disks on the storage server and the disk cluster are obtained, firstly, the storage disks are sorted according to the frame number where the storage disks are located, and then, the storage disks are sorted according to the slot numbers in the frame, so that a group of one-dimensional storage disk physical position identification identifier lists, namely a storage disk list, is formed:
frame number 0_ Slot number 0
Frame number 0_ Slot number 1
Frame number 0_ Slot number 2
Frame number 0_ Slot number N'
Frame number 1_ slot number 0
Frame number 1_ Slot number 1
Frame number 1_ slot number 2
Frame number 1_ slot number N "
Frame number 2_ Slot number 0
Frame number 2_ Slot number 1
Frame number 2_ Slot number 2
Frame number 2_ slot number N'
……
Frame number N _ Slot number 0
Frame number N _ Slot number 1
Frame number N _ Slot number 2
Frame number N _ slot number N'
The physical location identifier of each disk is then hashed (i.e., HASH) and mapped into a set of discrete and uniformly distributed positive integers called "storage medium factors". In this embodiment, the physical location identifier of each storage disk is used, and the "storage medium factor" calculated according to the physical location character string group is the same regardless of what storage medium is inserted into the physical location, that is, the "storage medium factor" in this embodiment is only related to each physical location, but is not related to the storage disk, so that the reliability can be further improved. In this embodiment, a disk number may be added, and a unique number may be assigned to each disk at the time of each disk number, for example, disk0001, disk0002, disk0003, … …, or disk 000N.
The physical location identification mark at this time is the frame number + slot number + storage disk number.
After obtaining the "storage medium factor" corresponding to each storage disk, the corresponding mount of the disk identifier (i.e. the block device file corresponding to Linux or other Unix-like, such as/dev/sda) corresponding to the storage disk and the physical location identifier of the storage disk may be further performed, for example:
/dev/sda /mnt/mydisks/01_00
/dev/sdb /mnt/mydisks/01_01
/dev/sdc /mnt/mydisks/01_02
/dev/sdd /mnt/mydisks/01_03
/dev/sde /mnt/mydisks/01_04
/dev/sdf /mnt/mydisks/01_05
/dev/sdg /mnt/mydisks/01_06
/dev/sdh /mnt/mydisks/01_07
/dev/sdi /mnt/mydisks/01_08
/dev/sdj /mnt/mydisks/01_09
/dev/sdk /mnt/mydisks/01_10
/dev/sdj /mnt/mydisks/01_11
the file full path information in this embodiment may include file type information + a plurality of storage directory paths + a file name; the file full path information can be mapped into a positive integer by adopting a hash algorithm.
After the balanced hashing algorithm of the embodiment is adopted to select the target storage disk for storage, when a user needs to read the file, the storage disk with the maximum value of the selection factor is still found according to the same method, and the storage disk is always the target storage disk during storage.
In this embodiment, in the above process, the working state of each storage disk may be monitored, and the storage disk with the abnormal state is removed according to the monitoring result and then replaced. When the files are removed, the files on the storage disk can be uniformly transferred to other storage disks, and the files can also be completely transferred to a new storage disk after replacement.
Currently, in the storage industry, Solid State Disks (SSDs) are becoming more and more mainstream, and in this embodiment, the SSDs and conventional mechanical hard disks may be separately grouped, that is, the SSDs form a group of solid state disk storage sublists, and the solid state disk storage sublist includes identification identifiers of the solid state disks, such as SSD _0001 and SSD _0002 … SSD _ 000N;
the conventional mechanical hard disk forms a set of mechanical hard disk storage sublists, such as disk _0001 and disk _0002 … disk _ 000N.
When the state monitoring is carried out, the two sub-lists can be respectively monitored in real time.
In the process of load management, for example, a user wants to store frequently accessed files (i.e., "hot" files) in the solid state disk storage sublist corresponding to the SSD according to a behavior requested by the user, hash calculation is performed only on the identifiers of the solid state disks in the solid state disk storage sublist, and the frequently accessed files (i.e., "hot" files) are mapped to the solid state disk storage sublist.
If a user wants to store a file with few accesses (i.e., "cold" file) into the mechanical hard disk storage sub-list corresponding to the conventional mechanical hard disk, hash calculation is performed only on the identification of each mechanical hard disk of the conventional mechanical hard disk storage sub-list, and the file with few accesses (i.e., "cold" file) is mapped into the mechanical hard disk storage sub-list. This may further enhance the satisfaction of the user experience.
Example two:
the present embodiment provides a multi-storage disk load management apparatus, please refer to fig. 4, including:
the multi-disk position management module 1 is used for acquiring a storage disk list, wherein the storage disk list comprises identification marks of storage disks;
the request receiving module 2 is used for receiving a file access request containing file full path information;
and the multi-disk load storage management module 3 is used for selecting one storage disk from the storage disks as a target storage disk accessed by the file access request by adopting a hash algorithm according to the file full path information in the file access request and the identification marks of the storage disks.
The multi-disk load storage management module 3 in this embodiment includes a calculation submodule 31, an integration submodule 32, and a selection submodule 33;
the calculation submodule 31 is configured to obtain a storage medium factor of each storage disk by processing the identification identifier of each storage disk through a hash algorithm; the file full path information is processed through a hash algorithm to obtain a file full path factor; the calculating submodule 31 may specifically map the identification and the file full path factor to positive integers through a hash algorithm, although mapping to other forms is not excluded; as long as the equipartition distribution characteristic of the hash algorithm can be applied.
The integration sub-module 32 is configured to integrate the file full path factor with the storage medium factor of each storage disk to obtain an integration factor corresponding to each storage disk;
the selecting submodule 33 is configured to select one storage disk from the storage disks as a target storage disk accessed by the file access request according to the integration factor corresponding to each storage disk, and includes:
processing the integration factors corresponding to the storage disks through a hash algorithm to obtain selection factors corresponding to the storage disks;
and selecting the storage disk corresponding to the selection factor with the maximum value as the target storage disk.
The same algorithm may be used for the calculation of the cubic hash algorithm in this embodiment.
In this embodiment, the identification mark in the storage disk list is a physical location identification mark of each storage disk; the storage device may specifically include a storage server and/or a disk cluster (JBOD), where the storage server and the disk cluster include a plurality Of storage Disks, and the storage Disks may be solid state Disks or mechanical hard Disks. In this embodiment, an application program for file access, that is, a file access client, is further provided on the storage device.
In this embodiment, the storage server and the JBOD may be numbered, for example, the storage server is numbered as frame number 0, the first disk cluster is numbered as frame number 1, the second disk cluster is numbered as frame number 2, and so on, and the nth disk cluster is numbered as frame number N;
further, the distributed file system daemon processes number slots of each storage disk (mechanical hard disk or solid state disk) in the storage server, and number slots of each storage disk in the disk cluster;
in this embodiment, each storage disk on the storage device is provided with a uniform and unique physical location number, that is, "frame number + slot number", which is called a physical location identification of the storage disk; when the storage disk is started, all the storage disks on the storage server and the disk cluster are obtained, and are firstly sorted according to the frame number where the storage disks are located and then sorted according to the slot number in the frame, so that a group of one-dimensional storage disk physical position identification identifier lists, namely storage disk lists, is formed. The calculation submodule 31 then performs HASH (i.e. HASH) calculation on the physical location identifiers of the storage disks, and maps the physical location identifiers of the storage disks into a set of discrete and uniformly distributed positive integers called "storage medium factors". In this embodiment, the physical location identifier of each storage disk is used, and the "storage medium factor" calculated according to the physical location character string group is the same regardless of what storage medium is inserted into the physical location, that is, the "storage medium factor" in this embodiment is only related to each physical location, but is not related to the storage disk, so that the reliability can be further improved. In this embodiment, a storage disk number may also be added, and a unique number is assigned to each storage disk at the time of numbering each storage disk, where the physical location identification mark at this time is defined by the frame number + _ slot number + storage disk number.
After the "storage medium factor" corresponding to each storage disk is obtained, the disk identifier corresponding to the storage disk (i.e. the block device file corresponding to Linux or other Unix-like, such as/dev/sda) and the physical location identifier of the storage disk may be further mounted correspondingly.
The file full path information in this embodiment may include file type information + a plurality of storage directory paths + a file name; the calculation submodule 31 may map the file full path information into a positive integer by using a hash algorithm.
After the balanced hashing algorithm of the embodiment is adopted to select the target storage disk for storage, when a user needs to read the file, the storage disk with the maximum value of the selection factor is still found according to the same method, and the storage disk is always the target storage disk during storage.
In this embodiment, in the above process, the working state of each storage disk may be monitored, and the storage disk with the abnormal state is removed according to the monitoring result and then replaced. When the files are removed, the files on the storage disk can be uniformly transferred to other storage disks, and the files can also be completely transferred to a new storage disk after replacement.
Referring to fig. 5, the multi-storage-disk load management apparatus in this embodiment may further include a status monitoring module 4, configured to monitor an operating status of each storage disk. And then the abnormal storage disk can be removed according to the monitoring result and then replaced. When the files are removed, the files on the storage disk can be uniformly transferred to other storage disks, and the files can also be completely transferred to a new storage disk after replacement.
Referring to fig. 6, the multi-storage-disk load management apparatus in this embodiment further includes a classification management module 5, configured to separately group the SSD and the conventional mechanical hard disk into groups, that is, the SSD forms a group of solid state disk storage sub-lists, where the solid state disk storage sub-lists include identification identifiers of the solid state disks, such as SSD _0001, SSD _0002 … SSD _ 000N;
the conventional mechanical hard disk forms a set of mechanical hard disk storage sublists, such as disk _0001 and disk _0002 … disk _ 000N.
When the state monitoring is carried out, the two sub-lists can be respectively monitored in real time.
In the process of load management, for example, a user wants to store frequently accessed files (i.e., "hot" files) in the solid state disk storage sublist corresponding to the SSD according to a behavior requested by the user, hash calculation is performed only on the identifiers of the solid state disks in the solid state disk storage sublist, and the frequently accessed files (i.e., "hot" files) are mapped to the solid state disk storage sublist.
If a user wants to store a file with few accesses (i.e., "cold" file) into the mechanical hard disk storage sub-list corresponding to the conventional mechanical hard disk, hash calculation is performed only on the identification of each mechanical hard disk of the conventional mechanical hard disk storage sub-list, and the file with few accesses (i.e., "cold" file) is mapped into the mechanical hard disk storage sub-list. This may further enhance the satisfaction of the user experience.
The hash algorithm adopted in the embodiment can support the hot plug mode to update in real time.
Example three:
the present embodiment provides a distributed file system, please refer to fig. 7, which includes a file access client 71, a file access interface 72, a plurality of storage disks 73, and a multi-storage disk load management apparatus 74 shown in the second embodiment; the file access client 71 may be implemented by various user programs, and the file access interface 72 may be implemented by a dynamic link library of a general purpose interface.
As shown in fig. 8, a mapping relationship between a plurality of storage disks 73 and mount points in the "distributed file system" in this embodiment is shown, where a storage server and a plurality of JBODs are involved, the storage server has a plurality of storage disks, the JBODs also have a plurality of storage disks, and the storage server and the JBODs are connected by using SAS ((Serial Attached SCSI), i.e., Serial Attached SCSI) cables. Each storage disk has a unique physical location identification, namely a frame number-slot number identification is used, and the physical location identification of the storage disk is used as a mount directory on an operating system. FIG. 8 shows a one-to-one mapping of all storage disks to mount points in the operating system; meanwhile, each storage disk has a unique "storage medium factor", and the calculation process is shown in example two.
The file access client 71 sends a file access request to the multi-storage disk load management device 74 through the file access interface 72; the multi-disk load manager 74 receives the file access request and selects one of the plurality of disks as a target disk to be accessed by the file access request. The following is a specific example of file storage, and please refer to fig. 9, which includes:
step 901: the file access client 71 calls the file access interface 72 to initiate a file access request and provides a "full path name of the file";
step 902: the multi-storage disk load management device 74 maps the "full pathname of the file" to a positive integer, which is called the "full pathfactor of the file";
step 903: the multi-disk load management device 74 provides a physical location and a list of available disk qualities, and obtains a "storage medium factor" for each disk;
step 904: the multi-storage disk load management device 74 merges each "storage medium factor" and "file full path factor" into an "consolidation factor" (how many "consolidation factors" there are);
step 905: the multi-storage disk load management device 74 calculates each "whole factor" to obtain a plurality of "selection factors" (how many "storage medium factors" there are, how many "consolidation factors", and thus, how many "selection factors");
step 906: the multi-storage-disk load management device 74 selects the "selection factor" with the maximum value, and finally maps the file to the storage disk with the maximum value of the "selection factor";
step 907: the multi-storage-disk load management device 74 completes the reading and writing operation of the file on the selected storage disk.
Example four:
the present embodiment provides a distributed storage network system, please refer to fig. 10, which includes a file access client 01, a file access interface 02, a plurality of storage nodes 03, and a multi-storage disk load management apparatus 04 shown in the second embodiment; the storage node 03 comprises a plurality of storage disks; that is, the plurality of storage disks in the third embodiment are used as one storage node, and the combination of the plurality of storage nodes constitutes the storage network system. In this embodiment, each storage node in the storage network may be numbered, for example, in the form of node1, node2, … …, node n; the above embodiments are adopted for the number and management method of a plurality of storage disks in each storage node. The specific control process is as follows:
the file access client 01 sends a file access request to the multi-storage-disk load management device 04 through the file access interface 02;
the multi-storage-disk load management device 04 receives the file access request, and selects one of the storage nodes as a target storage node nodeb according to the file access request, where the selection mode may also be a mode of selecting a target storage disk in the above embodiments, or may also be a mode of selecting another target storage disk to perform selection and determination; and then selects one of the plurality of storage disks of the target storage node nodeX as a target storage disk to be accessed by the file access request.
In this embodiment, the multi-disk load management apparatus 04 is used to complete a mechanism for selecting a storage node in the storage network, and further complete a multi-disk selection operation inside the storage node. The embodiment supports flexible expansion, a large-scale storage network can be constructed by expanding the storage nodes, the storage load of the whole storage network is evenly shared to each storage node, and the storage load is evenly shared to each disk of multiple disks in each storage node.
Compared with the prior art, the invention at least has the following advantages:
(1) the system architecture is very simple, an additional metadata controller is not needed, massive file access services can be carried out only by a server and a storage medium (a mechanical hard disk or a solid state disk), and the deployment and implementation are very convenient.
(2) The performance is high, the original metadata retrieval operation is evolved into the hash calculation operation, and the physical position of the file storage can be obtained by one-time quick three-level operation regardless of billions or billions of files
(3) The expansion is easy, and the system capacity and the throughput can be linearly improved as long as a storage medium (a mechanical hard disk or a solid state hard disk) is added.
The foregoing is a more detailed description of the present invention that is presented in conjunction with specific embodiments, and the practice of the invention is not to be considered limited to those descriptions. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (12)

1. A multi-storage disk load management method, comprising:
acquiring a storage disk list, wherein the storage disk list comprises identification marks of storage disks;
receiving a file access request, and acquiring file full-path information in the file access request;
processing the identification marks of the storage disks through a hash algorithm to obtain storage medium factors of the storage disks;
processing the file full path information through a hash algorithm to obtain a file full path factor;
integrating the file full path factors and the storage medium factors of the storage disks to obtain integration factors corresponding to the storage disks;
and selecting one storage disk from the storage disks as a target storage disk accessed by the file access request according to the integration factor corresponding to each storage disk.
2. The method for load management of multiple storage disks according to claim 1, wherein selecting one of the storage disks as a target storage disk accessed by the file access request according to the consolidation factor corresponding to each storage disk comprises:
processing the integration factors corresponding to the storage disks through a hash algorithm to obtain selection factors corresponding to the storage disks;
and selecting the storage disk corresponding to the selection factor with the maximum value as the target storage disk.
3. The multi-storage-disk load management method according to claim 2, wherein integrating the file full path factor with the storage medium factor of each storage disk comprises: and performing exclusive OR processing on the file full path factor and the storage medium factor of each storage disk respectively to obtain an integration factor corresponding to each storage disk.
4. A multi-storage disk load management method according to any one of claims 1 to 3, further comprising: and monitoring the working state of each storage disk, and replacing the storage disk with an abnormal state according to the monitoring result.
5. A multi-disk load management method according to any one of claims 1 to 3, wherein said identification mark is a physical location identification mark of each disk.
6. The method for load management of multiple storage disks according to claim 5, wherein the physical location identifier comprises a frame number of a frame in which the storage disk is located and a slot number of a slot in which the storage disk is located.
7. A multi-storage disk load management apparatus, comprising:
the multi-disk position management module is used for acquiring a storage disk list, and the storage disk list comprises identification marks of the storage disks;
the request receiving module is used for receiving a file access request containing file full path information;
the multi-disk load storage management module is used for selecting one of the storage disks as a target storage disk accessed by the file access request by adopting a hash algorithm according to the file full path information in the file access request and the identification marks of the storage disks;
the multi-disk load storage management module comprises a calculation submodule, an integration submodule and a selection submodule;
the computing submodule is used for processing the identification marks of the storage disks through a hash algorithm to obtain storage medium factors of the storage disks; the file full path information is processed through a hash algorithm to obtain a file full path factor;
the integration sub-module is used for integrating the file full path factor and the storage medium factor of each storage disk to obtain an integration factor corresponding to each storage disk;
and the selection submodule is used for selecting one storage disk from the storage disks as a target storage disk accessed by the file access request according to the integration factor corresponding to each storage disk.
8. The multi-disk load management apparatus according to claim 7, wherein the selecting sub-module selects one of the disks as a target disk to be accessed by the file access request according to the consolidation factor corresponding to each of the disks, and includes:
processing the integration factors corresponding to the storage disks through a hash algorithm to obtain selection factors corresponding to the storage disks;
and selecting the storage disk corresponding to the selection factor with the maximum value as the target storage disk.
9. The multi-disk load management apparatus according to claim 7 or 8, further comprising a status monitoring module for monitoring an operating status of each of the disks.
10. The multi-disk load management apparatus according to claim 7 or 8, wherein the identifier is a physical location identifier of each disk.
11. A distributed file system comprising a file access client, a file access interface, a plurality of storage disks and a multi-storage disk load management apparatus according to any of claims 7 to 10;
the file access client sends a file access request to the multi-storage-disk load management device through the file access interface;
the multi-storage disk load management device receives the file access request and selects one storage disk from the plurality of storage disks as a target storage disk accessed by the file access request.
12. A distributed storage network system comprising a file access client, a file access interface, a plurality of storage nodes, and a multi-disk load management apparatus according to any one of claims 7 to 10; the storage node comprises a plurality of storage disks;
the file access client sends a file access request to the multi-storage-disk load management device through the file access interface;
and the multi-storage-disk load management device receives the file access request, selects one of the storage nodes as a target storage node according to the file access request, and selects one of the storage disks of the target storage node as a target storage disk accessed by the file access request.
CN201510582124.XA 2015-09-14 2015-09-14 Multi-storage-disk load management method and device, file system and storage network system Active CN106527960B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510582124.XA CN106527960B (en) 2015-09-14 2015-09-14 Multi-storage-disk load management method and device, file system and storage network system
PCT/CN2016/098071 WO2017045545A1 (en) 2015-09-14 2016-09-05 Method and apparatus for managing loads of multiple storage disks, file system, and storage network system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510582124.XA CN106527960B (en) 2015-09-14 2015-09-14 Multi-storage-disk load management method and device, file system and storage network system

Publications (2)

Publication Number Publication Date
CN106527960A CN106527960A (en) 2017-03-22
CN106527960B true CN106527960B (en) 2021-04-02

Family

ID=58288162

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510582124.XA Active CN106527960B (en) 2015-09-14 2015-09-14 Multi-storage-disk load management method and device, file system and storage network system

Country Status (2)

Country Link
CN (1) CN106527960B (en)
WO (1) WO2017045545A1 (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111488127B (en) * 2020-04-16 2023-01-10 苏州浪潮智能科技有限公司 Data parallel storage method and device based on disk cluster and data reading method
CN112988065B (en) * 2021-02-08 2023-11-17 北京星网锐捷网络技术有限公司 Data migration method, device, equipment and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1641610A (en) * 2004-01-08 2005-07-20 英业达股份有限公司 Hard disk replacement control and management method for network storage system
CN104660643A (en) * 2013-11-25 2015-05-27 南京中兴新软件有限责任公司 Request response method and device and distributed file system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8095509B2 (en) * 2007-08-11 2012-01-10 Novell, Inc. Techniques for retaining security restrictions with file versioning
US9043334B2 (en) * 2012-12-26 2015-05-26 Industrial Technology Research Institute Method and system for accessing files on a storage system
CN104375781B (en) * 2013-08-16 2019-07-23 深圳市腾讯计算机系统有限公司 Data access method and device
CN104123359B (en) * 2014-07-17 2017-03-22 江苏省邮电规划设计院有限责任公司 Resource management method of distributed object storage system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1641610A (en) * 2004-01-08 2005-07-20 英业达股份有限公司 Hard disk replacement control and management method for network storage system
CN104660643A (en) * 2013-11-25 2015-05-27 南京中兴新软件有限责任公司 Request response method and device and distributed file system

Also Published As

Publication number Publication date
WO2017045545A1 (en) 2017-03-23
CN106527960A (en) 2017-03-22

Similar Documents

Publication Publication Date Title
US10127233B2 (en) Data processing method and device in distributed file storage system
CN108287660B (en) Data storage method and device
CN110489059B (en) Data cluster storage method and device and computer equipment
US9378067B1 (en) Automated load balancing across the distributed system of hybrid storage and compute nodes
US9122787B2 (en) Method and apparatus to utilize large capacity disk drives
US9116803B1 (en) Placement of virtual machines based on page commonality
US10908834B2 (en) Load balancing for scalable storage system
AU2013347972B2 (en) Distributed caching cluster management
JP7046172B2 (en) Computer implementation methods, computer program products, and systems for storing records in shard database shard tables, computer implementation methods, computer program products, and systems for retrieving records from shard database shard tables. System, as well as a system for storing shard databases
US20200042394A1 (en) Managing journaling resources with copies stored in multiple locations
US10356150B1 (en) Automated repartitioning of streaming data
CN107087031B (en) Storage resource load balancing method and device
US10298715B2 (en) Distributed processing system, task processing method, and storage medium
US9355121B1 (en) Segregating data and metadata in a file system
US9525729B2 (en) Remote monitoring pool management
US20200065306A1 (en) Bloom filter partitioning
US10853365B2 (en) Database management system, computer, and database management method
CN107948229B (en) Distributed storage method, device and system
US20240095084A1 (en) Scale out deduplicated file system as microservices
CN106527960B (en) Multi-storage-disk load management method and device, file system and storage network system
US9805109B2 (en) Computer, control device for computer system, and recording medium
US11436104B2 (en) Decreasing data restoration times using advanced configuration and power interface (ACPI)
US9037762B2 (en) Balancing data distribution in a fault-tolerant storage system based on the movements of the replicated copies of data
US11010410B1 (en) Processing data groupings belonging to data grouping containers
US11188258B2 (en) Distributed storage system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant