CN112667160A - Rapid equalization method and device for mass storage system - Google Patents

Rapid equalization method and device for mass storage system Download PDF

Info

Publication number
CN112667160A
CN112667160A CN202011566146.4A CN202011566146A CN112667160A CN 112667160 A CN112667160 A CN 112667160A CN 202011566146 A CN202011566146 A CN 202011566146A CN 112667160 A CN112667160 A CN 112667160A
Authority
CN
China
Prior art keywords
disk
physical
node
nodes
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011566146.4A
Other languages
Chinese (zh)
Inventor
杨飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Innovation Technology Co ltd
Original Assignee
Shenzhen Innovation Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Innovation Technology Co ltd filed Critical Shenzhen Innovation Technology Co ltd
Priority to CN202011566146.4A priority Critical patent/CN112667160A/en
Publication of CN112667160A publication Critical patent/CN112667160A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for quickly balancing a mass storage system, wherein the method comprises the following steps: dividing a first preset number of logic nodes based on an initial first preset number of physical nodes, wherein an initial second preset number of physical disks are arranged in the physical nodes; in the balancing process after capacity expansion or capacity reduction, data is preferentially balanced among the logic nodes; and after the data balance among the logic nodes is finished, if the data in the logic nodes are unbalanced, carrying out data balance in the nodes on the logic nodes with data unbalance. The invention reduces the pressure brought to the storage network and the storage nodes when the storage system is balanced.

Description

Rapid equalization method and device for mass storage system
Technical Field
The invention relates to the technical field of storage, in particular to a method and a device for quickly balancing a mass storage system.
Background
In recent years, with the continuous expansion of the scale of internet networks, mass data are required to be stored in emerging applications, edge computing, internet of things, big data analysis and real-time analysis every day, data storage becomes a hot topic of each big software design, and high-performance storage is required due to the large amount of stored data. The existing storage system is generally used as a distributed storage system, and the method for ensuring the data to be distributed evenly in the distributed storage system is an effective way for providing high performance by storage. However, the amount of data cannot be well estimated by software at the beginning of design, capacity expansion of storage capacity is faced in general situations, and capacity reduction may occur in individual situations.
Disclosure of Invention
The embodiment of the specification provides a method and a device for quickly balancing a mass storage system.
In one aspect, a method for quickly equalizing a mass storage system provided in an embodiment of the present specification includes: dividing a first preset number of logic nodes based on an initial first preset number of physical nodes, wherein an initial second preset number of physical disks are arranged in the physical nodes; in the balancing process after capacity expansion or capacity reduction, data is preferentially balanced among the logic nodes; and after the data balance among the logic nodes is finished, if the data in the logic nodes are unbalanced, carrying out data balance in the nodes on the logic nodes with data unbalance.
On the other hand, an embodiment of the present specification provides a fast equalization apparatus for a mass storage system, including: the logical node dividing module is used for dividing a first preset number of logical nodes based on an initial first preset number of physical nodes, wherein the physical nodes are provided with an initial second preset number of physical disks; the data balancing module among the logic nodes is used for balancing the data among the logic nodes preferentially in the balancing process after capacity expansion or capacity reduction; and the logical node internal data balancing module is used for carrying out node internal data balancing on the logical nodes with data imbalance.
The embodiment of the invention reduces the pressure brought to the storage network and the storage nodes when the storage system is rapidly balanced.
Drawings
Fig. 1 is a flow diagram of a method for fast leveling of a mass storage system according to some embodiments of the present disclosure.
Fig. 2 is a block diagram of a fast equalization apparatus for a mass storage system according to some embodiments of the present disclosure.
FIG. 3 is a distribution diagram of an initial storage system of some embodiments of the present description.
FIG. 4 is a distribution diagram of the initial storage system of FIG. 3 after three physical nodes have been expanded.
FIG. 5 is a distribution diagram of the storage system of FIG. 4 after partial disk migration.
FIG. 6 is a distribution diagram of the storage system of FIG. 5 with physical disks added.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.
As shown in fig. 1, some embodiments of the present specification provide a method for fast leveling of a mass storage system, including dividing a first predetermined number of logical nodes based on an initial first predetermined number of physical nodes, where an initial second predetermined number of physical disks are disposed in the physical nodes; in the balancing process after capacity expansion or capacity reduction, data is preferentially balanced among the logic nodes; and after the data balance among the logic nodes is finished, if the data in the logic nodes are unbalanced, carrying out data balance in the nodes on the logic nodes with data unbalance.
Further, in some embodiments of the present specification, the step of balancing the data among the logic nodes is specifically to obtain a disk usage rate of each logic node, and when the disk usage rates of the logic nodes are different, migrate the data from at least one logic node with a higher disk usage rate to at least one logic node with a lower disk usage rate among the logic nodes until the disk usage rates of the logic nodes are the same.
Further, in some embodiments of the present specification, in a logical node with a higher disk usage rate, at least one physical disk with a higher usage rate is selected as a data migration disk; and selecting at least one physical disk with lower utilization rate as a data migration disk in the logical node with lower disk utilization rate.
Further, in some embodiments of the present specification, the step of performing intra-node data balancing on the logical nodes with data imbalance is specifically to obtain a usage rate of each physical disk of each physical node in each logical node, migrate, in the same physical node, the physical disk with a higher usage rate as a data migration disk, and migrate the data to the physical disk with a lower usage rate until the usage rates of each physical disk in the same physical node are the same.
In some embodiments of the present description, the physical disks are each provided with a physical node ID, a logical node ID, and a global ID.
With reference to fig. 2, an embodiment of the present invention further provides a fast equalization apparatus for a mass storage system, including a logical node dividing module, configured to divide a first predetermined number of logical nodes based on an initial first predetermined number of physical nodes, where the physical nodes are provided with an initial second predetermined number of physical disks; the data balancing module among the logic nodes is used for balancing the data among the logic nodes preferentially in the balancing process after capacity expansion or capacity reduction; and the logical node internal data balancing module is used for carrying out node internal data balancing on the logical nodes with data imbalance.
In some embodiments of the present specification, the inter-logical node data balancing module is specifically configured to obtain a disk usage rate of each logical node, and when the disk usage rates of the logical nodes are different, migrate data from at least one logical node with a higher disk usage rate to at least one logical node with a lower disk usage rate among the logical nodes until the disk usage rates of the logical nodes are the same.
In some embodiments of the present specification, the inter-logical-node data balancing module is further configured to select, in a logical node with a higher disk usage rate, at least one physical disk with a higher usage rate as a data migration disk; and selecting at least one physical disk with lower utilization rate as a data migration disk in the logical node with lower disk utilization rate.
In some embodiments of the present description, the data balancing module in a logical node is specifically configured to obtain a usage rate of each physical disk of each physical node in each logical node, migrate, in the same physical node, a physical disk with a higher usage rate as a data migration disk, and migrate data to a physical disk with a lower usage rate until the usage rate of each physical disk in the same physical node is the same.
In some embodiments of the present specification, the physical disks are each provided with a physical node ID, a logical node ID, and a global ID.
There is also provided in some embodiments of the present specification an electronic device and a computer-readable storage medium, the electronic device comprising a memory for storing a computer software program; and the processor is used for realizing the steps of the rapid balancing method of the mass storage system when the computer software program is run. The computer readable storage medium stores a computer software program that when executed implements the steps of the method for fast balancing of mass storage systems.
The following describes the expansion of the storage system and the equalization after the expansion in detail with reference to fig. 3 to 6.
As shown in fig. 3, the storage system initially has three physical nodes (physical node 1, physical node 2, and physical node 3), and may be divided into three logical nodes (logical node 1, logical node 2, and logical node 3) according to the three physical nodes, where each physical node includes two disks, and as can be seen from fig. 3, each disk is provided with three identifiers, namely, a physical node ID, a logical node ID, and a global ID, for example, for two disks in the physical node 1, the physical node ID and the logical node ID are both 1, and the global ID is 1 and 2, respectively; since the storage system initial data is allocated evenly, it can be assumed that the current usage per disk is 80%.
As shown in fig. 4, the initial storage system is expanded, and three physical nodes (physical node 4, physical node 5, and physical node 6) are respectively expanded in logical node 1, logical node 2, and logical node 3, where fig. 4 shows a state where three newly added physical nodes are not inserted with disks.
As shown in fig. 5, when partial disks of the original physical nodes (physical node 1, physical node 2, and physical node 3) are inserted into the three newly added physical nodes (physical node 4, physical node 5, and physical node 6), the logical node and the physical node have the same usage rate, and the real capacity is not increased.
As shown in fig. 6, a new disk (disk of global ID 7-12) is inserted into the position of the dashed box in fig. 5, where the logical node utilization is the same, the disk usage in the physical node is not balanced, and then intra-node balancing may be started based on the scheme of the embodiment of the present invention. After a period of time, the balance can be completed to make the utilization rate of all the disks be 40%
The following detailed explanation for the equalization process is as follows:
with reference to fig. 3 to 6, the balancing process is a data migration process, that is, a part of data of an original physical disk (a disk with a global ID of 1-6% usage in fig. 3 to 6) is migrated to a newly added physical disk (a physical disk with a global ID of 7-12% usage). And calculating the utilization rate of each logic node, and preferably performing balance among the logic nodes. For example, for the disk usage rate of the logical node 1, the disk usage rate of the logical node 1 is equal to (80% + 80% + 0% + 0%)/4 is equal to 40%, and similarly, the disk usage rates of the logical node 2 and the logical node 3 are also 40%, so that it can be proved that the usage rates of the three logical nodes are the same. However, if the disk usage rates of the logical nodes through the above process are not balanced, the disk of the logical node with the higher disk usage rate is found out from the logical nodes as a data migrated disk, and the disk of the logical node with the lower disk usage rate is found out from the logical nodes as a migrated disk, where there may be multiple migrated disks, and finally the disk usage rates of each logical node are the same.
After the data balance among the logical nodes is completed, the physical node balance is checked again. For example, regarding the disk usage rate of the physical node 1, the disk usage rate of the physical node 1 is equal to (80% +0)/2, which is equal to 40%, so that the physical disk with the global ID of 1 having the usage rate of 80% in the physical node 1 needs to migrate 40% of data to the physical disk with the global ID of 7 having the usage rate of 0%, and similarly, other physical nodes (physical nodes 2 to 6) should perform corresponding operations.
It should be noted that the foregoing expansion process only describes one of various expansion situations, and if the expansion is performed in multiples, the logical nodes do not need to be balanced, and only the physical node internal balance is performed.
In summary, in the embodiment of the present invention, data balancing is performed according to priority balancing among logical nodes, and then data balancing in the nodes is performed, so as to finally achieve balancing of disk data in the storage system, thereby reducing use of the storage network as much as possible, and preferably avoiding use of the storage network, and performing only physical node balancing.
While the process flows described above include operations that occur in a particular order, it should be appreciated that the processes may include more or less operations that are performed sequentially or in parallel (e.g., using parallel processors or a multi-threaded environment). The present invention is described with reference to flowchart illustrations and/or block diagrams of methods according to embodiments of the invention.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method or device comprising the element.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the method embodiment, since it is substantially similar to the apparatus embodiment, the description is simple, and the relevant points can be referred to the partial description of the apparatus embodiment. The above description is only an example of the present specification, and is not intended to limit the present specification. Various modifications and alterations to this description will become apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the scope of the claims of the present specification.

Claims (12)

1. A method for fast equalization of a mass storage system, the method comprising:
dividing a first preset number of logic nodes based on an initial first preset number of physical nodes, wherein an initial second preset number of physical disks are arranged in the physical nodes;
in the balancing process after capacity expansion or capacity reduction, data is preferentially balanced among the logic nodes;
and after the data balance among the logic nodes is finished, if the data in the logic nodes are unbalanced, carrying out data balance in the nodes on the logic nodes with data unbalance.
2. The method for fast equalization of a mass storage system according to claim 1,
the step of equalizing data among the logical nodes may, specifically,
and acquiring the disk utilization rate of each logic node, and when the disk utilization rates of the logic nodes are different, migrating data from at least one logic node with higher disk utilization rate to at least one logic node with lower disk utilization rate among the logic nodes until the disk utilization rates of the logic nodes are the same.
3. The method for fast equalization of a mass storage system according to claim 2,
selecting at least one physical disk with higher utilization rate as a data migration disk in a logic node with higher disk utilization rate;
and selecting at least one physical disk with lower utilization rate as a data migration disk in the logical node with lower disk utilization rate.
4. The method for fast equalization of a mass storage system according to claim 1,
the step of performing intra-node data balancing on the logical nodes with data imbalance is specifically,
and acquiring the utilization rate of each physical disk of each physical node in each logical node, taking the physical disk with higher utilization rate as a data migration disk in the same physical node, and migrating the data to the physical disk with lower utilization rate until the utilization rate of each physical disk in the same physical node is the same.
5. The method for fast equalization of a mass storage system according to claim 1,
and the physical disks are all provided with a physical node ID, a logical node ID and a global ID.
6. An apparatus for fast leveling of a mass storage system, comprising:
the logical node dividing module is used for dividing a first preset number of logical nodes based on an initial first preset number of physical nodes, wherein the physical nodes are provided with an initial second preset number of physical disks;
the data balancing module among the logic nodes is used for balancing the data among the logic nodes preferentially in the balancing process after capacity expansion or capacity reduction;
and the logical node internal data balancing module is used for carrying out node internal data balancing on the logical nodes with data imbalance.
7. Mass storage system fast equalization apparatus according to claim 6,
the inter-logical-node data balancing module is specifically configured to obtain a disk usage rate of each logical node, and when the disk usage rates of the logical nodes are different, migrate data from at least one logical node with a higher disk usage rate to at least one logical node with a lower disk usage rate among the logical nodes until the disk usage rates of the logical nodes are the same.
8. Mass storage system fast equalization apparatus according to claim 7,
the inter-logical-node data balancing module is further used for selecting at least one physical disk with higher utilization rate as a data migration disk from the logical nodes with higher disk utilization rate; and selecting at least one physical disk with lower utilization rate as a data migration disk in the logical node with lower disk utilization rate.
9. Mass storage system fast equalization apparatus according to claim 6,
the data balancing module in the logical node is specifically configured to obtain a utilization rate of each physical disk of each physical node in each logical node, migrate, in the same physical node, a physical disk with a higher utilization rate as a data migration disk, and migrate data to a physical disk with a lower utilization rate until the utilization rates of each physical disk in the same physical node are the same.
10. Mass storage system fast equalization apparatus according to claim 6,
and the physical disks are all provided with a physical node ID, a logical node ID and a global ID.
11. An electronic device, comprising
A memory for storing a computer software program;
a processor for implementing the steps of the method for fast leveling of a mass storage system according to any one of claims 1 to 5 when running said computer software program.
12. A computer-readable storage medium, characterized in that,
the computer readable storage medium has stored thereon a computer software program which when executed performs the steps of the method for fast leveling of a mass storage system according to any of claims 1 to 5.
CN202011566146.4A 2020-12-25 2020-12-25 Rapid equalization method and device for mass storage system Pending CN112667160A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011566146.4A CN112667160A (en) 2020-12-25 2020-12-25 Rapid equalization method and device for mass storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011566146.4A CN112667160A (en) 2020-12-25 2020-12-25 Rapid equalization method and device for mass storage system

Publications (1)

Publication Number Publication Date
CN112667160A true CN112667160A (en) 2021-04-16

Family

ID=75409437

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011566146.4A Pending CN112667160A (en) 2020-12-25 2020-12-25 Rapid equalization method and device for mass storage system

Country Status (1)

Country Link
CN (1) CN112667160A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523251A (en) * 2011-11-25 2012-06-27 北京开拓天际科技有限公司 Cloud storage architecture for processing mass data and cloud storage platform using the same
CN103327094A (en) * 2013-06-19 2013-09-25 成都市欧冠信息技术有限责任公司 Data distributed type memory method and data distributed type memory system
CN103761059A (en) * 2014-01-24 2014-04-30 中国科学院信息工程研究所 Multi-disk storage method and system for mass data management
CN104702691A (en) * 2015-03-13 2015-06-10 华为技术有限公司 Distributed load balancing method and device
CN104917784A (en) * 2014-03-10 2015-09-16 华为技术有限公司 Data migration method and device, and computer system
CN109788006A (en) * 2017-11-10 2019-05-21 阿里巴巴集团控股有限公司 Data balancing method, device and computer equipment
CN110515947A (en) * 2019-08-23 2019-11-29 苏州浪潮智能科技有限公司 A kind of storage system
CN111913670A (en) * 2020-08-07 2020-11-10 北京百度网讯科技有限公司 Load balancing processing method and device, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102523251A (en) * 2011-11-25 2012-06-27 北京开拓天际科技有限公司 Cloud storage architecture for processing mass data and cloud storage platform using the same
CN103327094A (en) * 2013-06-19 2013-09-25 成都市欧冠信息技术有限责任公司 Data distributed type memory method and data distributed type memory system
CN103761059A (en) * 2014-01-24 2014-04-30 中国科学院信息工程研究所 Multi-disk storage method and system for mass data management
CN104917784A (en) * 2014-03-10 2015-09-16 华为技术有限公司 Data migration method and device, and computer system
CN104702691A (en) * 2015-03-13 2015-06-10 华为技术有限公司 Distributed load balancing method and device
CN109788006A (en) * 2017-11-10 2019-05-21 阿里巴巴集团控股有限公司 Data balancing method, device and computer equipment
CN110515947A (en) * 2019-08-23 2019-11-29 苏州浪潮智能科技有限公司 A kind of storage system
CN111913670A (en) * 2020-08-07 2020-11-10 北京百度网讯科技有限公司 Load balancing processing method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107562531B (en) Data equalization method and device
Xie et al. Power of d choices for large-scale bin packing: A loss model
CN109408590B (en) Method, device and equipment for expanding distributed database and storage medium
US10356150B1 (en) Automated repartitioning of streaming data
CN106339386B (en) Database flexible scheduling method and device
CN111290699B (en) Data migration method, device and system
US11093288B2 (en) Systems and methods for cluster resource balancing in a hyper-converged infrastructure
WO2012154177A1 (en) Varying a characteristic of a job profile relating to map and reduce tasks according to a data size
JP2022539955A (en) Task scheduling method and apparatus
CN104216784A (en) Hotspot balance control method and related device
CN110413393B (en) Cluster resource management method and device, computer cluster and readable storage medium
KR102326586B1 (en) Method and apparatus for processing large-scale distributed matrix product
CN114047883B (en) Data equalization method and device based on distributed storage system
CN110019528A (en) Database manipulation load-balancing method, device, equipment and medium
CN109788013B (en) Method, device and equipment for distributing operation resources in distributed system
CN109788006B (en) Data equalization method and device and computer equipment
CN106412075A (en) Resource allocation method and device based on cloud computing
CN111046004B (en) Data file storage method, device, equipment and storage medium
KR101661475B1 (en) Load balancing method for improving hadoop performance in heterogeneous clusters, recording medium and hadoop mapreduce system for performing the method
US10387578B1 (en) Utilization limiting for nested object queries
CN112667160A (en) Rapid equalization method and device for mass storage system
CN108259583B (en) Data dynamic migration method and device
CN106201711A (en) A kind of task processing method and server
CN112988367A (en) Resource allocation method and device, computer equipment and readable storage medium
CN105373451A (en) Virtual machine placement method and apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination