CN109828718B - Disk storage load balancing method and device - Google Patents

Disk storage load balancing method and device Download PDF

Info

Publication number
CN109828718B
CN109828718B CN201811496766.8A CN201811496766A CN109828718B CN 109828718 B CN109828718 B CN 109828718B CN 201811496766 A CN201811496766 A CN 201811496766A CN 109828718 B CN109828718 B CN 109828718B
Authority
CN
China
Prior art keywords
disk
storage utilization
utilization rate
average value
list
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811496766.8A
Other languages
Chinese (zh)
Other versions
CN109828718A (en
Inventor
余澈
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Unicom Big Data Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Unicom Big Data Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd, Unicom Big Data Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201811496766.8A priority Critical patent/CN109828718B/en
Publication of CN109828718A publication Critical patent/CN109828718A/en
Application granted granted Critical
Publication of CN109828718B publication Critical patent/CN109828718B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application discloses a method and a device for balancing disk storage load, which comprise the following steps: calculating the average value of the instantaneous storage utilization rate of each disk; calculating the average value of the historical storage utilization rate of each disk; generating a disk storage utilization rate list according to the average value of the instantaneous storage utilization rates of the disks; sorting the disks in the disk storage utilization ratio list according to the average value of the instantaneous storage utilization ratio and the average value of the historical storage utilization ratio of each disk in the disk storage utilization ratio list, and determining a disk to be migrated and a target disk according to the sorted sequence; and migrating the data from the disk to be migrated to the target disk. The disk to be migrated with the data migrated preferentially and the target disk with the data migrated preferentially can be obtained, the data of each disk is effectively balanced, the waste of CPU and disk IO of cluster nodes is also effectively avoided, the cluster storage occupation is balanced, and the cluster storage utilization rate is greatly improved.

Description

Disk storage load balancing method and device
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a disk storage load balancing method and device.
Background
Middleware is a service that an application provides in addition to services provided by an operating system, and a component in the "middle layer" is a bridge between an application on the upper layer and a service on the lower layer, and is also a bridge between applications (e.g., a distributed service component). The distributed message middleware supports a hardware or software infrastructure that sends and receives messages in a distributed system, i.e., the distributed message middleware is itself a distributed system.
Currently, high throughput distributed message middleware operating systems cache files rather than memory. The current high-throughput distributed message middleware stores data offsets to each disk as a recording basis for data reading and writing. The data are partitioned according to the subjects, the magnitude of each subject is different, and the rule of data falling is according to the number of the subjects existing on the disk.
When the concurrency of data reading and writing is improved, the data reading and writing speed is improved, the storage between nodes and between disks is unbalanced, and the uneven load of the IO (input/output) of the disks becomes a big pain point of a large-scale distributed message middleware. In practice, although the number of stored topics of each disk of a large distributed message middleware cluster is not very different, the amount of data stored in each disk is severely loaded unevenly. This may result in a large amount of theme data that falls on a part of the disks, a large read-write throughput of the theme, a small amount of theme data that may fall on the remaining part of the disks, and uneven load on the disks, which in turn may result in uneven storage between nodes and uneven read-write efficiency. In addition, in a big data scene, the largest distributed message middleware cluster needs 100+ nodes, and the number of wasted storage nodes reaches half of the number of cluster nodes; and the read-write concurrency rate of the cluster load is low.
Disclosure of Invention
The application provides a disk storage load balancing method and device aiming at the defects that in the prior art, data storage among nodes and disks in a distributed message middleware occupies unevenly, and the problem that certain storage resources are wasted due to the fact that the distributed message middleware is easy to face the problem of read-write peak of individual disks in a big data scene.
The application provides a method for balancing disk storage load, which comprises the following steps:
calculating the average value of the instantaneous storage utilization rate of each disk;
calculating the average value of the historical storage utilization rate of each disk;
generating a disk storage utilization rate list according to the average value of the instantaneous storage utilization rates of the disks;
sorting the disks in the disk storage utilization ratio list according to the average value of the instantaneous storage utilization ratio and the average value of the historical storage utilization ratio of each disk in the disk storage utilization ratio list, and determining a disk to be migrated and a target disk according to the sorted sequence;
and migrating the data from the disk to be migrated to the target disk.
Optionally, the disk storage utilization list includes: the step of generating a disk storage utilization list according to the average value of the instantaneous storage utilization of each disk specifically includes:
respectively calculating the lower limit value and the upper limit value of the disk storage utilization rate of each disk according to the average value of the instantaneous storage utilization rate of each disk and a preset threshold value of a fluctuation parameter;
determining a disk with a disk storage utilization rate larger than an upper limit value of the disk storage utilization rate to generate the first list;
and determining the disk with the disk storage utilization rate smaller than the lower limit value of the disk storage utilization rate to generate the second list.
Optionally, the step of sorting the disks in the disk storage utilization ratio list according to the instantaneous storage utilization ratio average value and the historical storage utilization ratio average value of each disk in the disk storage utilization ratio list, and determining the disk to be migrated and the target disk according to a sorted sequence specifically includes:
calculating a first difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the first list, and calculating a second difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the second list;
sequencing the disks in the first list according to a descending order of the first difference absolute values to generate a first sequence, and sequencing the disks in the second list according to a descending order of the second difference absolute values to generate a second sequence;
and determining the disks to be migrated according to the priority of the first sequence, and determining the target disks according to the priority of the second sequence, wherein the priority of the disks in the first sequence and the second sequence is the highest.
Optionally, the method further includes:
judging whether the historical storage utilization rate average value is larger than the lower limit value of the disk storage utilization rate and smaller than the upper limit value of the disk storage utilization rate, and whether the historical storage utilization rate average value exists in the disk storage utilization rate list;
and if so, deleting the disk corresponding to the average value of the historical storage utilization rate in the disk storage utilization rate list.
Optionally, the step of calculating an average value of instantaneous storage utilization of each disk specifically includes:
by the formula
Figure GDA0003481289890000031
Calculating the average value of the instantaneous storage utilization rate of each disk;
wherein i is the ith node, j is the jth block disk of the ith node, and the storage utilization rate of the jth block disk of the ith node is XijThe range of the node i is 0-n, and the range of the disk j on the node is 0-m.
Optionally, the step of calculating the average value of the historical storage utilization rates of the disks specifically includes:
by the formula
Figure GDA0003481289890000032
Calculating the average value of the historical storage utilization rate of each disk;
wherein k is a label of disk storage utilization, n is a total of n different labels of disk storage utilization, tkTags that are disk storage utilization have a long time axis,
Figure GDA0003481289890000033
the disk storage utilization for a state of the disk,
Figure GDA0003481289890000034
history of the jth disk of the ith node in T periodAnd storing the utilization rate average value.
Optionally, the step of migrating the data from the disk to be migrated to the target disk specifically includes:
by the formula
Figure GDA0003481289890000041
Calculating the maximum transferable basic data of the disk to be transferred;
wherein, BDjpJ is the disk label information, p is the basic unit data label information,
Figure GDA0003481289890000042
a basic unit data union set conforming to the migration standard;
and migrating the maximum transferable basic data in the disk to be migrated to the target disk.
The present application further provides a device for balancing disk storage load, including:
the first calculation module is used for calculating the average value of the instantaneous storage utilization rate of each disk;
the second calculation module is used for calculating the average value of the historical storage utilization rate of each disk;
the list generation module is used for generating a disk storage utilization list according to the average value of the instantaneous storage utilization of each disk;
the determining module is used for sequencing the disks in the disk storage utilization ratio list according to the average value of the instantaneous storage utilization ratio and the average value of the historical storage utilization ratio of each disk in the disk storage utilization ratio list, and determining the disk to be migrated and the target disk according to the sequencing sequence;
and the migration module is used for migrating the data from the disk to be migrated to the target disk.
Optionally, the disk storage utilization list includes: the list generation module specifically comprises:
the first calculation submodule is used for respectively calculating the lower limit value and the upper limit value of the disk storage utilization rate of each disk according to the average value of the instantaneous storage utilization rate of each disk and a preset threshold value of a fluctuation parameter;
the first list generation submodule is used for determining a disk with a disk storage utilization rate larger than an upper limit value of the disk storage utilization rate so as to generate the first list;
and the second list generation submodule is used for determining the disk with the disk storage utilization rate smaller than the lower limit value of the disk storage utilization rate so as to generate the second list.
Optionally, the determining module specifically includes:
the second calculation submodule is used for calculating a first difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the first list and calculating a second difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the second list;
the sorting submodule is used for sorting the disks in the first list according to the descending order of the first difference absolute values to generate a first sequence, and sorting the disks in the second list according to the descending order of the second difference absolute values to generate a second sequence;
the determining submodule is used for determining a disk to be migrated according to the priority of the first sequence and determining a target disk according to the priority of the second sequence, wherein the priority of a disk in the first sequence and the priority of a disk in the second sequence are the highest;
optionally, the apparatus further comprises:
the judging module is used for judging whether the historical storage utilization rate average value is larger than the lower limit value of the disk storage utilization rate and smaller than the upper limit value of the disk storage utilization rate and whether the historical storage utilization rate average value exists in the disk storage utilization rate list;
and if so, deleting the disk corresponding to the average value of the historical storage utilization rate in the disk storage utilization rate list.
Optionally, the first computing module specifically includes:
a fifth calculation submodule for passing through the formula
Figure GDA0003481289890000051
Calculating the average value of the instantaneous storage utilization rate of each disk;
wherein i is the ith node, j is the jth block disk of the ith node, and the storage utilization rate of the jth block disk of the ith node is XijThe range of the node i is 0-n, and the range of the disk j on the node is 0-m.
Optionally, the second calculating module specifically includes:
a sixth calculation submodule for passing through the formula
Figure GDA0003481289890000052
Calculating the average value of the historical storage utilization rate of each disk;
wherein k is a label of disk storage utilization, n is a total of n different labels of disk storage utilization, tkTags that are disk storage utilization have a long time axis,
Figure GDA0003481289890000053
the disk storage utilization for a state of the disk,
Figure GDA0003481289890000061
and storing the utilization rate average value of the history of the jth disk of the ith node in the T period.
Optionally, the migration module specifically includes:
a seventh calculation submodule for passing the formula
Figure GDA0003481289890000062
Calculating the maximum transferable basic data of the disk to be transferred;
wherein BDjpJ is the disk label information, p is the basic unit data label information,
Figure GDA0003481289890000063
a basic unit data union set conforming to the migration standard;
and the migration submodule is used for migrating the maximum migratable basic data in the disk to be migrated to the target disk.
According to the method and the device, the disk storage utilization rate list is generated through the instantaneous storage utilization rate mean value of each disk, and then the disks to be migrated and the target disk are determined according to the sequence of sequencing the instantaneous storage utilization rate mean value and the historical storage utilization rate mean value of each disk in the disk storage utilization rate list, so that data are migrated to the target disk from the disks to be migrated. The method is flexible, automatic and intelligent, and is embedded in the message middleware, so that a disk to be migrated with data priority and a target disk with data migrated with priority can be obtained, long-term tracking supervision control can be performed on the disk storage utilization rate, frequent disk data migration can be effectively avoided, the CPU of a cluster node is effectively avoided being wasted while each disk data is effectively balanced, disk IO (input/output) balances the cluster storage occupation, and the cluster storage utilization rate is greatly improved.
Drawings
Fig. 1 is a flowchart of a disk storage load balancing method according to a first embodiment of the present application;
fig. 2 is a flowchart of a disk storage load balancing method according to a second embodiment of the present application;
fig. 3 is a schematic structural diagram of a disk storage load balancing apparatus according to a third embodiment of the present application;
fig. 4 is a schematic structural diagram of a disk storage load balancing apparatus according to a fourth embodiment of the present application.
Detailed Description
In order to make the technical solutions of the present invention better understood, the present invention will be described in further detail with reference to the accompanying drawings and specific embodiments.
The application provides a disk storage load balancing method and device. The following detailed description is made with reference to the drawings of the embodiments provided in the present application, respectively.
A method for balancing disk storage load provided in a first embodiment of the present application is as follows:
the execution subject of the embodiment of the present application is a node coordinator, and as shown in fig. 1, it shows a flowchart of a method for balancing a disk storage load provided by the embodiment of the present application, and includes the following steps.
And step S101, calculating the average value of the instantaneous storage utilization rate of each disk.
And step S102, calculating the average value of the historical storage utilization rate of each disk.
And step S103, generating a disk storage utilization rate list according to the average value of the instantaneous storage utilization rates of the disks.
And step S104, sorting the disks in the disk storage utilization ratio list according to the average value of the instantaneous storage utilization ratio and the average value of the historical storage utilization ratio of each disk in the disk storage utilization ratio list, and determining the disk to be migrated and the target disk according to the sorted sequence.
Step S105, migrating data from the disk to be migrated to the target disk.
According to the method and the device, the disk storage utilization rate list is generated through the instantaneous storage utilization rate mean value of each disk, and then the disks to be migrated and the target disk are determined according to the sequence of sequencing the instantaneous storage utilization rate mean value and the historical storage utilization rate mean value of each disk in the disk storage utilization rate list, so that data are migrated to the target disk from the disks to be migrated. The method is flexible, automatic and intelligent, and is embedded in the message middleware, so that a disk to be migrated with data priority and a target disk with data migrated with priority can be obtained, long-term tracking supervision control can be performed on the disk storage utilization rate, frequent disk data migration can be effectively avoided, the CPU of a cluster node is effectively avoided being wasted while each disk data is effectively balanced, disk IO (input/output) balances the cluster storage occupation, and the cluster storage utilization rate is greatly improved.
A method for balancing disk storage load provided in a second embodiment of the present application is as follows:
the execution subject of the embodiment of the present application is a node coordinator, and as shown in fig. 2, it shows a flowchart of a method for balancing a disk storage load provided by the embodiment of the present application, and includes the following steps.
Step S201, calculating an average value of the instantaneous storage utilization of each disk.
In this embodiment, a Coordinator is used as a node Coordinator of a distributed message middleware to periodically collect multidimensional indexes of each node. Each node of the distributed cluster can transmit own state data and metadata to the node coordinators, and index data of each disk of the cluster are collected by the node coordinators and cached in a memory and stored in a distributed database or a time sequence database.
The multidimensional index refers to some basic stateless information of the collected cluster nodes, such as node information (node host names, node IPs, label information of each disk, storage utilization rate information of each disk, and the like), and basic unit data under each disk (basic unit data label information, storage utilization rate of the basic unit data, and the like). The acquired multidimensional indexes can also be stored in a time sequence database, and the movement of sudden increase and sudden decrease data is effectively avoided through the calculation of historical data, so that the waste of a node CPU and IO is avoided.
Preferably, the step of calculating an average value of instantaneous storage utilization rates of the respective disks specifically includes: by the formula
Figure GDA0003481289890000081
Calculating the average value of the instantaneous storage utilization rate of each disk; wherein i is the ith node, j is the jth block disk of the ith node, and the storage utilization rate of the jth block disk of the ith node is XijThe range of the node i is 0-n, and the range of the disk j on the node is 0-m.
In the step, the node coordinator calculates an instantaneous storage utilization average BU of each disk according to the collected node information and the storage utilization information of each disk in the multi-dimensional indexes of each node. BU means the average storage utilization per disk in each node, i.e., the average of the instantaneous storage utilization.
For example, if the node ranges from 0 to i, the disk ranges from 0 to j on the ith node, and the storage utilization rate of the jth disk of the ith node is XijThe range of the node i is 0-n, and the range of the disk j on the node is 0-m. Then, the formula for calculating the average value BU of the instantaneous storage utilization rates of the disks is as follows:
Figure GDA0003481289890000091
step S202, calculating the historical storage utilization rate average value of each disk.
In the step, the node coordinator calculates the historical storage utilization rate average value of each disk according to the collected node information and the storage utilization rate information of each disk in the multi-dimensional indexes of each node
Figure GDA0003481289890000092
Calculating the average value of the historical storage utilization rate of each disk according to the disk storage utilization rate information stored in the time sequence database
Figure GDA0003481289890000093
Figure GDA0003481289890000094
The meaning of (1) is the historical storage utilization average value of the jth disk of the ith node in the T period.
Preferably, the step of calculating the average value of the historical storage utilization rates of the respective disks specifically includes: by the formula
Figure GDA0003481289890000095
Calculating the average value of the historical storage utilization rate of each disk; wherein k is a label of disk storage utilization, n is a total of n different labels of disk storage utilization, tkTags that are disk storage utilization have a long time axis,
Figure GDA0003481289890000096
the disk storage utilization for a state of the disk,
Figure GDA0003481289890000097
and storing the utilization rate average value of the history of the jth disk of the ith node in the T period.
In this step, if the period is T, the value of T is configurable, and the default value is 24 hours; then the average of the historical storage utilization of a certain disk in the T period
Figure GDA0003481289890000098
Calculated by the following formula:
Figure GDA0003481289890000099
k denotes a label of disk storage utilization, n denotes a label of total n different disk storage utilization, tkThe time axis owned by the tag representing disk storage utilization is long,
Figure GDA00034812898900000910
indicating the disk storage utilization for a certain state of the disk.
For example, assuming that the list of the disk storage utilization exceeding the storage upper limit acquires a disk named as/mnt/sata 01, in the disk history statistics, the disk storage utilization is 50% in 22 hours, the disk storage utilization is 60% in 1 hour, the disk storage utilization is 70% in 1 hour, and T is 24 hours. Then it can be calculated according to the above formula
Figure GDA00034812898900000911
Step S203, generating a disk storage utilization rate list according to the average value of the instantaneous storage utilization rates of the disks.
In this step, the list of disks whose disk storage utilization exceeds the threshold includes: the device comprises a first list and a second list, wherein the first list refers to a list with too high disk storage utilization rate, and the first list refers to a list with too low disk storage utilization rate. And determining the upper and lower limit thresholds of the storage utilization rate of the disk by setting the threshold of the effective fluctuation parameter.
Preferably, the disk storage utilization list includes: the step of generating a disk storage utilization list according to the average value of the instantaneous storage utilization of each disk specifically includes: respectively calculating the lower limit value and the upper limit value of the disk storage utilization rate of each disk according to the average value of the instantaneous storage utilization rate of each disk and a preset threshold value of a fluctuation parameter; determining a disk with a disk storage utilization rate larger than an upper limit value of the disk storage utilization rate to generate the first list; and determining the disk with the disk storage utilization rate smaller than the lower limit value of the disk storage utilization rate to generate the second list.
In this step, if the threshold of the fluctuation parameter is represented by VPT, VPT can be flexibly set according to the requirement. The lower limit value of the disk storage utilization rate of each disk is the disk storage minimum utilization rate, and the calculation formula is BUmin=(BU-VPT)。
The upper limit value of the disk storage utilization rate of each disk is the maximum disk storage utilization rate, and the calculation formula is BUmax=(BU+VPT)。
For each disk, if its storage utilization rate Xij>BUmaxThen the node coordinator enters the first list; if it has a storage utilization rate Xij<BUminThen the second list is accounted for by the node coordinator. The data storage formats in the list are (disk name, disk storage utilization rate) for both the first list and the second list.
Step S204, judging whether the average value of the historical storage utilization rate is larger than the lower limit value of the disk storage utilization rate and smaller than the upper limit value of the disk storage utilization rate, and whether the average value of the historical storage utilization rate exists in the disk storage utilization rate list, if so, executing step S205; if not, the process ends.
Step S205, deleting the disk corresponding to the average value of the historical storage utilization in the disk storage utilization list.
In the above steps, after the disk storage utilization rate list is obtained, the disk storage utilization rate list is further filtered.
The specific filtering scheme is as follows: and updating the disk storage utilization rate list based on the pre-calculated historical storage utilization rate average value of each disk in the disk storage utilization rate list. If the average value of the historical storage utilization rate is (BU)min,BUmax) Within range, it is removed from the first list or the second list. If it is not
Figure GDA0003481289890000101
Is at (BU)min,BUmax) And if the range is out of the range, the subsequent processing is executed without processing.
A critical suppression value calculated from historical data stored in the time series data
Figure GDA0003481289890000111
I.e., historical storage utilization averages. Critical inhibition value
Figure GDA0003481289890000112
The innovation of the method has the advantage that a scientific stable value is calculated through historical data, and the value can be used as a scientific reference value of the storage utilization rate of the disk in the period time. By critical inhibition value
Figure GDA0003481289890000113
Some disks with suddenly increased storage can be filtered out, and are filtered out from a disk storage utilization rate list to be worth
Figure GDA0003481289890000114
And the data migration of the disk is carried out after the data meets the standard, so that the waste of a CPU, a memory and a disk IO (input/output) caused by frequent disk data migration can be effectively avoided, and the influence on the data service operation of the message middleware is avoided.
Step S206, sorting the disks in the disk storage utilization ratio list according to the average value of the instantaneous storage utilization ratio and the average value of the historical storage utilization ratio of each disk in the disk storage utilization ratio list, and determining the disk to be migrated and the target disk according to the sorted sequence.
In this step, the disk to be migrated with data migrated first and the target disk to which data is migrated first are determined based on the disk storage utilization lists updated in step S204 and step S205.
Preferably, the step of sorting the disks in the disk storage utilization ratio list according to the instantaneous storage utilization ratio average value and the historical storage utilization ratio average value of each disk in the disk storage utilization ratio list, and determining the disk to be migrated and the target disk according to a sorted sequence specifically includes: calculating a first difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the first list, and calculating a second difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the second list; sequencing the disks in the first list according to a descending order of the first difference absolute values to generate a first sequence, and sequencing the disks in the second list according to a descending order of the second difference absolute values to generate a second sequence; and determining the disks to be migrated according to the priority of the first sequence, and determining the target disks according to the priority of the second sequence, wherein the priority of the disks in the first sequence and the second sequence is the highest.
In this step, the absolute value of the difference | X of each disk in the updated disk storage utilization list is first calculatedijG|=|XijGnAnd BU |, the disks in the first list correspondingly calculate a first absolute difference value, and the disks in the second list correspondingly calculate a second absolute difference value.
Then, according to X in the first listijGAnd sorting the disks in the first list according to the sequence from large to small to obtain a first sequence. According to X in the second listijGAnd sorting the disks in the second list according to the sequence from large to small to obtain a second sequence. That is, the list with the over-high disk storage utilization rate correspondingly generates a sequence of first difference absolute values of one disk, and the list with the over-low disk storage utilization rate also correspondingly generates a sequence of second difference absolute values of one disk.
Finally, according to the first sequenceDetermining the disk to be migrated according to the priority, and determining the absolute value X of the difference valueijGThe large disks, namely the disks ranked before, are subjected to data migration first, and the data load is balanced first. Determining the target disk according to the priority of the second sequence, and the absolute value X of the differenceijGThe large disk, i.e. the disk ordered before, performs data migration first.
For example, the node coordinators collect disk storage data for each data node. Firstly, storing in memory, calculating BU, BUmin,BUmax. Meanwhile, real-time data are stored in a time sequence database, such as mainstream leveldb, influxdb and the like. Here, level db is more appropriate in view of the scene. Reading the historical data of the time sequence database to calculate the average value of the historical storage utilization rate
Figure GDA0003481289890000121
According to
Figure GDA0003481289890000122
And judging whether a disk with data burst increase or decrease exists in the disk storage utilization rate list or not according to the value, wherein the disk with data burst increase or decrease needs to be removed from the disk storage utilization rate list. And sorting the filtered disk storage utilization rate list according to the size, and screening out a disk to be migrated with data preferentially and a target disk with data preferentially migrated according to the sorted first sequence and second sequence.
Step S207, migrating the data from the disk to be migrated to the target disk.
In this step, data is migrated from the disk to be migrated, to which data is preferentially migrated, to the target disk, to which data is preferentially migrated.
Preferably, the step of migrating the data from the disk to be migrated to the target disk specifically includes: by the formula
Figure GDA0003481289890000123
Calculating the maximum transferable basic data of the disk to be transferred; wherein, BDjpJ is the disk label information, p is the basic unit data label information,
Figure GDA0003481289890000124
a basic unit data union set conforming to the migration standard; and migrating the maximum transferable basic data in the disk to be migrated to the target disk.
In this step, the subject data to be migrated is first determined based on the acquired disk to be migrated. According to the information (node, disk name) of the disk to be migrated preferentially acquired by the node coordinator, and the basic unit data (basic unit data label information, basic unit data storage utilization rate and the like) under each disk, the method passes through a formula
Figure GDA0003481289890000131
And calculating subject data needing to be migrated under the disk to be migrated.
Wherein, BDjpJ is the magnetic disk label information and has the same meaning as j in a formula for calculating the average value of the instantaneous storage utilization rate, and p is the basic unit data label information. Data is stored and utilized more than BU from the diskmaxThe storage utilization rate of the magnetic disk transferred to the magnetic disk is less than BUminThe magnetic disk of (a) a (b),
Figure GDA0003481289890000132
it is the VPT that is calculated,
Figure GDA0003481289890000133
calculating the basic unit data union meeting the migration standard to ensure that the BD after migrationjpThe BU can be approached to the maximum extent, and the load balance of the disk is kept. Then through the max function
Figure GDA0003481289890000134
Taking the maximum value, namely determining the maximum transferable basic data BD required to be transferred by the disk to be transferredmaxNamely, the disk to be migrated determines the subject data to be migrated.
After determining that the to-be-migrated disk determines the subject data to be migrated, data migration may be performed. The specific migration process is as follows:
the node coordinator commands the disk to be migrated to use the basic unit data BD obtained in the stepmaxFirstly copying to a target disk, monitoring the disk storage utilization rate in real time by a node coordinator in the copying process, and writing data updating records into a source database (Metadata base) by a migration data node according to data changes. And the source database confirms that the data storage state of the nodes is normal according to the record comparison so as to ensure that the nodes with large storage capacity of the original disk are instructed by the node coordinator to delete the copied data after the data copying is normally finished.
For example: the node coordinator calculates that part of unit data of the disk 1 needs to be migrated to the disk 2 according to the algorithm, and the process is a process of copying the data to the disk 2 first and then removing the copied part of the data of the disk 1. In the whole process, the node coordinators are responsible for management work such as coordinated scheduling, and the source database is responsible for metadata work such as change records.
The embodiment of the application has the following beneficial effects:
the method for updating the disk storage utilization rate list according to the average value of the historical storage utilization rate of the disk in the T period can perform long-term tracking supervision control on the disk with overhigh disk storage utilization rate and overhigh disk storage utilization rate, can effectively avoid frequent disk data migration, causes the waste of a CPU, an internal memory and a disk IO, and avoids influencing the data service operation of the message middleware.
And 2, sorting the disks in the disk storage utilization ratio list according to the absolute difference value of the historical storage utilization ratio mean value and the instantaneous storage utilization ratio mean value to obtain a disk to be migrated with data migrated preferentially and a target disk with data migrated preferentially, so that whether the basic unit data reaches the migration standard, the migration urgency and the migration priority can be judged, and the load balance of the cluster level can be guaranteed to the greatest extent.
And 3, the threshold value of the fluctuation parameter can be flexibly set, the average value of the instantaneous storage utilization rate of the disk level can be rapidly, accurately and efficiently calculated, and a reliable basis is provided for the confirmation, screening and sorting of the subsequent disk storage utilization rate list.
A disk storage load balancing apparatus provided in a third embodiment of the present application is as follows:
in the foregoing embodiment, a disk storage load balancing method is provided, and correspondingly, the present application also provides a disk storage load balancing apparatus.
Fig. 3 is a schematic structural diagram illustrating a disk storage load balancing apparatus according to an embodiment of the present application, and includes the following modules.
The first calculation module 11 is configured to calculate an instantaneous storage utilization average of each disk;
the second calculating module 12 is configured to calculate an average value of historical storage utilization rates of the disks;
the list generating module 13 is configured to generate a disk storage utilization list according to the average value of the instantaneous storage utilization of each disk;
a determining module 14, configured to sort, according to the instantaneous storage utilization average value and the historical storage utilization average value of each disk in the disk storage utilization list, and determine, according to a sorted sequence, a disk to be migrated and a target disk;
and the migration module 15 is configured to migrate data from the disk to be migrated to the target disk.
A disk storage load balancing apparatus provided in a fourth embodiment of the present application is as follows:
optionally, as shown in fig. 4, which shows a schematic structural diagram of a disk storage load balancing apparatus provided in the embodiment of the present application, based on the third embodiment, in the embodiment of the present application, the disk storage utilization list includes: the list generating module 13 specifically includes (not shown in the figure):
the first calculation submodule is used for respectively calculating the lower limit value and the upper limit value of the disk storage utilization rate of each disk according to the average value of the instantaneous storage utilization rate of each disk and a preset threshold value of a fluctuation parameter;
the first list generation submodule is used for determining a disk with a disk storage utilization rate larger than an upper limit value of the disk storage utilization rate so as to generate the first list;
and the second list generation submodule is used for determining the disk with the disk storage utilization rate smaller than the lower limit value of the disk storage utilization rate so as to generate the second list.
Optionally, as shown in fig. 4, the determining module 14 specifically includes (not shown in the figure):
the second calculation submodule is used for calculating a first difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the first list and calculating a second difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the second list;
the sorting submodule is used for sorting the disks in the first list according to the descending order of the first difference absolute values to generate a first sequence, and sorting the disks in the second list according to the descending order of the second difference absolute values to generate a second sequence;
the determining submodule is used for determining a disk to be migrated according to the priority of the first sequence and determining a target disk according to the priority of the second sequence, wherein the priority of a disk in the first sequence and the priority of a disk in the second sequence are the highest;
optionally, as shown in fig. 4, the apparatus further includes:
a judging module 16, configured to judge whether the average historical storage utilization rate is greater than a lower limit of the disk storage utilization rate and less than an upper limit of the disk storage utilization rate, and whether the average historical storage utilization rate exists in the disk storage utilization rate list;
and a deleting module 17, configured to delete, if yes, the disk corresponding to the average value of the historical storage utilization in the disk storage utilization list.
Optionally, as shown in fig. 4, the first calculating module 11 specifically includes (not shown in the figure):
a fifth calculation submodule for passing through the formula
Figure GDA0003481289890000151
Calculating the average value of the instantaneous storage utilization rate of each disk;
wherein i is the ith node, j is the jth block disk of the ith node, and the storage utilization rate of the jth block disk of the ith node is XijThe range of the node i is 0-n, and the range of the disk j on the node is 0-m.
Optionally, as shown in fig. 4, the second calculating module 12 specifically includes:
a sixth calculation submodule for passing through the formula
Figure GDA0003481289890000161
Calculating the average value of the historical storage utilization rate of each disk;
wherein k is a label of disk storage utilization, n is a total of n different labels of disk storage utilization, tkTags that are disk storage utilization have a long time axis,
Figure GDA0003481289890000162
the disk storage utilization for a state of the disk,
Figure GDA0003481289890000163
and storing the utilization rate average value of the history of the jth disk of the ith node in the T period.
Optionally, as shown in fig. 4, the migration module 15 specifically includes (not shown in the figure):
a seventh calculation submodule for passing the formula
Figure GDA0003481289890000164
Calculating the maximum transferable basic data of the disk to be transferred;
wherein BDjpJ is the disk label information, p is the basic unit data label information,
Figure GDA0003481289890000165
number of base units to meet migration criteriaAccording to the union set;
and the migration submodule is used for migrating the maximum migratable basic data in the disk to be migrated to the target disk.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (8)

1. A method for balancing disk storage load, comprising:
calculating the average value of the instantaneous storage utilization rate of each disk;
calculating the average value of the historical storage utilization rate of each disk;
generating a disk storage utilization rate list according to the average value of the instantaneous storage utilization rates of the disks;
sorting the disks in the disk storage utilization ratio list according to the average value of the instantaneous storage utilization ratio and the average value of the historical storage utilization ratio of each disk in the disk storage utilization ratio list, and determining a disk to be migrated and a target disk according to the sorted sequence;
migrating data from the disk to be migrated to the target disk;
the step of calculating the historical storage utilization average value of each disk specifically includes:
by the formula
Figure FDA0003481289880000011
Calculating the average value of the historical storage utilization rate of each disk;
wherein k is a label of disk storage utilization, n is a total of n different labels of disk storage utilization, tkTags that are disk storage utilization have a long time axis,
Figure FDA0003481289880000012
the disk storage utilization for a state of the disk,
Figure FDA0003481289880000013
the historical storage utilization rate average value of the jth disk of the ith node in the T period;
the list of disk storage utilizations comprises: the step of generating a disk storage utilization list according to the average value of the instantaneous storage utilization of each disk specifically includes:
respectively calculating the lower limit value and the upper limit value of the disk storage utilization rate of each disk according to the average value of the instantaneous storage utilization rate of each disk and a preset threshold value of a fluctuation parameter;
determining a disk with a disk storage utilization rate larger than an upper limit value of the disk storage utilization rate to generate the first list;
determining a disk with a disk storage utilization rate smaller than a lower limit value of the disk storage utilization rate to generate the second list;
the step of sorting the disks in the disk storage utilization ratio list according to the average value of the instantaneous storage utilization ratio and the average value of the historical storage utilization ratio of each disk in the disk storage utilization ratio list, and determining the disk to be migrated and the target disk according to the sorted sequence specifically comprises the steps of:
calculating a first difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the first list, and calculating a second difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the second list;
sequencing the disks in the first list according to a descending order of the first difference absolute values to generate a first sequence, and sequencing the disks in the second list according to a descending order of the second difference absolute values to generate a second sequence;
and determining the disks to be migrated according to the priority of the first sequence, and determining the target disks according to the priority of the second sequence, wherein the priority of the disks in the first sequence and the second sequence is the highest.
2. The method for load balancing of disk storage according to claim 1, further comprising:
judging whether the historical storage utilization rate average value is larger than the lower limit value of the disk storage utilization rate and smaller than the upper limit value of the disk storage utilization rate, and whether the historical storage utilization rate average value exists in the disk storage utilization rate list;
and if so, deleting the disk corresponding to the average value of the historical storage utilization rate in the disk storage utilization rate list.
3. The method for load balancing of disk storage according to any one of claims 1 to 2, wherein the step of calculating the average value of the instantaneous storage utilization of each disk specifically includes:
by the formula
Figure FDA0003481289880000021
Calculating the average value of the instantaneous storage utilization rate of each disk;
wherein i is the ith node, j is the jth block disk of the ith node, and the storage utilization rate of the jth block disk of the ith node is XijThe range of the node i is 0-n, and the range of the disk j on the node is 0-m.
4. The method for load balancing of disk storage according to claim 1, wherein the step of migrating data from the disk to be migrated to the target disk specifically includes:
by the formula
Figure FDA0003481289880000022
Calculating the maximum transferable basic data of the disk to be transferred;
wherein, BDjpStoring the utilization rate for the basic unit data, j is the disk label information, pIn order to tag information for the base unit data,
Figure FDA0003481289880000031
a basic unit data union set conforming to the migration standard;
and migrating the maximum transferable basic data in the disk to be migrated to the target disk.
5. An apparatus for load balancing disk storage, comprising:
the first calculation module is used for calculating the average value of the instantaneous storage utilization rate of each disk;
the second calculation module is used for calculating the average value of the historical storage utilization rate of each disk;
the list generation module is used for generating a disk storage utilization list according to the average value of the instantaneous storage utilization of each disk;
the determining module is used for sequencing the disks in the disk storage utilization ratio list according to the average value of the instantaneous storage utilization ratio and the average value of the historical storage utilization ratio of each disk in the disk storage utilization ratio list, and determining the disk to be migrated and the target disk according to the sequencing sequence;
the migration module is used for migrating data from the disk to be migrated to the target disk;
the second calculation module specifically includes:
a sixth calculation submodule for passing through the formula
Figure FDA0003481289880000032
Calculating the average value of the historical storage utilization rate of each disk;
wherein k is a label of disk storage utilization, n is a total of n different labels of disk storage utilization, tkTags that are disk storage utilization have a long time axis,
Figure FDA0003481289880000033
disk storage utilization for a state of a disk,
Figure FDA0003481289880000034
The historical storage utilization rate average value of the jth disk of the ith node in the T period;
the list of disk storage utilizations comprises: the list generation module specifically comprises:
the first calculation submodule is used for respectively calculating the lower limit value and the upper limit value of the disk storage utilization rate of each disk according to the average value of the instantaneous storage utilization rate of each disk and a preset threshold value of a fluctuation parameter;
the first list generation submodule is used for determining a disk with a disk storage utilization rate larger than an upper limit value of the disk storage utilization rate so as to generate the first list;
the second list generation submodule is used for determining the disk with the disk storage utilization rate smaller than the lower limit value of the disk storage utilization rate so as to generate the second list;
the determining module specifically includes:
the second calculation submodule is used for calculating a first difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the first list and calculating a second difference absolute value of the average value of the instantaneous storage utilization rate and the average value of the historical storage utilization rate of each disk in the second list;
the sorting submodule is used for sorting the disks in the first list according to the descending order of the first difference absolute values to generate a first sequence, and sorting the disks in the second list according to the descending order of the second difference absolute values to generate a second sequence;
and the determining submodule is used for determining the disk to be migrated according to the priority of the first sequence and determining the target disk according to the priority of the second sequence, wherein the priority of the disk which is sequenced at the front in the first sequence and the second sequence is the highest.
6. The apparatus for disk storage load balancing according to claim 5, further comprising:
the judging module is used for judging whether the historical storage utilization rate average value is larger than the lower limit value of the disk storage utilization rate and smaller than the upper limit value of the disk storage utilization rate and whether the historical storage utilization rate average value exists in the disk storage utilization rate list;
and if so, deleting the disk corresponding to the average value of the historical storage utilization rate in the disk storage utilization rate list.
7. The apparatus for load balancing of disk storage according to any one of claims 5 to 6, wherein the first computing module specifically includes:
a fifth calculation submodule for passing through the formula
Figure FDA0003481289880000041
Calculating the average value of the instantaneous storage utilization rate of each disk;
wherein i is the ith node, j is the jth block disk of the ith node, and the storage utilization rate of the jth block disk of the ith node is XijThe range of the node i is 0-n, and the range of the disk j on the node is 0-m.
8. The apparatus for load balancing of disk storage according to claim 5, wherein the migration module specifically includes:
a seventh calculation submodule for passing the formula
Figure FDA0003481289880000051
Calculating the maximum transferable basic data of the disk to be transferred;
wherein BDjpJ is the disk label information, p is the basic unit data label information,
Figure FDA0003481289880000052
to bases meeting the migration criteriaA base unit data union set;
and the migration submodule is used for migrating the maximum migratable basic data in the disk to be migrated to the target disk.
CN201811496766.8A 2018-12-07 2018-12-07 Disk storage load balancing method and device Active CN109828718B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811496766.8A CN109828718B (en) 2018-12-07 2018-12-07 Disk storage load balancing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811496766.8A CN109828718B (en) 2018-12-07 2018-12-07 Disk storage load balancing method and device

Publications (2)

Publication Number Publication Date
CN109828718A CN109828718A (en) 2019-05-31
CN109828718B true CN109828718B (en) 2022-03-18

Family

ID=66859544

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811496766.8A Active CN109828718B (en) 2018-12-07 2018-12-07 Disk storage load balancing method and device

Country Status (1)

Country Link
CN (1) CN109828718B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113268203B (en) * 2021-05-18 2022-11-04 天津中科曙光存储科技有限公司 Capacity balancing method and device of storage system, computer equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104035542A (en) * 2014-05-28 2014-09-10 中国科学院计算技术研究所 Virtual machine migration method and system for balancing calculation energy consumption and refrigeration energy consumption
CN104536909A (en) * 2014-12-09 2015-04-22 华为技术有限公司 Memory management method, memory management device and memory device
CN105187512A (en) * 2015-08-13 2015-12-23 航天恒星科技有限公司 Method and system for load balancing of virtual machine clusters
CN105592156A (en) * 2015-12-25 2016-05-18 中国人民解放军信息工程大学 Network function distributed elastic control method
CN106534359A (en) * 2016-12-13 2017-03-22 中科院成都信息技术股份有限公司 Storage load balancing method based on storage entropy
CN106873919A (en) * 2017-03-20 2017-06-20 郑州云海信息技术有限公司 A kind of date storage method and device based on cloud storage system
CN108009260A (en) * 2017-12-11 2018-05-08 西安交通大学 A kind of big data storage is lower with reference to node load and the Replica placement method of distance
US10089035B1 (en) * 2013-10-29 2018-10-02 EMC IP Holding Company LLC Block storage transparent platform migration

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10089035B1 (en) * 2013-10-29 2018-10-02 EMC IP Holding Company LLC Block storage transparent platform migration
CN104035542A (en) * 2014-05-28 2014-09-10 中国科学院计算技术研究所 Virtual machine migration method and system for balancing calculation energy consumption and refrigeration energy consumption
CN104536909A (en) * 2014-12-09 2015-04-22 华为技术有限公司 Memory management method, memory management device and memory device
CN105187512A (en) * 2015-08-13 2015-12-23 航天恒星科技有限公司 Method and system for load balancing of virtual machine clusters
CN105592156A (en) * 2015-12-25 2016-05-18 中国人民解放军信息工程大学 Network function distributed elastic control method
CN106534359A (en) * 2016-12-13 2017-03-22 中科院成都信息技术股份有限公司 Storage load balancing method based on storage entropy
CN106873919A (en) * 2017-03-20 2017-06-20 郑州云海信息技术有限公司 A kind of date storage method and device based on cloud storage system
CN108009260A (en) * 2017-12-11 2018-05-08 西安交通大学 A kind of big data storage is lower with reference to node load and the Replica placement method of distance

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《Trading Off Resource Utilization and Task Migrations in Dynamic Load-balancing》;Juan Luis J. Laredo等;《GECCO Companion "15: Proceedings of the Companion Publication of the 2015 Annual Conference on Genetic and Evolutionary Computation》;20150731;第1409–1410页 *
《基于可信计算的虚拟机迁移完整性度量研究》;汤琳琳;《万方学位论文平台》;20180424;第14-24页 *

Also Published As

Publication number Publication date
CN109828718A (en) 2019-05-31

Similar Documents

Publication Publication Date Title
CN107807796B (en) Data layering method, terminal and system based on super-fusion storage system
CN107229518B (en) Distributed cluster training method and device
CN110289994B (en) Cluster capacity adjusting method and device
US10356150B1 (en) Automated repartitioning of streaming data
CN107122126B (en) Data migration method, device and system
CN106339386B (en) Database flexible scheduling method and device
TWI738721B (en) Task scheduling method and device
US8904144B1 (en) Methods and systems for determining at risk index for storage capacity
CN111381928B (en) Virtual machine migration method, cloud computing management platform and storage medium
CN109739627B (en) Task scheduling method, electronic device and medium
JP2018515844A (en) Data processing method and system
CN106445677A (en) Load balancing method and device
CN105955662A (en) Method and system for expansion of K-DB data table space
WO2020172852A1 (en) Computing resource scheduling method, scheduler, internet of things system, and computer readable medium
CN110737717B (en) Database migration method and device
CN109828718B (en) Disk storage load balancing method and device
CN108021484B (en) Method and system for prolonging expected life value of disk in cloud service system
US20200293543A1 (en) Method and apparatus for transmitting data
WO2013190649A1 (en) Information processing method and device related to virtual-disk migration
CN116244085A (en) Kubernetes cluster container group scheduling method, device and medium
TWI718252B (en) Task scheduling method and device
CN107436812B (en) A kind of method and device of linux system performance optimization
CN111506425B (en) Method and device for processing quality of service data
CN114625570A (en) Database backup scheduling method and device
CN115617469A (en) Data processing method in cluster, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant