CN115203177B - Distributed data storage system and storage method - Google Patents

Distributed data storage system and storage method Download PDF

Info

Publication number
CN115203177B
CN115203177B CN202211125471.6A CN202211125471A CN115203177B CN 115203177 B CN115203177 B CN 115203177B CN 202211125471 A CN202211125471 A CN 202211125471A CN 115203177 B CN115203177 B CN 115203177B
Authority
CN
China
Prior art keywords
occupancy
node
storage
resource utilization
rate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211125471.6A
Other languages
Chinese (zh)
Other versions
CN115203177A (en
Inventor
王云
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiyue Network Technology Co ltd
Original Assignee
Beijing Zhiyue Network Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhiyue Network Technology Co ltd filed Critical Beijing Zhiyue Network Technology Co ltd
Priority to CN202211125471.6A priority Critical patent/CN115203177B/en
Publication of CN115203177A publication Critical patent/CN115203177A/en
Application granted granted Critical
Publication of CN115203177B publication Critical patent/CN115203177B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/214Database migration support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a distributed data storage system and a storage method, wherein the distributed data storage system comprises storage nodes, a processor and a memory are arranged in the storage nodes, and the storage nodes are mutually connected through a network; the monitoring module monitors and records the capacity occupancy rate and the resource utilization rate of each storage node; the calculation module is used for calculating the migration time period of each storage node based on the historical resource utilization rate aiming at each storage node; the evaluation module is used for screening the storage nodes according to the data provided by the calculation module and the capacity occupancy rate and determining the storage nodes which need to be migrated in and out; the migration module is used for migrating the storage data; the invention can automatically adjust the capacity occupancy rate and the resource utilization rate of each storage node by detecting the data storage amount of each storage node and carrying out data migration, thereby realizing the load balance of the storage system.

Description

Distributed data storage system and storage method
Technical Field
The present invention relates to the field of distributed data storage technologies, and in particular, to a distributed data storage system and a storage method.
Background
The distributed data storage system comprises a plurality of storage nodes and management nodes, and the storage nodes are responsible for storing, reading and writing files; the management node is responsible for distributing tasks to the data nodes for execution and meeting application requirements. Because data are distributed on different storage nodes, when the data of the storage nodes are more, the reading frequency of the storage nodes is correspondingly increased, and if the same storage node receives multiple access information at the same time, the reading speed of the data is liable to be reduced, so that in order to reduce the resource utilization rate of the storage nodes, the data on the storage nodes need to be migrated, however, the current data migration mainly depends on operation and maintenance personnel for operation, and the efficiency is lower.
Disclosure of Invention
In order to solve the above problems, the present invention provides a distributed data storage system and a storage method, so as to solve the problem that in the prior art, data migration in the distributed data storage system mainly depends on operation and maintenance personnel to perform operations, and efficiency is low.
In order to achieve the above object, the present invention adopts the following technical solution, and a distributed data storage method includes:
step S1: acquiring the capacity occupancy rate of each storage node, and defining the storage node of which the capacity occupancy rate exceeds a first threshold value as a high occupancy rate node, wherein the storage node of which the capacity occupancy rate is lower than a second threshold value is a low occupancy rate node, and the second threshold value is smaller than the first threshold value;
step S2: predicting idle time periods with the high-occupancy-rate node resource utilization rate lower than a preset resource utilization rate based on historical resource utilization rate data, if the idle time periods of a plurality of high-occupancy-rate nodes are located in the same time period, executing a step S3, and if not, executing a step S4;
and step S3: calculating the pressure value of each high-occupancy-rate node through a first formula, and selecting the high-occupancy-rate node with the largest pressure value for data migration, wherein the first formula is as follows:
Figure 55432DEST_PATH_IMAGE001
wherein, in the step (A),
Figure 927573DEST_PATH_IMAGE002
in order to be the capacity occupancy rate,
Figure 452095DEST_PATH_IMAGE003
for the read frequency of the high-occupancy node within 24 hours,
Figure 347370DEST_PATH_IMAGE004
the amount of data that needs to be migrated for the high-occupancy node,
Figure 22065DEST_PATH_IMAGE005
are respectively a weighting coefficient;
and step S4: determining the data volume to be migrated of the high-occupancy node;
step S5: screening the low-occupancy nodes with the residual storage capacity meeting a second formula, wherein the second formula is as follows:
Figure 748712DEST_PATH_IMAGE006
wherein, in the step (A),
Figure 709715DEST_PATH_IMAGE007
is the value of the second threshold value and is,
Figure 92286DEST_PATH_IMAGE008
the current amount of data stored for the low-occupancy node,
Figure 305093DEST_PATH_IMAGE009
the amount of data that needs to be migrated for the high-occupancy node,
Figure 89509DEST_PATH_IMAGE010
is the total capacity of the low-occupancy nodes;
step S6: selecting a storage node most suitable for the high-occupancy node from the low-occupancy nodes meeting the second formula, and transferring the storage data of the high-occupancy node to the low-occupancy nodes;
step S7: and repeating the step S2 to the step S6 until the high-occupancy-rate nodes do not exist in the storage system any more or all the low-occupancy-rate nodes are not suitable for migrating new storage data any more.
Further, in step S6, selecting a storage node most suitable for the high-occupancy node includes the following steps:
step S61: adding the historical resource utilization rates of the time points corresponding to the high-occupancy-rate node and the low-occupancy-rate node to obtain the predicted resource utilization rate of each time point after the low-occupancy-rate node is transferred and stored data
Figure 362359DEST_PATH_IMAGE011
Wherein the content of the first and second substances,
Figure 28963DEST_PATH_IMAGE012
and
Figure 373357DEST_PATH_IMAGE013
the resource utilization rates respectively represent the ith time point of the past jth day of the high-occupancy node and the low-occupancy node;
step S62: obtaining an average of the predicted resource utilization based on a third formula
Figure 277859DEST_PATH_IMAGE014
The third formula is:
Figure 252768DEST_PATH_IMAGE015
wherein m represents m days in total, and n represents n time points per day;
step S63: setting a resource utilization rate threshold, establishing a rectangular coordinate system by taking time as an X axis and the resource utilization rate as a Y axis, drawing the resource utilization rate threshold and the predicted resource utilization rate on the rectangular coordinate system, fitting each coordinate point of the predicted resource utilization rate based on a curve fitting method to obtain a curve function f (X), and calculating an area S which is formed by the curve function and the X axis and is larger than an area S formed by the resource utilization rate threshold and the X axis based on a fourth formula:
Figure 141090DEST_PATH_IMAGE016
,
Figure 289175DEST_PATH_IMAGE017
wherein, the first and the second end of the pipe are connected with each other,
Figure 48183DEST_PATH_IMAGE018
for the intersection of the curve function and the resource utilization threshold,
Figure 193994DEST_PATH_IMAGE019
for the purpose of the resource utilization threshold value,
Figure 897507DEST_PATH_IMAGE020
is to return
Figure 521387DEST_PATH_IMAGE021
And
Figure 400481DEST_PATH_IMAGE022
the parameter with the larger median value;
step S64: calculating collision scores of the low-occupancy nodes based on a fifth formula, wherein the low-occupancy node with the lowest collision score is the best-fit storage node, and the fifth formula is as follows:
Figure 779510DEST_PATH_IMAGE023
wherein
Figure 642424DEST_PATH_IMAGE024
Respectively, are weighting coefficients.
Further, before performing the step S61, the method further includes the following steps:
step S061: predicting the migration speed of the stored data based on the current network state, the size of the stored data, the hardware configuration of the storage node and the resource utilization rate of the storage node, and eliminating the low-occupancy node with the migration speed lower than the preset migration speed.
Further, after the step S61, the method further includes the following steps:
step S611: and if the predicted resource utilization rate exceeds the upper limit of the resource utilization rate of the low-occupancy node, rejecting the low-occupancy node.
Further, in the storage data migration process, if the resource utilization rates of the high-occupancy-rate node and the low-occupancy-rate node are greater than the preset resource utilization rate threshold, the migration rate of the storage data is reduced.
Further, when data migration is not performed, the resource utilization rates of the high-occupancy nodes and the low-occupancy nodes are obtained at intervals of a first time, and when data migration is performed, the resource utilization rates of the high-occupancy nodes and the low-occupancy nodes are obtained at intervals of a second time, wherein the second time is less than the first time.
Further, a migration value upper limit is set, and data migration of the storage data with the data volume larger than the migration value upper limit is prohibited.
Further, the curve fitting method is a least square method.
On the other hand, the invention also provides a distributed data storage system, which is used for realizing the distributed data storage method in the technical scheme, and comprises the following steps
The storage nodes are internally provided with a processor and a memory and are connected with each other through a network;
the monitoring module monitors and records the capacity occupancy rate and the resource utilization rate of each storage node;
the computing module is used for computing the idle time period of each storage node based on the historical resource utilization rate;
the evaluation module screens the storage nodes according to the data provided by the calculation module and the capacity occupancy rate, and determines the storage nodes which need to be subjected to data migration and migration;
a migration module for migrating the storage data
Compared with the prior art, the invention has the following beneficial effects:
1. the method comprises the steps that firstly, each storage node is divided based on capacity occupancy rate, and a high-occupancy-rate node and a low-occupancy-rate node are obtained, so that an object of a target needing data migration is obtained; then historical resource utilization rate data are obtained, so that the future resource utilization rate of the high-occupancy-rate node is predicted, and the influence on the data migration speed caused by the fact that data migration is carried out when the data reading of the data node is busy is avoided; the invention can automatically adjust the capacity occupancy rate and the resource utilization rate of each storage node by detecting the data storage amount of each storage node and carrying out data migration, thereby realizing the load balance of the storage system.
2. If a plurality of high-occupancy nodes exist for data migration, the migration sequence needs to be sorted, and during sorting, evaluation is performed according to the capacity occupancy rates, the reading frequency and the data volume needing to be migrated of the data nodes respectively, so that the high-occupancy nodes most needing to be subjected to data migration are obtained.
Drawings
FIG. 1 is a flow chart of a distributed data storage method of the present invention;
FIG. 2 is a schematic diagram of the predicted resource utilization of the low-occupancy node of the present invention;
fig. 3 is a curve fitting graph of the low-occupancy node predicted resource utilization of the present invention.
In the figure: 1. high occupancy rate nodes; 2. and (4) low-occupancy nodes.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
It will be understood that, as used herein, the terms "first," "second," and the like may be used herein to describe various elements, but these elements are not limited by these terms unless otherwise specified. These terms are only used to distinguish one element from another. For example, a first xx script may be referred to as a second xx script, and similarly, a second xx script may be referred to as a first xx script, without departing from the scope of the present application.
As shown in fig. 1, a distributed data storage method includes:
step S1: acquiring the capacity occupancy rate of each storage node, defining the storage nodes with the capacity occupancy rates exceeding a first threshold as high-occupancy rate nodes, defining the storage nodes with the capacity occupancy rates lower than a second threshold as low-occupancy rate nodes, and setting the second threshold smaller than the first threshold;
step S2: predicting idle time periods with the high-occupancy-rate node resource utilization rate lower than the preset resource utilization rate based on the historical resource utilization rate data, if the idle time periods of the plurality of high-occupancy-rate nodes are located in the same time period, executing the step S3, otherwise executing the step S4;
and step S3: calculating the pressure value of each high-occupancy-rate node through a first formula, selecting the high-occupancy-rate node with the maximum pressure value for data migration, and performing data migration on the high-occupancy-rate node with the maximum pressure valueThe formula is as follows:
Figure 273256DEST_PATH_IMAGE001
wherein, in the step (A),
Figure 538016DEST_PATH_IMAGE002
in order to be the capacity occupancy rate,
Figure 25629DEST_PATH_IMAGE003
for the read frequency of the high-occupancy node in 24 hours,
Figure 641418DEST_PATH_IMAGE004
the amount of data that needs to be migrated for a high-occupancy node,
Figure 75941DEST_PATH_IMAGE005
respectively are weighting coefficients;
and step S4: determining the data volume to be migrated of the high-occupancy-rate node;
step S5: screening the low-occupancy nodes with the residual storage capacity meeting a second formula, wherein the second formula is as follows:
Figure 523103DEST_PATH_IMAGE006
wherein, in the step (A),
Figure 916038DEST_PATH_IMAGE007
is the second threshold value, and is,
Figure 956807DEST_PATH_IMAGE008
the current amount of data stored for the low-occupancy node,
Figure 54076DEST_PATH_IMAGE009
the amount of data that needs to be migrated for a high-occupancy node,
Figure 559006DEST_PATH_IMAGE010
total capacity of low occupancy nodes;
step S6: selecting a storage node most suitable for the high-occupancy node from the low-occupancy nodes meeting the second formula, and transferring the storage data of the high-occupancy node to the low-occupancy node;
step S7: and repeating the step S2 to the step S6 until the high-occupancy-rate nodes do not exist in the storage system any more or all the low-occupancy-rate nodes are not suitable for migrating new storage data.
The method comprises the steps that firstly, each storage node is divided based on capacity occupancy rate, and a high-occupancy-rate node and a low-occupancy-rate node are obtained, so that an object of a target needing data migration is obtained; then historical resource utilization rate data are obtained, so that the future resource utilization rate of the high-occupancy-rate node is predicted, and the influence on the data migration speed caused by the fact that data migration is carried out when the data reading of the data node is busy is avoided; if a plurality of high-occupancy-rate nodes perform data migration, the migration sequence needs to be sorted, and if the system performs data migration of a plurality of storage nodes at the same time, the overload of a CPU is inevitably caused, so that the whole data storage system is stuck.
When sequencing is carried out, evaluation is carried out according to the capacity occupancy rate, the reading frequency and the data volume needing to be migrated of the data nodes respectively, for the capacity occupancy rate, as the data in the storage nodes are stored in the magnetic disk, in the actual use process, when the data in the magnetic disk tends to be saturated, the reading speed of the magnetic disk is reduced to some extent, for the reading frequency, the high-frequency reading represents that the data is frequently accessed, then the data is transferred, the resource utilization rate of the original data nodes can be obviously reduced, for the data volume, the storage data with the priority for transferring the data volume and larger data volume can be quickly read to reduce the capacity occupancy rate of the storage nodes; the step S5 can ensure that the low-occupancy-rate nodes are not changed into the high-occupancy-rate data nodes after the stored data are migrated; the invention can automatically adjust the capacity occupancy rate and the resource utilization rate of each storage node by detecting the data storage amount of each storage node and carrying out data migration, thereby realizing the load balance of the storage system.
In step S6, selecting a storage node most suitable for the high-occupancy node includes the following steps:
step S61: corresponding the high-occupancy-rate node and the low-occupancy-rate node to a time pointThe historical resource utilization rates are added to obtain the predicted resource utilization rate of each time point after the low-occupancy-rate node is transferred and stored data
Figure 60526DEST_PATH_IMAGE011
Wherein the content of the first and second substances,
Figure 978803DEST_PATH_IMAGE012
and
Figure 551867DEST_PATH_IMAGE013
respectively representing the resource utilization rates of the ith time point of the past jth day of the high-occupancy node and the low-occupancy node;
step S62: obtaining an average of predicted resource utilization based on a third formula
Figure 837269DEST_PATH_IMAGE014
The third formula is:
Figure 899903DEST_PATH_IMAGE015
wherein m represents m days in total, and n represents n time points per day;
step S63: setting a resource utilization rate threshold, establishing a rectangular coordinate system by taking time as an X axis and resource utilization rate as a Y axis, drawing the resource utilization rate threshold and the predicted resource utilization rate on a rectangular coordinate system, fitting each coordinate point of the predicted resource utilization rate based on a curve fitting method to obtain a curve function f (X), specifically, the curve fitting method is a least square method, calculating an area S which is formed by the curve function and the X axis and is larger than the area S formed by the resource utilization rate threshold and the X axis, and the fourth formula is as follows:
Figure 712001DEST_PATH_IMAGE025
,
Figure 26439DEST_PATH_IMAGE017
wherein the content of the first and second substances,
Figure 568278DEST_PATH_IMAGE018
as the intersection of the curve function and the resource utilization threshold,
Figure 739497DEST_PATH_IMAGE019
to be the threshold value of the resource utilization,
Figure 242153DEST_PATH_IMAGE020
is to return
Figure 219337DEST_PATH_IMAGE021
And
Figure 491049DEST_PATH_IMAGE022
the parameter with the larger median value;
step S64: calculating the adaptation score of each low-occupancy node based on a fifth formula, wherein the low-occupancy node with the highest adaptation score is the optimal adaptation storage node, and the fifth formula is as follows:
Figure 833169DEST_PATH_IMAGE023
wherein
Figure 682176DEST_PATH_IMAGE024
Respectively, are weighting coefficients.
As shown in fig. 2, firstly, in step S61, the acquired historical resource utilization rates of the time points corresponding to the high-occupancy node 1 and the low-occupancy node 2 are added, so as to obtain the predicted resource utilization rate of the low-occupancy node 2 in the corresponding time period after the storage data in the high-occupancy node 1 is transferred to the low-occupancy node 2; for example, in the present embodiment, historical resource utilization data of 24 time points each day in the last 3 days are obtained, and then the data of corresponding times are added to obtain the predicted resource utilization data shown in fig. 2.
Adding all the predicted resource utilization rate data and dividing by 72 to obtain the average predicted resource utilization rate of each time point through the step S62; the respective predicted resource utilization rates are plotted in the planar coordinate system through step S63,and obtaining a curve function closest to the trend of each coordinate point by using a curve fitting method, as shown in fig. 3, setting a resource utilization rate threshold value and drawing in a coordinate system to obtain an intersection point of the curve function and the resource utilization rate threshold value
Figure 338417DEST_PATH_IMAGE018
Obtaining the area enclosed by the curve function and the resource utilization rate threshold in each intersection point interval and the X axis through definite integral, then subtracting the area enclosed by the resource utilization rate threshold in each intersection point interval and the X axis, and finally obtaining the area enclosed by the curve function and the resource utilization rate threshold in each intersection point interval and the X axis through definite integral
Figure 526952DEST_PATH_IMAGE020
And when the area smaller than 0 in each intersection interval is removed, namely the area formed by the curve function and the X axis is smaller than the threshold of the resource utilization rate, finally the conflict score of each low-occupancy node 2 is calculated through the step S64, so that the low-occupancy node 2 which is optimally matched with the high-occupancy node 1 is obtained, namely the obtained average resource utilization rate is lowest, and the coincidence rate of the transferred storage data and the original storage data in the low-occupancy node 2 at the time point of the reading peak is lower, so that the stability of the whole storage system is improved.
Before step S61, the method further includes the following steps:
step S061: predicting the migration speed of the stored data based on the current network state, the size of the stored data, the hardware configuration of the storage node and the resource utilization rate of the storage node, and eliminating the low-occupancy-rate nodes with the migration speed lower than the preset migration speed.
By the steps, the problem that the migration speed is too low, so that the migration task occupies the resource utilization rate of the node for a long time and the running speed of the system is influenced due to the fact that the migration time of the data is too long can be avoided.
After step S61, the method further includes the following steps:
step S611: and if the predicted resource utilization rate exceeds the upper limit of the resource utilization rate of the low-occupancy node, rejecting the low-occupancy node.
And in the stored data migration process, if the resource utilization rate of the high-occupancy-rate node and the low-occupancy-rate node is greater than a preset resource utilization rate threshold, reducing the migration rate of the stored data.
The resource utilization rate of the nodes is reduced by reducing the migration rate of the stored data, so that the migration task is prevented from occupying too large resource utilization rate and excessively influencing the reading rate of other resources in the nodes.
And when data migration is not carried out, the resource utilization rates of the high-occupancy-rate nodes and the low-occupancy-rate nodes are obtained at intervals of a first time, and when data migration is carried out, the resource utilization rates of the high-occupancy-rate nodes and the low-occupancy-rate nodes are obtained at intervals of a second time, wherein the second time is less than the first time.
In the data migration process, the interval time for monitoring the resource utilization rate of the high-occupancy-rate node and the low-occupancy-rate node is shortened, so that whether the resource utilization rate is greater than a preset resource utilization rate threshold value or not can be monitored more accurately.
And setting a migration value upper limit, and prohibiting data migration of the storage data with the data volume larger than the migration value upper limit. In general, as the capacity occupied by the storage data increases, the resource utilization rate increases in proportion, and therefore, even if the storage data that is too large is migrated to another node, the storage data causes the node at the migration destination to easily have a large resource utilization rate, and the migration process involves a large migration cost, and therefore the storage data that is too large is not suitable for migration.
A distributed data storage system is used for realizing the distributed data storage method, and comprises storage nodes, a processor and a memory, wherein the storage nodes are internally provided with the processor and the memory and are mutually connected through a network; the monitoring module monitors and records the capacity occupancy rate and the resource utilization rate of each storage node; the computing module is used for computing the idle time period of each storage node based on the historical resource utilization rate; the evaluation module screens the storage nodes according to the data and the capacity occupancy rate provided by the calculation module, and determines the storage nodes needing data migration in and out; and the migration module is used for migrating the storage data.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a non-volatile computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the program is executed. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present invention, and the description thereof is more specific and detailed, but not construed as limiting the scope of the present invention. It should be noted that various changes and modifications can be made by those skilled in the art without departing from the spirit of the invention, and these changes and modifications are all within the scope of the invention. Therefore, the protection scope of the present patent shall be subject to the appended claims.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, which is intended to cover any modifications, equivalents, improvements, etc. within the spirit and scope of the present invention.

Claims (8)

1. A distributed data storage method, comprising:
step S1: acquiring the capacity occupancy rate of each storage node, defining the storage node with the capacity occupancy rate exceeding a first threshold as a high-occupancy rate node, defining the storage node with the capacity occupancy rate lower than a second threshold as a low-occupancy rate node, wherein the second threshold is smaller than the first threshold;
step S2: predicting idle time periods with the high-occupancy-rate node resource utilization rate lower than a preset resource utilization rate based on historical resource utilization rate data, if the idle time periods of a plurality of high-occupancy-rate nodes are located in the same time period, executing a step S3, and if not, executing a step S4;
and step S3: calculating the pressure value of each high-occupancy-rate node through a first formula, and selecting the high-occupancy-rate node with the largest pressure value for data migration, wherein the first formula is as follows:
Figure 930456DEST_PATH_IMAGE001
wherein, in the step (A),
Figure 683649DEST_PATH_IMAGE002
in order to be the capacity occupancy rate,
Figure 565017DEST_PATH_IMAGE003
for the read frequency of the high-occupancy node over the past 24 hours,
Figure 61857DEST_PATH_IMAGE004
the amount of data that needs to be migrated for the high-occupancy node,
Figure 712282DEST_PATH_IMAGE005
respectively are weighting coefficients;
and step S4: determining the data volume to be migrated of the high-occupancy node;
step S5: screening remaining storesThe low-occupancy node whose capacity satisfies a second formula, the second formula being:
Figure 370796DEST_PATH_IMAGE006
wherein, in the step (A),
Figure 739461DEST_PATH_IMAGE007
is the value of the second threshold value and is,
Figure 571150DEST_PATH_IMAGE008
the current amount of data stored for the low-occupancy node,
Figure 341660DEST_PATH_IMAGE009
the amount of data that needs to be migrated for the high-occupancy node,
Figure 171076DEST_PATH_IMAGE010
is the total capacity of the low-occupancy nodes;
step S6: selecting a storage node most suitable for the high-occupancy node from the low-occupancy nodes meeting the second formula, and transferring the storage data of the high-occupancy node to the low-occupancy nodes;
step S7: repeating the step S2 to the step S6 until the high-occupancy-rate nodes do not exist in the storage system any more or all the low-occupancy-rate nodes are not suitable for migrating new storage data any more;
in step S6, selecting a storage node most suitable for the high-occupancy node includes the following steps:
step S61: adding the historical resource utilization rates of the time points corresponding to the high-occupancy-rate node and the low-occupancy-rate node to obtain the predicted resource utilization rate of each time point after the low-occupancy-rate node is transferred to store data
Figure 27036DEST_PATH_IMAGE011
Wherein, the first and the second end of the pipe are connected with each other,
Figure 600100DEST_PATH_IMAGE012
and
Figure 959537DEST_PATH_IMAGE013
the resource utilization rates respectively represent ith time points of the past j days of the high-occupancy node and the low-occupancy node;
step S62: obtaining an average of the predicted resource utilization based on a third formula
Figure 225434DEST_PATH_IMAGE014
The third formula is:
Figure 303111DEST_PATH_IMAGE015
wherein m represents the past m days of total acquisition and n represents n time points of acquisition per day;
step S63: setting a resource utilization rate threshold, establishing a rectangular coordinate system by taking time as an X axis and the resource utilization rate as a Y axis, drawing the resource utilization rate threshold and the predicted resource utilization rate on the rectangular coordinate system, fitting each coordinate point of the predicted resource utilization rate based on a curve fitting method to obtain a curve function f (X), and calculating an area S which is formed by the curve function and the X axis and is larger than an area S formed by the resource utilization rate threshold and the X axis based on a fourth formula, wherein the fourth formula is as follows:
Figure 945445DEST_PATH_IMAGE016
wherein the content of the first and second substances,
Figure 221706DEST_PATH_IMAGE017
for the intersection of the curve function and the resource utilization threshold,
Figure 658503DEST_PATH_IMAGE018
for the purpose of the resource utilization threshold value,
Figure 223477DEST_PATH_IMAGE019
is to return
Figure 138343DEST_PATH_IMAGE020
And
Figure 472372DEST_PATH_IMAGE021
the parameter with the larger median value;
step S64: calculating collision scores of the low-occupancy nodes based on a fifth formula, wherein the low-occupancy node with the lowest collision score is the best-fit storage node, and the fifth formula is as follows:
Figure 91790DEST_PATH_IMAGE022
wherein
Figure 144060DEST_PATH_IMAGE023
Respectively, are weighting coefficients.
2. A distributed data storage method according to claim 1, further comprising, before performing step S61, the steps of:
step S061: predicting the migration speed of the stored data based on the current network state, the size of the stored data, the hardware configuration of the storage node and the resource utilization rate of the storage node, and eliminating the low-occupancy node with the migration speed lower than the preset migration speed.
3. The distributed data storage method according to claim 2, wherein after said step S61, further comprising the steps of:
step S611: and if the predicted resource utilization rate exceeds the upper limit of the resource utilization rate of the low-occupancy node, rejecting the low-occupancy node.
4. The distributed data storage method according to claim 1, wherein in a storage data migration process, if the resource utilization rates of the high-occupancy node and the low-occupancy node are greater than the preset resource utilization rate threshold, a migration rate of storage data is reduced.
5. The distributed data storage method according to claim 1, wherein the resource utilization rates of the high-occupancy nodes and the low-occupancy nodes are obtained at intervals of a first time when data migration is not performed, and the resource utilization rates of the high-occupancy nodes and the low-occupancy nodes are obtained at intervals of a second time when data migration is performed, the second time being less than the first time.
6. The distributed data storage method according to claim 1, wherein an upper migration value limit is set, and data migration of storage data with a data volume greater than the upper migration value limit is prohibited.
7. A distributed data storage method as claimed in claim 1, wherein said curve fitting method is a least squares method.
8. A distributed data storage system for implementing a distributed data storage method as claimed in any one of claims 1 to 7, comprising
The storage nodes are internally provided with a processor and a memory and are connected with each other through a network;
the monitoring module monitors and records the capacity occupancy rate and the resource utilization rate of each storage node;
the computing module is used for computing the idle time period of each storage node based on the historical resource utilization rate;
the evaluation module is used for screening the storage nodes according to the data provided by the calculation module and the capacity occupancy rate and determining the storage nodes needing data migration in and out;
and the migration module is used for migrating the storage data.
CN202211125471.6A 2022-09-16 2022-09-16 Distributed data storage system and storage method Active CN115203177B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211125471.6A CN115203177B (en) 2022-09-16 2022-09-16 Distributed data storage system and storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211125471.6A CN115203177B (en) 2022-09-16 2022-09-16 Distributed data storage system and storage method

Publications (2)

Publication Number Publication Date
CN115203177A CN115203177A (en) 2022-10-18
CN115203177B true CN115203177B (en) 2022-12-06

Family

ID=83571890

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211125471.6A Active CN115203177B (en) 2022-09-16 2022-09-16 Distributed data storage system and storage method

Country Status (1)

Country Link
CN (1) CN115203177B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453149B (en) * 2023-12-22 2024-04-09 柏科数据技术(深圳)股份有限公司 Data balancing method, device, terminal and storage medium of distributed storage system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105227645A (en) * 2015-09-15 2016-01-06 齐鲁工业大学 A kind of cloud data migration method
CN110377430A (en) * 2019-07-24 2019-10-25 中南民族大学 Data migration method, equipment, storage medium and device
WO2021073083A1 (en) * 2019-10-15 2021-04-22 南京莱斯网信技术研究院有限公司 Node load-based dynamic data partitioning system
WO2021180056A1 (en) * 2020-03-09 2021-09-16 华为技术有限公司 Method for resource migration, system and device
CN113821340A (en) * 2021-08-27 2021-12-21 济南浪潮数据技术有限公司 Dynamic balancing method, system, terminal and storage medium of distributed system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105227645A (en) * 2015-09-15 2016-01-06 齐鲁工业大学 A kind of cloud data migration method
CN110377430A (en) * 2019-07-24 2019-10-25 中南民族大学 Data migration method, equipment, storage medium and device
WO2021073083A1 (en) * 2019-10-15 2021-04-22 南京莱斯网信技术研究院有限公司 Node load-based dynamic data partitioning system
WO2021180056A1 (en) * 2020-03-09 2021-09-16 华为技术有限公司 Method for resource migration, system and device
CN113821340A (en) * 2021-08-27 2021-12-21 济南浪潮数据技术有限公司 Dynamic balancing method, system, terminal and storage medium of distributed system

Also Published As

Publication number Publication date
CN115203177A (en) 2022-10-18

Similar Documents

Publication Publication Date Title
US5481702A (en) Allocation optimization with different block-sized allocation maps
CN107656807B (en) Automatic elastic expansion method and device for virtual resources
CN112689007B (en) Resource allocation method, device, computer equipment and storage medium
CN115203177B (en) Distributed data storage system and storage method
CN107957848B (en) Deduplication processing method and storage device
CN109815004B (en) Request load control method, device, storage medium and computer equipment
CN111857597A (en) Hot spot data caching method, system and related device
CN105740077B (en) Task allocation method suitable for cloud computing
CN116346740A (en) Load balancing method and device
CN116627356B (en) Distribution control method and system for large-capacity storage data
CN114168318A (en) Training method of storage release model, storage release method and equipment
CN115951832A (en) Method and system for merging intelligent small files aiming at object storage
CN111190737A (en) Memory allocation method for embedded system
CN115994029A (en) Container resource scheduling method and device
CN116302383A (en) Distributed heterogeneous data acquisition method, system, computer equipment and storage medium
CN113918341A (en) Equipment scheduling method, device, equipment and storage medium
CN112559191B (en) Method and device for dynamically deploying GPU resources and computer equipment
CN114546652A (en) Parameter estimation method and device and electronic equipment
CN109828718B (en) Disk storage load balancing method and device
CN117519913B (en) Method and system for elastically telescoping scheduling of container memory resources
CN113741810B (en) Data migration method and device
CN116383290B (en) Data generalization and analysis method
CN114826951B (en) Service automatic degradation method, device, computer equipment and storage medium
CN112783440B (en) Data storage method and device for user node of block chain
CN112202860A (en) Container flow adjusting method, device and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant