CN112231137A - Rebalancing method and system for distributed storage data - Google Patents

Rebalancing method and system for distributed storage data Download PDF

Info

Publication number
CN112231137A
CN112231137A CN202011462529.7A CN202011462529A CN112231137A CN 112231137 A CN112231137 A CN 112231137A CN 202011462529 A CN202011462529 A CN 202011462529A CN 112231137 A CN112231137 A CN 112231137A
Authority
CN
China
Prior art keywords
current time
data
ceph cluster
rebalancing
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011462529.7A
Other languages
Chinese (zh)
Other versions
CN112231137B (en
Inventor
刘杰
史伟
闵宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Eflycloud Computing Co Ltd
Original Assignee
Guangdong Eflycloud Computing Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Eflycloud Computing Co Ltd filed Critical Guangdong Eflycloud Computing Co Ltd
Priority to CN202011462529.7A priority Critical patent/CN112231137B/en
Publication of CN112231137A publication Critical patent/CN112231137A/en
Application granted granted Critical
Publication of CN112231137B publication Critical patent/CN112231137B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0727Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a storage system, e.g. in a DASD or network based storage system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a rebalancing method and a system for distributed storage data, wherein the rebalancing method comprises the following steps: splitting historical IO data of the CEPH cluster into training data; training the training data to obtain a training result model; recording the current time point when the CEPH cluster fails or recording the current time point when the CEPH cluster expands; taking the current time point as the current time, and inputting the current time into the training result model; the training result model makes a decision on the current time, and is used for judging whether the current time is suitable for executing rebalancing on the storage data of the CEPH cluster. The method can automatically decide the time for rebalancing the storage data of the CEPH cluster through the training result model without manual intervention, greatly reduces the difficulty of distributed storage operation, and improves the efficiency of data rebalancing.

Description

Rebalancing method and system for distributed storage data
Technical Field
The invention relates to the technical field of distributed storage data, in particular to a rebalancing method and a rebalancing system for distributed storage data.
Background
CEPH is a widely used distributed storage engine, which has good scalability and fault tolerance, and when a certain storage unit (OSD) fails, the engine can automatically rebalance data affected by the failure into other storage units with good status.
The fast neural network (FANN) is a very popular artificial intelligence algorithm framework, and can generate a corresponding rule model through training of known data, so as to perform decision judgment on new unknown data.
Data rebalancing of distributed storage is a core big problem which troubles a distributed storage operator, if rebalancing is not performed for a long time, data can face the risk of secondary damage, and the data can be lost and cannot be retrieved, and if data rebalancing is performed immediately, the operation of a service is often greatly influenced, and the quality of service of storage cannot be guaranteed, so that the service is lost. In the face of the problem, an operator usually adopts stacking manpower to stare at the cluster and assists with manual experience judgment to make a decision whether to execute data rebalancing, so that the efficiency is low and the error rate is high.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a rebalancing method and a rebalancing system for distributed storage data, wherein historical IO data are trained to form a training result model, the time for rebalancing the storage data of a CEPH cluster can be decided by self through the training result model without manual intervention, the difficulty of distributed storage operation is greatly reduced, the efficiency of data rebalancing is improved, the service quality of distributed storage is further improved, and the fluctuation of the service quality is reduced.
In order to solve the technical problems, the invention provides the following technical scheme: a method of rebalancing distributed storage data, comprising the steps of:
step S1, splitting historical IO data of the CEPH cluster into training data;
step S2, training the training data to obtain a training result model;
step S3, when the CEPH cluster fails or needs to be expanded, recording the current time point when the CEPH cluster fails or recording the current time point when the CEPH cluster is expanded;
step S4, taking the current time point as the current time and inputting the current time into the training result model;
step S5, the training result model makes a decision on the current time, and is used for judging whether the current time is suitable for executing rebalancing on the storage data of the CEPH cluster; if the current time is suitable for executing rebalancing on the storage data of the CEPH cluster, executing rebalancing on the storage data of the CEPH cluster; if the current time is not suitable for rebalancing the storage data of the CEPH cluster, rebalancing the storage data of the CEPH cluster does not need to be performed at the current time.
Further, the step S1 is preceded by the step S0 of obtaining historical IO data of the CEPH cluster.
Further, in step S2, training data is trained in a FANN manner.
Further, the training result model in step S5 makes a decision on the current time, which specifically includes:
the training result model pre-judges the total IO times, the total IO data volume and the peak data BPS which may occur in a plurality of hours in the future of the CEPH cluster at the current time, and then constructs an equation: setting the total IO frequency as x, the total IO data quantity as y, the peak data BPS as z, and the time length of several hours in the future as t, the constructed equation is:
Figure DEST_PATH_IMAGE001
wherein a, b and c in the equation are fixed values selected in the practical application process;
when the calculation result of the equation
Figure 829612DEST_PATH_IMAGE002
When the current time is less than a certain threshold value, judging that the current time is suitable for executing rebalancing on the storage data of the CEPH cluster;
when the calculation result of the equation
Figure DEST_PATH_IMAGE003
When the value is greater than or equal to a certain threshold value,it is determined that the current opportunity is not suitable for rebalancing the stored data of the CEPH cluster.
Further, the future hours, the evaluation method of the duration t of the future hours is as follows:
the method comprises the following steps of evaluating the total storage capacity of a CEPH cluster by one percentage in terms of network transmission bandwidth, specifically: let the total storage capacity of the CEPH cluster be x1The maximum value of the network transmission bandwidth of the CEPH cluster is y1Then, the evaluation equation for the duration t is: t = dx1/fy1D and f are both fixed values configured in an actual deployment scene;
or according to the average use capacity of cluster hard disks in the CEPH cluster, the network transmission bandwidth is reduced for evaluation, and the method specifically comprises the following steps: let the average usage capacity of cluster hard disks in CEPH cluster be x2The maximum value of the network transmission bandwidth of the CEPH cluster is y2Then, the evaluation equation for the duration t is: t = gx2/hy2And g and h are both fixed values configured in an actual deployment scene.
Further, the step S5 further includes: if the current time is not suitable for rebalancing the stored data of the CEPH cluster, waiting for a period of time, setting the time point after the period of time as the current time point, and returning to step S4.
The invention also aims to provide a rebalance system for distributed storage data, which comprises a data acquisition module, a data training module, a time recording module, a training result model module and a rebalance module;
the data acquisition module is used for acquiring historical IO data of the CEPH cluster and splitting the historical IO data of the CEPH cluster into training data;
the data training module is used for training data, obtaining a training result model and placing the training result model in the training result model module;
the time recording module is used for: when the CEPH cluster fails or needs to be expanded, recording the current time point when the CEPH cluster fails or recording the current time point when the CEPH cluster is expanded, taking the current time point as the current time and inputting the current time into the training result model module;
the training result model module is used for making a decision on the current time, deciding whether the current time is suitable for executing rebalancing on the stored data of the CEPH cluster, and sending a decision result to the rebalancing module;
the rebalancing module is configured to: according to the decision result of the training result model module, if the current time is suitable for executing rebalancing on the storage data of the CEPH cluster, the rebalancing module executes rebalancing on the storage data of the CEPH cluster; if the current time is not suitable for rebalancing the storage data of the CEPH cluster, the rebalancing module does not need to rebalance the storage data of the CEPH cluster at the current time.
Further, the training result model module is used for making a decision on the current time, and specifically includes:
the training result model module pre-judges the total IO times, the total IO data volume and the peak data BPS which may occur in a plurality of hours in the future of the CEPH cluster at the current time, and then constructs an equation: setting the total IO frequency as x, the total IO data quantity as y, the peak data BPS as z, and the time length of several hours in the future as t, the constructed equation is:
Figure 317312DEST_PATH_IMAGE001
wherein a, b and c in the equation are fixed values selected in the practical application process;
when the calculation result of the equation
Figure 200954DEST_PATH_IMAGE003
When the current time is less than a certain threshold value, the training result model module judges that the current time is suitable for executing rebalancing on the stored data of the CEPH cluster;
when the calculation result of the equation
Figure 674661DEST_PATH_IMAGE003
Is greater than or equal to oneAnd when the threshold value is determined, the training result model module judges that the current time is not suitable for executing rebalancing on the stored data of the CEPH cluster.
Further, the training result model module comprises a duration evaluation unit;
the duration evaluation unit is used for evaluating the specific duration t of a plurality of hours in the future adopted by the training result model module in the decision process of the current opportunity, and the specific evaluation method of the duration t is as follows:
the method comprises the following steps of evaluating the total storage capacity of a CEPH cluster by one percentage in terms of network transmission bandwidth, specifically: let the total storage capacity of the CEPH cluster be x1The maximum value of the network transmission bandwidth of the CEPH cluster is y1Then, the evaluation equation for the duration t is: t = dx1/fy1D and f are both fixed values configured in an actual deployment scene;
or according to the average use capacity of cluster hard disks in the CEPH cluster, the network transmission bandwidth is reduced for evaluation, and the method specifically comprises the following steps: let the average usage capacity of cluster hard disks in CEPH cluster be x2The maximum value of the network transmission bandwidth of the CEPH cluster is y2Then, the evaluation equation for the duration t is: t = gx2/hy2And g and h are both fixed values configured in an actual deployment scene.
Further, the time recording module is further configured to: and if the current time is not suitable for rebalancing the stored data of the CEPH cluster, the time recording module waits for a period of time, sets the time point after the period of time as the current time, and inputs the current time into the training result model module.
After the technical scheme is adopted, the invention at least has the following beneficial effects: according to the method, historical IO data are trained in an FANN (fast neural network) mode, a training result model is formed, and whether rebalancing operation is suitable for storage data of a CEPH cluster at the current time can be judged by only carrying out decision judgment on new unknown data at the later stage.
Drawings
FIG. 1 is a flowchart illustrating steps of a method for rebalancing distributed storage data according to the present invention.
FIG. 2 is a block diagram of a distributed data storage rebalancing system according to the present invention.
Detailed Description
It should be noted that, in the present application, the embodiments and features of the embodiments may be combined with each other without conflict, and the present application is further described in detail with reference to the drawings and specific embodiments.
Example 1
As shown in fig. 1, the present embodiment provides a rebalancing method for distributed storage data, which includes the following specific steps:
s0, acquiring historical IO data of the CEPH cluster;
forming a historical data set by collecting related information of a CEPH cluster, wherein the data set comprises historical IO data, and the historical IO data comprises total IO times, total IO data amount and peak data BPS (bytes pre sec);
step S1, splitting historical IO data of the CEPH cluster into training data;
step S2, training the training data to obtain a training result model; preferably, training data is trained in a FANN mode, a fast neural network (FANN) is a very popular artificial intelligence algorithm framework, and a corresponding rule model can be generated through training of known data, so that decision judgment is carried out on new unknown data;
step S3, when the CEPH cluster fails or needs to be expanded, recording the current time point when the CEPH cluster fails or recording the current time point when the CEPH cluster is expanded;
step S4, taking the current time point as the current time and inputting the current time into the training result model;
specifically, the method comprises the following steps: in an actual trunking operation situation, data rebalancing generally lasts for several hours, if rebalancing time is prolonged to exceed one week, a probability of new data risk is high, and if the rebalancing time is prolonged to be less than one week, the characteristics of IO cannot completely reflect periodicity (periodicity in units of days is often high in fluctuation, and matching of periods is poor), so that a method with higher universality is adopted by taking a week as a main period and taking a day as an auxiliary period, and thus, for a composition of "opportunity", the method can be split into two characteristics: the location of the week and the location of the day, for example: (1, 3) the characteristic vector of the time represents that the time is 3 points in the morning of Monday;
step S5, the training result model makes a decision on the current time, and is used for judging whether the current time is suitable for executing rebalancing on the storage data of the CEPH cluster; if the current time is suitable for executing rebalancing on the storage data of the CEPH cluster, executing rebalancing on the storage data of the CEPH cluster; if the current time is not suitable for rebalancing the storage data of the CEPH cluster, rebalancing the storage data of the CEPH cluster is not required to be performed at the current time; the embodiment can automatically decide the time of rebalancing without manual intervention, greatly reduces the difficulty of distributed storage operation, improves the efficiency and further reduces the fluctuation of the service quality of distributed storage;
the training result model in step S5 makes a decision on the current time, which specifically includes:
the training result model pre-judges the total IO times, the total IO data volume and the peak data BPS which may occur in a plurality of hours in the future of the CEPH cluster at the current time, and then constructs an equation: assuming that the total IO frequency is x, the total IO data amount is y, the unit of y is MB, the peak data BPS is z, the unit of z is MB, the time length of several hours in the future is t, and the unit of t is second, the constructed equation is:
Figure 643754DEST_PATH_IMAGE001
wherein a, b and c in the equation are allSelecting a fixed value in the practical application process;
when the calculation result of the equation
Figure 329950DEST_PATH_IMAGE003
When the current time is less than a certain threshold value, judging that the current time is suitable for executing rebalancing on the storage data of the CEPH cluster;
when the calculation result of the equation
Figure 333678DEST_PATH_IMAGE003
When the current time is greater than or equal to a certain threshold value, judging that the current time is not suitable for executing rebalancing on the storage data of the CEPH cluster;
the above equation actually balances whether data rebalancing is properly performed in the period of time by balancing how often the data IO (total IO storage times), the total IO data size, and the ratio of the peak data BPS to the total IO data size in the next several hours, and a, b, and c in the equation can be reasonably selected in the practical application process, and preferably, a set of references is given here: a =1, b =10, c = 10; in addition, in practical situations, it is preferable to set a certain threshold to 20, that is, when weight is less than 20, it can be determined that the current time is suitable for performing data re-equalization, otherwise, it is not suitable for performing data re-equalization; under the group of data, whether the current opportunity is suitable for executing data rebalancing can be judged through the 3 output structures (total IO storage times, total IO data quantity and peak data BPS);
in addition, the time length t of the hours in the future needs to be calculated according to the data volume of ceph weight balance, and the time length t corresponds to about 6-10 hours/TB of a mechanical hard disk, and about 1-2 hours/TB of a solid state disk and a 10Gbps network; the specific time length t is evaluated by the following method:
the method comprises the following steps of evaluating the total storage capacity of a CEPH cluster by one percentage in terms of network transmission bandwidth, specifically: let the total storage capacity of the CEPH cluster be x1And x1The unit of (1) is MB, and the maximum value of the network transmission bandwidth of the CEPH cluster is y1And y is1Is MB/s, the evaluation equation for the duration t is: t = dx1/fy1D and f are both fixed values configured in an actual deployment scene; preferably, a reference value is given here: d =0.3, f = 0.5;
alternatively, the evaluation method of the duration t may also be:
according to the average use capacity of cluster hard disks in a CEPH cluster, the evaluation is carried out by reducing the network transmission bandwidth, and the method specifically comprises the following steps: let the average usage capacity of cluster hard disks in CEPH cluster be x2And x2The unit of (1) is MB, and the maximum value of the network transmission bandwidth of the CEPH cluster is y2And y is2Is MB/s, the evaluation equation for the duration t is: t = gx2/hy2G and h are both fixed values configured in an actual deployment scene; preferably, a reference value is given here as: g =1, h = 0.5;
the step S5 further includes: if the current time is not suitable for rebalancing the stored data of the CEPH cluster, waiting for a period of time, setting the time point after the period of time as the current time point, and returning to the step S4; preferably, the period of time is set to one hour.
Example 2
As shown in fig. 2, the present embodiment discloses a rebalancing system for distributed storage data based on the embodiment method, which can implement the rebalancing method for distributed storage data in embodiment 1, and the rebalancing system for distributed storage data includes a data acquisition module, a data training module, a time recording module, a training result model module, and a rebalancing module;
the data acquisition module is used for acquiring historical IO data of the CEPH cluster and splitting the historical IO data of the CEPH cluster into training data;
the data training module is used for training data, obtaining a training result model and placing the training result model in the training result model module;
the time recording module is used for: when the CEPH cluster fails or needs to be expanded, recording the current time point when the CEPH cluster fails or recording the current time point when the CEPH cluster is expanded, taking the current time point as the current time and inputting the current time into the training result model module;
the training result model module is used for making a decision on the current time, deciding whether the current time is suitable for executing rebalancing on the stored data of the CEPH cluster, and sending a decision result to the rebalancing module;
the rebalancing module is configured to: according to the decision result of the training result model module, if the current time is suitable for executing rebalancing on the storage data of the CEPH cluster, the rebalancing module executes rebalancing on the storage data of the CEPH cluster; if the current time is not suitable for rebalancing the storage data of the CEPH cluster, the rebalancing module does not need to rebalance the storage data of the CEPH cluster at the current time.
Further, the training result model module is used for making a decision on the current time, and specifically includes:
the training result model module pre-judges the total IO times, the total IO data volume and the peak data BPS (bytes pre sec) which may appear in the CEPH cluster in a plurality of hours in the future at the current time, and then constructs an equation: assuming that the total IO frequency is x, the total IO data amount is y, the unit of y is MB, the peak data BPS is z, the unit of z is MB, the time length of several hours is t, and the unit of t is second, the constructed equation is:
Figure 978286DEST_PATH_IMAGE001
wherein a, b and c in the equation are fixed values selected in the practical application process;
when the calculation result of the equation
Figure 700254DEST_PATH_IMAGE003
When the current time is less than a certain threshold value, the training result model module judges that the current time is suitable for executing rebalancing on the stored data of the CEPH cluster;
when the calculation result of the equation
Figure 455721DEST_PATH_IMAGE003
And when the current time is greater than or equal to a certain threshold value, the training result model module judges that the current time is not suitable for executing rebalancing on the stored data of the CEPH cluster.
Further, the training result model module comprises a duration evaluation unit;
the duration evaluation unit is used for evaluating the specific duration t of a plurality of hours in the future adopted by the training result model module in the decision process of the current opportunity, and the specific evaluation method of the duration t is as follows:
the method comprises the following steps of evaluating the total storage capacity of a CEPH cluster by one percentage in terms of network transmission bandwidth, specifically: let the total storage capacity of the CEPH cluster be x1And x1The unit of (1) is MB, and the maximum value of the network transmission bandwidth of the CEPH cluster is y1And y is1Is MB/s, the evaluation equation for the duration t is: t = dx1/fy1D and f are both fixed values configured in an actual deployment scene;
or according to the average use capacity of cluster hard disks in the CEPH cluster, the network transmission bandwidth is reduced for evaluation, and the method specifically comprises the following steps: let the average usage capacity of cluster hard disks in CEPH cluster be x2And x2The unit of (1) is MB, and the maximum value of the network transmission bandwidth of the CEPH cluster is y2And y is2Is MB/s, the evaluation equation for the duration t is: t = gx2/hy2And g and h are both fixed values configured in an actual deployment scene.
Further, the time recording module is further configured to: and if the current time is not suitable for rebalancing the stored data of the CEPH cluster, the time recording module waits for a period of time, sets the time point after the period of time as the current time, and inputs the current time into the training result model module.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that various equivalent changes, modifications, substitutions and alterations can be made herein without departing from the principles and spirit of the invention, the scope of which is defined by the appended claims and their equivalents.

Claims (10)

1. A rebalancing method for distributed storage data, comprising the steps of:
step S1, splitting historical IO data of the CEPH cluster into training data;
step S2, training the training data to obtain a training result model;
step S3, when the CEPH cluster fails or needs to be expanded, recording the current time point when the CEPH cluster fails or recording the current time point when the CEPH cluster is expanded;
step S4, taking the current time point as the current time and inputting the current time into the training result model;
step S5, the training result model makes a decision on the current time, and is used for judging whether the current time is suitable for executing rebalancing on the storage data of the CEPH cluster; if the current time is suitable for executing rebalancing on the storage data of the CEPH cluster, executing rebalancing on the storage data of the CEPH cluster; if the current time is not suitable for rebalancing the storage data of the CEPH cluster, rebalancing the storage data of the CEPH cluster does not need to be performed at the current time.
2. The method of claim 1, wherein the step S1 is preceded by the step S0 of obtaining historical IO data of a CEPH cluster.
3. The method according to claim 1, wherein in step S2, the training data is trained in a FANN manner.
4. The method according to claim 1, wherein the training result model in step S5 makes a decision on the current time, which specifically includes:
the training result model pre-judges the total IO times, the total IO data volume and the peak data BPS which may occur in a plurality of hours in the future of the CEPH cluster at the current time, and then constructs an equation: setting the total IO frequency as x, the total IO data quantity as y, the peak data BPS as z, and the time length of several hours in the future as t, the constructed equation is:
Figure 90959DEST_PATH_IMAGE001
wherein a, b and c in the equation are fixed values selected in the practical application process;
when the calculation result of the equation
Figure 957284DEST_PATH_IMAGE002
When the current time is less than a certain threshold value, judging that the current time is suitable for executing rebalancing on the storage data of the CEPH cluster;
when the calculation result of the equation
Figure 951784DEST_PATH_IMAGE002
And when the current time is larger than or equal to a certain threshold value, judging that the current time is not suitable for executing rebalancing on the storage data of the CEPH cluster.
5. The method of claim 4, wherein the time duration t of the next several hours is estimated by:
the method comprises the following steps of evaluating the total storage capacity of a CEPH cluster by one percentage in terms of network transmission bandwidth, specifically: let the total storage capacity of the CEPH cluster be x1The maximum value of the network transmission bandwidth of the CEPH cluster is y1Then, the evaluation equation for the duration t is: t = dx1/fy1D and f are both fixed values configured in an actual deployment scene;
or according to the average use capacity of cluster hard disks in the CEPH cluster, the network transmission bandwidth is reduced for evaluation, and the method specifically comprises the following steps: is provided withThe average using capacity of the cluster hard disk in the CEPH cluster is x2The maximum value of the network transmission bandwidth of the CEPH cluster is y2Then, the evaluation equation for the duration t is: t = gx2/hy2And g and h are both fixed values configured in an actual deployment scene.
6. The method for rebalancing distributed storage data according to claim 4 or 5, wherein said step S5 further comprises: if the current time is not suitable for rebalancing the stored data of the CEPH cluster, waiting for a period of time, setting the time point after the period of time as the current time point, and returning to step S4.
7. A distributed data storage rebalance system is characterized by comprising a data acquisition module, a data training module, a time recording module, a training result model module and a rebalance module;
the data acquisition module is used for acquiring historical IO data of the CEPH cluster and splitting the historical IO data of the CEPH cluster into training data;
the data training module is used for training data, obtaining a training result model and placing the training result model in the training result model module;
the time recording module is used for: when the CEPH cluster fails or needs to be expanded, recording the current time point when the CEPH cluster fails or recording the current time point when the CEPH cluster is expanded, taking the current time point as the current time and inputting the current time into the training result model module;
the training result model module is used for making a decision on the current time, deciding whether the current time is suitable for executing rebalancing on the stored data of the CEPH cluster, and sending a decision result to the rebalancing module;
the rebalancing module is configured to: according to the decision result of the training result model module, if the current time is suitable for executing rebalancing on the storage data of the CEPH cluster, the rebalancing module executes rebalancing on the storage data of the CEPH cluster; if the current time is not suitable for rebalancing the storage data of the CEPH cluster, the rebalancing module does not need to rebalance the storage data of the CEPH cluster at the current time.
8. The rebalancing system for distributed storage of data according to claim 7, wherein the training result model module is configured to make a decision on a current time, and specifically comprises:
the training result model module pre-judges the total IO times, the total IO data volume and the peak data BPS which may occur in a plurality of hours in the future of the CEPH cluster at the current time, and then constructs an equation: setting the total IO frequency as x, the total IO data quantity as y, the peak data BPS as z, and the time length of several hours in the future as t, the constructed equation is:
Figure 561757DEST_PATH_IMAGE001
wherein a, b and c in the equation are fixed values selected in the practical application process;
when the calculation result of the equation
Figure 590893DEST_PATH_IMAGE003
When the current time is less than a certain threshold value, the training result model module judges that the current time is suitable for executing rebalancing on the stored data of the CEPH cluster;
when the calculation result of the equation
Figure 628119DEST_PATH_IMAGE003
And when the current time is greater than or equal to a certain threshold value, the training result model module judges that the current time is not suitable for executing rebalancing on the stored data of the CEPH cluster.
9. The system of claim 8, wherein the training result model module comprises a duration evaluation unit;
the duration evaluation unit is used for evaluating the specific duration t of a plurality of hours in the future adopted by the training result model module in the decision process of the current opportunity, and the specific evaluation method of the duration t is as follows:
the method comprises the following steps of evaluating the total storage capacity of a CEPH cluster by one percentage in terms of network transmission bandwidth, specifically: let the total storage capacity of the CEPH cluster be x1The maximum value of the network transmission bandwidth of the CEPH cluster is y1Then, the evaluation equation for the duration t is: t = dx1/fy1D and f are both fixed values configured in an actual deployment scene;
or according to the average use capacity of cluster hard disks in the CEPH cluster, the network transmission bandwidth is reduced for evaluation, and the method specifically comprises the following steps: let the average usage capacity of cluster hard disks in CEPH cluster be x2The maximum value of the network transmission bandwidth of the CEPH cluster is y2Then, the evaluation equation for the duration t is: t = gx2/hy2And g and h are both fixed values configured in an actual deployment scene.
10. The system of claim 9, wherein the time logging module is further configured to: and if the current time is not suitable for rebalancing the stored data of the CEPH cluster, the time recording module waits for a period of time, sets the time point after the period of time as the current time, and inputs the current time into the training result model module.
CN202011462529.7A 2020-12-14 2020-12-14 Rebalancing method and system for distributed storage data Active CN112231137B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011462529.7A CN112231137B (en) 2020-12-14 2020-12-14 Rebalancing method and system for distributed storage data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011462529.7A CN112231137B (en) 2020-12-14 2020-12-14 Rebalancing method and system for distributed storage data

Publications (2)

Publication Number Publication Date
CN112231137A true CN112231137A (en) 2021-01-15
CN112231137B CN112231137B (en) 2021-03-30

Family

ID=74124511

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011462529.7A Active CN112231137B (en) 2020-12-14 2020-12-14 Rebalancing method and system for distributed storage data

Country Status (1)

Country Link
CN (1) CN112231137B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281506A (en) * 2014-07-10 2015-01-14 中国科学院计算技术研究所 Data maintenance method and system for file system
CN110389940A (en) * 2019-07-19 2019-10-29 苏州浪潮智能科技有限公司 A kind of data balancing method, device and computer readable storage medium
CN110417677A (en) * 2019-07-29 2019-11-05 北京易捷思达科技发展有限公司 A kind of QoS control method based on Ceph distributed storage Osd end data Recovery
US20200183590A1 (en) * 2017-04-12 2020-06-11 Barcelona Supercomputing Center - Centro Nacional De Supercomputación Distributed data structures for sliding window aggregation or similar applications
CN111397902A (en) * 2020-03-22 2020-07-10 华南理工大学 Rolling bearing fault diagnosis method based on feature alignment convolutional neural network
CN111736772A (en) * 2020-06-15 2020-10-02 中国工商银行股份有限公司 Storage space data processing method and device of distributed file system
CN111880747A (en) * 2020-08-01 2020-11-03 广西大学 Automatic balanced storage method of Ceph storage system based on hierarchical mapping
CN111917823A (en) * 2020-06-17 2020-11-10 烽火通信科技股份有限公司 Data reconstruction method and device based on distributed storage Ceph

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281506A (en) * 2014-07-10 2015-01-14 中国科学院计算技术研究所 Data maintenance method and system for file system
US20200183590A1 (en) * 2017-04-12 2020-06-11 Barcelona Supercomputing Center - Centro Nacional De Supercomputación Distributed data structures for sliding window aggregation or similar applications
CN110389940A (en) * 2019-07-19 2019-10-29 苏州浪潮智能科技有限公司 A kind of data balancing method, device and computer readable storage medium
CN110417677A (en) * 2019-07-29 2019-11-05 北京易捷思达科技发展有限公司 A kind of QoS control method based on Ceph distributed storage Osd end data Recovery
CN111397902A (en) * 2020-03-22 2020-07-10 华南理工大学 Rolling bearing fault diagnosis method based on feature alignment convolutional neural network
CN111736772A (en) * 2020-06-15 2020-10-02 中国工商银行股份有限公司 Storage space data processing method and device of distributed file system
CN111917823A (en) * 2020-06-17 2020-11-10 烽火通信科技股份有限公司 Data reconstruction method and device based on distributed storage Ceph
CN111880747A (en) * 2020-08-01 2020-11-03 广西大学 Automatic balanced storage method of Ceph storage system based on hierarchical mapping

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
贺昱洁: "负载均衡的大数据分布存储方法研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Also Published As

Publication number Publication date
CN112231137B (en) 2021-03-30

Similar Documents

Publication Publication Date Title
CN112202672B (en) Network route forwarding method and system based on service quality requirement
US8866443B2 (en) Lead acid storage battery and lead acid storage battery system for natural energy utilization system
CN111191918A (en) Service route planning method and device for smart power grid communication network
CN113438315B (en) Internet of things information freshness optimization method based on double-network deep reinforcement learning
CN101651709A (en) Method for calibrating integrity of P2P download files
CN111242171A (en) Model training, diagnosis and prediction method and device for network fault and electronic equipment
CN113469425B (en) Deep traffic jam prediction method
CN114465945B (en) SDN-based identification analysis network construction method
CN108023759A (en) Adaptive resource regulating method and device
CN112073535A (en) Bitmap-based data packet fragment transmission method
CN112231137B (en) Rebalancing method and system for distributed storage data
CN115190027B (en) Natural fault survivability evaluation method based on network digital twin
CN115189908B (en) Random attack survivability evaluation method based on network digital twin
CN111628932A (en) Electric power path optimization exploration method based on ant colony algorithm
CN114925313A (en) Self-adaptive method and system based on distributed link tracking dynamic sampling rate
CN103096380A (en) Wireless access point load balancing load balancing
CN107316056B (en) Automatic evaluation system and automatic evaluation method for network security level
CN117251276B (en) Flexible scheduling method and device for collaborative learning platform
CN110166368A (en) A kind of cloud storage network bandwidth control system and method
CN111382196B (en) Distributed accounting processing method and system
CN108664580A (en) Fine-grained load-balancing method and system in a kind of MongoDB databases
CN114881229B (en) Personalized collaborative learning method and device based on parameter gradual freezing
CN112394885B (en) Travel data storage system
CN112541594A (en) Method for constructing machine learning model allowing random missing of partial data sources
CN117149408A (en) Server cluster management method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant