CN113704220A

CN113704220A - Ceph parameter tuning method based on LSTM and genetic algorithm

Info

Publication number: CN113704220A
Application number: CN202111021786.1A
Authority: CN
Inventors: 李雷孝; 牛铁铭; 李�杰; 李少旭; 林浩; 马志强; 万剑雄
Original assignee: Inner Mongolia University of Technology
Current assignee: Inner Mongolia University of Technology
Priority date: 2021-09-01
Filing date: 2021-09-01
Publication date: 2021-11-26

Abstract

The invention belongs to the technical field of parameter tuning, and particularly relates to a Ceph parameter tuning method based on LSTM and a genetic algorithm, which comprises the following steps: collecting a data set; proving a non-linear relationship; s3, constructing a performance prediction model by using LSTM; the optimization was performed using EGA. The method for collecting the data set comprises the following steps: randomly dereferencing 8 configuration parameters of Ceph in an adjustable range, and setting the ith parameter conf_iHas a value range of [ lb_i，ub_i]，conf_i＝random(lb_i，ub_i) I is 1,2, …, 8; combining parameters config ═ conf₁，conf₂，conf₃，conf₄，conf₅，conf₆，conf₇，conf₈Updating the data into a Ceph system, and testing the read-write performance of a corresponding Ceph block storage system; parameter combination config_iAnd corresponding iops_iComposing a data item (config)_i，iops_i) And taking all collected data items as a data set for constructing a Ceph performance prediction model. According to the invention, an accurate and reliable Ceph performance prediction model is constructed by using LSTM, the predicted value of the performance prediction model is used as the fitness of population individuals, and the optimal parameter configuration is found through EGA, so that the system performance is optimal.

Description

Ceph parameter tuning method based on LSTM and genetic algorithm

Technical Field

The invention belongs to the technical field of parameter tuning, and particularly relates to a Ceph parameter tuning method based on LSTM and a genetic algorithm.

Background

The performance optimization work of domestic and foreign research scholars on the Ceph system is mainly divided into three aspects: specific hardware environment optimization, application-oriented scenario optimization, and internal mechanism optimization. With the advent of NVDIMM (Non-Volatile Dual In-line Memory Module) products, byte-addressable Non-Volatile Memory will provide IO performance similar to Memory In terms of specific hardware environment optimization. Simulating the performance of using NVDIMMs as the underlying media in a Ceph system, the throughput can be improved by more than 100% for a single node, mapping all content to NVDIMMs.

In application-oriented scene optimization, Ceph is not the most suitable storage system in the field of high-performance computing. Files accessed by data intensive applications in high-performance computing are classified as read intensive, write intensive or read-write intensive. The read-write characteristics of these files are used to set file placement decisions, balancing the high performance computing workload. In the field of cloud computing, data logs of data objects are operated, write atomicity and reliability are kept, and experimental results show that the capacity provided by a new storage engine is more than 3 times of the original capacity.

There has also been some progress in the research of performance optimization problems in terms of the internal mechanisms of Ceph storage systems. In the existing dividing and controlling strategy based on MapReduce, the optimal data placement strategy of Ceph in a heterogeneous environment is solved by using a mixed integer linear programming algorithm. The experimental results show that compared with the original strategy realized in Ceph, the algorithm can improve the read-write performance of the system by 25.6%. The existing multi-attribute decision-making Ceph Storage selection method collects IO performance of OSD (object Storage device) and effectively combines the IO performance, and distinguishes different application scenes by marking application priority, so that the overall read-write performance is improved by 13.7%. In the prior art, parameters including a kernel, a file system, a disk cache, an RBD and the like which need to be adjusted in a full flash memory environment are described in detail, but performance comparison before and after adjustment is not given. The existing black box optimization technology applied to the storage system selects the next modified parameter configuration according to the last information, but the method needs a large amount of data sets for support and is difficult to implement in a practical environment. The existing automatic adjustment method for the Ceph configuration parameters based on random forest (Radio Frequency, RF) and Genetic Algorithm (GA) uses RF to construct a performance prediction model, and compared with a black box optimization technology, the method can predict the Ceph system performance more quickly, and saves a large amount of time and occupation of system resources. However, the amount of data used in this document is too small, RF may not produce good regression results, and RF is not able to make predictions beyond the training set data range, which may result in overfitting when modeling data of certain specific noises.

Although the specific hardware environment optimization and application-oriented scene optimization methods have a certain progress on performance improvement, the general environment is not considered, and the performance improvement space caused by adjustment and optimization of the internal parameters of the Ceph is ignored. In the above internal mechanism optimization method research, the method has general applicability to performance improvement, but the nonlinear relationship of parameters cannot be considered completely.

Disclosure of Invention

Aiming at the technical problems that the read-write performance of the system cannot be fully exerted by the default parameters of the Ceph, the manual parameter adjustment efficiency is low, and a large amount of system resources are wasted, the invention provides the Ceph parameter adjusting and optimizing method based on the LSTM and the genetic algorithm, which has strong applicability, large performance improvement and high efficiency.

In order to solve the technical problems, the invention adopts the technical scheme that:

a Ceph parameter tuning method based on LSTM and genetic algorithm comprises the following steps:

s1, collecting a data set;

s2, proving the nonlinear relation of the data sets;

s3, constructing a performance prediction model by using LSTM;

and S4, optimizing by using EGA to obtain a set of optimal parameters.

The method for collecting the data set in S1 includes:

s1.1, randomly taking values of 8 configuration parameters of Ceph within an adjustable range, and setting the ith parameter conf_iHas a value range of [ lb_i，ub_i]，conf_i＝random(lb_i，ub_i)，i＝1,2,…,8；

S1.2, combining parameters config ═ { conf₁，conf₂，conf₃，conf₄，conf₅，conf₆，conf₇，conf₈Updating the data into a Ceph system, and testing the read-write performance of a corresponding Ceph block storage system;

s1.3 parameter combination config_iAnd corresponding iops_iComposing a data item (config)_i，iops_i) And taking all collected data items as a data set for constructing a Ceph performance prediction model.

The 8 parameters of the Ceph are bluestore _ cache _ size _ ssd, bluestore _ cache _ size _ hdd, bluestore _ cache _ size _ meta _ ratio, bluestore _ cache _ kv _ ratio, osd _ max _ write _ size, osd _ map _ cache _ size, rbd _ cache _ size and rbd _ cache _ max _ size, respectively; the type of the bluestore _ cache _ size _ ssd and bluestore _ cache _ size _ hd is integer, the type of the bluestore _ cache _ meta _ ratio and the bluestore _ cache _ kv _ ratio is float, and the type of the osd _ max _ write _ size, the osd _ map _ cache _ size, the rbd _ cache _ size and the rbd _ cache _ max _ dirty is integer.

The method for proving the non-linear relationship of the data sets in the step S2 is as follows:

the nonlinear relation of the data set is proved by establishing a function for predicting by establishing a linear combination through establishing a multiple linear regression model, wherein the multiple linear regression model is as follows:

f(config)＝ω₁conf₁+ω₂conf₂+...+ω₈conf₈+b

b is a constant, w₁-w₈If the variables have linear relation, a group of coefficients and constants must exist, so that the true value is constrained in the range of the predicted value obtained by the formula.

The method for constructing the performance prediction model by using the LSTM in the S3 comprises the following steps:

defining an Error formula Error reflecting the difference between the real value and the predicted value,

wherein Actual_iFor true values of Ceph block storage systems, Forecast_iIs the predicted value of the LSTM model, and n is the number of samples.

The method for optimizing by using EGA in S4 comprises the following steps: setting the population size as M and the maximum iteration number T, and combining a group of parameters with config ═ conf₁，conf2，conf₃，conf₄，conf₅，conf₆，conf₇，conf₈Using the gene of each individual as an individual in the population, and P (t) represents the population of the t generation; finding out individual elitist with maximum fitness in the population before genetic operation by adopting an EGA algorithm, storing the information of the individual elitist, and replacing the individual elitist with the minimum fitness in a new population after the genetic operationTo retain elitist in the next generation population.

Compared with the prior art, the invention has the following beneficial effects:

according to the invention, an accurate and reliable Ceph performance prediction model is constructed by using LSTM, the predicted value of the performance prediction model is used as the fitness of population individuals, and the optimal parameter configuration is found through EGA, so that the system performance is optimal.

Drawings

FIG. 1 is a schematic diagram of the overall framework of Ceph parameter tuning according to the present invention;

FIG. 2 is a graph showing the effect of the blocksize on model accuracy according to the present invention;

FIG. 3 is a graph comparing the predicted effects of the LSTM and RF models of the present invention;

FIG. 4 is a comparison of the effect of the method of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it is to be noted that, unless otherwise explicitly specified or limited, the terms "connected" and "connected" are to be interpreted broadly, e.g., as being fixed or detachable or integrally connected; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

s1, collecting a data set;

s2, proving the nonlinear relation of the data sets;

s3, constructing a performance prediction model by using LSTM;

and S4, optimizing by using EGA to obtain a set of optimal parameters.

Further, the method for collecting the data set in S1 is as follows:

Further, 8 parameters of Ceph are bluestore _ cache _ size _ ssd, bluestore _ cache _ size _ hdd, bluestore _ cache _ size _ meta _ ratio, bluestore _ cache _ kv _ ratio, osd _ max _ write _ size, osd _ map _ cache _ size, rbd _ cache _ size, and rbd _ cache _ max _ size, respectively; the type of bluestore _ cache _ size _ ssd and bluestore _ cache _ size _ hd is integer, the type of bluestore _ cache _ meta _ ratio and bluestore _ cache _ kv _ ratio is float, the type of osd _ max _ write _ size, osd _ map _ cache _ size, rbd _ cache _ size and rbd _ cache _ max _ size are integer, 8 parameters of Ceph are shown in table 1.

Parameter name	Type (B)	Value range
			bluestore_cache_size_ssd	integer	1GB～10GB
bluestore_cache_size_hdd	integer	1GB～10GB
			bluestore_cache_meta_ratio	float
	0～1
		bluestore_cache_kv_ratio	float		0～1
osd_max_write_size	integer					4～2000
		osd_map_cache_size	integer		64～1024
rbd_cache_size	integer					1MB～64MB
		rbd_cache_max_dirty	integer	1MB～64MB

TABLE 1

The method for randomly taking values of 8 parameters of Ceph in S3.1 comprises the following steps: randomly dereferencing 8 configuration parameters of Ceph in an adjustable range, and setting the ith parameter conf_iHas a value range of [ lb_i，ub_i]，conf_i＝random(lb_i，ub_i) I 1,2, …,8, combining the parameters config { conf ═ conf₁，conf₂，conf₃，conf₄，conf₅，conf₆，conf₇，conf₈Updating the data into a Ceph system, and testing the read-write performance of a corresponding Ceph block storage system; parameter combination config_iAnd corresponding iops_iComposing a data item (config)_i，iops_i) And taking all collected data items as a data set for constructing a Ceph performance prediction model.

The IOPS value in S3.2 is divided into 6 indices: random read IOPS, random write IOPS, sequential read IOPS, sequential write IOPS, mixed sequential read IOPS and mixed random read IOPS;

further, the method for collecting the IOPS value corresponding to the parameter in S3.2 includes:

step 1: random (lb) was used_i，ub_i) I is 1,2, …,8 is randomly 8 parameter values;

step 2: synchronizing the modified configuration parameters to the whole Ceph cluster by using a cluster management tool Ansible;

and step 3: acquiring the performance of the block storage system by using a fio + rbd test tool;

and 4, step 4: the testing task is periodically executed using the crontabs tool, steps 1-3 are repeated, and the parameter combinations and corresponding IOPS values are collected.

Further, the method for proving the non-linear relationship of the data sets in S2 is as follows:

f(config)＝ω₁conf₁+ω₂conf₂+...+ω₈conf₈+b

wherein: b is a constant, w₁-w₈If the variables have linear relation, a group of coefficients and constants must exist, so that the true value is constrained in the range of the predicted value obtained by the formula.

Further, the method for constructing the performance prediction model by using the LSTM in S3 includes:

Further, the optimization method using EGA in S4 is as follows: setting the population size as M and the maximum iteration number T, and combining a group of parameters with config ═ conf₁，conf₂，conf₃，conf₄，conf₅，conf₆，conf₇，conf₈Using the gene of each individual as an individual in the population, and P (t) represents the population of the t generation; and finding the individual elitist with the maximum fitness in the population before genetic operation by adopting an EGA algorithm, storing the information of the individual elitist, replacing the individual with the minimum fitness in the new population by the elitist after the genetic operation, and keeping the elitist in the next generation of population.

The pseudo code of the EGA algorithm is as follows.

Where lines 9 and 16 represent finding elite individuals elitist. Line 15 represents the replacement of the least adaptable individual in the new population with elitist.

Analysis of the Experimental results of the invention

First, predicting model accuracy

In order to improve the accuracy of the Ceph performance prediction model and reduce the training time, the batch size of the LSTM model needs to be adjusted. The batchsize is the number of samples selected for one training in the neural network. The size of this parameter affects the degree and speed of model optimization and directly affects the usage of the GPU and memory. If the batch size is too small, the fluctuation of the gradient change is large, and the network is not easy to converge, and if the parameter setting is too large, the memory capacity is too high, the gradient is inaccurate, and the time is long. The size of the batchsize is determined experimentally, as shown in fig. 2.

As can be seen from fig. 2, as the batch size increases, the accuracy of the model increases. When the blocksize is 32, the model accuracy is maximized. When the blocksize is greater than 32, the model accuracy is reduced. And the training time period is gradually increased after the blocksize is larger than 32. According to the experimental result, the blocksize selection 32 can achieve the optimal training effect.

After determining the parameters of the LSTM model, to verify the accuracy of the performance prediction model, the present invention uses 3000 sets of data acquired in section 3.1 as the data set of the performance prediction model. Wherein, 80% is used as a training set, 10% is used as a verification set, and 10% is used as a test set.

In order to verify the advantages and disadvantages of the Ceph performance prediction model, the performance prediction models are respectively established for the Ceph system by using the LSTM and the RF, and the accuracy of the two performance prediction models is analyzed and compared. The results of the experiment are shown in FIG. 3.

In fig. 3, predicted versus true values for LSTM and RF are shown. Where the abscissa represents different parameter configurations and the ordinate represents IOPS values. LSTM and RF represent predicted values of the performance model established by LSTM and RF, respectively. From the overall trend, the predicted values obtained by adopting the LSTM and the RF can reflect the performance fluctuation caused by the parameter change in time, but the predicted curve and the true value curve obtained by the RF have obvious difference and have larger deviation with the true value at certain moments. In order to visually compare the precision difference of the two models, the invention uses Error to evaluate the precision of the models. The Error values for the RF and LSTM models were calculated to be 0.56% and 0.28%, respectively, by comparative experiments. It follows that the prediction accuracy of LSTM is better than the prior method RF.

Second, performance comparison analysis

In order to evaluate the effect of the method on the adjustment of the Ceph system performance, the method is compared with an LSTM + GA method and an RF and GA based automatic Ceph adjustment method.

Before the experiment, initial parameters of EGA are set: probability of variation P_mCross probability P_cA population size M and a maximum number of iterations T. Mutation is essentially a deep search of the parameter configuration value space, the probability of mutation P_mIf the value is too large, the genetic algorithm becomes a random search algorithm, and because the randomness is too large, the EGA spends more time in searching; cross probability P_cThe alternating speed of the configuration scheme is influenced, and the algorithm efficiency is higher by selecting higher cross probability; the larger the population scale M and the maximum iteration number T are, the larger the search scale can be increased, the search accuracy is improved, but the larger the search scale M and the maximum iteration number T are, the more the time overhead is increased, and the search efficiency is reduced. The settings of the EGA parameters are shown in Table 2 after a plurality of experimental tests.

Parameter(s)	Value taking
		Maximum number of iterations T	100
Population size M	20
		Cross probability P_c	0.8
Probability of variation P_m	0.1

TABLE 2

An iteration trend graph of the LSTM + GA method, the LSTM + EGA method and the RF + GA method is shown in figure 4, wherein the abscissa represents the iteration times of the genetic algorithm, and the ordinate represents the read-write performance of the Ceph block storage system. In order to obtain more accurate experimental results, each method respectively takes the average value of 5 times of algorithm operation as the final experimental result.

It can be seen from fig. 4 that the read-write performance of the Ceph block storage system after parameter optimization is about 6750. The LSTM + GA method and the RF + GA method did not differ much in the first 20 generations, both methods reached a plateau around 60 generations, but RF + GA slightly lags behind LSTM + GA. The LSTM + EGA method can reach a steady state in about 40 generations, which shows that the convergence rate of the LSTM + EGA is faster than that of the LSTM + GA and the RF + GA. And the optimal value obtained by the LSTM + EGA method is superior to that obtained by the LSTM + GA method and the RF + GA method, which shows that the convergence precision of the LSTM + EGA is higher.

And (3) bringing the obtained optimal parameter combination into a Ceph system in a real environment, and measuring that the IOPS mean value of the block storage system performance is 6612 and the difference with the predicted value is not much within an acceptable range. The performance of the default parameter configuration can only reach 3971, and the performance is about 1.7 times of the default configuration.

Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are encompassed in the scope of the present invention.

Claims

1. A Ceph parameter tuning method based on LSTM and genetic algorithm is characterized in that: comprises the following steps:

s1, collecting a data set;

s2, proving the nonlinear relation of the data set, thereby proving the complexity of Ceph tuning;

s3, constructing a performance prediction model by using LSTM;

and S4, optimizing by using EGA to obtain a set of optimal parameters.

2. The Ceph parameter tuning method based on LSTM and genetic algorithm as claimed in claim 1, wherein: the method for collecting the data set in S1 includes:

3. The Ceph parameter tuning method based on LSTM and genetic algorithm as claimed in claim 2, wherein: the 8 parameters of the Ceph are bluestore _ cache _ size _ ssd, bluestore _ cache _ size _ hdd, bluestore _ cache _ size _ meta _ ratio, bluestore _ cache _ kv _ ratio, osd _ max _ write _ size, osd _ map _ cache _ size, rbd _ cache _ size and rbd _ cache _ max _ size, respectively; the type of the bluestore _ cache _ size _ ssd and bluestore _ cache _ size _ hd is integer, the type of the bluestore _ cache _ meta _ ratio and the bluestore _ cache _ kv _ ratio is float, and the type of the osd _ max _ write _ size, the osd _ map _ cache _ size, the rbd _ cache _ size and the rbd _ cache _ max _ dirty is integer.

4. The Ceph parameter tuning method based on LSTM and genetic algorithm as claimed in claim 1, wherein: the method for proving the non-linear relationship of the data sets in the step S2 is as follows:

f(config)＝ω₁conf₁+ω₂conf₂+...+ω₈conf₈+b

5. The Ceph parameter tuning method based on LSTM and genetic algorithm as claimed in claim 1, wherein: the method for constructing the performance prediction model by using the LSTM in the S3 comprises the following steps:

6. The Ceph parameter tuning method based on LSTM and genetic algorithm as claimed in claim 1, wherein: the method for optimizing by using EGA in S4 comprises the following steps: setting the population size as M and the maximum iteration number T, and combining a group of parameters with config ═ conf₁，conf₂，conf₃，conf₄，conf₅，conf₆，conf₇，conf₈As one individual in the population,each parameter represents a gene of an individual, and P (t) represents the population of the t generation; and finding the individual elitist with the maximum fitness in the population before genetic operation by adopting an EGA algorithm, storing the information of the individual elitist, replacing the individual with the minimum fitness in the new population by the elitist after the genetic operation, and keeping the elitist in the next generation of population.