CN115858660A - Parameter recommendation method and device and computer storage medium - Google Patents

Parameter recommendation method and device and computer storage medium Download PDF

Info

Publication number
CN115858660A
CN115858660A CN202111115232.8A CN202111115232A CN115858660A CN 115858660 A CN115858660 A CN 115858660A CN 202111115232 A CN202111115232 A CN 202111115232A CN 115858660 A CN115858660 A CN 115858660A
Authority
CN
China
Prior art keywords
parameter
performance
parameters
prediction model
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111115232.8A
Other languages
Chinese (zh)
Inventor
乔于洋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Suzhou Software Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Suzhou Software Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202111115232.8A priority Critical patent/CN115858660A/en
Publication of CN115858660A publication Critical patent/CN115858660A/en
Pending legal-status Critical Current

Links

Images

Abstract

The application discloses a parameter recommendation method, a device and a computer storage medium, wherein the method comprises the following steps: determining a parameter input set of a distributed storage system and system parameters configured in at least one storage scene; determining at least one performance index according to the parameter input set and the system parameters; performing model training according to the parameter input set, the system parameters and the at least one performance index, and determining at least one performance prediction model and at least one updated parameter set; each performance index corresponds to a performance prediction model and an updating parameter set; and recommending parameters to the distributed storage system according to the at least one performance prediction model and the at least one updated parameter set. Therefore, because system parameters under different storage scenes are considered in the model training, the optimal parameter recommendation under different storage scenes can be realized according to the trained performance prediction model, and the resource consumption can be reduced.

Description

Parameter recommendation method and device and computer storage medium
Technical Field
The present application relates to the field of distributed storage technologies, and in particular, to a parameter recommendation method and apparatus, and a computer storage medium.
Background
As the name implies, a distributed storage system stores data in a distributed manner on a plurality of independent devices. The traditional network storage system adopts a centralized storage server to store all data, and cannot meet the requirement of large-scale storage application. The distributed storage system adopts an expandable system structure, utilizes a plurality of storage servers to share the storage load, and utilizes the position server to position the storage information, thereby not only improving the reliability, the availability and the access efficiency of the system, but also being easy to expand.
In the related art, in order to improve the performance of the distributed storage system, two layers of hardware optimization and software optimization are generally considered. When the optimization is considered from the hardware level, hard requirements are usually imposed on the hardware, and the optimization cost is very high. When considering optimization from a software level, for a distributed storage system, such as Ceph, the configurable parameters provided by the distributed storage system can be as many as thousands, and when configuring the distributed storage system, the parameter setting depends on the experience of a storage engineer to a large extent; however, even an experienced storage engineer may have difficulty in ensuring that the set parameters are necessarily optimal parameters for optimizing the performance of the distributed storage system.
At present, although there are many parameter tuning schemes for a distributed storage system, there are defects that the knowledge base and practical experience of technicians are excessively relied on, and various storage scene requirements are difficult to be compatible.
Disclosure of Invention
The application provides a parameter recommendation method, a parameter recommendation device and a computer storage medium, which can carry out parameter recommendation on a distributed storage system in a targeted manner according to performance indexes emphasized by different storage scenes, can reduce resource consumption, and can improve parameter optimization efficiency.
The technical scheme of the application is realized as follows:
in a first aspect, an embodiment of the present application provides a parameter recommendation method, including:
determining a parameter input set of a distributed storage system and system parameters configured in at least one storage scene;
determining at least one performance indicator based on the input set of parameters and the system parameters;
performing model training according to the parameter input set, the system parameters and the at least one performance index, and determining at least one performance prediction model and at least one updated parameter set; each performance index corresponds to a performance prediction model and an updated parameter set, and the updated parameter set is obtained after the parameter input set is iteratively updated in the model training process;
and recommending parameters to the distributed storage system according to the at least one performance prediction model and the at least one updated parameter set.
In a second aspect, an embodiment of the present application provides a parameter recommendation apparatus, which includes a determination unit, a training unit, and a recommendation unit, wherein,
the determining unit is configured to determine a parameter input set of the distributed storage system and system parameters configured in at least one storage scene; and determining at least one performance indicator based on the input set of parameters and the system parameters;
the training unit is configured to perform model training according to the parameter input set, the system parameters and the at least one performance index, and determine at least one performance prediction model and at least one updated parameter set; each performance index corresponds to a performance prediction model and an updated parameter set, and the updated parameter set is obtained after the parameter input set is iteratively updated in the model training process;
and the recommending unit is configured to recommend parameters to the distributed storage system according to the at least one performance prediction model and the at least one updated parameter set.
In a third aspect, an embodiment of the present application further provides a parameter recommendation apparatus, which includes a memory and a processor, wherein,
the memory for storing a computer program operable on the processor;
the processor is configured to execute the parameter recommendation method according to the first aspect when the computer program is run.
In a fourth aspect, an embodiment of the present application provides a computer storage medium, where a computer program is stored, and the computer program, when executed by at least one processor, implements the parameter recommendation method according to the first aspect.
The method, the device and the computer storage medium for recommending the parameters are characterized in that a parameter input set of a distributed storage system and system parameters configured in at least one storage scene are determined; determining at least one performance index according to the parameter input set and the system parameters; performing model training according to the parameter input set, the system parameters and the at least one performance index, and determining at least one performance prediction model and at least one updated parameter set; each performance index corresponds to a performance prediction model and an updated parameter set, and the updated parameter set is obtained after the parameter input set is iteratively updated in the model training process; and recommending parameters to the distributed storage system according to the at least one performance prediction model and the at least one updated parameter set. In this way, when parameter recommendation is performed, because system parameters under different storage scenes are considered in model training, parameter recommendation can be performed on the distributed storage system in a targeted manner according to the performance prediction model and the updated parameter set obtained by the training, analysis on each parameter is not needed, the whole process is realized automatically, and manual intervention can be avoided; in addition, the performance prediction model is added into the parameter optimization process, so that the parameter optimization efficiency is improved, and the resource consumption is reduced.
Drawings
Fig. 1 is a schematic flowchart of a parameter recommendation method according to an embodiment of the present application;
fig. 2 is a schematic flowchart of another parameter recommendation method according to an embodiment of the present application;
fig. 3 is a schematic diagram of an application architecture of a parameter recommendation system according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of a parameter recommendation apparatus according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of another parameter recommendation device according to an embodiment of the present application;
fig. 6 is a schematic diagram of a specific hardware structure of a parameter recommendation apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a component of a parameter recommendation device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant application and are not limiting of the application. It should be further noted that, for the convenience of description, only the portions relevant to the related applications are shown in the drawings.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the present application only and is not intended to be limiting of the application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.
It should be noted that the terms "first \ second \ third" referred to in the embodiments of the present application are only used for distinguishing similar objects and do not represent a specific ordering for the objects, and it should be understood that "first \ second \ third" may be interchanged under the permission of a specific order or sequence, so that the embodiments of the present application described herein can be implemented in an order other than that shown or described herein.
It should be understood that a distributed storage system (also referred to as "distributed storage software," "distributed storage cluster," "distributed file system," etc.) may be considered as a queuing model, and the interaction between a client and the distributed storage system is mainly the queue processing of Input/Output (I/O) related instructions. Therefore, the optimization for the distributed storage system often uses the queue model as an entry point, mainly two directions of increasing parallelism and reducing service time, which are both related to factors such as hardware configuration of the server and operating system settings, for example, the Processing capability of a Central Processing Unit (CPU), the parallelism of a disk, the IO path length in software design, the memory size, the memory recovery mechanism, the network bandwidth, and the like.
In the related technology, there are many parameter tuning schemes for a distributed storage system, and most of the tuning schemes firstly define a tuning parameter range and perform real-time adjustment according to performance monitoring during cluster operation, and an adjustment strategy depends on knowledge base and practical experience of technicians. Some tuning schemes apply statistical and machine learning techniques to parameter tuning, construct an input set through parameter value range sampling, determine evaluation indexes, and select parameter configuration with optimal performance by trying different parameter configurations. The method can theoretically find an optimal solution, particularly for a nonlinear scene, has obvious tuning effect, can realize automatic tuning, has low requirement on operation and maintenance personnel, but consumes a large amount of system resources in an iteration process, and cannot compatibly process different hardware configurations, operating system configurations and the like in each result. In addition, although there is also an optimization scheme that a prediction model is firstly constructed and introduced into a parameter training process, resource consumption can be reduced and the time for finding the optimal parameter combination is shortened, the optimization scheme is basically highly coupled with the configuration of the cluster.
Based on this, the embodiment of the present application provides a parameter recommendation method, and the basic idea of the method is: determining a parameter input set of a distributed storage system and system parameters configured in at least one storage scene; determining at least one performance index according to the parameter input set and the system parameters; performing model training according to the parameter input set, the system parameters and the at least one performance index, and determining at least one performance prediction model and at least one updated parameter set; each performance index corresponds to a performance prediction model and an updating parameter set, and the updating parameter set is obtained after the parameter input set is iteratively updated in the model training process; and recommending parameters to the distributed storage system according to the at least one performance prediction model and the at least one updated parameter set. In this way, when parameter recommendation is performed, because system parameters under different storage scenes are considered in model training, parameter recommendation can be performed on the distributed storage system in a targeted manner according to the performance prediction model and the updated parameter set obtained by the training, analysis on each parameter is not needed, the whole process is realized automatically, and manual intervention can be avoided; in addition, the performance prediction model is added to the parameter optimization process, so that the parameter optimization efficiency is improved, and the resource consumption is reduced.
Embodiments of the present application will be described in detail below with reference to the accompanying drawings.
In an embodiment of the present application, refer to fig. 1, which shows a schematic flowchart of a parameter recommendation method provided in the embodiment of the present application. As shown in fig. 1, the method may include:
s101, determining a parameter input set of the distributed storage system and system parameters configured in at least one storage scene.
It should be noted that the parameter recommendation method provided in the embodiment of the present application is used to recommend a set of optimized parameter sets that optimize performance indexes of a distributed storage system in combination with an actual storage scenario before creating the distributed storage system. The method can be applied to a device for parameter recommendation, or equipment, a system and the like integrated with the device. Here, the electronic device may be a computer, a smart phone, a tablet computer, a notebook computer, a palm top computer, a Personal Digital Assistant (PDA), a navigation device, and the like, which is not limited in this embodiment.
In the embodiment of the present application, the parameter input set of the distributed storage system refers to configurable parameters from the distributed storage system itself, and taking the distributed storage system Ceph as an example, the configurable parameters of the distributed storage system itself can be up to thousands, for example, the maximum byte (byte) number of a write journal: journal max write bytes, the maximum operand stored in the queue at any time: journal queue max ops, and the like.
It should be further noted that, in the application embodiment, the initial parameter set may be a parameter set optimized after a large number of configurable parameters of the distributed storage system are screened, that is, not every configurable parameter of the distributed storage system is taken as a parameter in the initial parameter set.
In addition, the system parameters herein refer to some intrinsic configuration parameters of the system under different storage scenarios (e.g. different operating systems, hardware configurations, and storage policies), such as: CPU core count, memory size, maximum thread count limit, etc. In different storage scenarios, even the same configuration parameters of the distributed storage system may have different performance indexes. In the embodiment of the present application, for system parameters, system parameters configured in different storage scenarios, for example, different storage scenarios such as bare metal block storage, virtual machine block storage, object storage, and full flash block storage, may be determined respectively.
S102, determining at least one performance index according to the parameter input set and the system parameters.
It should be noted that after the parameter input set and the system parameters are determined, the performance index of the distributed storage system under different parameter combinations can be determined according to the parameter input set and the system parameters.
In some embodiments, the determining at least one performance indicator based on the input set of parameters and the system parameter may include:
building at least one test cluster corresponding to a storage scene according to the parameter input set and the system parameters;
in the running process of a test cluster corresponding to at least one storage scene, obtaining at least one performance index by using a performance test tool;
wherein the performance index comprises at least one of: unit time reading and writing times IOPS, throughput, central processing unit CPU utilization rate, memory utilization rate, exchange memory SWAP utilization rate and reading and writing delay.
It should be noted that, the parameters of the distributed storage system and different values of the system parameters may affect the performance of the distributed storage system. The step is mainly to determine the performance index of the distributed storage system under different parameter values and combinations through a parameter input set and system parameters of the distributed storage system and through performance testing tools such as perf, fio and the like. The performance index mainly refers to a read-write index, including but not limited to, an Input/Output Operations Per Second (IOPS), throughput, a Central Processing Unit (CPU) utilization rate, a memory utilization rate, a SWAP memory (SWAP) utilization rate, and read-write latency.
S103, performing model training according to the parameter input set, the system parameters and the at least one performance index, and determining at least one performance prediction model and at least one updated parameter set.
It should be noted that, by performing model training using the parameter input set, the system parameters, and the at least one performance index as a training set, at least one performance prediction model and an updated parameter set can be obtained, where each performance index corresponds to one performance prediction model and one updated parameter set, and the updated parameter set is obtained after the parameter input set is iteratively updated in the model training process.
And S104, recommending parameters to the distributed storage system according to the at least one performance prediction model and the at least one updated parameter set.
It should be noted that, after the foregoing steps, at least one performance prediction model and an updated parameter set corresponding to at least one performance index respectively are obtained. And then determining an optimal parameter set according to the at least one performance prediction model and the at least one updated parameter set, and further recommending parameters to the distributed storage system.
It should be noted that, in order to improve the performance of the distributed storage system, two layers of hardware optimization and software optimization are generally considered. In a possible implementation, considering optimization from a hardware level, for a Hard Disk type, the read-write performance of a Solid State Disk (SSD) is much higher than that of a mechanical Hard Disk (HDD), especially an SSD based on a Non-Volatile Memory standard (NVMe) protocol, which can fully exploit the advantage of multiple cores of CPUs, thereby greatly increasing concurrency, but the cost is also considerable, so that the application range of the full flash cluster is limited. In addition, if the cluster adopts full-flash deployment, the disk IO will not be a bottleneck any more, the length of the IO path, especially the switching of the kernel mode of the operating system, and the involvement of the distribution type will involve multi-node network communication, which will become a new bottleneck. Some vendors introduce Remote Direct Memory Access (RDMA), user state of Storage Performance Development Kit (SPDK), and other solutions to solve new Performance bottleneck. However, the above solutions for improving the performance of the distributed storage system have hard requirements on hardware, high economic cost and high reconstruction cost.
In another possible implementation, optimization is considered from a software level, taking distributed storage system Ceph as an example, as a common open-source distributed storage system, which provides as many as thousands of configurable parameters, and is deployed on different devices (such as HDD or SSD) under different storage scenarios (such as copy or erasure code), and when different operating systems (such as x86 or ARM) are adopted, the influence of the parameter setting values on the performance is related to these factors. Even experienced storage engineers cannot guarantee that the most suitable parameters are selected under different scenarios to exert the best performance of the distributed storage system.
Illustratively, ceph has a parameter, filescore _ op _ threads, in the filescore, which represents the number of IO threads, and if the parameter is set to a larger value, the IO processing speed can be increased, but if there are too many threads, frequent thread switching may also affect the performance of the distributed storage system. For another example, the bluestore design of Ceph should avoid double writing as much as possible, but metadata and some smaller data (e.g. data smaller than 16k on HDD and 64K on ssd) will be written into key-value (kv) database, and the indexing feature of the database can accelerate the reading and writing speed, but at the same time, this part also brings the problem of double writing.
Based on this, in another possible implementation manner, the embodiment of the present application may recommend a most suitable parameter configuration for a set of distributed storage systems according to hardware configuration, system settings, storage scenarios, and the like. According to the method and the system, performance optimization is carried out on the distributed storage system from a software level, automatic parameter tuning is used as an entry point, a set of automatic parameter tuning method and system are provided, and an optimal parameter combination can be selected according to hardware configuration, a storage strategy and different use scenes so as to improve the storage performance of different distributed storage systems.
The embodiment of the application provides a parameter recommendation method, which comprises the steps of determining a parameter input set of a distributed storage system and system parameters configured in at least one storage scene; determining at least one performance index according to the parameter input set and the system parameters; performing model training according to the parameter input set, the system parameters and the at least one performance index, and determining at least one performance prediction model and at least one updated parameter set; each performance index corresponds to a performance prediction model and an updating parameter set; and recommending parameters to the distributed storage system according to the at least one performance prediction model and the at least one updated parameter set. Therefore, system parameters under different storage scenes are considered in the model training, so that the performance prediction model obtained according to the training can realize the optimal parameter recommendation under different storage scenes, and the resource consumption can be reduced.
In another embodiment of the present application, because the embodiment of the present application is mainly applied to a distributed Software Defined Storage (SDS) system that provides a Storage resource Service at an Infrastructure as a Service (IaaS) layer, for a distributed SDS system at an IaaS layer, the number of parameters related to such a distributed Storage system is often many, and the value range of each parameter is very wide, and parameters that have decisive influence on performance are different in different usage scenarios.
Taking the distributed storage system Ceph as an example, the configurable parameters provided by the distributed storage system Ceph can reach thousands of kinds, but not every parameter will affect the performance, and the parameters affecting the performance are different in different storage scenarios. Therefore, it is necessary to screen the parameters in a certain manner, that is, only the parameters that affect the performance of the distributed storage system or have a large effect are screened, and then the parameters of these types are trained in a targeted manner, and finally an optimal parameter set of these parameters is generated to create the distributed storage system.
Therefore, in a specific example, the embodiment of the present application may determine the parameter input set of the distributed storage system by screening the parameter types of the distributed storage system in a certain manner. Referring to fig. 2, a schematic flow chart of another parameter recommendation method provided in the embodiment of the present application is shown. As shown in fig. 2, the method may include:
s201, acquiring an initial parameter set of the distributed storage system, randomly sampling parameters in the initial parameter set in a preset parameter value range, and determining a test sample.
S202, determining at least one performance index according to the test sample and the system parameters.
S203, performing model training on the test sample, the system parameters and the at least one performance index by using a second preset algorithm, and determining at least one intermediate performance prediction model and the prediction accuracy contribution of each parameter in the test sample to each intermediate performance prediction model.
And S204, analyzing the contribution of the determined prediction accuracy, and selecting a parameter input set from the test sample according to the analysis result.
It should be noted that the initial parameter set of the distributed storage system includes all configurable parameters of the distributed storage system (or may be a parameter set obtained by removing parameters that have no influence on the performance of the distributed storage system inevitably by an experienced engineer), each configurable parameter of the distributed storage system has a specific value range, that is, a preset parameter value range, and each parameter is randomly sampled in the preset parameter value range, so that a test sample is obtained.
It should be further noted that the system parameter may be a system parameter configured by the distributed storage system in at least one storage scenario; the at least one storage scenario may be bare metal block storage, virtual machine block storage, object storage, full flash block storage, and the like, and the embodiment of the present application is not particularly limited. Thus, after the test sample and the system parameters are determined, various parameter combinations of the parameters of the distributed storage system and the system parameters can be obtained according to the test sample and the system parameters, and then, under each parameter combination, the performance of the distributed storage system in the parameter combination can be determined, namely, each parameter combination corresponds to a group of performance indexes.
Specifically, the distributed storage system can be built according to the test samples and the system parameters through performance test tools, such as perf, fio, and the like, and then the distributed storage system is operated, so that the corresponding relation between a specific parameter combination and a performance index can be obtained.
Here, the performance index mainly refers to a read-write index, and may include, but is not limited to, one or more of the following, the number of times of read-write operations per second (i.e., IOPS), throughput (which may be expressed in mbps), CPU utilization, memory utilization, SWAP utilization, read-write latency, and the like. In this embodiment, attention is mostly paid to three performances, i.e., IOPS, mbps, and read/write latency, but specifically, which performance indexes are obtained may be specifically set in combination with an actual situation, which is not specifically limited in this embodiment of the present application.
After obtaining at least one set of performance indexes, the test sample, the system parameters and the performance indexes may be trained by using a second preset algorithm to obtain an intermediate performance prediction model, where the intermediate performance prediction model is the same type as the performance prediction model described in the foregoing embodiment, and the intermediate performance prediction model cannot be used as a finally determined performance prediction model, which is used as a partial basis for determining the parameter input set and is equivalent to determining an intermediate process in the performance prediction model.
In the embodiment of the application, the second preset algorithm selects a random forest algorithm, training is performed through the random forest algorithm to obtain an intermediate performance prediction model, the test sample and the system parameters are used as input, and the performance indexes are used as output. Random Forest (RF) is a classifier that contains multiple decision trees, each of which can be used for prediction, and a more accurate and stable model is obtained by building and fusing the multiple decision trees. Because the parameter value and the performance index are not linear relation, the problem of the nonlinear relation between the parameter value and the performance index can be well solved through the random forest algorithm.
It should be further noted that, in the set of intermediate performance prediction models obtained in the embodiment of the present application, the set of intermediate performance prediction models includes at least one intermediate performance prediction model corresponding to at least one performance index, that is, for each performance index, one intermediate performance prediction model corresponding to the performance index can be obtained. For example: an intermediate IOPS prediction model, an intermediate mbps prediction model, an intermediate read-write delay prediction model and the like.
It should be further noted that, while the intermediate performance prediction model corresponding to each performance index is obtained, the contribution degree of each parameter to the prediction accuracy of each intermediate performance prediction model can also be obtained.
In some embodiments, before performing model training on the test samples, the system parameters, and the at least one performance indicator by using a second preset algorithm, and determining at least one intermediate performance prediction model and a prediction accuracy contribution degree of each parameter in the test samples to each intermediate performance prediction model, the method may further include:
carrying out mutation point detection on the parameters of each type and at least one performance index in the parameter input set;
and determining the parameter type corresponding to the mutation of at least one performance index as the parameter type for model training.
It should be noted that, before determining the contribution of each parameter to the prediction accuracy of each intermediate performance prediction model, in order to avoid calculating the contribution of unnecessary parameters to the prediction accuracy, a mutation point detection method may be used to determine whether the prediction accuracy contribution detection needs to be performed on the parameter, that is, whether a parameter needs to be used during model training.
In particular, during the training of the intermediate performance prediction model, the change in the value of each parameter of a parameter may have a significant or minimal impact on performance. If the contribution degree of the prediction accuracy of the parameter is calculated for each performance result, it is meaningless because the change of the performance is possibly very slight, if the performance index has a sudden change on a certain parameter value point, then the analysis of the contribution degree of the prediction accuracy is necessary at this time, and the detection of the sudden change point is to find the parameter type corresponding to the point where the performance index has a sudden change. The parameter set corresponding to the parameter type with the mutation point on the influence of the performance index is used as training data for training to obtain an intermediate performance prediction model, the contribution degree of the parameters to the prediction accuracy of the intermediate performance prediction model is determined, and the part of the parameters without the mutation point on the influence of the performance index is not used as the parameters for training the intermediate performance prediction model.
Here, the prediction accuracy contribution of each parameter to the intermediate performance prediction model corresponding to each performance index may be different. For example, for the parameter 1, the influence on the performance a is large, and a slight change in the value may cause the performance a to change from extremely poor to excellent, so the parameter 1 is an important parameter for the performance a; for the performance B, the parameter 1 has little influence on the performance B, and within the range of the parameter 1, no matter how the parameter 1 takes the value, the performance B will not change or will change very little, so for the performance B, the parameter 1 can be regarded as an "irrelevant" parameter. The influence of the parameter on a certain performance can be represented by the contribution degree of the parameter on the prediction accuracy of the performance prediction model corresponding to the performance index, and the larger the contribution degree of the prediction accuracy is, the larger the influence of the parameter on the performance is; conversely, the smaller the contribution of the prediction accuracy, the smaller the influence of the parameter on the performance.
Therefore, the parameter input set can be selected from the initial parameter set according to the analysis result by analyzing the contribution degree of the prediction accuracy.
In some embodiments, the analyzing the contribution of the determined prediction accuracy and selecting the parameter input set from the test sample according to the analysis result may include:
and selecting candidate prediction accuracy rate contribution degrees of which the prediction accuracy rate contribution degrees are larger than a preset contribution degree threshold value from the determined prediction accuracy rate contribution degrees, and determining a first parameter set by using parameters corresponding to the candidate prediction accuracy rate contribution degrees.
It should be noted that, in the embodiment of the present application, the parameter is screened according to the contribution degree of the parameter to the prediction accuracy of the intermediate performance prediction model, a preset contribution degree threshold may be set, and when the contribution degree of the parameter to the prediction accuracy of the intermediate performance prediction model exceeds the preset contribution degree threshold, it is indicated that the parameter has a large influence on the performance, and the parameter may be used as an important parameter to be screened out. For example, the preset contribution threshold may be 50%, 60%, and the like, which is not specifically limited in the embodiment of the present application.
Specifically, the candidate prediction accuracy contribution degree of which the prediction accuracy contribution degree is greater than a preset contribution degree threshold is selected from the determined prediction accuracy contribution degrees, then parameters corresponding to the candidate prediction accuracy contribution degrees are determined, the parameters are screened important parameters, the important parameters form a first parameter set, the parameters are randomly sampled within a preset parameter value range, and a process of screening a parameter input set of the distributed storage system is completed.
It should be noted that, for the system parameters, the performance impact of different system parameters on the distributed storage system is also different. Therefore, while the prediction accuracy contribution of each parameter in the test sample to each performance prediction model is determined, the prediction accuracy contribution of each system parameter in the system parameters to each performance prediction model can be screened out, and the system parameters with the prediction accuracy contribution exceeding the contribution threshold of the performance prediction model corresponding to a certain performance index are selected as the updated system parameters according to the same method.
Therefore, multiple iterations can be performed in this way, and the parameter input set is continuously updated, that is, the first parameter input set is used for replacing the initial parameter input set to continue model training.
Further, in some embodiments, the method further comprises:
determining the number of iterations of the at least one performance prediction model;
when the iteration times are smaller than the preset iteration times, determining the first parameter set as the initial parameter set, performing 1 addition processing on the iteration times, returning to perform random sampling of the parameter in the initial parameter set in a preset parameter value range, and determining a test sample by using the test sample;
and when the iteration times reach a preset iteration time, determining the first parameter set as the parameter input set, and determining a model obtained after the latest iteration update as the at least one performance prediction model.
It should be noted that the preset number of iterations may be determined by combining factors such as the complexity of the actual model and the test environment, and may be, for example, 10 times, 30 times, 100 times, and the like, which is not specifically limited in this embodiment of the application.
The method comprises the steps of obtaining at least one intermediate performance test model by training each time, namely one iteration, when the iteration number is smaller than a preset iteration number, continuously screening parameters, determining at least one performance prediction model and a corresponding first parameter set, determining the first parameter set as an initial parameter input set, adding 1 to the iteration number, then according to the first parameter set (namely an updated initial parameter input set) obtained after iteration updating, returning to execute random sampling of a preset parameter value range of parameters in the initial parameter set, determining a test sample, and continuously training to obtain at least one new set of intermediate performance prediction models.
When the iteration times reach the preset times, parameters do not need to be screened, the first parameter set obtained at this time can be directly determined as a parameter input set, and a model obtained after the latest iteration is updated can be directly determined as the final at least one performance prediction model.
In addition, the basis for not performing iteration again may be that, when the absolute value of the difference between the performance index predicted by the performance prediction model and the real performance index is smaller than the error threshold (or the loss function value of the model is smaller than the preset loss threshold), it indicates that the prediction accuracy of the performance prediction model is already high, and then iteration is not performed again.
That is, the initial parameter input set (which may also include system parameters) for training the performance prediction model is continuously updated by the performance prediction model itself (the performance prediction model in the process of screening parameters is referred to as an intermediate performance prediction model) until the final parameter input set is determined, which is an iterative process and can be completed by an Expectation-Maximization (EM) algorithm.
Thus, the parameter input set is selected by combining the process of training the performance prediction model with determining the parameter input set. The parameter input set is used as a sample parameter when the performance prediction model is trained on one hand, and is used as a parameter needing to be optimized on the other hand. The advantage of doing so is that, because the performance prediction model that obtains finally is trained by the part of parameter that influences the performance the most, on one hand can make the prediction result of performance prediction model more accurate, on the other hand need not to carry out parameter optimization to those parameters that do not influence performance or influence little, has reduced the consumption of resource.
In another specific example, the embodiment of the present application may further determine the parameter input set of the distributed storage system by an IO stack analysis. Thus in some embodiments: the determining the input set of parameters of the distributed storage system may include:
carrying out IO stack analysis on the distributed storage system, and determining relevant parameters of an IO stack;
and randomly sampling the preset IO stack parameter value range of the related parameters of the IO stack to obtain a related parameter set, and determining the related parameter set as a parameter input set.
It should be noted that if some types of parameters are associated with the IO stack of the distributed storage system, these types of parameters will necessarily have an impact on the performance of the distributed storage system. Therefore, the IO stack of the distributed storage system can be analyzed to determine the relevant parameters of the IO stack, and the parameters can influence the response of the IO path and can be used as important parameters influencing the performance of the distributed storage system. The parameter input set of the distributed storage system can be obtained by randomly sampling the important parameters (namely, the related parameters of the IO stack) in the parameter value range (namely, the preset IO stack parameter value range).
However, when determining the parameter input set of the distributed storage system in this way, the developer is often required to have a deeper understanding of the implementation of the distributed storage system, and the experience of the developer is relatively relied on.
After determining the input set of parameters for the distributed storage system and the system parameters configured in the at least one storage scenario, at least one performance indicator may be determined based on the input set of parameters and the system parameters.
It should be noted that, in the embodiment of the present application, the process of determining the parameter input set is combined with the process of training the prediction model, where the at least one performance prediction model and the at least one updated parameter set finally determined here are, in the foregoing iteration process, when the number of iterations reaches a preset number, the at least one performance prediction model obtained after the last iteration is updated, and the one updated parameter set corresponding to the at least one performance prediction model is the parameter set used for training the corresponding performance prediction model in the parameter input set after the last iteration is updated.
Exemplarily, it is assumed that 100 configurable parameters of a distributed storage system are respectively parameter 1, parameter 2, and parameter 3 \ 8230, and that 100 parameters are randomly sampled within a preset parameter value range to obtain a parameter input set, and then a performance test is performed together with system parameters (the system parameters may be selected as parameters to be screened, or the system parameters may not be selected, in this example, it is assumed that the system parameters are not selected, only the configurable parameters of the distributed storage system are selected), so as to obtain a plurality of corresponding performance index values, and it is assumed that only performance a, performance B, and performance C of the distributed storage system are detected here. Then, after the foregoing iterative process (for example, 100 iterations), three prediction models corresponding to the performances a, B, and C are finally obtained, and since in the iterative process, it is determined which parameters have a greater influence on a specific performance index based on the contribution degree of each parameter to the prediction accuracy of the performance prediction model.
After 100 iterations, the A prediction model corresponding to the A performance index is finally obtained by a parameter input set consisting of parameters 1 to 15 and parameters 27, namely, the updated parameter set corresponding to the A prediction model is the parameters 1 to 15 and the parameters 27; similarly, if the final B prediction model corresponding to the B performance index is obtained from the parameter input set consisting of the parameters 5 to 10 and the parameters 70 to 75, that is, the updated parameter set corresponding to the B prediction model is the parameters 5 to 10 and the parameters 70 to 75; if the C prediction model corresponding to the C performance index is finally obtained by a parameter input set consisting of the parameters 60 to 73, namely the updated parameter set corresponding to the C prediction model is the parameters 60 to 73. The other parameters except the execution parameters have little influence on the performance and are not used as the parameter input set for training the performance prediction model.
It should be further noted that, if the parameter input set is obtained based on the IO stack analysis in step S101, here, model training may be directly performed according to the parameter input set, the system parameters, and the at least one performance index, so as to obtain at least one performance prediction model and a parameter set (also referred to as an updated parameter set) respectively corresponding to the at least one performance prediction model.
In the embodiment of the application, the finally obtained performance prediction model is obtained according to the updated parameter set after repeated iteration updating, so that when the model is used for parameter recommendation, the optimal parameter set of parameters directly related to performance can be obtained. In addition, when the performance prediction model is trained, the system parameters are also used as the parameter input set to determine the performance index and use the performance index for training the performance prediction model. Therefore, when parameter recommendation is carried out, a set of configuration parameters which are matched with system parameters of a storage scene of the distributed storage system can be directly obtained, and the configuration parameters which enable the distributed storage system to have the best performance under a specific storage scene can be obtained.
Further, in some embodiments, during the model training process, the method may further include:
after the first parameter set is obtained, determining the prediction accuracy contribution of each parameter in the first parameter set corresponding to each performance index according to the prediction accuracy contribution of each parameter in the first parameter set to each performance prediction model and the corresponding relation between the performance prediction model and the performance index;
and under each performance index, selecting candidate prediction accuracy rate contribution degrees of which the prediction accuracy rate contribution degrees are larger than a preset contribution degree threshold value from the determined prediction accuracy rate contribution degrees, and determining parameters corresponding to the candidate prediction accuracy rate contribution degrees as an updated parameter set corresponding to each performance index.
It should be noted that, in the process of training the performance prediction model, after the first parameter set is obtained, the contribution degree of each parameter in the first parameter set to the prediction accuracy of each performance prediction model can be determined according to the contribution degree of each parameter in the first parameter set to the prediction accuracy of each performance prediction model and the corresponding relationship between the performance prediction model and the performance index, and for each performance index, the contribution degree of each parameter in the first parameter set to the prediction accuracy of each parameter in the first parameter set is determined.
In this way, for each performance index, a first parameter set can be obtained, and a parameter in the first parameter set, for which the contribution degree of the prediction accuracy of the performance index is greater than the preset contribution degree threshold, is determined as the updated parameter set corresponding to the performance index. Thereby, the corresponding relation between the parameters and the performance indexes can be determined.
Further, after determining the at least one performance prediction model and the at least one updated parameter set, parameter recommendations may be made to the distributed storage system based on the at least one performance prediction model and the at least one target parameter set. Specifically, in some embodiments, the making parameter recommendations to the distributed storage system according to the at least one performance prediction model and the at least one target parameter set may include:
acquiring target system parameters configured by the distributed storage system in a preset storage scene;
according to a target performance index to be subjected to parameter recommendation, selecting a target performance prediction model corresponding to the target performance index from at least one performance prediction model, and selecting a target parameter set corresponding to the target performance index from at least one updating parameter set;
and performing parameter optimization on the target parameter set and the target system parameters by using a first preset algorithm according to the target performance prediction model to obtain an optimized parameter set corresponding to the target performance index, and recommending the optimized parameter set to the distributed storage system.
It should be noted that the parameter recommendation method provided in the embodiment of the present application can perform optimal parameter recommendation according to a current usage scenario, and when performing parameter recommendation to a distributed storage system according to at least one performance prediction model and at least one target parameter set, first obtain target system parameters configured by the distributed storage system in a preset storage scenario, where the preset storage scenario is a specific storage scenario of the distributed storage system to be created, for example, what storage policy is adopted, how hardware is configured, what operating system is selected, and the like, and performance indexes of each storage scenario are different.
Therefore, the target performance index needing parameter recommendation is obtained according to the preset storage scene, the target performance prediction model corresponding to the target performance index is selected from the at least one performance prediction model, and the target parameter set corresponding to the target performance index is selected from the at least one updating parameter set. And then inputting the target performance model into a first preset algorithm, performing parameter optimization on the target parameter set and the target system parameters through the first preset algorithm to obtain an optimized parameter set corresponding to the target performance index, and performing parameter recommendation on the obtained optimized parameter set to the distributed storage system.
It should be noted that, in the embodiment of the present application, the first preset algorithm may be an EM algorithm, where the target parameter set and the target system parameters are input into the target performance prediction model, then the performance prediction model is input into the EM algorithm, and an optimized parameter set that optimizes the target performance index is obtained through iterative computation of the EM algorithm. In addition, the first preset algorithm may also be a bayesian algorithm, which is not specifically limited in this embodiment of the present application.
Briefly, in the parameter recommendation method provided in the embodiment of the present application, first, performance prediction models corresponding to different performance indexes are trained, and a parameter input set used for training the performance prediction distributed storage system is determined based on IO stack analysis and/or according to a prediction accuracy contribution of each parameter to the performance prediction model corresponding to each performance index in the process of training the performance prediction models. Therefore, when the performance prediction model is obtained, the corresponding relation between each parameter and the performance index can be obtained, after the performance prediction model is obtained, which type of performance index is more concerned under the current scene can be determined according to specific storage scenes such as hardware configuration, storage strategies and the like, then the performance prediction model corresponding to the performance index is selected, the parameter input set, the system parameters and the performance prediction model corresponding to the performance prediction model are input into a first preset algorithm for iterative computation, the optimal parameter set under the current storage scene is finally obtained, and the optimal parameter set is recommended to the distributed storage system so as to create the distributed storage system according to the optimal parameter set.
That is to say, when parameter recommendation is performed in the embodiment of the application, a corresponding performance prediction model can be selected according to a performance index emphasized by an actual storage scene, according to the performance prediction model and an updated parameter set, it is determined that the performance index can be optimized under the value combination of parameters in the updated parameter set, the parameter value combination is recommended as an optimal parameter combination, and a distributed storage system is created according to the recommended optimal parameter combination.
The embodiment provides a parameter recommendation method, which includes determining a parameter input set of a distributed storage system and system parameters configured in at least one storage scene; determining at least one performance index according to the parameter input set and the system parameters; performing model training according to the parameter input set, the system parameters and the at least one performance index, and determining at least one performance prediction model and at least one updated parameter set; each performance index corresponds to a performance prediction model and an updating parameter set, and the updating parameter set is obtained by iteratively updating a parameter input set in the model training process; and recommending parameters to the distributed storage system according to the at least one performance prediction model and the at least one updating parameter set. Therefore, the performance detection model corresponding to the performance index emphasized by the current storage scene can be selected according to different storage scenes, and meanwhile, the parameters having the largest influence on the performance index can be known, so that only the important parameters are optimized in a targeted manner according to the performance prediction model, unnecessary resource consumption is reduced, and the efficiency of parameter optimization is improved. In addition, because the performance prediction model is added with parameter optimization, the resource consumption is reduced, and the coupling of the system is reduced at the same time. In addition, the embodiment of the application can obtain the corresponding relation between the parameters and the performance indexes in the training process of the performance prediction model, and is beneficial to research and development personnel to carry out targeted modification work on the IO stack, so that the optimization efficiency is improved.
In another embodiment of the present application, refer to fig. 3, which shows an application architecture diagram of a parameter recommendation system provided in an embodiment of the present application. As shown in fig. 3, the parameter recommendation system may include three modules, respectively: a parameter input set construction module 301, a performance index acquisition module 302, and a model training and parameter optimization module 303.
The parameter input set construction module 301 is used to construct a parameter input set of the distributed storage system. The types of the parameters can be determined based on an IO stack analysis method or a characteristic contribution degree analysis method, and then the parameters of the types are randomly sampled to obtain a parameter input set of the distributed storage system.
The performance indicator obtaining module 302 is configured to determine a performance indicator according to a parameter input set of the distributed storage system and a system parameter.
The model training and parameter optimizing module 303 is configured to train the performance prediction model and perform parameter optimization according to the performance prediction model. The model training and parameter optimization module 303 may be divided into two sub-modules, namely a model training sub-module 303A and a parameter optimization sub-module 303B. The model training submodule 303A is configured to, on one hand, obtain a prediction accuracy contribution of the input parameter to the performance prediction model to update the parameter input set after analyzing the input parameter, and on the other hand, obtain the performance prediction model; the parameter optimization sub-module 303B is configured to optimize the hyper-parameters of the second preset algorithm in the process of training the performance prediction model.
The following will describe in detail the work flow of each module of the parameter recommendation system provided in the embodiment of the present application with reference to fig. 3, it should be noted that the work between the modules is not independently divided, for example: when the workflow of the parameter input set building module 301 is explained, the workflow needs to be specifically explained in combination with the model training and parameter optimization module 303.
As shown in fig. 3, the parameter input set building block 301 may correspond to steps S3011 to S3022, which are as follows:
and S3011, selecting a parameter type.
And S3012, randomly sampling the parameters.
It should be noted that, the embodiment of the present application provides a parameter recommendation method, which is applied to a distributed storage system of an SSD, which provides a resource service on an IaaS layer, to perform parameter recommendation on the SSD, and for the distributed storage system of the SSD, which provides the resource service on the IaaS layer, a large number of configurable parameters are often involved. Taking Ceph as an example, the configurable parameters provided by the Ceph are more than thousands of, and each parameter has a certain value range, so that if each parameter combination is evaluated, a large amount of resources are wasted; in the thousands of parameters, not every parameter can have a great influence on the performance of the distributed storage system, so that the parameter optimization can be well realized only by taking the parameter which has a remarkable influence on the performance as a sample for training; if parameters which have no or little influence on the performance are trained, system resources are wasted and more time is consumed. Therefore, the embodiment of the present application first screens out the parameter types necessary as training samples.
Specifically, selecting the type of parameter is to screen the parameter, and may be performed in one or two of the following two ways.
The method I comprises the following steps: based on IO stack analysis, parameters related to IO stacks are determined by analyzing the IO stacks of the distributed storage system, and the parameters can influence the response of an IO path, so that the performance of the distributed storage system is influenced. This approach requires that a developer have a clear understanding of the architecture of the distributed storage system and know which parameters correspond to which IO processes. For example, in the design of ceph, double writing is involved, and journal is written first and then the disk is dropped, so that some parameters control how to write journal, whether to use cache, when to flush data, and the like, which all have an influence on the performance of the distributed storage system. Based on the parameter types screened out by the IO stack analysis, the parameter types can be regarded as IO stack related types.
However, the number of parameters involved in the distributed storage system is huge, and users of the distributed storage system are not all developers, and in this case, the screening of the parameters is difficult, and if all the parameters are added to the training process of the performance prediction model, the calculation amount of the training process is quite large, and a great amount of calculation resources are consumed. In this case, parameter screening may be performed in the second method.
The second method comprises the following steps: based on the characteristic contribution degree analysis, the characteristic contribution degree analysis means that in the process of training a performance prediction model, the prediction result of each time can be used as the evaluation of all parameters participating in the training, and the parameters have high contribution degree to the accuracy of the prediction result and are more easily selected to enter the next iteration process.
Specifically, the method comprises the following steps: firstly, all configurable parameters of the distributed storage system are determined as candidate parameters, and then all the candidate parameters are randomly sampled within the parameter value range, so that an initial parameter input set of the distributed storage system is obtained.
Then, in the performance index obtaining module 302, the performance index is obtained according to the initial parameter input set and the system parameter input set, and the parameter types that have a sudden change influence on the performance in the initial parameter input set can be screened out by a sudden change point detection method and used as parameters of the training performance prediction model. Subsequently, in the model training and parameter optimizing module 303, the performance prediction model corresponding to each performance index can be obtained by training the performance prediction model according to the initial parameter input set, the system parameter input set and the performance index, and the feature contribution is analyzed according to the prediction accuracy contribution (also referred to as the feature contribution) of each parameter to the performance prediction model, so that the more important parameter type, i.e., the parameter type with higher prediction accuracy contribution to the performance prediction model, can be determined, and the more important parameter type can be selected. Randomly sampling the selected parameter types within a preset parameter value range to obtain a parameter input set, then determining a performance index according to the parameter input set and system parameters, continuing to execute the step of training the prediction model to obtain the performance prediction model and the prediction accuracy contribution of each parameter to the performance prediction model again, determining the parameter types with higher prediction accuracy contribution to the performance prediction model again, updating the parameter input set according to the parameters, and then iteratively executing the process until the iteration number reaches the preset iteration number.
It should be noted that, the process of obtaining the performance index and performing the model training will be described in detail in the following steps, which are not described herein again.
As shown in fig. 3, the performance index obtaining module 302 may correspond to steps S3021 to S3024, which are as follows:
and S3021, determining a parameter input set of the distributed storage system.
And S3022, determining a system parameter input set.
And the parameter input set of the distributed storage system is a parameter input set obtained by randomly sampling the parameters of the determined types in the value range.
The system parameter input set may include customized system parameters, such as system recommended configuration parameters in different storage scenarios. Here, the different storage scenarios may be, for example, bare metal block storage, virtual machine block storage, object storage, full flash block storage, and the like.
Exemplary, customized system parameters may include: maximum thread number limit (kernel.threads-max), maximum thread number limit (kernel.pid _ max), physical memory usage mode (vm.swap), CPU number, memory capacity, and the like.
According to the parameter recommendation method provided by the embodiment of the application, the optimal parameter combination can be determined according to hardware configuration, system setting, storage strategies, storage use scenes and the like, so that the storage performance of the distributed storage system in different use scenes is improved.
In terms of hardware configuration, for storage, a mechanical hard disk or a solid state hard disk is used by a node deployed in the distributed storage system, or a hybrid deployment is used. For example, in a common object storage scenario, the data disk is typically a relatively low-cost mechanical hard disk, the solid state disk is used as a cache disk, and in the double write process, the journal write in the SDD is faster than the joarnal write in the HDD.
From the viewpoint of system configuration, the performance of the same distributed storage system may also be different on different operating systems, such as the conventional x86 system and the ARM system, and whether the operating system is a large Page (Huge Page), the maximum number of processes, etc. are all needed to be adapted to the distributed storage system to achieve better performance of the distributed storage system. Usually, the empirical values are taken, but in the parameter recommendation method provided in the embodiment of the present application, this step of system setting may be added to the training process.
In view of storage scenarios, different storage scenarios have different requirements on performance and configuration, for example, object storage requires a storage vendor to provide a large amount of space, but has a relatively low requirement on time delay (i.e., read-write delay), and for storage scenarios such as backup migration, the requirement on data transmission time is relatively relaxed, while for storage scenarios of super-speed blocks to provide services for a database, the requirement on time delay is very high, and in different storage scenarios, standard hardware configuration is also different. Under the condition of being sensitive to cost, more and cheaper mechanical hard disks can be selected to provide larger storage space, and in the storage scene sensitive to time delay, the influence of IO bottleneck can be reduced by using a solid state disk with better performance as far as possible.
One feature of the distributed storage system is the reliability of data from the perspective of storage policy, each data has multiple copies, and this part is also associated with the storage scenario, for example, object storage that wants to provide as much storage space as possible generally selects an erasure code storage policy, and if block storage, generally selects a three-copy storage policy. In addition, in order to implement data security, it is necessary to consider how to distribute the copies in a storage policy, where the copies are placed on different servers in the same rack and on different racks, even in different data centers, and differences in the storage policy affect the performance of the distributed storage software, and because the differences in the storage policy cause differences in communication costs between the copies.
And how to provide the recommendation of the optimal distributed storage system parameter configuration is to abstract the parameter configuration into parameter variables. For example, if the operating system is x86 or ARM, then a parameter is _ x86=1, is \\arm =0 can be written; for another example, the storage policy is copy =3 if the three copies are three copies, and copy =2 if the erasure code is three copies, and the hardware configuration, the storage policy, the system configuration, and the like are added as specific parameters to the training process of the performance prediction model through abstraction.
And S3023, operating the distributed storage system.
And S3024, determining the performance index.
Through a performance testing tool (such as perf, fio and the like), according to the parameter input set of the distributed storage system and the combination of specific parameters in the parameter input set of the system, the performance index of the distributed storage system under different parameter combinations can be obtained by simulating and operating the distributed storage system.
Here, the performance index of the distributed storage system mainly refers to a read-write index, and at least may include one or more of the following: IOPS, mbps, CPU utilization, memory utilization, SWAP partition (i.e., SWAP) utilization, read-write latency, etc. The IOPS may include sequential read and write IOPS and random read and write IOPS. In practical applications, IOPS performance, mbps performance, and read-write latency performance are often of greater concern.
As shown in fig. 3, the model training and optimizing module 303 may correspond to steps S3031 to S3032, which are as follows:
s3031, training a performance prediction model.
And taking the parameter input set, the system parameter input set and the performance indexes of the distributed storage system as training sets, and constructing a performance prediction model through a random forest algorithm, so as to obtain the performance prediction model. In this embodiment, the obtained performance prediction model may include a plurality of performance prediction models respectively corresponding to the performance indexes.
Here, the process of selecting the parameter type by the parameter input set building module 301 is combined with the training performance prediction model, and the prediction accuracy contribution of each parameter to the corresponding performance prediction model is obtained by a mutation point detection method while obtaining a plurality of performance prediction models corresponding to the performance indexes respectively. And then selecting the parameter types according to the scheme, knowing that the iteration times are reached, the performance prediction model obtained at the moment is the final performance prediction model, and the determined parameter input set is a parameter which has a large influence on the performance of the distributed storage system.
It should be further noted that, assuming that there are originally 50 configurable parameters, after iterative update, only 8 configurable parameters, parameter 1, parameter 2, parameter 3 \8230, and parameter 8, remain in the parameter input set, and finally 2 performance prediction models are obtained: a prediction model and B prediction model. The parameters 1, 2, 3, 4 and 5 are used for training the A prediction model, so that when the A prediction model is used for parameter recommendation, only 5 parameters of the parameters 1, 2, 3, 4 and 5 are required to be optimized to obtain an optimized parameter set for recommendation.
The parameters 3, 4, 5, 6, 7 and 8 are used for training the B prediction model, so that when the B prediction model is used for parameter recommendation, only 6 parameters of the parameters 3, 4, 5, 6, 7 and 8 are needed for optimization, and an optimized parameter set is obtained for recommendation. That is, the parameters that have a large impact on each performance metric may be different.
In the embodiment of the application, the trained performance prediction model can directly participate in the process of optimizing and recommending the parameters, so that the step of acquiring performance indexes when constructing a new parameter set is omitted, and the tight coupling of a specific cluster is reduced to a certain extent.
This is because, if there is no performance prediction model, if the parameter optimization re-recommendation is to be implemented, the distributed storage system needs to be operated once each time the performance index is obtained, which results in a huge workload, but if the trained prediction model is used to participate in the process of parameter optimization re-recommendation, the performance index can be obtained directly through the performance prediction model instead of operating the distributed storage system, which greatly reduces the workload.
In addition, if no performance prediction model exists, each result is obtained in a fixed cluster, so that the optimized and re-recommended parameters are greatly influenced by the cluster, although when factors such as hardware configuration, storage strategy and the like are considered, the influence of the parameters on the performance of the distributed storage system can be obtained when different clusters operate in the training process, if the performance prediction model is obtained through training, the performance prediction model already takes the influence of a plurality of different hardware configurations and storage strategies into consideration, and the performance prediction model participates in the process of optimizing and re-recommending the parameters, so that the influence of the cluster itself on the results can be reduced. Thereby reducing the coupling of the final recommended parameters and the clusters. That is, since the process of obtaining the optimal recommended parameters is performed by the performance prediction model, the result is not accurate only on the training clusters, but is also universally referred, and the same is true on other clusters.
In the embodiment of the present application, since a plurality of performance prediction models for different performance indexes are finally obtained, there may be a case where a parameter suitable for the index a has a disadvantage to the index B. At this time, the parameter type and the specific model need to be selected according to the performance index of the emphasis, for example, the trained model is for index a, and then the parameter beneficial to index a is selected. This is why multiple models are trained, so that models trained for different criteria can be used for different storage requirements.
In the process of training the performance prediction model, the contribution degree of the parameters to the prediction accuracy of the performance prediction model can be directly determined through a random forest algorithm. The parameter with the prediction accuracy degree of contribution exceeding a preset degree of contribution threshold is used as the parameter of the screened sample, the preset degree of contribution threshold can be customized, for example, set to 50%, and specifically can be determined according to the acceptance of the number of iterations, and generally, the parameter of about 1/5-1/2 of each iteration can be reserved, so that the threshold setting is reasonable.
And S3032, carrying out hyper-parameter optimization.
In the process of training the performance prediction model, the hyper-parameters of the random forest can be optimized through a hyper-parameter optimization algorithm. That is, in the embodiment of the present application, step S3031 and step S3032 may be performed simultaneously. Illustratively, the optimization of the hyper-parameters may be implemented by means of bayesian optimization, such as:
inputting: f, X
D←initSample(f,X)
for i←|D|to T do
f←buildModel(X)
x i ←argmax x∈X s(x,f)
y i ←f(x i )
D←D∪(x i ,y i )
end for
Or alternatively
Inputting: f, X, S, M
D←initSample(f,X)
for i←|D|to T do
p(y|x,D)←buildModel(M,D)
xi←argmaxx∈Xs(x,p(y|x,D))
y i ←f(x i )
D←D∪(x i ,y i )
end for
Wherein X is all parameters of the random forest;
f represents an objective function, and is the AUC mean value of 5 times of cross validation of the random forest model.
Bayesian optimization of hyper-parameters is a common technical means in the field and will not be described here.
Thus, a final performance prediction model is obtained through a second preset algorithm (such as a random forest algorithm) and a hyper-parameter optimization algorithm (such as a Bayesian optimization algorithm).
After the final performance prediction model is obtained, only the performance prediction model and the system parameters of the storage scene needing parameter recommendation are input into a first preset algorithm, and the recommendation parameters matched with the current storage scene can be obtained through operation of the first preset algorithm.
That is to say, with the parameter recommendation system provided in the embodiment of the present application, an optimized parameter set and a performance prediction model can be finally obtained, and according to the performance prediction model, even if parameter recommendation needs to be performed on a new storage scenario, an optimized parameter set can be obtained by inputting system parameters, a parameter input set, and a performance prediction model of the new storage scenario into a first preset algorithm.
In the embodiment of the application, under a specific use scene, a corresponding specific performance prediction model is selected to recommend the optimal parameters. Therefore, the configuration (including hardware configuration, system configuration and the like) of the distributed storage system under different use scenes, such as whether the SSD is configured or not, the memory size, the number of CPUs (central processing units), the process limit number and the like can be obtained, the parameters are fixed, the required prediction model is selected, the parameter optimization algorithm is operated, and the optimized parameter setting of the distributed storage system is the finally recommended parameter. For example, in a time delay sensitive use scenario, a performance prediction model obtained by training with time delay as a performance index is selected, and a parameter optimization algorithm is run, so that a parameter set corresponding to an obtained optimal time delay result is recommended applicable parameter setting for the scenario.
Illustratively, the first preset algorithm is an EM algorithm, and the parameter recommendation process is as follows:
x: initial parameter set
F: performance prediction model
S: adjusting the value of X
M (X, y): x set conforming to Y _ exception
X _ perfect: recommended parameter values
Y _ exception = [ define ]: required performance index to be achieved
for(n)
y=F(X)
X=S(X)
when y in Y_exception:
M.append(X,y)
X_perfect=argmaxM(X,y)
Thus, by running the algorithm, the optimal parameter recommendation according with the current scene can be obtained.
To sum up, the parameter recommendation method provided by the embodiment of the present application can be integrally divided into three parts: constructing a parameter input set; acquiring a performance index; model training and hyper-parameter optimization.
A first part: the parameter input set is constructed, for example, ceph is taken as an example, the configurable parameters of the distributed storage system are numerous and can reach thousands, each parameter has a certain value range, and if each parameter combination is evaluated, a great amount of resources are wasted undoubtedly. In addition, not all parameters have a great influence on the performance of the distributed storage system, and the optimization can be performed well as long as the parameters having a significant influence on the performance are considered. Therefore, for the selection of the parameter input set, the following two schemes may be adopted, one scheme is to determine the relevant parameters of the IO stack by analyzing the IO stack of the distributed storage system, and these parameters may affect the response of the IO path and may definitely affect the performance, but this method needs a developer to have a deeper understanding on the implementation of the used distributed storage system. The other scheme is that all parameters of the distributed storage system are used as candidate parameters to carry out parameter value range random sampling to construct a test sample, then the test sample enters the second part to obtain performance indexes, and then enters the third part to construct a performance prediction model, the prediction accuracy contribution of each parameter to the performance prediction model can be obtained while the performance prediction model is obtained, the prediction accuracy contribution is analyzed (namely, characteristic contribution analysis), and a more important parameter set can be selected.
A second part: the performance indexes are mainly obtained through a parameter set obtained through the first part and self-defined system parameters (such as maximum thread number limit kernel. Threads-max, maximum process number limit kernel. Pid-max, how to use physical memory vm. Swappiness and the like), and performance indexes are obtained through performance testing tools perf, fio and the like, and the performance indexes are mainly read-write indexes, such as: IOPS, mbps, CPU utilization, memory utilization, SWAP utilization, read-write latency.
And a third part: training and optimizing, wherein the part can be subdivided into two parts, namely training a performance prediction model and optimizing parameters. Firstly, the purpose of training the performance prediction model is not only to analyze the feature contribution degree, but also to reduce the consumption of resources, the performance prediction model obtained by training through a sampling method can directly participate in the parameter optimization process, the step of acquiring the performance index when constructing a new parameter set is omitted, and thus the tight coupling of a specific cluster is reduced to a certain extent. The prediction method can select random forests, and the method can well solve the problem of the nonlinear relation between parameter values and performance indexes. The parameter optimization/recommendation algorithm selected by the embodiment of the application can obtain an approximate optimal solution at a very small evaluation cost, and the actual application requirement is that the training set is not rich enough, because the performance evaluation index under the parameter combination is obtained, the process of consuming resources is adopted, and the requirement is based on whether the prediction model is added into the parameter optimization process or the parameter optimization/recommendation algorithm is selected.
Briefly, the parameter recommendation method provided by the embodiment of the present application can be implemented by the following steps:
step 1: constructing a parameter input set: and randomly sampling the parameter value range.
And 2, step: system parameter input set: the system recommends configuration parameters (bare metal block storage, virtual machine block storage, object storage, and full flash block storage) under several scenarios.
And step 3: building a distributed storage cluster (namely a distributed storage system), and acquiring a performance index: such as sequential read and write IOPS, random read and write IOPS, latency (i.e., read and write latency), throughput, etc.
And 4, step 4: and taking the parameter input set and the performance index result as a training set to construct a performance prediction model through a random forest algorithm, and obtaining a performance prediction model for each performance index and the prediction accuracy contribution of each parameter to the corresponding model.
And 5: and (4) selecting parameters with the contribution degree exceeding 50% in the step (4) and repeating the steps (1) to (4) to obtain a new performance prediction model and a new prediction accuracy contribution degree. Steps 1-5 may be iteratively performed by the EM algorithm until a final performance prediction model is obtained. Through the steps, the performance prediction model used for the parameter optimization step is obtained, meanwhile, the corresponding relation between the parameters and the performance indexes can be obtained, and the method is beneficial for research personnel to carry out targeted modification work on the software IO stack, and improves the optimization efficiency.
In the process of training the performance prediction model, the hyper-parameters of the random forest can be optimized through a Bayesian algorithm.
Step 6: and (6) recommending parameters. And selecting a corresponding performance prediction model according to an actual scene, inputting the system parameters and the performance prediction model into a first preset algorithm, and obtaining an optimal parameter set through iterative computation.
Compared with the scheme of the related art, the parameter recommendation method provided by the embodiment of the application has at least the following advantages: most of the existing parameter optimization schemes are tightly coupled with the system, and theoretically, parameter optimization is needed once when each system with different configurations is configured, so that resources are consumed. Some parameter optimization schemes require that a tuning person has deeper understanding on the distributed storage system, and the learning cost is higher. The embodiment of the application aims to realize automation of each link from parameter set selection, model establishment and parameter recommendation of a parameter optimization system as far as possible by reducing manual intervention. Moreover, since the non-black box algorithm is selected: for example, the random forest algorithm in the embodiment of the application, the algorithm with the strong explanatory property can obtain deep knowledge of each parameter through analysis of a calculation process, can also be used for performance bottleneck analysis, and provides a reasonable idea for optimization of a subsequent distributed storage system.
That is, the key points of the present application are:
(1) And providing recommendation of optimal distributed storage system parameter configuration according to different storage design use scenes and different configurations.
(2) And the performance prediction model is added into the parameter optimization, so that the resource consumption is reduced, and the coupling of the system is reduced.
(3) The corresponding relation between the parameters and the performance indexes can be obtained in the training process of the performance prediction model, and the method is beneficial for research and development personnel to carry out targeted modification work on the IO stack of the distributed storage system, and improves the optimization efficiency.
The embodiment provides a parameter recommendation method, and the specific implementation of the embodiment is elaborated through the embodiment, so that it can be seen that the parameter recommendation method provided by the embodiment can select a corresponding performance prediction model according to an actual storage scene, optimize and recommend parameters with large influence on performance indexes corresponding to the performance model, and realize targeted parameter recommendation; all links of parameter selection, model establishment and parameter recommendation can be automatically realized, so that the influence of manual intervention is reduced, and the influence is not limited by the service level of a professional; because the process of parameter optimization is added into the prediction model, the resource consumption of the system is reduced, the coupling of the system can be reduced, and meanwhile, according to the corresponding relation between the parameters and the performance indexes obtained in the model training process, research personnel can pertinently modify the IO stack, and the optimization efficiency is further improved.
In another embodiment of the present application, referring to fig. 4, a schematic structural diagram of a parameter recommendation apparatus 40 provided in the embodiment of the present application is shown. As shown in fig. 4, the parameter recommending apparatus 40 may include: a determination unit 401, a training unit 402, and a recommendation unit 403, wherein,
a determining unit 401 configured to determine a parameter input set of the distributed storage system and system parameters configured in at least one storage scenario; and determining at least one performance indicator based on the input set of parameters and the system parameters;
a training unit 402 configured to perform model training according to the parameter input set, the system parameter, and the at least one performance indicator, and determine at least one performance prediction model and at least one updated parameter set; each performance index corresponds to a performance prediction model and an updating parameter set, and the updating parameter set is obtained after the parameter input set is iteratively updated in the model training process;
a recommending unit 403 configured to recommend parameters to the distributed storage system according to the at least one performance prediction model and the at least one updated parameter set.
In some embodiments, as shown in fig. 5, the parameter recommendation apparatus 40 may further include an obtaining unit 404 configured to obtain target system parameters configured by the distributed storage system in a preset storage scenario; according to a target performance index to be subjected to parameter recommendation, selecting a target performance prediction model corresponding to the target performance index from the at least one performance prediction model, and selecting a target parameter set corresponding to the target performance index from the at least one updating parameter set;
the recommending unit 403 is further configured to perform parameter optimization on the target parameter set and the target system parameters by using the first preset algorithm according to the target performance prediction model to obtain an optimized parameter set corresponding to the target performance index, and recommend the optimized parameter set to the distributed storage system.
In some embodiments, the determining unit 401 is further configured to perform input/output IO stack analysis on the distributed storage system, and determine a relevant parameter of the IO stack; and randomly sampling the preset IO stack parameter value range of the related parameters of the IO stack to obtain a related parameter set, and determining the related parameter set as the parameter input set.
In some embodiments, the determining unit 401 is further configured to obtain an initial parameter set of the distributed storage system, perform random sampling on parameters in the initial parameter set within a preset parameter value range, and determine a test sample; and determining at least one performance indicator based on the test sample and the system parameter; performing model training on the test sample, the system parameters and the at least one performance index by using a second preset algorithm, and determining at least one intermediate performance prediction model and the prediction accuracy contribution of each parameter in the test sample to each intermediate performance prediction model; and analyzing the contribution degree of the determined prediction accuracy rate, and selecting the parameter input set from the test sample according to the analysis result.
In some embodiments, as shown in fig. 5, the parameter recommendation apparatus 40 may further include an analysis unit 405 configured to select, from the determined prediction accuracy contributions, a candidate prediction accuracy contribution of which the prediction accuracy contribution is greater than a preset contribution threshold, and determine a first parameter set by using a parameter corresponding to the candidate prediction accuracy contribution;
the training unit 402 is further configured to determine the number of iterations of the at least one performance prediction model; when the iteration times are smaller than the preset iteration times, determining the first parameter set as the initial parameter set, performing 1 addition processing on the iteration times, returning to perform random sampling of the parameter in the initial parameter set in a preset parameter value range, and determining a test sample; and when the iteration times reach a preset iteration time, determining the first parameter set as the parameter input set, and determining a model obtained after the latest iteration update as the at least one performance prediction model.
In some embodiments, the determining unit 401 is further configured to, after obtaining the first parameter set, determine the prediction accuracy contribution of each parameter in the first parameter set corresponding to each performance index according to the prediction accuracy contribution of each parameter in the first parameter set to each performance prediction model and the corresponding relationship between the performance prediction model and the performance index; and under each performance index, selecting candidate prediction accuracy contribution degrees with the prediction accuracy contribution degrees larger than a preset contribution degree threshold from the determined prediction accuracy contribution degrees, and determining parameters corresponding to the candidate prediction accuracy contribution degrees as the updated parameter set corresponding to each performance index.
In some embodiments, the determining unit 401 is further configured to set up a test cluster corresponding to the at least one storage scenario according to the parameter input set and the system parameter; in the running process of the test cluster corresponding to the at least one storage scene, a performance test tool is used for obtaining the at least one performance index; wherein the performance indicators include at least some of: IOPS, throughput, CPU utilization, memory utilization, SWAP utilization, and read-write latency.
It is understood that, in this embodiment, a "unit" may be a part of a circuit, a part of a processor, a part of a program or software, etc., and may also be a module, or may be non-modular. Moreover, each component in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Accordingly, the present embodiment provides a computer storage medium storing a computer program which, when executed by at least one processor, implements the parameter recommendation method of any one of the preceding embodiments.
Based on the above-mentioned components of the parameter recommendation apparatus 40 and the computer storage medium, refer to fig. 6, which shows a specific hardware structure diagram of the parameter recommendation apparatus 40 according to an embodiment of the present application. As shown in fig. 6, the parameter recommending apparatus 40 may include: a communication interface 601, a memory 602, and a processor 603; the various components are coupled together by a bus system 604. It is understood that the bus system 604 is used to enable connected communication between these components. The bus system 604 includes a power bus, a control bus, and a status signal bus in addition to a data bus. For clarity of illustration, however, the various buses are labeled as bus system 604 in fig. 6. The communication interface 601 is configured to receive and send signals during information transmission and reception with other external network elements;
a memory 602 for storing a computer program capable of running on the processor 603;
a processor 603 for, when running the computer program, performing:
determining a parameter input set of a distributed storage system and system parameters configured in at least one storage scene;
determining at least one performance index according to the parameter input set and the system parameters;
performing model training according to the parameter input set, the system parameters and the at least one performance index, and determining at least one performance prediction model and at least one updated parameter set; each performance index corresponds to a performance prediction model and an updating parameter set respectively, and the updating parameter set is obtained after the parameter input set is subjected to iterative updating in the model training process;
and recommending parameters to the distributed storage system according to the at least one performance prediction model and the at least one updated parameter set.
It will be appreciated that the memory 602 in the subject embodiment can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory. The non-volatile Memory may be a Read-Only Memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash Memory. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of example, but not limitation, many forms of RAM are available, such as Static random access memory (Static RAM, SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic random access memory (Synchronous DRAM, SDRAM), double Data Rate Synchronous Dynamic random access memory (ddr Data Rate SDRAM, ddr SDRAM), enhanced Synchronous SDRAM (ESDRAM), synchronous chained SDRAM (Synchronous link DRAM, SLDRAM), and Direct memory bus RAM (DRRAM). The memory 602 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
And the processor 603 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 603. The Processor 603 may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software modules may be located in ram, flash, rom, prom, or eprom, registers, etc. as is well known in the art. The storage medium is located in the memory 602, and the processor 603 reads the information in the memory 602, and combines the hardware thereof to complete the steps of the above method.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units configured to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Optionally, as another embodiment, the processor 603 is further configured to perform the steps of the method of any of the previous embodiments when running the computer program.
Based on the composition and hardware structure diagram of the parameter recommendation apparatus 40, refer to fig. 7, which shows a composition structure diagram of a parameter recommendation device 70 provided in an embodiment of the present application. As shown in fig. 7, the parameter recommendation device 70 at least comprises the parameter recommendation apparatus 40 of any of the previous embodiments.
For the parameter recommendation system 70, the corresponding performance prediction models are selected for different storage scenarios, and parameter recommendation is performed on the distributed storage system according to the parameter set corresponding to the performance prediction models, so that automatic optimal parameter recommendation is performed on the distributed storage system in different storage scenarios, the efficiency of parameter recommendation is improved, and resource consumption is reduced.
The above description is only a preferred embodiment of the present application, and is not intended to limit the scope of the present application.
It should be noted that, in the present application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a component of' 8230; \8230;" does not exclude the presence of another like element in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present application are merely for description, and do not represent the advantages and disadvantages of the embodiments.
The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.
The features disclosed in the several product embodiments presented in this application can be combined arbitrarily, without conflict, to arrive at new product embodiments.
The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A method for parameter recommendation, the method comprising:
determining a parameter input set of a distributed storage system and system parameters configured in at least one storage scene;
determining at least one performance indicator based on the input set of parameters and the system parameters;
performing model training according to the parameter input set, the system parameters and the at least one performance index, and determining at least one performance prediction model and at least one updated parameter set; each performance index corresponds to a performance prediction model and an updating parameter set, and the updating parameter set is obtained after the parameter input set is iteratively updated in the model training process;
and recommending parameters to the distributed storage system according to the at least one performance prediction model and the at least one updated parameter set.
2. The method of claim 1, wherein said making parameter recommendations to the distributed storage system based on the at least one performance prediction model and the at least one updated set of parameters comprises:
acquiring target system parameters configured by the distributed storage system in a preset storage scene;
according to a target performance index to be subjected to parameter recommendation, selecting a target performance prediction model corresponding to the target performance index from the at least one performance prediction model, and selecting a target parameter set corresponding to the target performance index from the at least one updating parameter set;
and according to the target performance prediction model, performing parameter optimization on the target parameter set and the target system parameters by using a first preset algorithm to obtain an optimized parameter set corresponding to the target performance index, and recommending the optimized parameter set to the distributed storage system.
3. The method of claim 1, wherein determining the input set of parameters for the distributed storage system comprises:
performing input/output (IO) stack analysis on the distributed storage system, and determining related parameters of the IO stack;
and randomly sampling the preset IO stack parameter value range of the related parameters of the IO stack to obtain a related parameter set, and determining the related parameter set as the parameter input set.
4. The method of claim 1, wherein determining the input set of parameters for the distributed storage system comprises:
acquiring an initial parameter set of the distributed storage system, randomly sampling parameters in the initial parameter set in a preset parameter value range, and determining a test sample;
determining at least one performance index according to the test sample and the system parameter;
performing model training on the test sample, the system parameters and the at least one performance index by using a second preset algorithm, and determining at least one intermediate performance prediction model and the prediction accuracy contribution degree of each parameter in the test sample to each intermediate performance prediction model;
and analyzing the contribution of the determined prediction accuracy, and selecting the parameter input set from the test sample according to the analysis result.
5. The method of claim 4, wherein said analyzing the contribution of the determined prediction accuracy and selecting the input set of parameters from the test sample based on the analysis comprises:
selecting candidate prediction accuracy rate contribution degrees of which the prediction accuracy rate contribution degrees are larger than a preset contribution degree threshold value from the determined prediction accuracy rate contribution degrees, and determining a first parameter set by using parameters corresponding to the candidate prediction accuracy rate contribution degrees;
determining the number of iterations of the at least one performance prediction model;
when the iteration times are smaller than the preset iteration times, determining the first parameter set as the initial parameter set, performing 1 addition processing on the iteration times, returning to perform random sampling of the parameter in the initial parameter set in a preset parameter value range, and determining a test sample;
and when the iteration times reach a preset iteration time, determining the first parameter set as the parameter input set, and determining a model obtained after the latest iteration update as the at least one performance prediction model.
6. The method of claim 5, wherein during the model training process, the method further comprises:
after the first parameter set is obtained, determining the prediction accuracy contribution of each parameter in the first parameter set corresponding to each performance index according to the prediction accuracy contribution of each parameter in the first parameter set to each performance prediction model and the corresponding relation between the performance prediction model and the performance index;
and under each performance index, selecting candidate prediction accuracy rate contribution degrees of which the prediction accuracy rate contribution degrees are larger than a preset contribution degree threshold value from the determined prediction accuracy rate contribution degrees, and determining parameters corresponding to the candidate prediction accuracy rate contribution degrees as the updated parameter set corresponding to each performance index.
7. The method of any of claims 1 to 5, wherein determining at least one performance indicator based on the input set of parameters and the system parameters comprises:
according to the parameter input set and the system parameters, building a test cluster corresponding to the at least one storage scene;
in the running process of the test cluster corresponding to the at least one storage scene, obtaining the at least one performance index by using a performance test tool;
wherein the performance indicators include at least one of: unit time reading and writing times IOPS, throughput, central processing unit CPU utilization rate, memory utilization rate, exchange memory SWAP utilization rate and reading and writing delay.
8. A parameter recommendation device is characterized in that the parameter recommendation device comprises a determination unit, a training unit and a recommendation unit, wherein,
the determining unit is configured to determine a parameter input set of the distributed storage system and system parameters configured in at least one storage scene; and determining at least one performance indicator based on the input set of parameters and the system parameters;
the training unit is configured to perform model training according to the parameter input set, the system parameters and the at least one performance index, and determine at least one performance prediction model and at least one updated parameter set; each performance index corresponds to a performance prediction model and an updating parameter set, and the updating parameter set is obtained after the parameter input set is iteratively updated in the model training process;
the recommendation unit is configured to recommend parameters to the distributed storage system according to the at least one performance prediction model and the at least one updated parameter set.
9. A parameter recommendation apparatus, comprising a memory and a processor, wherein,
the memory for storing a computer program operable on the processor;
the processor, when running the computer program, is configured to perform the parameter recommendation method of any of claims 1 to 7.
10. A computer storage medium, characterized in that the computer storage medium stores a computer program which, when executed by at least one processor, implements the parameter recommendation method according to any one of claims 1 to 7.
CN202111115232.8A 2021-09-23 2021-09-23 Parameter recommendation method and device and computer storage medium Pending CN115858660A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111115232.8A CN115858660A (en) 2021-09-23 2021-09-23 Parameter recommendation method and device and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111115232.8A CN115858660A (en) 2021-09-23 2021-09-23 Parameter recommendation method and device and computer storage medium

Publications (1)

Publication Number Publication Date
CN115858660A true CN115858660A (en) 2023-03-28

Family

ID=85652260

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111115232.8A Pending CN115858660A (en) 2021-09-23 2021-09-23 Parameter recommendation method and device and computer storage medium

Country Status (1)

Country Link
CN (1) CN115858660A (en)

Similar Documents

Publication Publication Date Title
US10922269B2 (en) Proactive optimizations at multi-tier file systems
US20200356901A1 (en) Target variable distribution-based acceptance of machine learning test data sets
US11036552B2 (en) Cognitive scheduler
Mahgoub et al. {OPTIMUSCLOUD}: Heterogeneous configuration optimization for distributed databases in the cloud
Wang et al. A simulation approach to evaluating design decisions in mapreduce setups
US8904377B2 (en) Reconfiguration of computer system to allow application installation
JP6031196B2 (en) Tuning for distributed data storage and processing systems
US10078455B2 (en) Predicting solid state drive reliability
US20080172668A1 (en) Profile-based cpu/core affinity
US20220103433A1 (en) Infrastructure discovery and analysis
US10154091B1 (en) Deploying infrastructure units according to resource hosting constraints
Torres et al. A hierarchical approach for availability and performance analysis of private cloud storage services
Hua et al. Hadoop configuration tuning with ensemble modeling and metaheuristic optimization
CN113821332B (en) Method, device, equipment and medium for optimizing efficiency of automatic machine learning system
Wasi-ur-Rahman et al. MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters
US11164086B2 (en) Real time ensemble scoring optimization
Bian et al. Simulating big data clusters for system planning, evaluation, and optimization
US10594620B1 (en) Bit vector analysis for resource placement in a distributed system
CN115858660A (en) Parameter recommendation method and device and computer storage medium
US20230137184A1 (en) Incremental machine learning for a parametric machine learning model
Li et al. Global reliability evaluation for cloud storage systems with proactive fault tolerance
Chen et al. iSDS: a self-configurable software-defined storage system for enterprise
US11888930B1 (en) System and method for management of workload distribution for transitory disruption
US11243832B2 (en) Dynamically analyzing diagnostic operations data via machine learning techniques
US11715037B2 (en) Validation of AI models using holdout sets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination