CN117389603A - Application performance optimization method, electronic device and storage medium - Google Patents

Application performance optimization method, electronic device and storage medium Download PDF

Info

Publication number
CN117389603A
CN117389603A CN202210793932.0A CN202210793932A CN117389603A CN 117389603 A CN117389603 A CN 117389603A CN 202210793932 A CN202210793932 A CN 202210793932A CN 117389603 A CN117389603 A CN 117389603A
Authority
CN
China
Prior art keywords
configuration
parameter
application
parameter value
configurations
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210793932.0A
Other languages
Chinese (zh)
Inventor
周全
杨肖
陈律
喻之斌
关冰宇
李国徽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN202210793932.0A priority Critical patent/CN117389603A/en
Publication of CN117389603A publication Critical patent/CN117389603A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management
    • G06F8/71Version control; Configuration management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/70Software maintenance or management

Abstract

The application discloses an application performance optimization method, electronic equipment and a storage medium, and relates to the technical field of computers. According to the application performance optimization method, the initial configuration is determined by determining the parameter value range of each operation parameter of the application, and the target configuration for operating the application is determined based on the initial configuration in the parameter value range of the operation parameter, so that the operation performance data of the application can meet the application performance optimization conditions. According to the method and the device, a large number of configurations formed by combining different parameter values do not need to be selected for each operation parameter, then the target configuration is gradually searched from the large number of configurations, only one initial configuration is needed to be determined in the parameter value range of each operation parameter, and then the target configuration for operating the application is obtained according to the initial configuration, so that the operation amount is small, and the target configuration capable of improving the application performance can be rapidly determined.

Description

Application performance optimization method, electronic device and storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an application performance optimization method, an electronic device, and a storage medium.
Background
Applications deployed on cloud servers typically require computation and processing of large amounts of data. At present, the applications usually process data by adopting default parameter values of operation parameters, the default parameter values often cannot well utilize the performance of the applications, and the better parameter values can greatly improve the performance of the applications.
However, each application has several tens or even hundreds of operation parameters, if different parameter values are selected for each operation parameter to form the configuration to be verified, a large number of configurations to be verified are obtained, each configuration to be verified is executed one by one, that is, each configuration to be verified is adopted to operate the application, and then the optimal configuration is selected, which takes a very long time.
Aiming at dozens or even hundreds of operation parameters, how to quickly determine the optimal configuration formed by the combination of the optimal parameter values of each operation parameter so as to operate the application by adopting the optimal configuration, thereby improving the performance of the application is a problem to be solved urgently.
Disclosure of Invention
The application performance optimization method, the electronic device and the storage medium can rapidly determine target configuration capable of improving application performance.
In a first aspect, an embodiment of the present application provides an application performance optimization method, where the method may include: determining a parameter value range for a plurality of operating parameters based on default parameter values for the plurality of operating parameters of the application; determining an initial configuration based on parameter value ranges of the plurality of operating parameters; the initial configuration comprises parameter values of a plurality of operation parameters of the application, wherein the parameter value of each operation parameter is in a parameter value range of the corresponding operation parameter; determining a target configuration based on the initial configuration; wherein the target configuration meets the following conditions: when the application is operated by adopting the target configuration, the operation performance data of the application meets the application performance optimization condition.
According to the application performance optimization method, the initial configuration is determined by determining the parameter value range of each operation parameter of the application, and the target configuration for operating the application is determined based on the initial configuration in the parameter value range of the operation parameter, so that the operation performance data of the application can meet the application performance optimization conditions. According to the method and the device, a large number of configurations formed by combining different parameter values do not need to be selected for each operation parameter, then the target configuration is gradually searched from the large number of configurations, only one initial configuration is needed to be determined in the parameter value range of each operation parameter, and then the target configuration for operating the application is obtained according to the initial configuration, so that the operation amount is small, and the target configuration capable of improving the application performance can be rapidly determined.
Alternatively, the running performance data of the application may include, but is not limited to, a duration for the application to process the data of the set data amount; the application performance optimization condition may include, but is not limited to, the duration of time taken to apply the data of the processing set data amount being less than or equal to the first set duration of time. The application is only exemplified by the case of the running performance data of the application and the case of the application performance optimization condition, but the running performance data of the application referred to herein may also include other cases, and correspondingly, the application performance optimization condition may also include other conditions, which is not particularly limited in this application.
In one possible implementation, when determining the parameter value ranges of the plurality of operating parameters, the initial parameter value ranges of the plurality of operating parameters may be determined based on the default parameter values and the set first fluctuation ratio of the plurality of operating parameters of the application, respectively; generating a successful configuration by selecting parameter values within an initial parameter value range for each operating parameter; generating a first set number of simulated configurations based on the successful configuration; selecting a second set number of simulation configurations from the first set number of simulation configurations based on a distance between each simulation configuration and the successful configuration; and if the second set number of simulation configurations all meet the first set condition, determining parameter value ranges of a plurality of operating parameters based on the second set number of simulation configurations. Wherein, successful configuration refers to configuration satisfying a first setting condition, and the first setting condition is: when the application is operated by adopting the corresponding configuration, the duration of the application for processing the data of the set data quantity is less than or equal to the second set duration; the second set time period is longer than the first set time period.
When the successful configuration is generated, the following operations may be repeatedly performed until the determined configuration to be determined meets the first setting condition: selecting a parameter value of each operation parameter in a parameter value range of each operation parameter, and obtaining configuration to be determined based on the selected parameter value of each operation parameter; and determining whether the obtained configuration to be determined meets the first setting condition, if not, returning to re-execute the selection of the parameter value of each operation parameter in the initial parameter value range of each operation parameter, wherein the parameter value of each operation parameter which is re-selected is different from the parameter value of each operation parameter which is already selected. And finally, taking the configuration to be determined meeting the first setting condition as successful configuration.
In the above embodiment, the fluctuation ratio of the parameters may be set, and the initial parameter value ranges of the plurality of operating parameters are respectively determined based on the default parameter values of the operating parameters; and searching the successful configuration in the initial parameter value range, and obtaining parameter value ranges of a plurality of operation parameters based on the parameter values of the operation parameters in the simulation configurations derived from the successful configuration. By the method, the parameter value range of each operation parameter can be determined more accurately, and the method is beneficial to searching for a proper target configuration in the subsequent steps more quickly.
In one possible implementation manner, when generating the first set number of simulation configurations based on the successful configuration, the successful configuration and the initial parameter value range of each operation parameter may be input into the parameter range model to obtain the first set number of simulation configurations output by the parameter range model; the parameter range model is obtained by training a virtual sample generation network VSGNet by adopting a set test sample.
In another possible implementation manner, when generating the first set number of simulation configurations based on the successful configuration, the intermediate parameter value range of each operation parameter may be determined based on the parameter value of each operation parameter included in the successful configuration and the set second fluctuation ratio; generating a first set number of simulation configurations by performing a plurality of selection of parameter values within the intermediate parameter value range for each operating parameter; the second fluctuation ratio is smaller than the first fluctuation ratio.
The two methods for generating the simulation configuration provided by the embodiment of the application are convenient for a user to flexibly select according to actual needs.
In one possible implementation, for an i-th operating parameter of the plurality of operating parameters, the parameter value range of the i-th operating parameter may be determined by: selecting a minimum parameter value from the parameter values of the ith operating parameter respectively contained in the second set number of simulation configurations as the lower limit of the parameter value range of the ith operating parameter, and selecting a maximum parameter value as the upper limit of the parameter value range of the ith operating parameter to obtain the parameter value range of the ith operating parameter; and i is taken from 1 to N, wherein N is the number of the plurality of operation parameters.
Through the process, the parameter value ranges corresponding to the operation parameters of the application can be rapidly determined.
In one possible implementation, when determining the initial configuration based on the parameter value ranges of the plurality of operating parameters, a third set number of candidate initial configurations may be generated by performing a plurality of selections of parameter values within the parameter value ranges of each operating parameter; running an application by adopting each candidate initial configuration, and recording the execution duration of each candidate initial configuration; the execution time of any one candidate initial configuration is the time used by the application to process the data of the set data volume when the application is operated by adopting any one candidate initial configuration; and taking the candidate initial configuration with the shortest execution duration as the initial configuration.
The initial configuration determined by the above process is the optimal configuration with the shortest execution time, and the appropriate target configuration can be searched faster by searching on the basis of the optimal configuration.
In one possible implementation, when generating the target configuration based on the initial configuration, the following steps may be repeatedly performed with the initial configuration as the base configuration: generating candidate configurations based on the base configuration; running the application by adopting the generated candidate configuration, and acquiring running performance data of the application; if the running performance data of the application does not meet the set application performance optimization conditions, returning the current candidate configuration as a new basic configuration to execute the operation of generating the candidate configuration based on the basic configuration; and if the running performance data of the application does not meet the set application performance optimization conditions, taking the current candidate configuration as a target configuration.
In the above embodiment, based on the initial configuration, the determined target configuration may enable the performance of the application to reach or approach the optimum by means of a loop search.
In one possible implementation, when generating the candidate configuration based on the base configuration, a fourth set number of virtual configurations may be generated according to the base configuration; repeating the following operations until the ratio of the maximum confidence coefficient obtained by two adjacent times is within a set ratio range: selecting a virtual configuration which is not selected and has the smallest sample distance from the fourth set number of virtual configurations; running the application by adopting the selected virtual configuration, obtaining the execution time of the application, and determining the corresponding maximum confidence coefficient according to the execution time; and when the ratio of the maximum confidence coefficient obtained in the two adjacent times is within a set ratio range, the virtual configuration selected when the maximum confidence coefficient is obtained in the next time in the two adjacent times is used as the candidate configuration.
Through the process, among the multiple virtual configurations generated based on the basic configuration, the candidate configuration with the maximum confidence degree meeting the condition is determined, and whether the running performance data of the application meets the set application performance optimization condition or not when the candidate configuration is executed is judged, so that the proper target configuration can be searched more quickly.
In one possible implementation manner, when a fourth set number of virtual configurations are generated according to the basic configuration, the basic configuration may be input into the distance measurement model to obtain a first set number of virtual configurations output by the distance measurement model; and selecting a fourth set number of virtual configurations from the first set number of virtual configurations according to the sample distance of each virtual configuration.
Wherein the distance metric model may be obtained by:
training the VSGNet by adopting a set test sample, and adjusting the hyper-parameters of the VSGNet to obtain a hyper-parameter value and a parameter range model for model training; the super-parameter value of the VSGNet comprises a frequency threshold value that the VSGNet accumulates to reach a convergence condition in the training process, a change percentage threshold value of each operation parameter in the convergence condition, the number of simulation samples output by the VSGNet generator each time, and a number threshold value of simulation samples meeting a second set condition in the simulation samples output by the VSGNet generator each time; training the parameter range model by training data according to the super-parameter value trained by the model to obtain a distance measurement model; the training data is data obtained from a configuration with the shortest execution duration among configurations with the determined execution duration.
In the above embodiment, the parameter range model adopts VSGNet, the VSGNet generally includes a generator and a discriminator, the output of the generator is used as the input of the discriminator, the loss function corresponding to the generator is used for guiding the training of the generator, the loss function corresponding to the discriminator is used for guiding the training of the discriminator, and the two are not mutually guided to do the countermeasure training, so that the training process is faster and the time is shorter.
In one possible implementation, when generating the candidate configuration based on the basic configuration, parameter value search ranges of the plurality of operating parameters may be determined based on the parameter values of the plurality of operating parameters included in the basic configuration and the set third fluctuation ratio, respectively, and the candidate configuration may be generated based on the parameter value search ranges of the plurality of operating parameters.
In a second aspect, embodiments of the present application provide an electronic device comprising a processor and a memory; a memory for storing computer executable instructions; a processor for reading and executing computer-executable instructions stored in the memory such that the method as set forth in any one of the possible designs of the first aspect is performed.
In a third aspect, embodiments of the present application provide a chip, including a processor and a power supply circuit; the power supply circuit is for powering the processor, the processor for reading and executing the computer executable instructions, such that the method proposed by any one of the possible designs of the above-mentioned first aspect is executed.
In a fourth aspect, embodiments of the present application provide a computer-readable storage medium having stored therein computer-executable instructions for causing a computer to perform the method set forth in any one of the possible designs of the first aspect.
In a fifth aspect, embodiments of the present application provide a computer program product comprising computer-executable instructions for causing a computer to perform the method as set forth in any one of the possible designs of the first aspect.
The technical effects achieved by any one of the second to fifth aspects may be referred to the description of the beneficial effects in the first aspect, and the detailed description is not repeated here.
Drawings
Fig. 1 is a schematic diagram of an application scenario in an embodiment of the present application;
FIG. 2 is a flowchart of an application performance optimization method according to an embodiment of the present application;
FIG. 3 is a flowchart of another method for optimizing application performance according to an embodiment of the present application;
FIG. 4 is a flowchart of another method for optimizing application performance according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an application performance optimization device according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of another performance optimizing apparatus according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of a chip according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in detail below with reference to the accompanying drawings. The terminology used in the description section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.
Before describing the embodiments provided herein, some of the terms in this application are explained in order to facilitate understanding by those skilled in the art, and are not limited to the terms in this application.
(1) Operating parameters: an application typically runs under set operating parameters, and an application may have multiple operating parameters, such as the amount of memory that the application may occupy, the number of processor cores that the application may use, the number of threads that the application occupies when running, and so on.
(2) Configuration: the combination of parameter values for a plurality of operating parameters employed by an application at runtime is referred to as configuration. When each operating parameter uses a default parameter value, the resulting combination of parameter values may be referred to as a default configuration.
In the embodiments of the present application, the term "plurality" refers to two or more, and in view of this, the term "plurality" may also be understood as "at least two" in the embodiments of the present application. "at least one" may be understood as one or more, for example as one, two or more. For example, including at least one means including one, two or more, and not limiting what is included, e.g., including at least one of A, B and C, then A, B, C, A and B, A and C, B and C, or A and B and C, may be included. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/", unless otherwise specified, generally indicates that the associated object is an "or" relationship.
Unless stated to the contrary, the embodiments of the present application refer to ordinal terms such as "first," "second," etc., for distinguishing between multiple objects and not for defining a sequence, timing, priority, or importance of the multiple objects.
Because each application has tens or even hundreds of operation parameters, if different parameter values are selected for each operation parameter to form the configuration to be verified, a large number of configurations to be verified can be obtained, each configuration to be verified is executed one by one, namely, each configuration to be verified is adopted to operate the application, and then the optimal configuration is selected, so that a very long time can be consumed. In order to quickly determine an optimal configuration formed by combining the preferred parameter values of each operation parameter so as to improve the performance of the application, the embodiment of the application provides an application performance optimization method which can be used for determining the optimal configuration of the application with a plurality of operation parameters so as to improve the performance of the application. By way of example, these applications may include applications deployed in a big data platform of a cloud server, such as the popular big data framework applications of spark, redis, kafka, etc., which have many key operating parameters.
The application performance optimization method provided by the embodiment of the application can be applied to the application scene shown in fig. 1. The application scenario shown in fig. 1 may be understood as a cloud server system, where the cloud server system may include a server cluster including a router 100, a switch 200 and a plurality of servers 300, and the servers 300 in the server cluster may be divided into a master node and a slave node, where the master node is used for scheduling monitoring and publishing tasks, and actual configuration and program running on the slave node. The spark, redis, kafka and other applications can be deployed on the same server or can be deployed on different servers respectively.
The application performance optimization method provided by the embodiment of the application can be used for determining the optimal configuration of the spark, redis, kafka and other applications so as to improve the performance of the application. The method may be performed by any of the servers 300 of fig. 1, as well as by other electronic devices.
Fig. 2 shows a flowchart of an application performance optimization method according to an embodiment of the present application. As shown in fig. 2, the application performance optimization method may include the steps of:
s201, determining parameter value ranges of a plurality of operation parameters based on default parameter values of the plurality of operation parameters of the application.
An application may have a plurality of operating parameters, each operating parameter being provided with a default parameter value. For example, assume an application has the following operating parameters: the amount of memory that an application can occupy, the number of processor cores that an application can use, the number of threads that an application occupies when running. The default parameter value for the memory size may be 10G, the default parameter value for the number of processor cores may be 5, and the default parameter value for the number of threads occupied during run-time may be 20.
In determining the parameter value ranges of the plurality of operating parameters, initial parameter value ranges of the plurality of operating parameters may be determined based on the default parameter values and the set first fluctuation ratio of the plurality of operating parameters, respectively, of the application, and a successful configuration may be generated by selecting a parameter value within the initial parameter value range of each operating parameter. Wherein, the successful configuration meets a first setting condition, and the first setting condition is: when the application is configured to run, the duration of the application for processing the data of the set data amount is less than or equal to the second set duration.
Based on the successful configuration, a first set number of simulated configurations is generated. In some embodiments, a range of intermediate parameter values for each operating parameter may be determined based on the parameter values for each operating parameter included in the successful configuration and the set second fluctuation ratio, respectively, and a first set number of simulated configurations may be generated by selecting parameter values within the range of intermediate parameter values for each operating parameter; wherein the second fluctuation ratio is smaller than the first fluctuation ratio. In other embodiments, the successful configuration and the initial parameter value range for each operating parameter may be input to the parameter range model to obtain a first set number of simulated configurations output by the parameter range model; the parameter range model is obtained by training a virtual sample generation network (virtual sample generation networks, VSGNet) by using a set test sample.
And selecting a second set number of simulation configurations from the first set number of simulation configurations based on the distance between each simulation configuration and the successful configuration, and determining parameter value ranges of a plurality of operation parameters based on the second set number of simulation configurations if the second set number of simulation configurations meet the first set condition.
S202, determining initial configuration based on parameter value ranges of a plurality of operation parameters.
The initial configuration includes parameter values of the plurality of operation parameters, any one of the plurality of operation parameters is taken as a first operation parameter, and the parameter value of the first operation parameter is within a parameter value range of the first operation parameter.
For example, by selecting a plurality of sets of parameter values within a parameter value range of each operation parameter, a third set number of candidate initial configurations may be generated, each candidate initial configuration may be used to operate the application, and an execution duration of each candidate initial configuration may be recorded. The execution time of the first candidate initial configuration is the time used by the application to process the data of the set data volume when the application is operated by adopting the first candidate initial configuration; the first candidate initial configuration is any one of the third set number of candidate initial configurations. And taking the candidate initial configuration with the shortest execution duration as the initial configuration.
S203, determining a target configuration based on the initial configuration.
Wherein the target configuration meets the following conditions: when the target configuration is adopted to run the application, the running performance data of the application meets the application performance optimization condition. Illustratively, the running performance data of the application may include, but is not limited to, a duration of time for the application to process the data of the set data amount, and the application performance optimization condition may include, but is not limited to, a duration of time for the application to process the data of the set data amount being less than or equal to the first set duration of time. Wherein the first set time length is longer than the second set time length. The embodiment of the present application only exemplifies the case of the running performance data of the application and the case of the application performance optimization condition, but the running performance data of the application related to the embodiment of the present application may also include other cases, and correspondingly, the application performance optimization condition may also include other conditions, which is not specifically limited in this embodiment of the present application.
In some embodiments, the initial configuration may be used as a base configuration, and the following operations are repeated: and generating candidate configuration based on the basic configuration, and running the application by adopting the generated candidate configuration to acquire the running performance data of the application.
If the running performance data of the application does not meet the set application performance optimization conditions, taking the current candidate configuration as a new basic configuration, and returning to execute the operation of generating the candidate configuration based on the basic configuration; and if the running performance data of the application meets the set application performance optimization conditions, taking the current candidate configuration as a target configuration.
And outputting target configuration, and completing the configuration of each operation parameter of the application by referring to the target configuration, so that the application adopts the target configuration to operate, and the performance of the application can be greatly improved.
According to the application performance optimization method, the initial configuration is determined by determining the parameter value range of each operation parameter of the application, and the target configuration for operating the application is determined based on the initial configuration in the parameter value range of the operation parameter, so that the operation performance data of the application can meet the application performance optimization conditions. According to the method and the device, a large number of configurations formed by combining different parameter values do not need to be selected for each operation parameter, then the target configuration is gradually searched from the large number of configurations, only one initial configuration is needed to be determined in the parameter value range of each operation parameter, and then the target configuration for operating the application is obtained according to the initial configuration, so that the operation amount is small, and the target configuration capable of improving the application performance can be rapidly determined.
In order to facilitate understanding of the technical solution provided in the embodiments of the present application, the application performance optimization method provided in the embodiments of the present application is described below by using two detailed specific examples.
In one embodiment, as shown in fig. 3, the application performance optimization method provided in the embodiment of the present application may include the following steps:
s301, determining an initial parameter value range of each operation parameter based on default parameter values of each operation parameter of the application.
In one embodiment, the initial parameter value range of each operating parameter may be determined based on the default parameter value of each operating parameter of the application and the set first fluctuation ratio, respectively, and the first fluctuation ratio may be, for example, 50%, that is, the initial parameter value range of each operating parameter of the application may be set to be up to and down to 50% of the default parameter value of each operating parameter, for example, if the default parameter value of a certain operating parameter is 10, the parameter value range of the operating parameter may be [5,15]; as another example, if the default parameter value for the number of threads occupied by the runtime is 20, the parameter value range for the operating parameter may be [10,30]. In another embodiment, the initial parameter value range for each operating parameter of the application may be set to 40%, 60%, or any other percentage above and below the default parameter value for each operating parameter.
S302, a successful configuration is generated by selecting parameter values within the initial parameter value range for each operating parameter.
And selecting the parameter value of each operation parameter in the parameter value range of each operation parameter respectively, obtaining configuration to be judged based on the selected parameter value of each operation parameter, if the configuration to be judged does not meet the first setting condition, selecting the parameter value of each operation parameter in the parameter value range of each operation parameter again, and if the configuration to be judged meets the first setting condition, taking the configuration to be judged as successful configuration. Wherein, the first setting condition is: when the application is configured to run, the duration of the application for processing the data of the set data amount is less than or equal to the second set duration.
The method includes the steps that after a configuration to be determined is obtained, the configuration to be determined is executed, namely, an application is enabled to operate by adopting parameter values of each operation parameter in the configuration to be determined, data of a set data amount are processed, and if all the data are processed within a second set time period, the execution can be considered to be successful; if the data is not processed completely within the second set time period or faults occur in the execution process, the execution is considered to be unsuccessful. If the configuration to be determined is successfully executed, the configuration to be determined is used as successful configuration; and if the execution of the configuration to be determined is unsuccessful, reselecting the parameter value of each operation parameter to generate a new configuration to be determined until successful configuration of successful execution is obtained.
S303, generating NG strip simulation configuration based on the successful configuration.
Where NG may be a preset integer value. In some embodiments, the intermediate parameter value range for each operating parameter may be determined based on the parameter value and the set second fluctuation ratio for each operating parameter included in the successful configuration, and the NG strip simulation configuration may be generated by selecting parameter values within the intermediate parameter value range for each operating parameter. Wherein the second fluctuation ratio is smaller than the first fluctuation ratio. For example, if the first fluctuation ratio is 50%, the second fluctuation ratio may be 25% or 30%.
S304, selecting N simulation configurations from the NG simulation configurations based on the distance between each simulation configuration and the successful configuration.
Wherein N is an integer value less than NG; the Manhattan distance between each simulation configuration and the successful configuration can be respectively determined, the NG simulation configurations are ordered according to the order of the Manhattan distance from large to small, and the first N simulation configurations are selected.
S305, judging whether all N simulation configurations meet a first setting condition; if yes, go to step S306; if not, the process returns to step S303.
Respectively executing N simulation configurations, if each simulation configuration is executed successfully, determining that the N simulation configurations all meet the first setting condition, and executing step S306; if any simulation configuration in the N simulation configurations is not successfully executed, the step S303 is executed again.
S306, determining parameter value ranges of a plurality of operation parameters based on the N simulation configurations.
Each of the plurality of operating parameters may be taken as a first operating parameter in turn, and the parameter value range of the first operating parameter may be determined by: and selecting the minimum parameter value from the parameter values of the first operation parameters contained in the N simulation configurations as the lower limit of the parameter value range of the first operation parameters, and selecting the maximum parameter value as the upper limit of the parameter value range of the first operation parameters to obtain the parameter value range of the first operation parameters.
For example, assuming that N is 3, the first operation parameter is the number of threads occupied by the application runtime, and the number of threads occupied by the application runtime in the three simulation configurations is 18, 16, and 20, respectively, where the minimum parameter value is 16, and the maximum parameter value is 20, it may be obtained that the parameter value range corresponding to the number of threads occupied by the application runtime is [16,20].
S307, determining an initial configuration based on the parameter value ranges of the plurality of operation parameters.
In some embodiments, M candidate initial configurations may be generated by selecting a plurality of sets of parameter values within a parameter value range of each operation parameter, running an application with each candidate initial configuration, and recording an execution duration of each candidate initial configuration, where the candidate initial configuration with the shortest execution duration is used as the initial configuration. The execution duration of any one candidate initial configuration is the duration used by the application to process the data of the set data volume when the application is operated by adopting the candidate initial configuration.
In other embodiments, M candidate initial configurations may be generated by selecting a plurality of sets of parameter values within a range of parameter values for each operating parameter, and the initial configuration may be determined from the M candidate initial configurations by an initialization strategy. Wherein, the initialization strategy may comprise the following steps: and respectively determining the sample distance of each candidate initial configuration in the M candidate initial configurations through a data generation quality estimator (data generation quality evaluator, DGQE), selecting the candidate initial configuration with the minimum sample distance from the candidate initial configurations which are not executed in the M candidate initial configurations as the minimum configuration, executing the minimum configuration, and acquiring the execution time of the minimum configuration. And taking the currently executed candidate initial configuration and the execution duration of each candidate initial configuration as a training set. The training set is input into a Gaussian Process (GP) model, and the maximum confidence of the current GP model is obtained. Judging whether the current maximum confidence coefficient meets a set confidence coefficient condition, wherein the set confidence coefficient condition can be that the ratio of the current maximum confidence coefficient to the maximum confidence coefficient obtained in the previous round is in a set ratio range. If not, returning to execute the step of selecting the candidate initial configuration with the minimum sample distance from the candidate initial configurations which are not executed in the M candidate initial configurations as the minimum configuration, and re-determining the minimum configuration; and if the current minimum configuration is met or all the M candidate initial configurations are executed, taking the current minimum configuration as the initial configuration.
S308, determining target configuration through a Bayesian optimization process based on the initial configuration.
Wherein the bayesian optimization (bayesian optimization, BO) process may comprise the steps of:
and generating candidate configurations based on the initial configuration, executing the generated candidate configurations, and judging whether the execution duration of the candidate configurations is smaller than or equal to the first set duration. The first set time period may be an integer multiple of the shortest time period, for example, the first set time period T max May be the shortest time runtimes best Can be expressed as T max =5*runtime best ,runtime best May be a value set based on historical experience.
If the execution duration runtime of candidate configuration is greater than T max Stopping executing the candidate configuration, and setting the execution time length of the candidate configuration to 10 x run time best And taking the current candidate configuration as a basic configuration, repeatedly executing the steps of generating a new candidate configuration based on the basic configuration and executing the new candidate configuration until the execution duration of the candidate configuration is less than or equal to the first set duration, taking the current candidate configuration as a final determined target configuration, and outputting the target configuration.
In some embodiments, the initial configuration and the execution duration of the initial configuration may be taken as initial samples when the first candidate configuration is generated. And taking the initial sample as training data of the GP model, selecting the expected optimal configuration through the GP model and the DGQE, and taking the optimal configuration as the first candidate configuration. When the non-first candidate configuration is generated, parameter value search ranges of a plurality of operation parameters may be respectively determined based on parameter values of the plurality of operation parameters included in the current candidate configuration and the set third fluctuation ratio. Illustratively, assuming that the third fluctuation ratio is 25%, the running parameter is exemplified by the number of threads occupied by the application runtime, and if the number of threads occupied by the application runtime in the initial configuration is 16, the parameter value search range corresponding to the number of threads occupied by the application runtime may be [12,20]. The parameter values of the operation parameters can be randomly selected in the parameter value search range of the operation parameters, the NG bar random configuration is generated, the sample distance between each random configuration in the NG bar random configuration and the current candidate configuration is respectively determined, and the configuration with the minimum sample distance is selected as the new candidate configuration.
After obtaining the new candidate configuration, executing the new candidate, and if the execution duration of the candidate configuration is less than or equal to the first set duration, taking the current candidate configuration as the target configuration, and rapidly determining and outputting the target configuration through the process.
In one embodiment, as shown in fig. 4, the application performance optimization method provided in the embodiment of the present application may include the following steps:
s401, constructing a virtual sample generation network comprising a generator and a discriminator.
The virtual sample generation network is the VSGNet network; both the generator and the arbiter may employ deep neural network (deep neural network, DNN) networks.
S402, training a virtual sample generation network by adopting a test sample set, and determining a super-parameter value of model training to obtain a parameter range model.
Taking set test samples, for example, samples [1, …,1], samples [0, …,0] and samples [ -1, …, -1] can be used as test samples for a VSGNet network, where the dimension of each sample is equal to the number of operating parameters contained by the application. Generating random noise data of N (0, 1) in a positive distribution, and obtaining a test sample set based on the test samples and the random noise data.
And performing first training on the VSGNet network by adopting a test sample set, namely taking the test sample in the test sample set as the input of the VSGNet network to obtain a simulation sample output by a generator of the VSGNet network and the probability that the simulation sample output by a discriminator of the VSGNet network can be taken as a real sample. Determining a loss value of the generator of the VSGNet network, and stopping the first training until C times of accumulation reach a convergence condition, wherein the convergence condition can be that the loss value of the generator of the VSGNet network converges, and the loss value can be expressed as:
Gloss<P*n
where P is the percentage change of each operating parameter of the application, n is the dimension of the test sample, i.e. the number of operating parameters the application contains, p×n is the percentage overall change of all operating parameters of the application, and GLoss is the loss value of the generator of the VSGNet network, where GLoss can be expressed as:
where NG is the number of simulation samples output by the generator of the VSGNet network, n is the dimension of the test samples, i.e., the number of operating parameters the application contains, nrdata j A value, nddata, representing the j-th dimension in the input test sample ij Represents random noise data, generator (nddata) ij ) Representing simulated samples output by the generator of the VSGNet network under the influence of random noise data.
And acquiring NG simulation samples output by a generator of the VSGNet network in the first round of training, and judging whether the number of simulation samples meeting a second set condition in the NG simulation samples reaches the set number ENG, wherein the second set condition can be that Gloss < e×n, e is a preset value, and e can be any decimal between 0 and 1.
If the number of simulation samples meeting the second set condition in the NG simulation samples is smaller than ENG, training the VSGNet network again until the number of simulation samples meeting the second set condition in the NG simulation samples output by the generator of the VSGNet network in training is larger than or equal to ENG, stopping training, and completing the model training process of the first iteration period. In each training round, the generator of the VSGNet network outputs NG bar simulation samples. For example, a second training round is performed on the VSGNet network using test samples until the second training round is stopped when the cumulative C times reach the convergence condition. And obtaining NG simulation samples output by a generator of the VSGNet network in the second round of training, and stopping training to complete the model training process of the first iteration period if the number of simulation samples meeting the second set condition in the NG simulation samples obtained by the second round of training is greater than or equal to ENG.
Because the value range of the numerical value of each dimension in the test sample input into the VSGNet network is within [ -1,1], the simulation sample of the numerical value of each dimension in [ -1,1] in the simulation sample output by the generator of the VSGNet network can be considered as a legal simulation sample; if the value of a dimension in a simulation sample is outside the value range of [ -1,1], the simulation sample may be considered an illegal simulation sample.
And selecting simulation samples of the ENG strips meeting the second set condition from NG strip simulation samples obtained by training of the last round of the model training process of the first iteration period, and respectively obtaining time consumption of each simulation sample.
Wherein, the time consumption of each simulation sample is obtained by the following way:
and respectively executing the ENG simulation samples, and acquiring the time consumption for executing each simulation sample. Taking any one of the ENG simulation samples as an example for explanation, the simulation sample is called a first simulation sample, the numerical value of the operation parameter of each dimension in the first simulation sample is respectively used as the numerical value of each operation parameter of the application, the application is operated, the application processes the data with the set data amount, and the time length used by the application to process the data is recorded as time consumption. Wherein the data of the set data amount is data which simulates the processing required by the application in the real running environment.
Generating a quality estimator by using the ENG simulation samples and time-consuming input data for executing each simulation sample, and determining training values corresponding to the ENG simulation samples by using DGQE, wherein the training values corresponding to the ENG simulation samples can be target range values of each simulation sample in the ENG simulation samples range The target range value of any one of the simulation samples can be expressed as target range = (time consuming x unlawful)/normalized manhattan distance. Where the normalized Manhattan distance refers to the Manhattan distance between the simulated sample and the input test sample, for example, a ratio of changes in each dimension between the two samples may be usedAn example determines a normalized Manhattan distance between a simulation sample and a test sample. The unlawful rate is the probability that the simulated sample is an unlawful simulated sample. The unlawful rate No-Legitimacy of any one simulation sample can be expressed by the following formula:
wherein nrdata j A value, gdata, representing the j-th dimension in the input test sample j A value representing the j-th dimension in the output simulation sample,indicating that when the value of the jth dimension in the input test sample is 0, determining the illegality rate by referring to the value of the jth dimension in the output simulation sample, +. >And when the value of the jth dimension in the input test sample is not 0, determining the unlawful rate by referring to the ratio of the difference value of the value 1 and the value of the jth dimension in the output simulation sample.
And C times of reaching the convergence condition are accumulated in the steps, the change percentage P of each operation parameter in the convergence condition, the number NG of simulation samples output by a VSGNet network generator in each training round, the minimum number ENG of the simulation samples meeting the second setting condition are used as super parameters of model training, and each super parameter adopts a set initial value in the model training process of the first iteration period. In order to gradually reduce the training value corresponding to the ENG simulation sample, after each iteration period, the numerical value of the super parameter of model training is adjusted through a Bayesian optimization algorithm, and the numerical value of the adjusted super parameter is adopted to train the VSGNet network in the next iteration period until the set iteration period times are reached. By way of example, the number of iteration cycles may be 130, i.e. after adjusting the hyper-parameters 130 times, the training process is stopped,the current value of the super-parameter is used as the super-parameter value of model training, and the trained VSGNet network can be called VSGNet range And (5) a model.
The embodiment of the application adopts VSGNet, wherein the VSGNet comprises a generator and a discriminator, and the output of the generator is used as the input of the discriminator. Compared with the mode that the generative countermeasure network (generative adversarial networks, GAN) adopts the generator and the discriminator countermeasure training, the loss function corresponding to the generator of the VSGNet in the embodiment of the application is used for guiding the generator training, the loss function corresponding to the discriminator of the VSGNet is used for guiding the discriminator training, and the two are not mutually guided to perform the countermeasure training, so that the training process is faster and the time is shorter.
S403, determining an initial parameter value range of each operation parameter based on the default parameter value of each operation parameter of the application.
S404, generating a successful configuration by selecting parameter values within the initial parameter value range for each operating parameter.
And randomly selecting the parameter value of each operation parameter in the initial parameter value range of each operation parameter respectively, and obtaining the configuration to be determined based on the combination of the parameter values of each operation parameter. Executing the configuration to be determined, namely enabling the application to operate by adopting the parameter value of each operation parameter in the configuration to be determined, processing the data of the set data amount, and if the data are completely processed within the first set time length, considering that the execution is successful; if the data is not processed completely within the first set time period or faults occur in the execution process, the execution is considered to be unsuccessful. If the configuration to be determined is successfully executed, the configuration to be determined is used as successful configuration; and if the execution of the configuration to be determined is unsuccessful, reselecting the parameter value of each operation parameter to generate a new configuration to be determined until successful configuration of successful execution is obtained.
S405, taking the successful configuration and the initial parameter value range of each operation parameter as the input of a parameter range model, and obtaining the NG strip simulation configuration output by the parameter range model.
Parameter Range model, i.e. VSGNet described above range And (5) a model.
S406, selecting the simulation configuration with the farthest distance between N simulation configurations and the successful configuration from the NG simulation configurations, and executing the selected N simulation configurations.
Each configuration is an n-dimensional array, and the distance between each simulation configuration and the successful configuration in the NG simulation configuration is determined respectively, and illustratively, a manhattan distance may be used, that is, the manhattan distance between each simulation configuration and the successful configuration is determined respectively. And sequencing the NG simulation configurations according to the sequence from large to small, selecting N simulation configurations with the front sequence, and executing the selected N simulation configurations respectively.
S407, judging whether all the N simulation configurations are successfully executed; if yes, go to step S408; if not, the process returns to step S404.
S408, according to the N simulation configurations, determining the parameter value range of each operation parameter.
For each operating parameter, the following operations may be performed: minimum value config of the operation parameters in N simulation configurations min As the lower limit of the parameter value range of the operation parameter, the maximum value config of the operation parameter in the N simulation configurations is set max The parameter value range of the operating parameter is obtained as the upper limit of the parameter value range of the operating parameter, and thus, the parameter value range of the operating parameter can be expressed as [ config ] min ,config max ]。
S409, training the parameter range model according to the determined hyper-parameter value to obtain a distance measurement model.
Wherein the distance metric model may be referred to as VSGNet distance And (5) a model.
In some embodiments, by repeatedly executing step S404, a plurality of successful configurations may be obtained, and the configuration with the shortest execution time is selected from the obtained successful configurations as the current optimal configuration, and the current optimal configuration is normalized, that is, the values of the operation parameters are mapped into the range of [ -1,1] according to the maximum value and the minimum value of the operation parameters of each dimension, so as to obtain the normalized configuration.
The standardized configuration and N (0, 1) N-too-far distributionThe random noise data constitutes a training data set. VSGNet pair employing training dataset range Training the model, namely taking training data in a training data set as VSGNet range Model input to obtain VSGNet range Virtual configuration of generator output of model and VSGNet range This virtual configuration of the arbiter output of the model may be referred to as the probability of a real configuration, hereinafter referred to as the real probability.
VSGNet pair employing training dataset range Model generation of VSGNet distance First training of the model, i.e., normalized configuration and random noise data as VSGNet range The input of the generator of the model results in a virtual configuration of the generator output, and the probability that this virtual configuration of the arbiter output can be taken as a real configuration. Determining VSGNet range The loss value GLoss of the model generator stops the first training when the accumulated C times reach the first setting condition.
Acquisition at VSGNet distance In the first round of training of the model, VSGNet range And judging whether the number of the virtual configurations meeting the second setting condition in the NG strip virtual configuration reaches the ENG strip or not.
If the number of virtual configurations satisfying the second setting condition is smaller than the ENG bar in the NG bar virtual configuration, the VSGNet is adjusted range Network parameters of model, and vs. VSGNet range The model performs the next round of training until VSGNet in the training range And stopping training when the number of the simulation samples meeting the second set condition in the NG strip simulation samples output by the model generator is greater than or equal to the number of the ENG strips, and completing the model training process of the first iteration period. In each training round, the generator of the VSGNet network outputs NG bar virtual configurations. For example, for VSGNet range The model performs the second training until the convergence condition is reached for the accumulated C times, and stops the second training. Acquisition of VSGNet in second round of training range The number of virtual configurations satisfying the second setting condition in the NG bar virtual configurations outputted by the generator of the model if the NG bar virtual configurations obtained by the second round trainingAnd if the model training time is greater than or equal to ENG, stopping training, and completing the model training process of the first iteration period.
From NG virtual configurations obtained by training the last round of the model training process in the first iteration period, selecting virtual configurations of the ENG strips meeting the second set condition, inputting the DGQE of the ENG strip virtual configurations and time consumption of each virtual configuration, determining a distance value corresponding to the ENG strip virtual configuration through the DGQE, wherein the distance value corresponding to the ENG strip virtual configuration can be an average value of sample distances of each virtual configuration in the ENG strip virtual configuration, and the sample distance of any one virtual configuration can be expressed as sample distance= (real probability multiplied by cosine distance)/standardized Manhattan distance. Wherein the cosine distance is the cosine distance between the virtual configuration and the input standardized configuration and the standardized manhattan distance is the manhattan distance between the virtual configuration and the input standardized configuration.
After each iteration period, the current optimal configuration is reselected, a new standardized configuration is generated, and the new standardized configuration is adopted for VSGNet range The model carries out model training of the next iteration period until the distance value corresponding to the obtained ENG virtual configuration is converged to obtain VSGNet distance And (5) a model.
S410, generating M candidate initial configurations based on the parameter value range of each operation parameter.
Illustratively, each candidate initial configuration of the M candidate initial configurations may be obtained by: and respectively selecting the value of each operation parameter in a parameter value range of each operation parameter by random sampling or other modes, and combining the selected value of each operation parameter to obtain a candidate initial configuration.
S411, executing M candidate initial configurations, and selecting the candidate initial configuration with the shortest execution duration as the initial configuration.
And respectively executing the generated M candidate initial configurations, acquiring the execution time length of the M candidate initial configurations, and selecting the candidate initial configuration with the shortest execution time length from the M candidate initial configurations as the initial configuration.
And S412, generating candidate configurations based on the initial configuration.
Input initial configuration into VSGNet distance In the model, VSGNet is obtained distance And the NG bar virtual configuration output by the model selects the virtual configuration with the minimum W bar sample distance from the NG bar virtual configuration. The sample distance of each virtual configuration in the NG virtual configuration is respectively determined, the NG virtual configurations are ordered according to the sequence of the sample distances from small to large, and the W virtual configurations with the previous arrangement sequence are selected.
And selecting the virtual configuration with the smallest sample distance from the unexecuted configurations in the W virtual configurations as the smallest configuration, and executing the smallest configuration. And inputting the minimum configuration and the execution time length of the minimum configuration into the Gaussian process model, and determining the current maximum confidence of the GP model. Judging whether the current maximum confidence coefficient meets a set confidence coefficient condition, wherein the set confidence coefficient condition is that the ratio of the current maximum confidence coefficient to the maximum confidence coefficient obtained in the previous round is in a set ratio range, or the W configuration is completely executed. If not, re-selecting the minimum configuration from the unexecuted configurations in the W virtual configurations; if so, candidate configurations are obtained based on the current minimum configuration. For example, the current minimum configuration and the execution duration corresponding to the current minimum configuration may be used as an initial sample, the initial sample is used as training data of the GP model, the predicted optimal configuration is selected through the GP model and the DGQE, and the optimal configuration is used as a candidate configuration.
S413, executing the generated candidate configuration.
S414, monitoring whether the execution duration of the candidate configuration is longer than a first set duration; if yes, step S415 is performed, and if no, step S416 is performed.
The first set time period may be an integer multiple of the shortest time period, for example, the first set time period T max May be the shortest time runtimes best Can be expressed as T max =5*runtime best ,runtime best May be a value set based on historical experience.
S415, the execution of the candidate configuration is stopped, and the candidate configuration is returned to step S412 as a new initial configuration.
After stopping the execution of the candidate configuration, the execution duration of the candidate configuration may be set to a fixed value, which may be 10 x runtime, for example best . The execution duration of the candidate configuration may be expressed as runtimes, if runtimes > T is detected max Stopping executing the candidate configuration, and setting the execution time length of the candidate configuration to 10 x run time best
S416, taking the current candidate configuration as the final determined target configuration, and outputting the target configuration.
The application performance optimization method provided by the embodiment of the application can automatically search the optimal configuration of the big data application under the given data volume to be processed, and provide automatic performance optimization service for the application. The number of candidate configurations to be executed in the process is small, so that time can be saved, the target configuration can be quickly determined, a large number of training samples are not required to be acquired in the model training process, and cost can be saved.
The method embodiment is based on the same technical conception as the method embodiment, and the application performance optimizing device is further provided in the embodiment of the application. In some embodiments, as shown in fig. 5, the application performance optimization apparatus 500 may include a parameter range search module 501, a sampling module 502, a configuration search module 503, a data generation quality evaluator 504, and a virtual sample generation network 505. The default parameter values of the plurality of operation parameters and the data of the set data amount of the application are input into the application performance optimization device 500, and the target configuration output by the application performance optimization device 500 can be obtained.
The parameter range searching module 501 is configured to determine parameter value ranges of a plurality of operating parameters based on default parameter values of the plurality of operating parameters of the application.
The sampling module 502 is configured to determine an initial configuration based on a range of parameter values for a plurality of operating parameters. The initial configuration comprises parameter values of a plurality of operation parameters, wherein the parameter value of any one of the plurality of operation parameters is in a parameter value range of any one of the plurality of operation parameters.
A configuration search module 503 for determining a target configuration based on the initial configuration; the target configuration meets the following conditions: when the target configuration is adopted to run the application, the running performance data of the application meets the application performance optimization condition. Wherein the running performance data of the application comprises a time length for the application to process the data of the set data amount; the application performance optimization condition includes that a duration of time taken for the application to process the data of the set data amount is less than or equal to the first set duration of time.
In some embodiments, virtual sample generation network 505 includes a generator and a arbiter. The parameter range search module 501 may be specifically configured to: training the virtual sample generation network 505 by adopting a test sample set, and determining a super-parameter value of model training to obtain a parameter range model; during model training, model training goals are determined by the data generation quality evaluator 504. Determining an initial parameter value range of each operation parameter based on a default parameter value of each operation parameter of an application, generating configuration to be determined by randomly selecting parameter values in the initial parameter value range of each operation parameter, operating the application by adopting the parameter value of each operation parameter in the configuration to be determined, processing data of a set data amount, taking the configuration to be determined as successful configuration if all the data are processed in a first set time period, taking the successful configuration and the initial parameter value range of each operation parameter as input of a parameter range model, obtaining NG simulation configuration output by the parameter range model, selecting simulation configurations with the farthest distances between N simulation configurations and the successful configuration from the NG simulation configurations, executing the selected N simulation configurations, and determining the parameter value range of each operation parameter according to the N simulation configurations if all the N simulation configurations are executed successfully.
The sampling module 502 may be specifically configured to: the parameter range model trained by the virtual sample generation network 505 is further trained according to the determined hyper-parameter values to obtain a distance metric model, and during the training process, a training target is determined by the data generation quality evaluator 504. And generating M candidate initial configurations based on the parameter value range of each operation parameter, executing the M candidate initial configurations, and selecting the candidate initial configuration with the shortest execution duration as the initial configuration.
The configuration search module 503 may specifically be configured to: determining the target configuration using bayesian optimization, the bayesian optimization process may include: taking the initial configuration as a basic configuration; the following steps are repeatedly performed: generating candidate configurations based on the base configurations by a distance metric model trained by the virtual sample generation network 505; running an application by adopting the generated candidate configuration, and acquiring the execution time of the candidate configuration; if the execution duration of the candidate configuration is longer than the first set duration, the current candidate configuration is used as a new basic configuration, the new candidate configuration is generated based on the new basic configuration through the distance measurement model, the new candidate configuration is executed until the execution duration of the candidate configuration is shorter than or equal to the first set duration, the current candidate configuration is used as a target configuration, and the target configuration is output.
In other embodiments, as shown in fig. 6, an application performance optimization apparatus 600 provided in an embodiment of the present application may include a parameter range search module 601, a sampling module 602, a configuration search module 603, a data generation quality evaluator 604, and a random sampler 605. The default parameter values of the plurality of operation parameters and the data of the set data amount of the application are input into the application performance optimization apparatus 600, and the target configuration output by the application performance optimization apparatus 600 can be obtained.
Wherein, the parameter range search module 601 may be configured to: determining an initial parameter value range of each operation parameter based on a default parameter value of each operation parameter of an application, randomly selecting a parameter value in the initial parameter value range of each operation parameter through a random sampler 605, generating a configuration to be determined, operating the application by adopting the parameter value of each operation parameter in the configuration to be determined, processing data of a set data amount, if all the data are processed within a first set time period, taking the configuration to be determined as a successful configuration, respectively determining an intermediate parameter value range of each operation parameter based on the parameter value of each operation parameter contained in the successful configuration and a set second fluctuation proportion, generating NG strip simulation configuration by selecting the parameter value in the intermediate parameter value range of each operation parameter, selecting the simulation configuration with the farthest distance between N strips and the successful configuration from the NG strip simulation configuration through a data generation quality estimator 604, and executing the selected N strip simulation configurations, and if all the N strip simulation configurations are executed successfully, determining the parameter value range of each operation parameter according to the N strip simulation configurations.
The sampling module 602 may be configured to: the M candidate initial configurations are generated by selecting a plurality of sets of parameter values within the parameter value range of each operating parameter by random sampler 605, and the initial configuration is determined from the M candidate initial configurations by an initialization strategy. Wherein, the initialization strategy may comprise the following steps: the data generation quality evaluator 604 determines a sample distance of each candidate initial configuration in the M candidate initial configurations, selects a candidate initial configuration with a minimum sample distance from among the candidate initial configurations not executed in the M candidate initial configurations as a minimum configuration, executes the minimum configuration, and acquires an execution duration of the minimum configuration. And taking the currently executed candidate initial configuration and the execution duration of each candidate initial configuration as a training set. And inputting the training set into the Gaussian process model, and obtaining the maximum confidence coefficient of the current GP model. Judging whether the current maximum confidence coefficient meets a set confidence coefficient condition, wherein the set confidence coefficient condition can be that the ratio of the current maximum confidence coefficient to the maximum confidence coefficient obtained in the previous round is in a set ratio range. If not, returning to execute the step of selecting the candidate initial configuration with the minimum sample distance from the candidate initial configurations which are not executed in the M candidate initial configurations as the minimum configuration, and re-determining the minimum configuration; and if the current minimum configuration is met or all the M candidate initial configurations are executed, taking the current minimum configuration as the initial configuration.
The configuration search module 603 may be configured to: determining the target configuration using bayesian optimization, the bayesian optimization process may include: taking the initial configuration as a basic configuration; the following steps are repeatedly performed: generating candidate configurations based on the base configuration by random sampler 605; running an application by adopting the generated candidate configuration, and acquiring the execution time of the candidate configuration; if the execution duration of the candidate configuration is longer than the first set duration, the current candidate configuration is used as a new basic configuration, the new candidate configuration is generated based on the new basic configuration through the random sampler 605, the new candidate configuration is executed until the execution duration of the candidate configuration is shorter than or equal to the first set duration, the current candidate configuration is used as a target configuration, and the target configuration is output.
The method embodiment is based on the same technical concept as the method embodiment described above, and an electronic device is also provided in this embodiment, where the electronic device may be the server 300 shown in fig. 1, or may be another electronic device. The electronic device may be used to implement the functions of the method embodiment shown in fig. 2, so that the beneficial effects of the method embodiment described above may be implemented.
In some embodiments, the electronic device 700 may be configured as shown in fig. 7, including a processor 701 and a memory 702 coupled to the processor 701. The processor 701 and the memory 702 may be connected to each other through a bus, and the processor 701 may be a general-purpose processor such as a microprocessor, or other conventional processor. The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, or the like. The buses may be divided into address buses, data buses, control buses, etc.
The memory 702 may be used to store software programs and modules, and the processor 701 executes the software programs and modules stored in the memory 702 to perform various functional applications and data processing of the electronic device 700, such as the application performance optimization method provided in the embodiments of the present application.
The memory 702 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, application programs of at least one application, and the like; the stored data area may be used to apply various configurations used or generated in the performance optimization process, and the like. In addition, the memory 702 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device.
The processor 701 in the electronic device 700 is configured to execute computer instructions or programs stored in the memory 702 to perform the functions of the method embodiment shown in fig. 2. When the electronic device 700 is used to implement the method shown in fig. 2, the processor 701 is configured to: determining a parameter value range for a plurality of operating parameters based on default parameter values for the plurality of operating parameters of the application; determining an initial configuration based on parameter value ranges of the plurality of operating parameters; the initial configuration comprises parameter values of a plurality of operation parameters of the application, wherein the parameter value of each operation parameter is in a corresponding parameter value range; determining a target configuration based on the initial configuration; the target configuration meets the following conditions: when the target configuration is adopted to run the application, the running performance data of the application meets the application performance optimization condition. Wherein, the running performance data of the application can comprise the duration of time for the application to process the data of the set data amount; the application performance optimization condition may include the application processing the data of the set data amount for a duration less than or equal to the first set duration.
In some embodiments, the processor 701 may include one or more processing units, and the different processing units may be separate devices or may be integrated in one or more processors. The processor 701 may further include a controller, where the controller may generate an operation control signal according to the instruction operation code and the timing signal, to complete instruction fetching and instruction execution control.
It should be understood that the structures illustrated in the embodiments of the present application do not constitute a specific limitation on the electronic device. In other embodiments of the present application, the electronic device may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The method embodiments described above are based on the same technical concept, and a chip, which may be a computing chip, is also provided in the embodiments of the present application. The chip can be used to implement the functions of the method embodiment shown in fig. 2, so that the beneficial effects of the method embodiment described above can be implemented.
In some embodiments, the chip 800 may be configured as shown in fig. 8, including a processor 801 and a power supply circuit 802 coupled to the processor 801. The processor 801 and the power supply circuit 802 may be connected to each other through a bus, and the processor 801 may be a general-purpose processor such as a microprocessor, or other conventional processor. The bus may be a peripheral component interconnect standard PCI bus or an extended industry standard architecture EISA bus, etc. The buses may be divided into address buses, data buses, control buses, etc. The power supply circuit 802 is used to supply power to the processor 801 via a bus.
The processor 801 may be connected to a memory provided outside the chip or to a memory provided inside the chip, and execute software programs and modules stored in the memory, thereby performing various functional applications and data processing of the chip 800, such as an application performance optimization method provided in the embodiments of the present application.
In some embodiments, the processor 801 may include one or more processing units, and the different processing units may be separate devices or may be integrated into one or more processors. The processor 801 may further include a controller, which may generate operation control signals according to the instruction operation code and the timing signals, to complete instruction fetching and instruction execution control.
The steps of the method in the embodiments of the present application may be implemented by means of hardware, or may be implemented by means of a processor executing a computer program or instructions. The computer program or instructions may constitute a computer program product. Embodiments of the present application also provide a computer program product comprising computer-executable instructions. In one embodiment, the computer-executable instructions are for causing a computer to perform the functions of the method embodiments shown in fig. 2-4.
The computer executable instructions may be stored in a computer readable storage medium, and embodiments of the present application also provide a computer readable storage medium having the executable instructions stored therein. In one embodiment, the computer-executable instructions are for causing a computer to perform the functions of the method embodiments shown in fig. 2-4.
The computer readable storage medium provided by the embodiments of the present application may be a random access memory (random access memory, RAM), a flash memory, a read-only memory (ROM), a programmable read-only memory (programmableROM, PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), a register, a hard disk, a removable hard disk, a CD-ROM, or any other form of computer readable storage medium known in the art.
The computer-executable instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, e.g., floppy disk, hard disk, tape; optical media, such as digital video discs (digital video disc, DVD); but also semiconductor media such as solid state disks.
In the various embodiments of the application, if there is no specific description or logical conflict, terms and/or descriptions between the various embodiments are consistent and may reference each other, and features of the various embodiments may be combined to form new embodiments according to their inherent logical relationships. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion, such as a series of steps or elements. The method, system, article, or apparatus is not necessarily limited to those explicitly listed but may include other steps or elements not explicitly listed or inherent to such process, method, article, or apparatus.
Although the present application has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely exemplary of the arrangements defined in the appended claims and are to be construed as covering any and all modifications, variations, combinations, or equivalents that are within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made in the present application without departing from the scope of the application. Thus, if such modifications and variations of the embodiments of the present application fall within the scope of the claims and the equivalents thereof, the present application is intended to encompass such modifications and variations.

Claims (16)

1. An application performance optimization method, the method comprising:
determining a parameter value range for a plurality of operating parameters based on default parameter values for the plurality of operating parameters of the application;
determining an initial configuration based on the parameter value ranges of the plurality of operating parameters; the initial configuration comprises parameter values of the plurality of operating parameters, wherein the parameter value of any one of the plurality of operating parameters is within the parameter value range of the any one operating parameter;
determining a target configuration based on the initial configuration; the target configuration meets the following conditions: when the target configuration is adopted to run the application, the running performance data of the application meets the application performance optimization condition.
2. The method of claim 1, wherein the application's performance data includes a length of time for the application to process data of a set amount of data; the application performance optimization condition includes that a duration of time for the application to process the data of the set data amount is less than or equal to a first set duration of time.
3. The method of claim 2, wherein determining the parameter value range for the plurality of operating parameters based on the default parameter values for the plurality of operating parameters of the application comprises:
Determining initial parameter value ranges of a plurality of operation parameters respectively based on default parameter values of the plurality of operation parameters of the application and a set first fluctuation proportion;
generating a successful configuration by selecting parameter values within an initial parameter value range for each operating parameter; the successful configuration meets a first setting condition;
generating a first set number of simulated configurations based on the successful configuration;
selecting a second set number of simulation configurations from the first set number of simulation configurations based on a distance between each simulation configuration and the successful configuration;
if the second set number of simulation configurations all meet the first set condition, determining parameter value ranges of the plurality of operating parameters based on the second set number of simulation configurations;
wherein, the first setting condition is: when the application is operated by adopting corresponding configuration, the duration of the application for processing the data of the set data quantity is less than or equal to the second set duration; the second set time length is longer than the first set time length.
4. A method according to claim 3, wherein said generating a successful configuration by selecting parameter values within an initial parameter value range for each operating parameter comprises:
Repeating the following operations until the determined configuration to be determined meets the first setting condition:
selecting a parameter value for each operating parameter within an initial parameter value range for each operating parameter;
obtaining configuration to be determined based on the selected parameter value of each operation parameter;
determining whether the obtained configuration to be determined meets the first setting condition, if not, returning to re-execute the selection of the parameter value of each operation parameter in the initial parameter value range of each operation parameter, wherein the re-selected parameter value of each operation parameter is different from the selected parameter value of each operation parameter;
and taking the configuration to be determined meeting the first setting condition as the successful configuration.
5. The method of claim 3 or 4, wherein the generating a first set number of simulated configurations based on the successful configuration comprises:
inputting the successful configuration and the initial parameter value range of each operation parameter into a parameter range model to obtain a first set number of simulation configurations output by the parameter range model; the parameter range model is obtained by training a virtual sample generation network VSGNet by adopting a set test sample.
6. The method of claim 3 or 4, wherein the generating a first set number of simulated configurations based on the successful configuration comprises:
determining an intermediate parameter value range of each operation parameter based on the parameter value of each operation parameter contained in the successful configuration and the set second fluctuation proportion;
generating the first set number of simulation configurations by performing a plurality of selection of parameter values within an intermediate parameter value range for each operating parameter; the second fluctuation ratio is smaller than the first fluctuation ratio.
7. The method of any of claims 3-6, wherein determining the parameter value ranges for the plurality of operating parameters based on the second set number of simulation configurations comprises:
for an ith operating parameter of the plurality of operating parameters, determining a parameter value range for the ith operating parameter by:
selecting a minimum parameter value from the parameter values of the ith operating parameter respectively contained in the second set number of simulation configurations as the lower limit of the parameter value range of the ith operating parameter, and selecting a maximum parameter value as the upper limit of the parameter value range of the ith operating parameter to obtain the parameter value range of the ith operating parameter;
And i is taken from 1 to N, wherein N is the number of the plurality of operation parameters.
8. The method of any of claims 1-7, wherein determining an initial configuration based on a range of parameter values for the plurality of operating parameters comprises:
generating a third set number of candidate initial configurations by performing a plurality of selections of parameter values within the parameter value range for each operating parameter;
running the application by adopting each candidate initial configuration, and recording the execution duration of each candidate initial configuration; the execution time of any one candidate initial configuration is the time used by the application to process the data of the set data volume when the application is operated by adopting the any one candidate initial configuration;
and taking the candidate initial configuration with the shortest execution duration as the initial configuration.
9. The method according to any one of claims 1-8, wherein said determining a target configuration based on said initial configuration comprises:
taking the initial configuration as a basic configuration;
the following steps are repeatedly performed:
generating candidate configurations based on the base configuration;
running the application by adopting the generated candidate configuration, and acquiring running performance data of the application;
If the running performance data of the application does not meet the set application performance optimization conditions, taking the current candidate configuration as a new basic configuration, and returning to execute the operation of generating the candidate configuration based on the basic configuration; and if the running performance data of the application does not meet the set application performance optimization conditions, taking the current candidate configuration as the target configuration.
10. The method of claim 9, wherein the generating candidate configurations based on the base configuration comprises:
generating a fourth set number of virtual configurations according to the basic configuration;
repeating the following operations until the ratio of the maximum confidence coefficient obtained by two adjacent times is within a set ratio range:
selecting a virtual configuration which is not selected and has the smallest sample distance from the fourth set number of virtual configurations;
running the application by adopting the selected virtual configuration to obtain the execution duration of the application, and determining the corresponding maximum confidence coefficient according to the execution duration;
and when the ratio of the maximum confidence coefficient obtained by two adjacent times is within a set ratio range, the virtual configuration selected when the maximum confidence coefficient is obtained by the next time in the two adjacent times is used as the candidate configuration.
11. The method of claim 10, wherein generating a fourth set number of virtual configurations from the base configuration comprises:
inputting the basic configuration into a distance measurement model to obtain a first set number of virtual configurations output by the distance measurement model;
and selecting a fourth set number of virtual configurations from the first set number of virtual configurations according to the sample distance of each virtual configuration.
12. The method of claim 11, wherein the distance metric model is obtained by:
training the VSGNet by adopting a set test sample, and adjusting the hyper-parameters of the VSGNet to obtain a hyper-parameter value and a parameter range model for model training; the super-parameter value of the VSGNet comprises a frequency threshold value for enabling the VSGNet to reach a convergence condition in a cumulative way in the training process, a change percentage threshold value of each operation parameter in the convergence condition, the number of simulation samples output by a VSGNet generator each time, and a number threshold value of simulation samples meeting a second set condition in the simulation samples output by the VSGNet generator each time;
training the parameter range model by training data according to the super-parameter value trained by the model to obtain the distance measurement model; the training data is data obtained according to the configuration with the shortest execution duration in the configurations with the determined execution duration.
13. An electronic device comprising a processor and a memory; the memory has a computer program stored thereon; the processor is configured to read the computer program stored in the memory and execute it so that the method according to any one of claims 1 to 12 is performed.
14. A chip, comprising a processor and a power supply circuit; the power supply circuit is configured to power the processor, the processor being configured to execute a computer program to implement the method of any one of claims 1 to 12.
15. A computer-readable storage medium, characterized in that computer-executable instructions for causing a computer to perform the method according to any one of claims 1-12 are stored.
16. A computer program product comprising computer executable instructions for causing a computer to perform the method of any one of claims 1 to 12.
CN202210793932.0A 2022-07-05 2022-07-05 Application performance optimization method, electronic device and storage medium Pending CN117389603A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210793932.0A CN117389603A (en) 2022-07-05 2022-07-05 Application performance optimization method, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210793932.0A CN117389603A (en) 2022-07-05 2022-07-05 Application performance optimization method, electronic device and storage medium

Publications (1)

Publication Number Publication Date
CN117389603A true CN117389603A (en) 2024-01-12

Family

ID=89461868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210793932.0A Pending CN117389603A (en) 2022-07-05 2022-07-05 Application performance optimization method, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN117389603A (en)

Similar Documents

Publication Publication Date Title
US11556778B2 (en) Automated generation of machine learning models
US20200042899A1 (en) Parallel Development and Deployment for Machine Learning Models
US20220414426A1 (en) Neural Architecture Search Method and Apparatus, Device, and Medium
CN110941424B (en) Compiling parameter optimization method and device and electronic equipment
US20120158623A1 (en) Visualizing machine learning accuracy
US20200150941A1 (en) Heterogenous computer system optimization
US20160077860A1 (en) Virtual machine placement determination device, virtual machine placement determination method, and virtual machine placement determination program
CN115018081B (en) Feature selection method, application program prediction method and device
CN106610989B (en) Search keyword clustering method and device
KR20200117690A (en) Method and Apparatus for Completing Knowledge Graph Based on Convolutional Learning Using Multi-Hop Neighborhoods
US11372379B2 (en) Computer system and control method
CN112990461B (en) Method, device, computer equipment and storage medium for constructing neural network model
CN111008705B (en) Searching method, device and equipment
CN117389603A (en) Application performance optimization method, electronic device and storage medium
EP4339843A1 (en) Neural network optimization method and apparatus
US11676050B2 (en) Systems and methods for neighbor frequency aggregation of parametric probability distributions with decision trees using leaf nodes
CN111737347B (en) Method and device for sequentially segmenting data on Spark platform
CN110059880B (en) Service discovery method and device
CN112783854A (en) Method and device for acquiring configuration parameters of database
US20240005160A1 (en) Methods and systems for optimizing a peak memory usage of an artificial neural network graph
CN113759869B (en) Intelligent household appliance testing method and device
CN116069471B (en) Deterministic scheduling method and device for tasks and electronic equipment
CN112861951B (en) Image neural network parameter determining method and electronic equipment
US11928562B2 (en) Framework for providing improved predictive model
CN117435516B (en) Test case priority ordering method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication