CN116382903A

CN116382903A - Resource allocation optimization method and device for big data platform scheduling system

Info

Publication number: CN116382903A
Application number: CN202310315141.1A
Authority: CN
Inventors: 付源; 刘京玮; 杨华峰; 左腾
Original assignee: China Citic Bank Corp Ltd
Current assignee: China Citic Bank Corp Ltd
Priority date: 2023-03-28
Filing date: 2023-03-28
Publication date: 2023-07-04

Abstract

The invention provides a method and a device for optimizing resource allocation of a big data platform scheduling system, wherein the method comprises the following steps: setting an optimization target of the resource allocation of a big data platform scheduling system and determining parameters and constraints to be optimized; generating a global model data set by using a resource method, and training the global model data set to obtain a global RBNN model; carrying out global iterative optimization on the global RBNN model by utilizing L-SHADE to obtain a final generation population individual; obtaining a local model data set by adopting a filling criterion according to the population individuals of the last generation, and training the local model data set to obtain a local RBNN model; and carrying out local iterative optimization by using the L-SHADE local RBNN model to obtain an optimization result as a resource allocation strategy. According to the method, the resource allocation strategy of each tenant of the big data platform dispatching system is abstracted into an RBNN model, and global optimization and local optimization are respectively carried out on the model, so that the optimal resource allocation strategy is obtained.

Description

Resource allocation optimization method and device for big data platform scheduling system

Technical Field

The invention relates to the technical field of big data processing, in particular to a method and a device for optimizing resource allocation of a big data platform scheduling system.

Background

The existing optimization method of the large data platform scheduling system for system resource allocation cannot solve the problem of high-complexity, high-dimensional and high-nonlinearity resource allocation optimization, and the difficulty of the resource allocation optimization is as follows:

1. the big data platform only provides operators and computing power service for each tenant, and the characteristics and features of the operation of each tenant on the platform are not known, so that the scheduling resource allocation problem is difficult to abstract into a mathematical model.

2. The large data platform has numerous tenants, so that the problem of resource allocation belongs to the problem of high-dimensional constraint optimization, is relatively complex, and is difficult to accurately model by a general agent model, so that optimization cannot be performed or an optimization result is poor.

Most of the current methods for optimizing the dispatching system independently use an evolutionary algorithm (particle swarm optimization), but the evolutionary algorithm based on population evolution can require a large amount of real objective function calculation (namely, the dispatching system obtains a result by actually running according to a specified parameter and approximately requires 100 times of problem dimension to be calculated in real time), and if one tenant is regarded as a problem dimension in a large data platform, for numerous platforms of tenants, massive cost waste is caused and the cost is often unacceptable.

Disclosure of Invention

In view of the foregoing, the present invention provides a method and apparatus for optimizing resource allocation of a big data platform scheduling system, so as to solve at least one of the above problems.

According to a first aspect of the present invention, there is provided a method for optimizing resource allocation of a big data platform scheduling system, the method comprising: minimizing the total operation running time as an optimization target of the large data platform scheduling system resource allocation, wherein parameters to be optimized are resource quotas of each tenant in the large data platform, the number of the problem dimensions is tenant number D, and the constraint is that the sum of the tenant quotas is smaller than the total resource amount of the large data platform scheduling system; generating a global model dataset in a variable space by using a resource method, and training the global model dataset by using RBNN to obtain a global RBNN model; performing global iterative optimization on the global RBNN model by using L-SHADE to obtain a final generation population individual; obtaining a local model data set by adopting a filling criterion according to the population individuals of the last generation, and training the local model data set by using RBNN to obtain a local RBNN model; and carrying out local iterative optimization on the local RBNN model by using the L-SHADE to obtain an optimization result, and taking the optimization result as a strategy for distributing resources of each tenant of the large data platform scheduling system.

Preferably, in the method according to the embodiment of the present invention, performing global iterative optimization on the global RBNN model by using L-SHADE to obtain a final population individual includes: performing population updating iteration on the global RBNN model by utilizing L-SHADE; judging whether population algebra is an integer multiple of the number of the problem dimensions; responding to the population algebra being an integer multiple of the number of the problem dimensions, selecting an optimal individual in the current population according to the fitness value, calculating Euler distances between the individuals in all the populations and the optimal individual, eliminating the preset number of individuals with the farthest distances, and retraining a global RBNN model by using RBNN based on the rest individuals; and responding to the population algebra not being an integer multiple of the problem dimension number, keeping the global RBNN model unchanged, and continuing to carry out population updating iteration on the global RBNN model by using the L-SHADE.

Preferably, in the method according to the embodiment of the present invention, obtaining a local model dataset according to the population individuals of the last generation by using a filling criterion, and training the local model dataset by using an RBNN to obtain a local RBNN model includes: and selecting the top x individuals in the sequence from high to low of fitness values in the population individuals of the last generation after global iterative optimization, and simultaneously selecting the most excellent individuals in each generation in the last generation generated in the global iterative optimization process to form a local model data set, and training the local model data set by using RBNN to obtain a local RBNN model.

Preferably, in the method of the embodiment of the invention, when the fitness value of the last generation population individuals is selected from the first x individuals in the sequence from high to low, the value of x is D/2; when the most excellent individual in each generation in the last y generation is selected, the value of y is D/2; the size of the composed local model dataset is D.

Preferably, in the method of the embodiment of the present invention, selecting the top x individuals in the ranking from high to low of the fitness value in the population individuals of the last generation after the global iterative optimization, and simultaneously selecting the most excellent individuals in each generation in the last y generations generated in the global iterative optimization process to form the local model dataset includes: selecting the top x individuals in the sequence of the fitness value from high to low in the population individuals of the last generation after global iterative optimization, and simultaneously selecting the most excellent individuals of each generation in the last generation generated in the global iterative optimization process; calculating variances and means of the dimension elements in the individuals; a local model dataset consisting of D new individuals is generated from the Gaussian model based on the variance and the mean.

According to a second aspect of the present invention, there is also provided a big data platform scheduling system resource allocation optimizing apparatus, the apparatus comprising: the model setting unit is used for minimizing the total operation running time as an optimization target of the large data platform scheduling system resource allocation, wherein parameters to be optimized are resource quotas of all tenants in the large data platform, the number of the problem dimensions is the number D of the tenants, and the constraint is that the sum of the tenant quotas is smaller than the total resource amount of the large data platform scheduling system; the global model training unit is used for generating a global model data set in a variable space by utilizing a resource method, and training the global model data set by using RBNN to obtain a global RBNN model; the global optimization unit is used for carrying out global iterative optimization on the global RBNN model by utilizing the L-SHADE to obtain a final generation population individual; the local model training unit is used for obtaining a local model data set by adopting a filling criterion according to the population individuals of the last generation, and training the local model data set by using RBNN to obtain a local RBNN model; the local optimization unit is used for carrying out local iterative optimization on the local RBNN model by utilizing the L-SHADE to obtain an optimization result, and taking the optimization result as a strategy of resource allocation of each tenant of the big data platform scheduling system.

Preferably, the global optimization unit in the apparatus according to the embodiment of the present invention includes: the updating iteration module is used for carrying out population updating iteration on the global RBNN model by utilizing the L-SHADE, and when the judging unit judges that the population algebra is not an integer multiple of the number of the problem dimensions, continuing carrying out population updating iteration on the global RBNN model by utilizing the L-SHADE; the judging module is used for judging whether population algebra is an integer multiple of the number of the problem dimensions; and the model evolution module is used for responding to the population algebra which is an integer multiple of the problem dimension number, selecting the optimal individual in the current population according to the fitness value, calculating Euler distances between the individuals in all the populations and the optimal individual, eliminating the preset number of individuals with the farthest distances, and retraining the global RBNN model by using RBNN based on the rest individuals.

Preferably, the local model training unit in the apparatus according to the embodiment of the present invention is specifically configured to: and selecting the top x individuals in the sequence from high to low of fitness values in the population individuals of the last generation after global iterative optimization, and simultaneously selecting the most excellent individuals in each generation in the last generation generated in the global iterative optimization process to form a local model data set, and training the local model data set by using RBNN to obtain a local RBNN model.

Preferably, in the device of the embodiment of the invention, when the fitness value of the last generation population individuals is selected from the first x individuals in the sequence from high to low, the value of x is D/2; when the most excellent individual in each generation in the last y generation is selected, the value of y is D/2; the size of the composed local model dataset is D.

Preferably, the local model training unit in the device of the embodiment of the present invention selects the first x individuals in the last generation population individuals after global iterative optimization, and simultaneously selects the most excellent individuals in each generation in the last y generations generated in the global iterative optimization process to form the local model dataset, where the method includes: selecting the top x individuals in the sequence of the fitness value from high to low in the population individuals of the last generation after global iterative optimization, and simultaneously selecting the most excellent individuals of each generation in the last generation generated in the global iterative optimization process; calculating variances and means of the dimension elements in the individuals; a local model dataset consisting of D new individuals is generated from the Gaussian model based on the variance and the mean.

According to a third aspect of the present invention there is provided an electronic device comprising a memory, a processor and a computer program stored on said memory and executable on said processor, the processor implementing the steps of the above method when executing said computer program.

According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the above method.

According to a fifth aspect of the present invention there is provided a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the above method.

According to the technical scheme, in the large data platform scheduling system resource allocation, the total operation running time is minimized and is used as an optimization target of the large data platform scheduling system resource allocation, the parameter to be optimized is the resource quota of each tenant in the large data platform, the number of the problem dimensions is the number D of the tenants, the sum of the quotas of each tenant is constrained to be smaller than the total resource of the large data platform scheduling system, and an evolutionary algorithm and a neural network deep learning technology are adopted, so that the high-dimensional optimization problem can be processed at lower cost.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:

fig. 1 is a schematic flow chart of a method for optimizing resource allocation of a big data platform scheduling system according to an embodiment of the present application;

fig. 2 is a flow chart of a method for optimizing resource allocation of a big data platform scheduling system according to another embodiment of the present application;

FIG. 3 is a flow chart illustrating the composition of a local model dataset using a fill strategy provided by an embodiment of the present application;

fig. 4 is a schematic diagram of a method for optimizing resource allocation of a big data platform scheduling system according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a resource allocation optimizing device of a big data platform scheduling system according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a global optimization unit according to an embodiment of the present disclosure;

fig. 7 is a schematic block diagram of a system configuration of an electronic device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present invention and their descriptions herein are for the purpose of explaining the present invention, but are not to be construed as limiting the invention.

Fig. 1 is a schematic flow chart of a method for optimizing resource allocation of a big data platform scheduling system according to an embodiment of the present application, where the method includes:

step S101: and minimizing the total operation running time as an optimization target of the large data platform scheduling system resource allocation, wherein the parameter to be optimized is the resource quota of each tenant in the large data platform, the number of the problem dimensions is the number D of the tenants, and the constraint is that the sum of the quotas of each tenant is smaller than the total resource amount of the large data platform scheduling system.

Step S102: a global model dataset is generated in a variable space using a resource approach, and the global model dataset is trained using a solution radial basis function neural network (Radial Basis Neural Network, RBNN) to obtain a global RBNN model.

In this embodiment, the resource method refers to a sampling method, for example, a latin hypercube sampling (Latin hypercube sampling, LHS) method can be adopted to generate a global model data set in a variable space, and the latin hypercube sampling is a method for approximately randomly sampling from multiple parameter distribution, and belongs to a hierarchical sampling technology. Hierarchical sampling is a method of statistically extracting samples from a statistical population (also called a "parent"), dividing the unit of sampling into different layers according to some characteristic or some rule, and then extracting samples from the different layers independently and randomly. Thereby ensuring that the structure of the sample is similar to the overall structure and improving the estimation accuracy. Thus, using Latin hypercube sampling can achieve that a large amount of the required prepared data can be derived with only a small number of real sample points.

Preferably, the size of the global model dataset can be set to be 10 times of the problem dimension, namely, the size is set to be 10D, so that the global RBNN model can be trained according to the global dataset formed by a small number of real sample points, and the real sample points need to be actually operated, so that calculation cost is high.

Step S103: and performing global iterative optimization on the global RBNN model by using the L-SHADE to obtain the final generation population individuals.

In the embodiment, the L-SHADE algorithm is adopted for the optimization of the global RBNN model, and the algorithm has proved to be more efficient and accurate in complex optimization problem and higher in adaptation problem dimension than the traditional evolutionary algorithm, so that the optimization method has a better optimization effect in solving the problem of resource allocation optimization of a high-dimensional big data platform scheduling system.

Step S104: and obtaining a local model data set by adopting filling criteria according to the population individuals of the last generation, and training the local model data set by using RBNN to obtain a local RBNN model.

Step S105: and carrying out local iterative optimization on the local RBNN model by using the L-SHADE to obtain an optimization result, and taking the optimization result as a strategy for distributing resources of each tenant of the large data platform scheduling system.

According to the method, the resource allocation strategy of each tenant of the big data platform scheduling system is abstracted into the RBNN model, and the overall optimization and the local optimization are respectively carried out on the model by utilizing the L-SHADE algorithm, so that the optimal resource allocation strategy is obtained, and compared with the prior art, the method is lower in implementation cost and better in effect.

Fig. 2 is a schematic flow chart of a method for optimizing resource allocation of a big data platform scheduling system according to another embodiment of the present application, where the method includes the following steps:

step S201: and minimizing the total operation running time as an optimization target of the large data platform scheduling system resource allocation, wherein the parameter to be optimized is the resource quota of each tenant in the large data platform, the number of the problem dimensions is the number D of the tenants, and the constraint is that the sum of the quotas of each tenant is smaller than the total resource amount of the large data platform scheduling system.

Step S202: and generating a global model data set in a variable space by using a resource method, and training the global model data set by using the RBNN to obtain a global RBNN model.

Step S203: and carrying out population updating iteration on the global RBNN model by using L-SHADE.

Step S204: and judging whether population algebra is an integer multiple of the number of the problem dimensions, if so, proceeding to step S205, otherwise, keeping the global RBNN model unchanged and returning to step S203.

Step S205: and selecting the optimal individual in the current population according to the fitness value, and calculating Euler distances between the individuals in all the populations and the optimal individual.

Step S206: and eliminating the preset number of individuals with the farthest distances, retraining the global RBNN model by using the RBNN based on the rest individuals, and updating the global RBNN model.

In this embodiment, after the euler distances between the individuals in the whole population and the optimal individuals are calculated in step S205, the individuals may be ranked according to the euler distances, for example, the individuals are ranked in the order from far to near, and then the preset number of the individuals with the farthest distances are removed, for example, 10% of the individuals with the farthest distances are removed from all the individuals. This part of the individuals can be regarded as low-value individuals, the training of the model is not helpful, and here, 10% of the set values can be set according to the needs or experiments, and generally, the balance point is found, because too many rejects can generate information loss, and too few rejects can reduce the optimal convergence rate.

Step S207: judging whether the iterative optimization is completed, if not, returning to the step S203, and if so, entering the step S208.

Step S208: and selecting the top x individuals in the sequence from high to low of fitness values in the population individuals of the last generation after global iterative optimization, and simultaneously selecting the most excellent individuals in each generation in the last generation generated in the global iterative optimization process to form a local model data set. In this embodiment, by collecting partial individuals in other generations, overfitting and too small a range of partial dataset variables can be avoided.

Preferably, in this embodiment, the value of x may be D/2; when the most excellent individual in each generation in the last y generation is selected, the value of y can be D/2 with better value; the size of the composed local model dataset is D.

Further preferably, as shown in fig. 3, taking x and y as D/2 as examples, the present step may further include the following sub-steps:

step S2081: d/2 most excellent individuals in the final generation population individuals after global iterative optimization are selected, and meanwhile, the most excellent individuals in each generation in the final D/2 generation generated in the global iterative optimization process are selected.

Step S2082: the variance and mean of the dimensional elements in these individuals are calculated.

Step S2083: a local model dataset consisting of D new individuals is generated from the Gaussian model based on the variance and the mean.

Step S209: the local model dataset is trained using the RBNN to obtain a local RBNN model.

Step S210: and carrying out local iterative optimization on the local RBNN model by using the L-SHADE to obtain an optimization result, and taking the optimization result as a strategy for distributing resources of each tenant of the large data platform scheduling system.

In this step, "optimizing the local RBNN model by local iterative optimization by using L-SHADE" specifically refers to the foregoing steps S203-S207, and the details are repeated here. And performing iterative optimization on the local RBNN model to obtain an optimization result, namely an optimal strategy for the resource allocation of each tenant of the large data platform scheduling system.

As shown in fig. 4, which is a schematic diagram of the above-mentioned large data platform scheduling system resource allocation optimization method, it can be seen from fig. 4 that the present application is divided into two phases, the first phase is a global optimization phase, the second phase is a local optimization phase, both phases adopt an L-segment algorithm as an iterative optimization algorithm, firstly, a latin hypercube sampling is utilized to obtain a global data set, then the global data set is utilized to train to obtain a global RBNN model, the global RBNN model can output a strategy of large data platform scheduling system resource allocation, the global RBNN model can adopt an offline model updating strategy, namely, low-value sample points are continuously removed in the optimization process, and the strategy of using residual sample points to continuously update a global proxy model; then, carrying out iterative optimization on the global RBNN model by using an L-SHADE algorithm to obtain a final generation population individual; then filling the local data set by using a filling strategy, and training a local RBNN model by using the local data set; and finally, carrying out iterative optimization on the local RBNN model by using an L-SHADE algorithm to obtain an optimal solution, wherein the optimal solution is an optimal strategy for resource allocation of each tenant of the large data platform scheduling system.

Fig. 5 is a schematic structural diagram of a resource allocation optimizing device of a big data platform scheduling system according to an embodiment of the present application, where the device includes: the model setting unit 510, the global model training unit 520, the global optimizing unit 530, the local model training unit 540, and the local optimizing unit 550 are sequentially connected therebetween.

The model setting unit 510 is configured to minimize a total job running time as an optimization target of the big data platform scheduling system resource allocation, where the parameter to be optimized is a resource quota of each tenant in the big data platform, the number of problem dimensions is the number D of tenants, and the constraint is that the sum of the tenant quotas is smaller than the total resource amount of the big data platform scheduling system.

The global model training unit 520 is configured to generate a global model dataset in a variable space using a resource method, and train the global model dataset using the RBNN to obtain a global RBNN model.

The global optimization unit 530 is configured to perform global iterative optimization on the global RBNN model by using L-SHADE to obtain a final population individual.

The local model training unit 540 is configured to obtain a local model dataset according to the population individuals of the last generation using the filling criteria, and train the local model dataset using the RBNN to obtain a local RBNN model.

The local optimization unit 550 is configured to perform local iterative optimization on the local RBNN model by using L-SHADE to obtain an optimization result, and use the optimization result as a policy for allocating resources of each tenant of the big data platform scheduling system.

Preferably, as shown in fig. 6, the global optimization unit 530 may further include:

the update iteration module 531 is configured to perform population update iteration on the global RBNN model by using L-segment, and when the judgment unit judges that the population algebra is not an integer multiple of the number of the problem dimensions, continue to perform population update iteration on the global RBNN model by using L-segment;

the judging module 532 is configured to judge whether the population algebra is an integer multiple of the number of the problem dimensions;

the model evolution module 533 is configured to select, in response to the population algebra being an integer multiple of the number of problem dimensions, an optimal individual in the current population according to the fitness value, calculate euler distances between the individuals in all the populations and the optimal individual, reject a preset number of individuals with the farthest distances, and retrain the global RBNN model using RBNN based on the remaining individuals.

Preferably, the local model training unit 540 may specifically be configured to: and selecting the top x individuals in the sequence from high to low of fitness values in the population individuals of the last generation after global iterative optimization, and simultaneously selecting the most excellent individuals in each generation in the last generation generated in the global iterative optimization process to form a local model data set, and training the local model data set by using RBNN to obtain a local RBNN model.

Preferably, the local model training unit 540 selects the value of x as D/2 when the fitness value of the last generation population individuals is selected from the first x individuals in the high-to-low ranking; when the most excellent individual in each generation in the last y generation is selected, the value of y is D/2; the size of the composed local model dataset is D.

Preferably, the local model training unit 540 selects the first x individuals in the ranking from high to low of the fitness value in the population individuals of the last generation after the global iterative optimization, and simultaneously selects the most excellent individuals in each generation in the last y generations generated in the global iterative optimization process, so as to form the local model data set, where the steps include: selecting the top x individuals in the sequence of the fitness value from high to low in the population individuals of the last generation after global iterative optimization, and simultaneously selecting the most excellent individuals of each generation in the last generation generated in the global iterative optimization process; calculating variances and means of the dimension elements in the individuals; a local model dataset consisting of D new individuals is generated from the Gaussian model based on the variance and the mean.

The embodiment of the invention also provides electronic equipment, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the method when executing the program.

Embodiments of the present invention also provide a computer program product comprising a computer program/instruction which, when executed by a processor, performs the steps of the above method.

The embodiment of the invention also provides a computer readable storage medium, and the computer readable storage medium stores a computer program for executing the method.

As shown in fig. 7, the electronic device 600 may further include: a communication module 110, an input unit 120, an audio processor 130, a display 160, a power supply 170. It is noted that the electronic device 600 need not include all of the components shown in fig. 7; in addition, the electronic device 600 may further include components not shown in fig. 7, to which reference is made to the related art.

As shown in fig. 7, the central processor 100, sometimes also referred to as a controller or operational control, may include a microprocessor or other processor device and/or logic device, which central processor 100 receives inputs and controls the operation of the various components of the electronic device 600.

The memory 140 may be, for example, one or more of a buffer, a flash memory, a hard drive, a removable media, a volatile memory, a non-volatile memory, or other suitable device. The information about failure may be stored, and a program for executing the information may be stored. And the central processor 100 can execute the program stored in the memory 140 to realize information storage or processing, etc.

The input unit 120 provides an input to the central processor 100. The input unit 120 is, for example, a key or a touch input device. The power supply 170 is used to provide power to the electronic device 600. The display 160 is used for displaying display objects such as images and characters. The display may be, for example, but not limited to, an LCD display.

The memory 140 may be a solid state memory such as Read Only Memory (ROM), random Access Memory (RAM), SIM card, or the like. But also a memory which holds information even when powered down, can be selectively erased and provided with further data, an example of which is sometimes referred to as EPROM or the like. Memory 140 may also be some other type of device. Memory 140 includes a buffer memory 141 (sometimes referred to as a buffer). The memory 140 may include an application/function storage 142, the application/function storage 142 for storing application programs and function programs or a flow for executing operations of the electronic device 600 by the central processor 100.

The memory 140 may also include a data store 143, the data store 143 for storing data, such as contacts, digital data, pictures, sounds, and/or any other data used by the electronic device. The driver storage 144 of the memory 140 may include various drivers of the electronic device for communication functions and/or for performing other functions of the electronic device (e.g., messaging applications, address book applications, etc.).

The communication module 110 is a transmitter/receiver 110 that transmits and receives signals via an antenna 111. A communication module (transmitter/receiver) 110 is coupled to the central processor 100 to provide an input signal and receive an output signal, which may be the same as in the case of a conventional mobile communication terminal.

Based on different communication technologies, a plurality of communication modules 110, such as a cellular network module, a bluetooth module, and/or a wireless local area network module, etc., may be provided in the same electronic device. The communication module (transmitter/receiver) 110 is also coupled to a speaker 131 and a microphone 132 via an audio processor 130 to provide audio output via the speaker 131 and to receive audio input from the microphone 132 to implement usual telecommunication functions. The audio processor 130 may include any suitable buffers, decoders, amplifiers and so forth. In addition, the audio processor 130 is also coupled to the central processor 100 so that sound can be recorded locally through the microphone 132 and so that sound stored locally can be played through the speaker 131.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. The method for optimizing the resource allocation of the big data platform scheduling system is characterized by comprising the following steps:

minimizing the total operation running time as an optimization target of the large data platform scheduling system resource allocation, wherein parameters to be optimized are resource quotas of each tenant in the large data platform, the number of the problem dimensions is tenant number D, and the constraint is that the sum of the tenant quotas is smaller than the total resource amount of the large data platform scheduling system;

generating a global model dataset in a variable space by using a resource method, and training the global model dataset by using RBNN to obtain a global RBNN model;

performing global iterative optimization on the global RBNN model by using L-SHADE to obtain a final generation population individual;

obtaining a local model data set by adopting a filling criterion according to the population individuals of the last generation, and training the local model data set by using RBNN to obtain a local RBNN model;

and carrying out local iterative optimization on the local RBNN model by using the L-SHADE to obtain an optimization result, and taking the optimization result as a strategy for distributing resources of each tenant of the large data platform scheduling system.

2. The method for optimizing resource allocation of a big data platform scheduling system according to claim 1, wherein performing global iterative optimization on the global RBNN model by using L-segment to obtain a final generation population individual comprises:

performing population updating iteration on the global RBNN model by utilizing L-SHADE;

judging whether population algebra is an integer multiple of the number of the problem dimensions;

responding to the population algebra being an integer multiple of the number of the problem dimensions, selecting an optimal individual in the current population according to the fitness value, calculating Euler distances between the individuals in all the populations and the optimal individual, eliminating the preset number of individuals with the farthest distances, and retraining a global RBNN model by using RBNN based on the rest individuals;

and responding to the population algebra not being an integer multiple of the problem dimension number, keeping the global RBNN model unchanged, and continuing to carry out population updating iteration on the global RBNN model by using the L-SHADE.

3. The method for optimizing resource allocation for a big data platform scheduling system of claim 1, wherein obtaining a local model dataset based on the population individuals of the last generation using a fill criterion, and training the local model dataset using RBNN to obtain a local RBNN model comprises:

and selecting the top x individuals in the sequence from high to low of fitness values in the population individuals of the last generation after global iterative optimization, and simultaneously selecting the most excellent individuals in each generation in the last generation generated in the global iterative optimization process to form a local model data set, and training the local model data set by using RBNN to obtain a local RBNN model.

4. The method for optimizing resource allocation of big data platform scheduling system according to claim 3, wherein when the fitness value of the last generation population individuals is selected from the first x individuals in the high-to-low ranking, the value of x is D/2; when the most excellent individual in each generation in the last y generation is selected, the value of y is D/2; the size of the composed local model dataset is D.

5. The method for optimizing resource allocation of a big data platform scheduling system according to claim 3, wherein selecting the top x individuals in the ranking from high to low of fitness values in the population individuals of the last generation after the global iterative optimization, and simultaneously selecting the most excellent individuals in each generation in the last y generations generated in the global iterative optimization process to form the local model dataset comprises:

selecting the top x individuals in the sequence of the fitness value from high to low in the population individuals of the last generation after global iterative optimization, and simultaneously selecting the most excellent individuals of each generation in the last generation generated in the global iterative optimization process;

calculating variances and means of the dimension elements in the individuals;

a local model dataset consisting of D new individuals is generated from the Gaussian model based on the variance and the mean.

6. A big data platform scheduling system resource allocation optimizing device, the device comprising:

the model setting unit is used for minimizing the total operation running time as an optimization target of the large data platform scheduling system resource allocation, wherein parameters to be optimized are resource quotas of all tenants in the large data platform, the number of the problem dimensions is the number D of the tenants, and the constraint is that the sum of the tenant quotas is smaller than the total resource amount of the large data platform scheduling system;

the global model training unit is used for generating a global model data set in a variable space by utilizing a resource method, and training the global model data set by using RBNN to obtain a global RBNN model;

the global optimization unit is used for carrying out global iterative optimization on the global RBNN model by utilizing the L-SHADE to obtain a final generation population individual;

the local model training unit is used for obtaining a local model data set by adopting a filling criterion according to the population individuals of the last generation, and training the local model data set by using RBNN to obtain a local RBNN model;

the local optimization unit is used for carrying out local iterative optimization on the local RBNN model by utilizing the L-SHADE to obtain an optimization result, and taking the optimization result as a strategy of resource allocation of each tenant of the big data platform scheduling system.

7. The big data platform scheduling system resource allocation optimization device of claim 6, wherein the global optimization unit comprises:

the updating iteration module is used for carrying out population updating iteration on the global RBNN model by utilizing the L-SHADE, and when the judging unit judges that the population algebra is not an integer multiple of the number of the problem dimensions, continuing carrying out population updating iteration on the global RBNN model by utilizing the L-SHADE;

the judging module is used for judging whether population algebra is an integer multiple of the number of the problem dimensions;

and the model evolution module is used for responding to the population algebra which is an integer multiple of the problem dimension number, selecting the optimal individual in the current population according to the fitness value, calculating Euler distances between the individuals in all the populations and the optimal individual, eliminating the preset number of individuals with the farthest distances, and retraining the global RBNN model by using RBNN based on the rest individuals.

8. The apparatus for optimizing resource allocation of big data platform scheduling system according to claim 6, wherein the local model training unit is specifically configured to:

9. The apparatus for optimizing resource allocation of big data platform scheduling system according to claim 8, wherein when the fitness value of the last generation population individuals is selected from the first x individuals in the high-to-low ranking, the value of x is D/2; when the most excellent individual in each generation in the last y generation is selected, the value of y is D/2; the size of the composed local model dataset is D.

10. The apparatus for optimizing resource allocation of big data platform scheduling system according to claim 8, wherein the local model training unit selects the top x individuals in the ranking from high to low of fitness value in the last generation population individuals after global iterative optimization, and simultaneously selects the most excellent individuals in each generation in the last y generations generated in the global iterative optimization process to compose the local model data set, comprising:

calculating variances and means of the dimension elements in the individuals;

11. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed by the processor.

12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any one of claims 1 to 5.

13. A computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the method of any of claims 1 to 5.