CN116048785A

CN116048785A - Elastic resource allocation method based on supervised learning and reinforcement learning

Info

Publication number: CN116048785A
Application number: CN202211623990.5A
Authority: CN
Inventors: 李丽娜; 许一鸣; 李念峰; 黄盛奎
Original assignee: Changchun University
Current assignee: Changchun University
Priority date: 2022-12-15
Filing date: 2022-12-15
Publication date: 2023-05-02

Abstract

The invention discloses an elastic resource allocation method based on supervised learning and reinforcement learning, which comprises the following steps: step one, dividing a first training set and a first testing set based on the crawled and processed testing data, and constructing a second training set at the same time; step two, a first neural network is established, training and parameter adjustment are carried out on the first neural network through a first training set based on a supervised learning technology, and a supervised model after training is adjusted and optimized through an RMSprop optimization algorithm and a first test set; thirdly, establishing a second neural network, initializing the second neural network, training and tuning the second neural network through a second training set based on a reinforcement learning technology, and tuning a trained reinforcement learning model through an Adam optimization algorithm; and fourthly, performing resource allocation on the real data set through the second neural network after training and optimizing, and outputting an optimal resource allocation result. The neural network is initialized by adopting a supervised learning technology and then the reinforcement learning technology is combined, so that the accuracy of resource allocation is effectively improved.

Description

Elastic resource allocation method based on supervised learning and reinforcement learning

Technical Field

The invention relates to an elastic resource allocation method based on supervised learning and reinforcement learning, and belongs to the field of data stream processing.

Background

With the tremendous growth of network data and the increasing real-time requirements for data processing, techniques for data stream processing are continually evolving. In a distributed stream processing system (hereinafter DSPS), a data stream processing application (hereinafter DSPA) is generally represented as a Directed Acyclic Graph (DAG). Points in the DAG represent operators that process the data stream and edges represent the data stream. The operator processes and converts the incoming data into a new data stream while receiving the data.

Each operator allocates a certain amount of resources (such as threads) and processes the data in parallel. The number of resources on an operator is called parallelism. The parallelism is insufficient, the data processing capacity of operators is weak, and the processing delay is long and the data are lost; if the parallelism is too large, the resource utilization rate of the DSPS is low. In addition, static resource allocation cannot meet real-time processing requirements due to the time variability and burstiness of streaming data. Thus, an accurate, dynamic resource allocation method is necessary, and also critical.

The common operator resource allocation method is mainly based on a threshold value, a queuing theory model, a control theory model and the like, and has the problems of inaccurate resource adjustment, high algorithm complexity and the like. In recent years, deep learning techniques have been applied to resource scheduling, including deep reinforcement learning models, which exhibit better performance in resource placement issues. However, the problem of dynamic operator resource allocation, that is, parallelism adjustment, is not solved, and the variable processing requirements of stream data cannot be accurately and efficiently adapted.

The popular supervised learning method carries out model training through the labels, the acquisition of the labels is time-consuming and difficult, and meanwhile, if the distribution of a training set and the distribution of real data have great difference, the performance of the model is influenced; the deep reinforcement learning belongs to model-free learning, and can be well adapted to the dynamic characteristics of data, however, the initial parameters of the deep neural network are random, if the model performance is better, the training time is long, the condition of non-convergence is easy to occur, and the training target cannot be achieved.

Disclosure of Invention

The invention designs and develops an elastic resource allocation method based on supervised learning and reinforcement learning, which initializes a deep neural network through the supervised learning and combines with the reinforcement learning to perform operator parallel resource allocation, thereby effectively ensuring the accuracy of resource allocation, being capable of being deployed into an actual data stream processing platform as a resource scheduling component thereof, improving the utilization rate of system resources and saving the energy consumption of the system with low resource adjustment cost.

The technical scheme provided by the invention is as follows:

an elastic resource allocation method based on supervised learning and reinforcement learning, comprising:

step one, dividing a first training set and a first testing set based on the test data obtained and processed by the web crawler, and constructing a second training set at the same time;

step two, a first neural network is established, training and parameter adjustment are carried out on the first neural network through a first training set based on a supervised learning technology, and a supervised learning model after optimization and training is carried out through an RMSprop optimization algorithm and a first test set;

thirdly, establishing a second neural network, initializing the second neural network, training and tuning the second neural network through a second training set based on a reinforcement learning technology, and tuning a trained reinforcement learning model through an Adam optimization algorithm;

and fourthly, performing resource allocation on the real data set through the second neural network after training and optimizing, and outputting an optimal resource allocation result.

Preferably, the first step includes:

dividing test data into a first part and a second part, wherein the first part of test data obtains the optimal parallelism of an operator through an optimal allocation algorithm according to the data processing capacity and the cache occupation amount of the operator, takes the optimal parallelism as the resource allocation optimal value and the supervision learning label of the operator, recalculates the cache occupation amount of the operator, and outputs the optimal parallelism and the cache occupation amount of the operator as the algorithm;

taking a set of four tuples consisting of the test data of the first part, the cache occupation amount returned by the optimal algorithm, the optimal solution and the data processing capacity of the CPU of the single thread as a data set, and according to 8:2 into a first training set and a first testing set;

and taking a set of the second part of test data and a binary group formed by the data processing capacity of the single-thread CPU as a second training set.

Preferably, the second step includes:

performing supervised learning through a first training set, training and parameter adjustment on a first neural network model by adopting a data batch training mode to obtain a first neural network meeting optimal resource allocation, and storing network parameters;

in the training process, for the first neural network after each optimization, verifying the proximity degree between the first neural network and the optimal resource allocation and the target thereof through a first test set, optimizing the first neural network until the expected target is met, and finishing the training.

Preferably, the third step includes:

constructing a second neural network with the same structure as the first neural network, reloading the optimal network parameters of the first neural network, and initializing the second neural network;

based on the initialized second neural network, combining reinforcement learning technology, taking the current data in the second training set, the CPU data processing amount of a single thread and the buffer occupation amount after the previous data processing as the input data of an operator, and training and optimizing the second neural network.

Preferably, in the first step, the optimal allocation thread number corresponding to each data is calculated through an optimal allocation algorithm, and the buffer occupancy amount β after T threads are allocated to process the data is obtained, where the optimal algorithm model is as follows:

wherein T is the optimal distribution thread number, C _cpu Data throughput for a single thread, C _cache And D is the input data quantity to be processed by the operator, and beta-is the buffer occupation quantity before the data processing.

Preferably, in the second step, training the first neural network by using a supervised learning technique includes:

inputs (D, beta, C) _cpu ) Triad to first neural network, 2 nd Dense layer of first neural network outputting operator resource allocation result

Will->

And comparing the error value with T, calculating an error value between the error value and the T through a cross entropy function, and updating the network parameters of the first neural network through the error value and a feedforward gradient descent algorithm.

Preferably, in the third step, training the second neural network after the supervised learning initialization by the reinforcement learning technology includes:

in each training step t, according to the state s of the operator _t Selecting one parallelism from a given parallelism set through a second neural network, and after executing corresponding resource allocation, transferring the operator state to the operator state s of the next training step t+1 _t+1 At the same time the operator receives the reward r _t Guiding the second neural network to select optimal resource allocation value parallelism T ', wherein the corresponding cache occupation amount beta' is as follows:

wherein D 'is the data to be processed, β' ^- Is the buffer occupation before the data processing.

Preferably, in the fourth step, a differential reward value r is set for reinforcement learning _t Prize value meterThe calculation model is as follows:

when D '+beta' ^- ＞C _cache In the time-course of which the first and second contact surfaces,

r _t ＝one_reward；

when D '+beta' ^- ≤C _cache ，

In the formula, the overflow_penalty is a prize value lacking one resource, the waste_penalty is a prize value exceeding one resource, and the one_re is a prize value when the resource allocation is optimal.

The beneficial effects of the invention are as follows:

the invention utilizes the deep neural network, combines the supervised learning technology with the model and the reinforcement learning technology without the model, thereby realizing the dynamic self-adaptive allocation of operator resources, improving the accuracy of resource allocation and reducing the time consumption and the cost of resource allocation.

According to the invention, supervised learning and reinforcement learning are combined, so that the step of preparing labels is avoided compared with the independent supervised learning, and the real-time processing requirement of stream data is met; compared with independent reinforcement learning, the training time is reduced, and the training efficiency and the model accuracy are effectively improved.

In the streaming computing platform, the self-adaptive elastic resource scheduling facing to operators is realized, the accurate resource allocation is beneficial to improving the performance of resource scheduling, and the resource use can be saved, so that the energy consumption of a system is further saved.

Drawings

Fig. 1 is a flow chart of an intelligent resource allocation method according to the present invention.

Fig. 2 is a schematic diagram of an intelligent resource allocation model according to an embodiment of the present invention.

Detailed Description

The present invention is described in further detail below with reference to the drawings to enable those skilled in the art to practice the invention by referring to the description.

As shown in fig. 1-2, the present invention provides an elastic resource allocation method based on supervised learning and reinforcement learning, which initializes a deep neural network through supervised learning and performs operator parallel resource allocation in combination with reinforcement learning, thereby effectively ensuring the accuracy of resource allocation, greatly reducing the training time of deep reinforcement learning, balancing the accuracy, the high efficiency and the low cost of resource allocation, and comprising:

step one, acquiring real data, preprocessing the real data, properly expanding the data volume by combining the distribution characteristics and the range of the real data, meeting the data requirement of a model, taking the data as test data, and dividing the test data into two parts;

the first part of test data is used as input data of an operator, an optimal allocation algorithm is adopted according to the data processing capacity and the cache occupation amount of the operator, so that the optimal parallelism of the operator, namely the thread number T, is obtained, meanwhile, the cache occupation amount beta of the operator, namely the thread number T and the cache occupation amount, are recalculated and output together as an algorithm, and T is an optimal value of operator resource allocation and a label for supervising learning;

forming four tuples of each input data of an operator, the buffer occupation amount returned by an optimal algorithm, the CPU data processing amount of a single thread and an optimal allocation thread, and then forming 8 for all the four tuples: 2 into a first training set and a first testing set;

calculating the optimal allocation thread number corresponding to each data through an optimal allocation algorithm, and simultaneously obtaining the cache occupation quantity beta after processing the data, wherein an optimal algorithm model is as follows:

wherein T is the optimal distribution thread number, C _cpu Data throughput for a single thread, C _cache And D is the input data quantity to be processed by the operator, and beta is the buffer occupation quantity before the data processing.

Step two, a first neural network is established, a first training set is input to train and tune the network offline through supervised learning, the network is continuously optimized by a first testing set in the training process, and then network parameters meeting expected targets are stored;

training and parameter adjustment are carried out on the first neural network model by using a first training set through supervised learning and adopting a data batch training mode, so as to obtain a first neural network meeting the optimal resource allocation, and network parameters are saved, wherein the method comprises the following steps:

in the training process, verifying the proximity degree of the first neural network after each optimization and the optimal resource allocation expected target through a first test set and optimizing until the expected target is met, and finishing the training;

Then will->

Step three, taking the second part of test data divided in the step one as a second training set directly, then establishing a second neural network, and reloading the network parameters after supervision training as initial parameters of the network;

and (3) taking the second part of the test data divided in the step one as a second training set directly, constructing a second neural network with the same structure as the first neural network, and reloading the optimal network parameters of the first neural network, namely initializing the second neural network.

Step four, based on the second neural network after the initialization of the supervised learning, as a strategy network for reinforcement learning, further combining with reinforcement learning technology, updating and optimizing network parameters by using a second training set, wherein the reinforcement learning training comprises:

in each training step t, according to the operator state S _t I.e. the input data and the cache occupation amount of the operator, selecting one parallelism in a given parallelism set through a second neural network, i.e. a strategy network, and after executing the resource allocation corresponding to the parallelism, the state of the operator transits to S _t+1 (i.e., the operator state of the next training step t+1), while the operator receives the reward r _t The reward value directs the second neural network to select parallelism toward an optimal resource allocation value;

based on the second neural network initialized by parameters, further combining reinforcement learning technology, taking the current data in the second training set, the CPU data processing amount of a single thread and the buffer occupation amount after the previous data processing as the input data of an operator, and training and optimizing the second neural network to obtain a second neural network close to an optimal resource allocation target;

training the second neural network after the supervised learning initialization through the reinforcement learning technology, wherein the second neural network comprises:

wherein D 'is the data to be processed, β' ^- The buffer occupation before the data processing is performed;

setting differentiated prize value r for reinforcement learning _t The calculation model of the reward value is:

r _t ＝one_reward： (3)

when D '+beta' ^- ≤C _cache ，

Equation (1) is a prize value when the resource allocation is insufficient, wherein the overflow_penalty is a prize value lacking one resource;

equation (2) is a prize value for a resource over allocation, where the cost_penalty is a prize value for one more resource;

equation (3) is a prize value at the time of resource allocation optimization, and the value is equal to one_reorder.

And fifthly, testing the real data set by using the trained second neural network, and verifying whether the real data set can obtain an approximately optimal operator resource allocation result, namely operator parallelism.

And inputting the acquired real data and the buffer occupation amount of the operators into a trained second neural network, obtaining the parallelism of the operators, comparing the distribution result with an optimal solution, and checking the effect of model resource distribution.

Examples

The neural network is initialized by adopting a supervised learning technology, and then the network parameters are efficiently and accurately trained by further combining a reinforcement learning technology, so that the accuracy of resource allocation is effectively improved, and the method comprises the following steps:

operating environment: in the present invention, as a preference, python software is run with version number: python 3.7.12, framework for deep learning: the operating system of theano 1.0.5 and lasagne 0.1: windows 10, hardware configuration: processor AMD Ryzen 94900HS Radeon Graphics 3.00GHz, memory 32G RAM, graphics card NVIDIA GeForce RTX 2060with Max-Q design.

Initialization setting, namely setting the processing capacity of a single resource according to the resource demand and the total system resource of stream data processing in an application scene, specifically setting the data processing capacity C of each thread resource when the value range of data is 1-40000 tuples in each data processing time step _cpu For 5000 tuples, per operator cache capacity C _cache Set to 2000 tuples, use the formula

The upper limit N of the allocable resource allocation is calculated to be 8.

For visual representation, the intelligent resource allocation model is composed of a first neural network and a second neural network, wherein the first neural network is trained by adopting a supervised learning technology, and the second neural network is trained by adopting a deep reinforcement learning technology. In performing model training, the first and second neural networks are trained, respectively, using first and second training sets generated based on the real data.

Because the second neural network is initialized by using the trained first neural network parameters, the two are progressive relations, namely, network parameters stored after the training of the first neural network are needed to be loaded before the second neural network is trained, and then the training of the second neural network is carried out, and the method specifically comprises the following steps:

step one, building a training and testing data set;

collecting real data: by using the web crawler tool tweety provided by python, according to the tweed number, crawling English tweet text related to the COVID19 on the tweet to generate a COVID19_twitter real data set, wherein the time range is as follows: 2021-08-01 to 2021-10-01;

wherein the pushid number is obtained from the data text file in the corresponding time range in the Github project (https:// GitHub. Com/the panacealab/covid 19_twitter). The downloaded text data comprises the praise number, platform, text, comment, forwarding information, number and creation time of the text, and in the invention, as a preferable mode, the original data is the forwarding number of the text.

The original data is preprocessed. Sorting the original data according to time, counting the number of push forwarding per minute, and obtaining the data distribution condition: data in the range of 0-300 accounted for 99.8%. Expanding the part of data by 133 times, wherein the expanded data range is as follows: and 0-40000, and expanding the data quantity to 1001 ten thousand according to the random distribution characteristic to be used as test data.

Further, the test data are divided into 1 ten thousand and 1000 ten thousand parts: the test data of the first part is according to the optimal resource allocation algorithm, obtain parallelism and buffer occupation of the operator, and form the quadruple with data handling capacity and test data of CPU single thread, then according to 8: the ratio of 2 is divided into a first training set and a first testing set, which are used for training and optimizing a first neural network; the second portion of the test data is directly used as a second training set to train a second neural network.

Step two, two real data sets are established and used for testing the performance of the intelligent resource allocation model;

1) The real data set 1, the data source is text in the Beijing_2016_HourlyPM25 data set, the content is an air quality index obtained every hour in 2016, and specific data can be downloaded from a website (http:// www.stateair.net/web/assets/history/1/Beijing_ 2016_HourlyPM25_crea ted20170201.csv).

The downloaded data includes city, time, air quality index, and duration. In the present invention, as a preferred, the raw data is an air quality index.

The original data is processed, 45 invalid data smaller than 0 are removed, then every 10 data are summed, and the sum value is used as data of one unit of operator processing. After the operation is completed, the data larger than 4000 are removed, and the rest data are enlarged by 10 times, so that a first real data set is obtained.

2) The real data set 2, the data source is the text in SN m tot V2.0 data set, the content is the solar black sub-related data from 1749 1 month to 2021 month, and the specific data can be downloaded from website (https:// github. Com/karenlmasters/ASTR204 jupyterAssignment/blob/master/SN_m_tot_v2.0. Txt).

The downloaded data comprises the lunar calendar year, the year fraction in the middle of the corresponding month, the average total solar black sub-number in the month and the standard deviation of the average solar black sub-number in the month, and in the invention, the original data is the average total solar black sub-number in the month as a preferable mode.

Processing the original data, reserving the data with the proportion of 99.8% in the range of 0-300, and expanding 133 times to obtain a second real data set.

Step three, constructing an intelligent resource allocation model;

the first neural network and the second neural network are built based on DNN, the network structure is shown in figure 2, the first layer is an Input layer, the second layer is a Dense layer, and the third layer is a Dense layer and also serves as an output layer.

First neural network: layer 1 is Input layer, 1 neuron; layer 2 is the Dense layer, 20 neurons; the layer 3 is a Dense layer, 9 neurons are used as an output layer for outputting the parallelism result of the resource allocation of the operator.

Input format: [ None,3], wherein 3 represents 3 kinds of input data: streaming data, the buffer occupation amount after the previous resource allocation execution and the CPU data processing amount of a single thread, wherein None represents the number of groups input each time.

Output format: [ None,9], wherein 9 represents probability values corresponding to 9 parallelism outputted by the neural network after processing the data, none represents the number of groups outputted at a time.

And outputting probability values of the parallelism at the 2 nd Dense layer of the first neural network by adopting a Softmax function, selecting the parallelism corresponding to the maximum probability value as a resource allocation result of an operator, calculating an error value of the parallelism obtained by an optimal allocation algorithm according to a cross entropy function, and adjusting network parameters by combining a feedforward gradient descent method in deep learning.

The second neural network and the first neural network are identical in structure to the input and output formats.

Based on the second neural network, a reinforcement learning model is designed, including a set of states, a set of actions, and a reward function.

State set: the resource allocation state set of operators is { (D ', beta' ^- ，C _cpu ) And (c) wherein the triplet (D ', β' ^- ，C _cpu ) Representing the state of the system, consisting of the data D 'to be processed and the current buffer occupancy beta' ^- And CPU data throughput for a single thread.

Action set: the resource allocation action set of the operator is as follows

N is the maximum number of resource allocations, i.e. the maximum parallelism. When the action value is T ', 1.ltoreq.T '. Ltoreq.N, representing that the parallelism of resource allocation is T ', specifically, when the action is +.>

When no action is taken, i.e. the parallelism of the resource allocation remains unchanged.

Bonus function: since there are various cases of matching, excessive and insufficient between the resource allocation result and the optimal allocation value, differentiated bonus values are set, and specific bonus functions refer to formulas (1) - (4). Wherein one_rewind is set to 1, and both overflow_penalty and cost_penalty are set to-1.

In the 2 nd Dense layer of the second neural network, the probability values of all the parallelism in the current state are output by adopting a Softmax function, the parallelism corresponding to the maximum probability value is selected as the resource allocation result of an operator, and then the function log pi is used _θ (s _t ，a _t )v _t Calculating a loss value as a loss function, wherein v _t Is the accumulated discount-prize value that,

and finally, combining the loss value, the network parameter and the learning rate, and adopting a feedforward gradient descent method to adjust the network parameter.

Training a first neural network, and training and adjusting parameters of the first neural network by combining data of a first training set with a supervised learning technology, wherein the learning rate is set to be 0.0004, the size of a data batch is set to be 32, the training times are 40, the network parameters are updated by using an RMSprop algorithm, the attenuation factor rho is set to be 0.9, and the blurring factor epsilon is set to be 10 ^-9 。

And aiming at the trained first neural network, testing by using a first testing set to obtain a result meeting expected performance, wherein the accuracy of resource allocation reaches about 96.4%, namely, the parallelism of the optimal allocation is close, and the trained network parameters are stored.

Step five, training a second neural network. Initializing a second neural network by utilizing parameter reloading of the first neural network, forming a triplet by data of a second training set and buffer occupation amount of an operator, taking the triplet as a state of the operator, and training and parameter adjustment by further combining reinforcement learning technology; the goal of reinforcement learning is to maximize the cumulative prize value, i.e., encourage optimal operator resource allocation; specifically, according to the accumulated rewarding value of each training, the network parameters of the second neural network are adjusted and updated, and the accuracy of resource allocation is improved.

In the training process, the Adam algorithm is used for updating strategy network parameters, the learning rate is set to be 0.0004, the training times are 500, and 20000 data are trained once.

And step six, comparing and testing the intelligent resource allocation model, taking the second neural network after training and optimizing as the trained intelligent resource allocation model, and respectively carrying out the resource allocation test on the 2 real data sets to obtain a result set of resource allocation, namely a parallelism set of operators.

Comparative example

In the invention, the intelligent resource allocation model is compared with two models, namely: the first neural network-based supervised learning model and the second neural network-based reinforcement learning model are the same as the model of the invention in terms of parameter setting, training set, test set and training times.

In the present invention, two main indicators are used for performance measurement: the accuracy and the jackpot value measure the performance of the model; the accuracy index indicates the matching degree of the distribution result of each method and the required optimal distribution value, and the greater the accuracy, the greater the probability of distributing the optimal result in training; the magnitude of the jackpot value generally reflects the magnitude of the deviation between the resource allocation result and the desired optimal allocation, with the larger the jackpot value, the closer to the optimal resource allocation.

The evaluation result of the reinforcement learning model is: accuracy of the covid19_twitter real dataset: 41.9% of the prize value being-50.86; accuracy of the beijin_2016_hourlypm 25 real dataset: 71.78, prize value 22.24; accuracy of sn_m_tot_v2.0 real dataset: 63.99%, with a prize value of 5.55.

The evaluation result of the model of the invention is: accuracy of the covid19_twitter real dataset: 91.95, a prize value of 83.87; accuracy of the beijin_2016_hourlypm 25 real dataset: 95.4% and a prize value of 90.6; accuracy of sn_m_tot_v2.0 real dataset: 94.11%, with a prize value of 88.82.

As can be seen from the comparison results, compared with the reinforcement learning model, the model of the invention has the advantages that the accuracy and the rewarding value are far superior to those of the reinforcement learning model, so that the model is described: compared with the reinforcement learning model, the error between the obtained resource allocation result and the optimal value is smaller, and the data convergence speed of the method model is also reflected to be faster, so that the combination of the supervision learning and the reinforcement learning is verified, and the resource allocation model based on the deep neural network can effectively perform high-quality and high-efficiency resource allocation.

In summary, when the method faces the problem of dynamic operator resource allocation in stream data processing, an intelligent resource allocation method based on supervised learning and reinforcement learning is provided, firstly, a supervised learning technology is adopted to train and tune a deep neural network, then, model parameters trained by the supervised learning are used to initialize a strategy network of a deep reinforcement learning model, the reinforcement learning technology is combined to train and optimize the strategy network, and the trained strategy network is used as an intelligent resource allocation model. Compared with a reinforcement learning-based method, the method can obtain better resource allocation performance with smaller training cost for time-varying and burst-oriented stream data. In the streaming computing platform, the flexible intelligent resource scheduling is beneficial to reducing the cost of resource scheduling, and improves the resource utilization rate and reduces the energy consumption of the system while meeting the application resource requirements.

Although embodiments of the present invention have been disclosed above, it is not limited to the details and embodiments shown, it is well suited to various fields of use, and further modifications may be readily apparent to those skilled in the art, without departing from the general concepts defined by the claims and the equivalents thereof, and therefore the invention is not limited to the specific details and illustrations shown and described herein.

Claims

1. An elastic resource allocation method based on supervised learning and reinforcement learning is characterized by comprising the following steps:

2. The method for flexible resource allocation based on supervised learning and reinforcement learning as set forth in claim 1, wherein the step one includes:

3. The method for flexible resource allocation based on supervised learning and reinforcement learning according to claim 2, wherein the second step comprises:

4. The method for flexible resource allocation based on supervised learning and reinforcement learning as set forth in claim 3, wherein the step three includes:

5. The method for allocating elastic resources based on supervised learning and reinforcement learning according to claim 4, wherein in the first step, the optimal allocation thread number corresponding to each data is calculated by an optimal allocation algorithm, and the buffer occupancy amount β after T threads are allocated to process the data is obtained, and the optimal algorithm model is as follows:

/>

wherein T is the optimal distribution thread number, C _cpu Data throughput for a single thread, C _cache For the buffer capacity of the operator, D is the input data quantity to be processed by the operator, beta ^- Is the buffer occupation before the data processing.

6. The method for flexible resource allocation based on supervised learning and reinforcement learning according to claim 5, wherein in the second step, training the first neural network by the supervised learning technique comprises:

Will->

7. The method for allocating elastic resources based on supervised learning and reinforcement learning according to claim 6, wherein in the third step, the second neural network after the initialization of the supervised learning is trained by reinforcement learning technology comprises:

8. The method for allocating elastic resources based on supervised learning and reinforcement learning according to claim 7, wherein the reinforcement learning is set with a differentiated prize value r in the fourth step _t The calculation model of the reward value is:

r _t ＝one_reward；

when D '+beta' ^- ≤C _cache ，