CN116048785A - Elastic resource allocation method based on supervised learning and reinforcement learning - Google Patents

Elastic resource allocation method based on supervised learning and reinforcement learning Download PDF

Info

Publication number
CN116048785A
CN116048785A CN202211623990.5A CN202211623990A CN116048785A CN 116048785 A CN116048785 A CN 116048785A CN 202211623990 A CN202211623990 A CN 202211623990A CN 116048785 A CN116048785 A CN 116048785A
Authority
CN
China
Prior art keywords
neural network
training
resource allocation
data
optimal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211623990.5A
Other languages
Chinese (zh)
Inventor
李丽娜
许一鸣
李念峰
黄盛奎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changchun University
Original Assignee
Changchun University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changchun University filed Critical Changchun University
Priority to CN202211623990.5A priority Critical patent/CN116048785A/en
Publication of CN116048785A publication Critical patent/CN116048785A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an elastic resource allocation method based on supervised learning and reinforcement learning, which comprises the following steps: step one, dividing a first training set and a first testing set based on the crawled and processed testing data, and constructing a second training set at the same time; step two, a first neural network is established, training and parameter adjustment are carried out on the first neural network through a first training set based on a supervised learning technology, and a supervised model after training is adjusted and optimized through an RMSprop optimization algorithm and a first test set; thirdly, establishing a second neural network, initializing the second neural network, training and tuning the second neural network through a second training set based on a reinforcement learning technology, and tuning a trained reinforcement learning model through an Adam optimization algorithm; and fourthly, performing resource allocation on the real data set through the second neural network after training and optimizing, and outputting an optimal resource allocation result. The neural network is initialized by adopting a supervised learning technology and then the reinforcement learning technology is combined, so that the accuracy of resource allocation is effectively improved.

Description

Elastic resource allocation method based on supervised learning and reinforcement learning
Technical Field
The invention relates to an elastic resource allocation method based on supervised learning and reinforcement learning, and belongs to the field of data stream processing.
Background
With the tremendous growth of network data and the increasing real-time requirements for data processing, techniques for data stream processing are continually evolving. In a distributed stream processing system (hereinafter DSPS), a data stream processing application (hereinafter DSPA) is generally represented as a Directed Acyclic Graph (DAG). Points in the DAG represent operators that process the data stream and edges represent the data stream. The operator processes and converts the incoming data into a new data stream while receiving the data.
Each operator allocates a certain amount of resources (such as threads) and processes the data in parallel. The number of resources on an operator is called parallelism. The parallelism is insufficient, the data processing capacity of operators is weak, and the processing delay is long and the data are lost; if the parallelism is too large, the resource utilization rate of the DSPS is low. In addition, static resource allocation cannot meet real-time processing requirements due to the time variability and burstiness of streaming data. Thus, an accurate, dynamic resource allocation method is necessary, and also critical.
The common operator resource allocation method is mainly based on a threshold value, a queuing theory model, a control theory model and the like, and has the problems of inaccurate resource adjustment, high algorithm complexity and the like. In recent years, deep learning techniques have been applied to resource scheduling, including deep reinforcement learning models, which exhibit better performance in resource placement issues. However, the problem of dynamic operator resource allocation, that is, parallelism adjustment, is not solved, and the variable processing requirements of stream data cannot be accurately and efficiently adapted.
The popular supervised learning method carries out model training through the labels, the acquisition of the labels is time-consuming and difficult, and meanwhile, if the distribution of a training set and the distribution of real data have great difference, the performance of the model is influenced; the deep reinforcement learning belongs to model-free learning, and can be well adapted to the dynamic characteristics of data, however, the initial parameters of the deep neural network are random, if the model performance is better, the training time is long, the condition of non-convergence is easy to occur, and the training target cannot be achieved.
Disclosure of Invention
The invention designs and develops an elastic resource allocation method based on supervised learning and reinforcement learning, which initializes a deep neural network through the supervised learning and combines with the reinforcement learning to perform operator parallel resource allocation, thereby effectively ensuring the accuracy of resource allocation, being capable of being deployed into an actual data stream processing platform as a resource scheduling component thereof, improving the utilization rate of system resources and saving the energy consumption of the system with low resource adjustment cost.
The technical scheme provided by the invention is as follows:
an elastic resource allocation method based on supervised learning and reinforcement learning, comprising:
step one, dividing a first training set and a first testing set based on the test data obtained and processed by the web crawler, and constructing a second training set at the same time;
step two, a first neural network is established, training and parameter adjustment are carried out on the first neural network through a first training set based on a supervised learning technology, and a supervised learning model after optimization and training is carried out through an RMSprop optimization algorithm and a first test set;
thirdly, establishing a second neural network, initializing the second neural network, training and tuning the second neural network through a second training set based on a reinforcement learning technology, and tuning a trained reinforcement learning model through an Adam optimization algorithm;
and fourthly, performing resource allocation on the real data set through the second neural network after training and optimizing, and outputting an optimal resource allocation result.
Preferably, the first step includes:
dividing test data into a first part and a second part, wherein the first part of test data obtains the optimal parallelism of an operator through an optimal allocation algorithm according to the data processing capacity and the cache occupation amount of the operator, takes the optimal parallelism as the resource allocation optimal value and the supervision learning label of the operator, recalculates the cache occupation amount of the operator, and outputs the optimal parallelism and the cache occupation amount of the operator as the algorithm;
taking a set of four tuples consisting of the test data of the first part, the cache occupation amount returned by the optimal algorithm, the optimal solution and the data processing capacity of the CPU of the single thread as a data set, and according to 8:2 into a first training set and a first testing set;
and taking a set of the second part of test data and a binary group formed by the data processing capacity of the single-thread CPU as a second training set.
Preferably, the second step includes:
performing supervised learning through a first training set, training and parameter adjustment on a first neural network model by adopting a data batch training mode to obtain a first neural network meeting optimal resource allocation, and storing network parameters;
in the training process, for the first neural network after each optimization, verifying the proximity degree between the first neural network and the optimal resource allocation and the target thereof through a first test set, optimizing the first neural network until the expected target is met, and finishing the training.
Preferably, the third step includes:
constructing a second neural network with the same structure as the first neural network, reloading the optimal network parameters of the first neural network, and initializing the second neural network;
based on the initialized second neural network, combining reinforcement learning technology, taking the current data in the second training set, the CPU data processing amount of a single thread and the buffer occupation amount after the previous data processing as the input data of an operator, and training and optimizing the second neural network.
Preferably, in the first step, the optimal allocation thread number corresponding to each data is calculated through an optimal allocation algorithm, and the buffer occupancy amount β after T threads are allocated to process the data is obtained, where the optimal algorithm model is as follows:
Figure BDA0003999933030000031
Figure BDA0003999933030000032
wherein T is the optimal distribution thread number, C cpu Data throughput for a single thread, C cache And D is the input data quantity to be processed by the operator, and beta-is the buffer occupation quantity before the data processing.
Preferably, in the second step, training the first neural network by using a supervised learning technique includes:
inputs (D, beta, C) cpu ) Triad to first neural network, 2 nd Dense layer of first neural network outputting operator resource allocation result
Figure BDA0003999933030000033
Will->
Figure BDA0003999933030000034
And comparing the error value with T, calculating an error value between the error value and the T through a cross entropy function, and updating the network parameters of the first neural network through the error value and a feedforward gradient descent algorithm.
Preferably, in the third step, training the second neural network after the supervised learning initialization by the reinforcement learning technology includes:
in each training step t, according to the state s of the operator t Selecting one parallelism from a given parallelism set through a second neural network, and after executing corresponding resource allocation, transferring the operator state to the operator state s of the next training step t+1 t+1 At the same time the operator receives the reward r t Guiding the second neural network to select optimal resource allocation value parallelism T ', wherein the corresponding cache occupation amount beta' is as follows:
Figure BDA0003999933030000041
wherein D 'is the data to be processed, β' - Is the buffer occupation before the data processing.
Preferably, in the fourth step, a differential reward value r is set for reinforcement learning t Prize value meterThe calculation model is as follows:
when D '+beta' - >C cache In the time-course of which the first and second contact surfaces,
Figure BDA0003999933030000042
Figure BDA0003999933030000043
r t =one_reward;
when D '+beta' - ≤C cache
Figure BDA0003999933030000044
In the formula, the overflow_penalty is a prize value lacking one resource, the waste_penalty is a prize value exceeding one resource, and the one_re is a prize value when the resource allocation is optimal.
The beneficial effects of the invention are as follows:
the invention utilizes the deep neural network, combines the supervised learning technology with the model and the reinforcement learning technology without the model, thereby realizing the dynamic self-adaptive allocation of operator resources, improving the accuracy of resource allocation and reducing the time consumption and the cost of resource allocation.
According to the invention, supervised learning and reinforcement learning are combined, so that the step of preparing labels is avoided compared with the independent supervised learning, and the real-time processing requirement of stream data is met; compared with independent reinforcement learning, the training time is reduced, and the training efficiency and the model accuracy are effectively improved.
In the streaming computing platform, the self-adaptive elastic resource scheduling facing to operators is realized, the accurate resource allocation is beneficial to improving the performance of resource scheduling, and the resource use can be saved, so that the energy consumption of a system is further saved.
Drawings
Fig. 1 is a flow chart of an intelligent resource allocation method according to the present invention.
Fig. 2 is a schematic diagram of an intelligent resource allocation model according to an embodiment of the present invention.
Detailed Description
The present invention is described in further detail below with reference to the drawings to enable those skilled in the art to practice the invention by referring to the description.
As shown in fig. 1-2, the present invention provides an elastic resource allocation method based on supervised learning and reinforcement learning, which initializes a deep neural network through supervised learning and performs operator parallel resource allocation in combination with reinforcement learning, thereby effectively ensuring the accuracy of resource allocation, greatly reducing the training time of deep reinforcement learning, balancing the accuracy, the high efficiency and the low cost of resource allocation, and comprising:
step one, acquiring real data, preprocessing the real data, properly expanding the data volume by combining the distribution characteristics and the range of the real data, meeting the data requirement of a model, taking the data as test data, and dividing the test data into two parts;
the first part of test data is used as input data of an operator, an optimal allocation algorithm is adopted according to the data processing capacity and the cache occupation amount of the operator, so that the optimal parallelism of the operator, namely the thread number T, is obtained, meanwhile, the cache occupation amount beta of the operator, namely the thread number T and the cache occupation amount, are recalculated and output together as an algorithm, and T is an optimal value of operator resource allocation and a label for supervising learning;
forming four tuples of each input data of an operator, the buffer occupation amount returned by an optimal algorithm, the CPU data processing amount of a single thread and an optimal allocation thread, and then forming 8 for all the four tuples: 2 into a first training set and a first testing set;
calculating the optimal allocation thread number corresponding to each data through an optimal allocation algorithm, and simultaneously obtaining the cache occupation quantity beta after processing the data, wherein an optimal algorithm model is as follows:
Figure BDA0003999933030000061
Figure BDA0003999933030000062
wherein T is the optimal distribution thread number, C cpu Data throughput for a single thread, C cache And D is the input data quantity to be processed by the operator, and beta is the buffer occupation quantity before the data processing.
Step two, a first neural network is established, a first training set is input to train and tune the network offline through supervised learning, the network is continuously optimized by a first testing set in the training process, and then network parameters meeting expected targets are stored;
training and parameter adjustment are carried out on the first neural network model by using a first training set through supervised learning and adopting a data batch training mode, so as to obtain a first neural network meeting the optimal resource allocation, and network parameters are saved, wherein the method comprises the following steps:
in the training process, verifying the proximity degree of the first neural network after each optimization and the optimal resource allocation expected target through a first test set and optimizing until the expected target is met, and finishing the training;
inputs (D, beta, C) cpu ) Triad to first neural network, 2 nd Dense layer of first neural network outputting operator resource allocation result
Figure BDA0003999933030000063
Then will->
Figure BDA0003999933030000064
And comparing the error value with T, calculating an error value between the error value and the T through a cross entropy function, and updating the network parameters of the first neural network through the error value and a feedforward gradient descent algorithm.
Step three, taking the second part of test data divided in the step one as a second training set directly, then establishing a second neural network, and reloading the network parameters after supervision training as initial parameters of the network;
and (3) taking the second part of the test data divided in the step one as a second training set directly, constructing a second neural network with the same structure as the first neural network, and reloading the optimal network parameters of the first neural network, namely initializing the second neural network.
Step four, based on the second neural network after the initialization of the supervised learning, as a strategy network for reinforcement learning, further combining with reinforcement learning technology, updating and optimizing network parameters by using a second training set, wherein the reinforcement learning training comprises:
in each training step t, according to the operator state S t I.e. the input data and the cache occupation amount of the operator, selecting one parallelism in a given parallelism set through a second neural network, i.e. a strategy network, and after executing the resource allocation corresponding to the parallelism, the state of the operator transits to S t+1 (i.e., the operator state of the next training step t+1), while the operator receives the reward r t The reward value directs the second neural network to select parallelism toward an optimal resource allocation value;
based on the second neural network initialized by parameters, further combining reinforcement learning technology, taking the current data in the second training set, the CPU data processing amount of a single thread and the buffer occupation amount after the previous data processing as the input data of an operator, and training and optimizing the second neural network to obtain a second neural network close to an optimal resource allocation target;
training the second neural network after the supervised learning initialization through the reinforcement learning technology, wherein the second neural network comprises:
in each training step t, according to the state S of the operator t Selecting one parallelism from a given parallelism set through a second neural network, and after executing corresponding resource allocation, transferring the operator state to the operator state S of the next training step t+1 t+1 At the same time the operator receives the reward r t Guiding the second neural network to select optimal resource allocation value parallelism T ', wherein the corresponding cache occupation amount beta' is as follows:
Figure BDA0003999933030000071
wherein D 'is the data to be processed, β' - The buffer occupation before the data processing is performed;
setting differentiated prize value r for reinforcement learning t The calculation model of the reward value is:
when D '+beta' - >C cache In the time-course of which the first and second contact surfaces,
Figure BDA0003999933030000072
Figure BDA0003999933030000073
r t =one_reward: (3)
when D '+beta' - ≤C cache
Figure BDA0003999933030000074
Equation (1) is a prize value when the resource allocation is insufficient, wherein the overflow_penalty is a prize value lacking one resource;
equation (2) is a prize value for a resource over allocation, where the cost_penalty is a prize value for one more resource;
equation (3) is a prize value at the time of resource allocation optimization, and the value is equal to one_reorder.
And fifthly, testing the real data set by using the trained second neural network, and verifying whether the real data set can obtain an approximately optimal operator resource allocation result, namely operator parallelism.
And inputting the acquired real data and the buffer occupation amount of the operators into a trained second neural network, obtaining the parallelism of the operators, comparing the distribution result with an optimal solution, and checking the effect of model resource distribution.
Examples
The neural network is initialized by adopting a supervised learning technology, and then the network parameters are efficiently and accurately trained by further combining a reinforcement learning technology, so that the accuracy of resource allocation is effectively improved, and the method comprises the following steps:
operating environment: in the present invention, as a preference, python software is run with version number: python 3.7.12, framework for deep learning: the operating system of theano 1.0.5 and lasagne 0.1: windows 10, hardware configuration: processor AMD Ryzen 94900HS Radeon Graphics 3.00GHz, memory 32G RAM, graphics card NVIDIA GeForce RTX 2060with Max-Q design.
Initialization setting, namely setting the processing capacity of a single resource according to the resource demand and the total system resource of stream data processing in an application scene, specifically setting the data processing capacity C of each thread resource when the value range of data is 1-40000 tuples in each data processing time step cpu For 5000 tuples, per operator cache capacity C cache Set to 2000 tuples, use the formula
Figure BDA0003999933030000081
The upper limit N of the allocable resource allocation is calculated to be 8.
For visual representation, the intelligent resource allocation model is composed of a first neural network and a second neural network, wherein the first neural network is trained by adopting a supervised learning technology, and the second neural network is trained by adopting a deep reinforcement learning technology. In performing model training, the first and second neural networks are trained, respectively, using first and second training sets generated based on the real data.
Because the second neural network is initialized by using the trained first neural network parameters, the two are progressive relations, namely, network parameters stored after the training of the first neural network are needed to be loaded before the second neural network is trained, and then the training of the second neural network is carried out, and the method specifically comprises the following steps:
step one, building a training and testing data set;
collecting real data: by using the web crawler tool tweety provided by python, according to the tweed number, crawling English tweet text related to the COVID19 on the tweet to generate a COVID19_twitter real data set, wherein the time range is as follows: 2021-08-01 to 2021-10-01;
wherein the pushid number is obtained from the data text file in the corresponding time range in the Github project (https:// GitHub. Com/the panacealab/covid 19_twitter). The downloaded text data comprises the praise number, platform, text, comment, forwarding information, number and creation time of the text, and in the invention, as a preferable mode, the original data is the forwarding number of the text.
The original data is preprocessed. Sorting the original data according to time, counting the number of push forwarding per minute, and obtaining the data distribution condition: data in the range of 0-300 accounted for 99.8%. Expanding the part of data by 133 times, wherein the expanded data range is as follows: and 0-40000, and expanding the data quantity to 1001 ten thousand according to the random distribution characteristic to be used as test data.
Further, the test data are divided into 1 ten thousand and 1000 ten thousand parts: the test data of the first part is according to the optimal resource allocation algorithm, obtain parallelism and buffer occupation of the operator, and form the quadruple with data handling capacity and test data of CPU single thread, then according to 8: the ratio of 2 is divided into a first training set and a first testing set, which are used for training and optimizing a first neural network; the second portion of the test data is directly used as a second training set to train a second neural network.
Step two, two real data sets are established and used for testing the performance of the intelligent resource allocation model;
1) The real data set 1, the data source is text in the Beijing_2016_HourlyPM25 data set, the content is an air quality index obtained every hour in 2016, and specific data can be downloaded from a website (http:// www.stateair.net/web/assets/history/1/Beijing_ 2016_HourlyPM25_crea ted20170201.csv).
The downloaded data includes city, time, air quality index, and duration. In the present invention, as a preferred, the raw data is an air quality index.
The original data is processed, 45 invalid data smaller than 0 are removed, then every 10 data are summed, and the sum value is used as data of one unit of operator processing. After the operation is completed, the data larger than 4000 are removed, and the rest data are enlarged by 10 times, so that a first real data set is obtained.
2) The real data set 2, the data source is the text in SN m tot V2.0 data set, the content is the solar black sub-related data from 1749 1 month to 2021 month, and the specific data can be downloaded from website (https:// github. Com/karenlmasters/ASTR204 jupyterAssignment/blob/master/SN_m_tot_v2.0. Txt).
The downloaded data comprises the lunar calendar year, the year fraction in the middle of the corresponding month, the average total solar black sub-number in the month and the standard deviation of the average solar black sub-number in the month, and in the invention, the original data is the average total solar black sub-number in the month as a preferable mode.
Processing the original data, reserving the data with the proportion of 99.8% in the range of 0-300, and expanding 133 times to obtain a second real data set.
Step three, constructing an intelligent resource allocation model;
the first neural network and the second neural network are built based on DNN, the network structure is shown in figure 2, the first layer is an Input layer, the second layer is a Dense layer, and the third layer is a Dense layer and also serves as an output layer.
First neural network: layer 1 is Input layer, 1 neuron; layer 2 is the Dense layer, 20 neurons; the layer 3 is a Dense layer, 9 neurons are used as an output layer for outputting the parallelism result of the resource allocation of the operator.
Input format: [ None,3], wherein 3 represents 3 kinds of input data: streaming data, the buffer occupation amount after the previous resource allocation execution and the CPU data processing amount of a single thread, wherein None represents the number of groups input each time.
Output format: [ None,9], wherein 9 represents probability values corresponding to 9 parallelism outputted by the neural network after processing the data, none represents the number of groups outputted at a time.
And outputting probability values of the parallelism at the 2 nd Dense layer of the first neural network by adopting a Softmax function, selecting the parallelism corresponding to the maximum probability value as a resource allocation result of an operator, calculating an error value of the parallelism obtained by an optimal allocation algorithm according to a cross entropy function, and adjusting network parameters by combining a feedforward gradient descent method in deep learning.
The second neural network and the first neural network are identical in structure to the input and output formats.
Based on the second neural network, a reinforcement learning model is designed, including a set of states, a set of actions, and a reward function.
State set: the resource allocation state set of operators is { (D ', beta' - ,C cpu ) And (c) wherein the triplet (D ', β' - ,C cpu ) Representing the state of the system, consisting of the data D 'to be processed and the current buffer occupancy beta' - And CPU data throughput for a single thread.
Action set: the resource allocation action set of the operator is as follows
Figure BDA0003999933030000111
N is the maximum number of resource allocations, i.e. the maximum parallelism. When the action value is T ', 1.ltoreq.T '. Ltoreq.N, representing that the parallelism of resource allocation is T ', specifically, when the action is +.>
Figure BDA0003999933030000112
When no action is taken, i.e. the parallelism of the resource allocation remains unchanged.
Bonus function: since there are various cases of matching, excessive and insufficient between the resource allocation result and the optimal allocation value, differentiated bonus values are set, and specific bonus functions refer to formulas (1) - (4). Wherein one_rewind is set to 1, and both overflow_penalty and cost_penalty are set to-1.
In the 2 nd Dense layer of the second neural network, the probability values of all the parallelism in the current state are output by adopting a Softmax function, the parallelism corresponding to the maximum probability value is selected as the resource allocation result of an operator, and then the function log pi is used θ (s t ,a t )v t Calculating a loss value as a loss function, wherein v t Is the accumulated discount-prize value that,
Figure BDA0003999933030000113
and finally, combining the loss value, the network parameter and the learning rate, and adopting a feedforward gradient descent method to adjust the network parameter.
Training a first neural network, and training and adjusting parameters of the first neural network by combining data of a first training set with a supervised learning technology, wherein the learning rate is set to be 0.0004, the size of a data batch is set to be 32, the training times are 40, the network parameters are updated by using an RMSprop algorithm, the attenuation factor rho is set to be 0.9, and the blurring factor epsilon is set to be 10 -9
And aiming at the trained first neural network, testing by using a first testing set to obtain a result meeting expected performance, wherein the accuracy of resource allocation reaches about 96.4%, namely, the parallelism of the optimal allocation is close, and the trained network parameters are stored.
Step five, training a second neural network. Initializing a second neural network by utilizing parameter reloading of the first neural network, forming a triplet by data of a second training set and buffer occupation amount of an operator, taking the triplet as a state of the operator, and training and parameter adjustment by further combining reinforcement learning technology; the goal of reinforcement learning is to maximize the cumulative prize value, i.e., encourage optimal operator resource allocation; specifically, according to the accumulated rewarding value of each training, the network parameters of the second neural network are adjusted and updated, and the accuracy of resource allocation is improved.
In the training process, the Adam algorithm is used for updating strategy network parameters, the learning rate is set to be 0.0004, the training times are 500, and 20000 data are trained once.
And step six, comparing and testing the intelligent resource allocation model, taking the second neural network after training and optimizing as the trained intelligent resource allocation model, and respectively carrying out the resource allocation test on the 2 real data sets to obtain a result set of resource allocation, namely a parallelism set of operators.
Comparative example
In the invention, the intelligent resource allocation model is compared with two models, namely: the first neural network-based supervised learning model and the second neural network-based reinforcement learning model are the same as the model of the invention in terms of parameter setting, training set, test set and training times.
In the present invention, two main indicators are used for performance measurement: the accuracy and the jackpot value measure the performance of the model; the accuracy index indicates the matching degree of the distribution result of each method and the required optimal distribution value, and the greater the accuracy, the greater the probability of distributing the optimal result in training; the magnitude of the jackpot value generally reflects the magnitude of the deviation between the resource allocation result and the desired optimal allocation, with the larger the jackpot value, the closer to the optimal resource allocation.
The evaluation result of the reinforcement learning model is: accuracy of the covid19_twitter real dataset: 41.9% of the prize value being-50.86; accuracy of the beijin_2016_hourlypm 25 real dataset: 71.78, prize value 22.24; accuracy of sn_m_tot_v2.0 real dataset: 63.99%, with a prize value of 5.55.
The evaluation result of the model of the invention is: accuracy of the covid19_twitter real dataset: 91.95, a prize value of 83.87; accuracy of the beijin_2016_hourlypm 25 real dataset: 95.4% and a prize value of 90.6; accuracy of sn_m_tot_v2.0 real dataset: 94.11%, with a prize value of 88.82.
As can be seen from the comparison results, compared with the reinforcement learning model, the model of the invention has the advantages that the accuracy and the rewarding value are far superior to those of the reinforcement learning model, so that the model is described: compared with the reinforcement learning model, the error between the obtained resource allocation result and the optimal value is smaller, and the data convergence speed of the method model is also reflected to be faster, so that the combination of the supervision learning and the reinforcement learning is verified, and the resource allocation model based on the deep neural network can effectively perform high-quality and high-efficiency resource allocation.
In summary, when the method faces the problem of dynamic operator resource allocation in stream data processing, an intelligent resource allocation method based on supervised learning and reinforcement learning is provided, firstly, a supervised learning technology is adopted to train and tune a deep neural network, then, model parameters trained by the supervised learning are used to initialize a strategy network of a deep reinforcement learning model, the reinforcement learning technology is combined to train and optimize the strategy network, and the trained strategy network is used as an intelligent resource allocation model. Compared with a reinforcement learning-based method, the method can obtain better resource allocation performance with smaller training cost for time-varying and burst-oriented stream data. In the streaming computing platform, the flexible intelligent resource scheduling is beneficial to reducing the cost of resource scheduling, and improves the resource utilization rate and reduces the energy consumption of the system while meeting the application resource requirements.
Although embodiments of the present invention have been disclosed above, it is not limited to the details and embodiments shown, it is well suited to various fields of use, and further modifications may be readily apparent to those skilled in the art, without departing from the general concepts defined by the claims and the equivalents thereof, and therefore the invention is not limited to the specific details and illustrations shown and described herein.

Claims (8)

1. An elastic resource allocation method based on supervised learning and reinforcement learning is characterized by comprising the following steps:
step one, dividing a first training set and a first testing set based on the test data obtained and processed by the web crawler, and constructing a second training set at the same time;
step two, a first neural network is established, training and parameter adjustment are carried out on the first neural network through a first training set based on a supervised learning technology, and a supervised learning model after optimization and training is carried out through an RMSprop optimization algorithm and a first test set;
thirdly, establishing a second neural network, initializing the second neural network, training and tuning the second neural network through a second training set based on a reinforcement learning technology, and tuning a trained reinforcement learning model through an Adam optimization algorithm;
and fourthly, performing resource allocation on the real data set through the second neural network after training and optimizing, and outputting an optimal resource allocation result.
2. The method for flexible resource allocation based on supervised learning and reinforcement learning as set forth in claim 1, wherein the step one includes:
dividing test data into a first part and a second part, wherein the first part of test data obtains the optimal parallelism of an operator through an optimal allocation algorithm according to the data processing capacity and the cache occupation amount of the operator, takes the optimal parallelism as the resource allocation optimal value and the supervision learning label of the operator, recalculates the cache occupation amount of the operator, and outputs the optimal parallelism and the cache occupation amount of the operator as the algorithm;
taking a set of four tuples consisting of the test data of the first part, the cache occupation amount returned by the optimal algorithm, the optimal solution and the data processing capacity of the CPU of the single thread as a data set, and according to 8:2 into a first training set and a first testing set;
and taking a set of the second part of test data and a binary group formed by the data processing capacity of the single-thread CPU as a second training set.
3. The method for flexible resource allocation based on supervised learning and reinforcement learning according to claim 2, wherein the second step comprises:
performing supervised learning through a first training set, training and parameter adjustment on a first neural network model by adopting a data batch training mode to obtain a first neural network meeting optimal resource allocation, and storing network parameters;
in the training process, for the first neural network after each optimization, verifying the proximity degree between the first neural network and the optimal resource allocation and the target thereof through a first test set, optimizing the first neural network until the expected target is met, and finishing the training.
4. The method for flexible resource allocation based on supervised learning and reinforcement learning as set forth in claim 3, wherein the step three includes:
constructing a second neural network with the same structure as the first neural network, reloading the optimal network parameters of the first neural network, and initializing the second neural network;
based on the initialized second neural network, combining reinforcement learning technology, taking the current data in the second training set, the CPU data processing amount of a single thread and the buffer occupation amount after the previous data processing as the input data of an operator, and training and optimizing the second neural network.
5. The method for allocating elastic resources based on supervised learning and reinforcement learning according to claim 4, wherein in the first step, the optimal allocation thread number corresponding to each data is calculated by an optimal allocation algorithm, and the buffer occupancy amount β after T threads are allocated to process the data is obtained, and the optimal algorithm model is as follows:
Figure FDA0003999933020000021
/>
Figure FDA0003999933020000022
wherein T is the optimal distribution thread number, C cpu Data throughput for a single thread, C cache For the buffer capacity of the operator, D is the input data quantity to be processed by the operator, beta - Is the buffer occupation before the data processing.
6. The method for flexible resource allocation based on supervised learning and reinforcement learning according to claim 5, wherein in the second step, training the first neural network by the supervised learning technique comprises:
inputs (D, beta, C) cpu ) Triad to first neural network, 2 nd Dense layer of first neural network outputting operator resource allocation result
Figure FDA0003999933020000023
Will->
Figure FDA0003999933020000024
And comparing the error value with T, calculating an error value between the error value and the T through a cross entropy function, and updating the network parameters of the first neural network through the error value and a feedforward gradient descent algorithm.
7. The method for allocating elastic resources based on supervised learning and reinforcement learning according to claim 6, wherein in the third step, the second neural network after the initialization of the supervised learning is trained by reinforcement learning technology comprises:
in each training step t, according to the state s of the operator t Selecting one parallelism from a given parallelism set through a second neural network, and after executing corresponding resource allocation, transferring the operator state to the operator state s of the next training step t+1 t+1 At the same time the operator receives the reward r t Guiding the second neural network to select optimal resource allocation value parallelism T ', wherein the corresponding cache occupation amount beta' is as follows:
Figure FDA0003999933020000031
wherein D 'is the data to be processed, β' - Is the buffer occupation before the data processing.
8. The method for allocating elastic resources based on supervised learning and reinforcement learning according to claim 7, wherein the reinforcement learning is set with a differentiated prize value r in the fourth step t The calculation model of the reward value is:
when D '+beta' - >C cache In the time-course of which the first and second contact surfaces,
Figure FDA0003999933020000032
Figure FDA0003999933020000033
r t =one_reward;
when D '+beta' - ≤C cache
Figure FDA0003999933020000034
In the formula, the overflow_penalty is a prize value lacking one resource, the waste_penalty is a prize value exceeding one resource, and the one_re is a prize value when the resource allocation is optimal.
CN202211623990.5A 2022-12-15 2022-12-15 Elastic resource allocation method based on supervised learning and reinforcement learning Pending CN116048785A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211623990.5A CN116048785A (en) 2022-12-15 2022-12-15 Elastic resource allocation method based on supervised learning and reinforcement learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211623990.5A CN116048785A (en) 2022-12-15 2022-12-15 Elastic resource allocation method based on supervised learning and reinforcement learning

Publications (1)

Publication Number Publication Date
CN116048785A true CN116048785A (en) 2023-05-02

Family

ID=86117240

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211623990.5A Pending CN116048785A (en) 2022-12-15 2022-12-15 Elastic resource allocation method based on supervised learning and reinforcement learning

Country Status (1)

Country Link
CN (1) CN116048785A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663610A (en) * 2023-08-02 2023-08-29 荣耀终端有限公司 Scheduling network training method, task scheduling method and related equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116663610A (en) * 2023-08-02 2023-08-29 荣耀终端有限公司 Scheduling network training method, task scheduling method and related equipment
CN116663610B (en) * 2023-08-02 2023-12-19 荣耀终端有限公司 Scheduling network training method, task scheduling method and related equipment

Similar Documents

Publication Publication Date Title
CN110809772B (en) System and method for improving optimization of machine learning models
US9015083B1 (en) Distribution of parameter calculation for iterative optimization methods
CN107330902B (en) Chaotic genetic BP neural network image segmentation method based on Arnold transformation
CN113570039B (en) Block chain system based on reinforcement learning optimization consensus
WO2024087512A1 (en) Graph neural network compression method and apparatus, and electronic device and storage medium
CN108564592A (en) Based on a variety of image partition methods for being clustered to differential evolution algorithm of dynamic
CN112884236B (en) Short-term load prediction method and system based on VDM decomposition and LSTM improvement
CN116048785A (en) Elastic resource allocation method based on supervised learning and reinforcement learning
CN112001556A (en) Reservoir downstream water level prediction method based on deep learning model
CN112287990A (en) Model optimization method of edge cloud collaborative support vector machine based on online learning
CN115437795A (en) Video memory recalculation optimization method and system for heterogeneous GPU cluster load perception
CN116187835A (en) Data-driven-based method and system for estimating theoretical line loss interval of transformer area
CN116389270A (en) DRL (dynamic random link) joint optimization client selection and bandwidth allocation based method in federal learning
CN114972850A (en) Distribution inference method and device for multi-branch network, electronic equipment and storage medium
CN115543626A (en) Power defect image simulation method adopting heterogeneous computing resource load balancing scheduling
Alnowibet et al. An efficient algorithm for data parallelism based on stochastic optimization
CN112381591A (en) Sales prediction optimization method based on LSTM deep learning model
CN110880773A (en) Power grid frequency modulation control method based on combination of data driving and physical model driving
CN113705929B (en) Spring festival holiday load prediction method based on load characteristic curve and typical characteristic value fusion
CN115392441A (en) Method, apparatus, device and medium for on-chip adaptation of quantized neural network model
CN114969148A (en) System access amount prediction method, medium and equipment based on deep learning
CN109493065A (en) A kind of fraudulent trading detection method of Behavior-based control incremental update
CN110929849B (en) Video detection method and device based on neural network model compression
CN113379533A (en) Method, device, equipment and storage medium for improving circulating loan quota
CN111522240A (en) Four-rotor aircraft model, identification method, system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination