CN115499511A

CN115499511A - Micro-service active scaling method based on space-time diagram neural network load prediction

Info

Publication number: CN115499511A
Application number: CN202211442766.6A
Authority: CN
Inventors: 郑烇; 李峥; 李江明; 陈双武; 杨坚; 杨锋
Original assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Current assignee: Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date: 2022-11-18
Filing date: 2022-11-18
Publication date: 2022-12-20
Anticipated expiration: 2042-11-18
Also published as: CN115499511B

Abstract

The invention relates to the field of cloud computing, and discloses a micro-service active scaling method based on space-time diagram neural network load prediction, which introduces a space-time diagram neural network to predict the working load, and better embodies the spatial relation between different micro-services in a micro-service scene, so that more accurate prediction can be made; based on accurate prediction of workload, micro-services can be better balanced between the computational resources occupied by the micro-services and the quality of service provided through micro-service scaling decisions.

Description

Micro-service active scaling method based on space-time diagram neural network load prediction

Technical Field

The invention relates to the field of cloud computing, in particular to a microservice active scaling method based on space-time diagram neural network load prediction.

Background

With the rapid development of network services, the services provided by network application service providers are more and more complex, the functions are more and more, and simultaneously, services are rapidly expanded and iterated. Under this trend, microservices architecture arose. In the micro-service architecture, the whole network application program is divided into a plurality of micro-services which are independent from each other, and only other micro-services are called through network requests to acquire required information. Compared with the traditional network application, the micro-service architecture realizes application modularization and has higher expandability, fault tolerance and maintainability. In a cloud data center, in order to provide better service quality for micro services, more computing resources should be allocated to the micro services, and the better the micro services are, but allocating too many computing resources may result in too low resource utilization, thereby generating resource waste. Therefore, the data center must provide an efficient elastic scaling scheme for the microservice so as to meet the service quality requirements of the microservice and improve the utilization rate of computing resources as much as possible to reduce the operation cost of the microservice. Therefore, dynamic resource scheduling of micro services is of great interest to academia and industry, and automatic elastic scaling of micro services is a specific implementation manner of the micro services. The current elastic expansion schemes are mainly divided into two categories: one is a threshold-based reactive algorithm and the other is a prediction-based proactive algorithm. Reactive algorithms such as SmartVM can only react when workload changes have occurred, and therefore have hysteresis, are prone to jitter when workload changes are rapid, and stretch frequently causing unnecessary overhead. Active algorithms such as HANSEL and the like rely more on the accuracy of workload prediction. The existing prediction algorithm is mainly based on a regression theory or a traditional neural network, can only predict according to the historical time sequence of the micro-service workload, and cannot reflect the spatial relation among the micro-services. Therefore, a model which can simultaneously embody the time relation and the space relation of the workload of the micro-service is needed to predict, and then the elastic expansion and contraction of the micro-service are carried out based on the workload prediction.

Disclosure of Invention

In order to solve the technical problems, the invention provides a micro-service active scaling method based on space-time diagram neural network load prediction, which improves the utilization rate of computing resources to reduce the operation cost of a cloud computing center while ensuring the service quality of micro-services as much as possible by scaling and scheduling the computing resources occupied by the micro-services.

In order to solve the technical problem, the invention adopts the following technical scheme:

a micro-service active scaling method based on space-time diagram neural network load prediction comprises the following steps:

step one, modeling a micro service architecture:

the whole micro-service architecture comprises N micro-services and a set of micro-services

Ith microservice

Is represented by

(ii) a Wherein the content of the first and second substances,

representing microservices

The work load of (a) is,

representing microservices

The computing resources of (a) are set up,

representing microservices

The quality of service of (c); fixed calling relation exists among micro services, and set of calling relation exists

，

(ii) a Wherein the relationship is invoked

Representing microservices

To micro service

Calling relationship of, micro-service

When the workload of (2) changes, the relationship is called

Will micro-serve

The workload of (2) changes;

wherein, the ith micro-service

Attributes may also be expressed as

I.e. increase by one

The attributes of the data are then compared to the attributes,

representing microservices

The identification of (a);

step two, predicting the working load of the micro-service:

constructing and training a time-space diagram neural network consisting of a GAT network and a GRU network, and recording as a GAT-GRU network;

in a GAT-GRU network, the input comprises input data

And set of calling relationships

Wherein

Representing the length of the time series of the input data,

representing the number of microservices;

representing the characteristic number of the micro-service workload, wherein the characteristics of the micro-service workload comprise the CPU occupancy rate and the memory occupancy rate of the micro-service; firstly processing input data by a GAT layer GAT-1, then inputting the hidden state output by the GAT-1 into a GRU layer, then processing the hidden state output by the GRU layer by another GAT layer GAT-2 as the input of the GRU layer of the next time sequence, finally merging the hidden state output by each time sequence GRU, processing the merged state by a prediction layer, and finally outputtingRequired prediction data

(ii) a Wherein

Representing the temporal length of the predicted data;

step three, scaling decision of micro-service level:

adopting a DDPG model to decide whether each micro-service is scaled based on the prediction of the micro-service workload;

the environment state of the DDPG model comprises the resource occupation condition and the service quality condition of each micro service obtained from the prediction data; the resource occupation condition comprises the CPU occupancy rate, the memory occupancy rate and the number of the work copies of the microservice; the quality of service condition comprises an average request response time of the microservice;

the action set of the DDPG model comprises the capacity reduction, maintenance or capacity expansion of each micro-service; when the action value is larger than 1, carrying out capacity expansion, and taking the number of the working copies as the action value and rounding down; when the action value is smaller than-1, carrying out capacity reduction, wherein the number of the working copies is the action value and rounded up; maintaining the number of microservice working copies unchanged when the action value is between-1 and 1;

the reward of the DDPG model is the reciprocal of the weighted average of the average occupancy rate of the CPU of each micro service, the average occupancy rate of the memory and the normalized request response time.

Further, the prediction layer is formed by connecting several fully-connected layers in series.

Compared with the prior art, the invention has the beneficial technical effects that:

due to the fact that the space-time diagram neural network is introduced to predict the working load, the space relation among different micro services in a micro service scene is better reflected, and therefore more accurate prediction can be made. Based on accurate prediction of workload, the computational resources occupied by the micro-services and the quality of service provided can be better balanced through micro-service scaling decisions. And because the invention is based on the active expansion of prediction, can respond to the change of the working load in advance, prevent the system from difficult to respond in time and cause the service quality collapse or resource waste when the request quantity appears and changes greatly.

Drawings

FIG. 1 is a block diagram of a GAT-GRU network for microservice workload prediction in accordance with the present invention;

fig. 2 is an internal work flow diagram of a GRU network;

FIG. 3 is a flowchart illustrating an embodiment of micro-server resource scheduling according to the present invention.

Detailed Description

A preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.

The implementation of the invention is based on the combination of a space-time diagram neural network and a DDPG (Deep Deterministic Policy gradient) model.

The space-time diagram neural network is composed of a GRU network (Gated recurring Unit) network and a GAT network (Graph Attention Networks).

The conversion form of the GRU network is shown as follows:

； (1)

； (2)

； (3)

； (4)

wherein

The representative matrix is multiplied element by element,

、

and

respectively representing a reset gate, an update gate and a cell gate in the GRU network;

、

and

the parameters of the reset gate, the update gate and the cell gate respectively,

、

and

the offsets corresponding to the reset gate, the update gate and the cell gate respectively can be continuously learned in the training process;

is a hyperbolic tangent function, and

and

are respectively GRU network

Input and output of time of day.

The internal workflow of the GRU network is shown in fig. 2.

The transformation of the GAT network is shown as follows:

； (5)

； (6)

wherein

Representing the input data of the GAT network,

is a node in a GAT network

Feature vectors, nodes of

Node, node

Is a node

Of the node(s) of (a) is,

is a node

Is determined by the feature vector of (a),

is a node

Is determined by the feature vector of (a),

、

、

has a length of

；

Represents the output data of the GAT network,

the feature vector output after graph attention aggregation is carried out on each node in the GAT network, and the length is

；

Which is a function that is non-linear in expression,

in order to activate the function(s),

is a weight matrix of shape

，

Representing nodes

Is determined by the node of the neighbor node set,

the symbols represent vector concatenations, and

is a length of

The weight vector of (a) is calculated,

is composed of

The transposing of (1). Finally obtained from formula (6)

I.e. the attention coefficient, which represents the node

By its neighbor node

The extent of the effect. After a multi-head attention mechanism is adopted, input data are processed by using a plurality of same GAT networks at the same time, and then the average value or the concatenation of the outputs is taken when the input data are output.

When the GAT network is applied to the micro-service workload prediction in the present invention, the nodes of the GAT network are the micro-services in the present invention.

The above two networks are used to predict the workload of the microservice.

The DDPG model is a reinforced learning algorithm and is used for carrying out scaling decision of micro-services.

The DDPG model comprises four networks, respectively

A network,

A network,

Network and

a network. Wherein

The network is used to convert the input environmental state into an action value,

the network is used for pairing in corresponding environment states

The action value provided by the network is scored, and

and

networks for preventing respectively

Network and

the network fluctuates too much during a training session. The main workflow of the DDPG model is as follows:

(1) Random initialization

Parameters of a network

And

parameters of a network

；

(2) Initialization

Parameters of a network

And

parameters of a network

Let the parameters

And

are the same value, parameter

And with

The values of (A) and (B) are the same;

(3) Initializing a memory cache;

(4) For each round:

(5) Initializing a random variable with a mean value of 0 normal distribution;

(6) Obtaining initial states from an environment

；

(7) For each time step

：

(8) Selecting actions

And adding a random variable;

(9) Performing actions on an environment

And observe the reward

And new state

；

(10) Will be provided with

Storing the data into a memory cache;

(11) Selecting from a memory cache

A record therein of

Is recorded as

；

(12) Calculating target values for each record separately

；

(13) By minimizing a loss function

To update the Q network:

；

(14) Updating

Network:

(ii) a Wherein

Is composed of

The policy represented by the network;

(15) Soft update

And

a network;

(16) Ending a time step, if the state is not the final state or the time does not exceed the range, returning to the step (7), and executing the next time step;

(17) And (4) ending one round, returning to the step (4), and entering the next round.

In a scenario where a microservice is deployed in a cloud computing center, computing resources need to be provided to many microservices at the same time. Generally, for microservices, the more computing resources are obtained, the stronger the capacity of providing services is, and the more the self-service quality can be guaranteed. However, the cloud computing center cannot allocate computing resources to the micro-service without limit in order to improve the service quality of the micro-service, which may lead to an unlimited increase in the operation cost of the cloud computing center. Therefore, when the workload of the micro service is large, more computing resources are needed to be allocated to the micro service to ensure the service quality of the micro service, and when the micro service is relatively idle, certain computing resources should be recycled to prevent the waste of the computing resources caused by the low resource utilization rate of the micro service.

Therefore, an active resource scheduling method is needed, which predicts the workload of the microservice in the future by monitoring the workload data of the microservice in real time, and then determines whether to scale the microservice according to the predicted workload. The specific method is described as follows:

(1) Modeling a micro-service architecture. The whole micro service architecture is arranged to contain

A micro-service, use

Represents a collection of microservices, then

. For the first

Personal microservice

By using

Indicating its attributes. Wherein, the first and the second end of the pipe are connected with each other,

representing microservices

The identity of (2);

representing microservices

The workload of (2);

representing microservices

The computing resources of (1);

representing microservices

The quality of service of. This is achievedBesides, a fixed calling relationship also exists between the micro services, the fixed calling relationship is determined at the time of micro service design, and the calling relationship is expressed as

，

Wherein

Representing microservices

To micro service

The call relationship of (1). Due to this calling relationship

In the presence of a gas (b) in a gas (a),

when the workload of the system is changed, the system will be right

Also causing a certain impact.

(2) And predicting the work load of the micro-service. And (3) adopting a model combining the graph attention network and the recurrent neural network to predict the micro-service working load. Specifically, a time-space diagram neural network combining a GAT network and a GRU network is constructed and trained to perform workload prediction, and the GAT-GRU network is also referred to as a GAT-GRU network in the invention.

In the GAT-GRU network constructed by the invention, the input data of the network is

And

wherein

Representing the length of the time series of the input data,

representing the number of micro-services,

number of features representing microservice workload, here

The matrix of (1) corresponds to the input data in the GAT network in the foregoing

，

A set of call relationships representing the entire microservice architecture. The CPU occupancy rate and the memory occupancy rate of the micro-service are mainly considered, so

. The input data is first processed through a GAT layer and then the hidden state of its output is input into the GRU layer. And then, the hidden state output by the GRU layer is processed by another GAT layer to be used as the hidden input of the GRU of the next time sequence. Finally, merging the hidden outputs of GRUs in each time sequence, processing the merged outputs through a prediction layer, and finally outputting the required prediction data

Wherein

Represents the time-sequence length of the predicted data, here

Is a matrix ofCorresponding output data in the GAT network in the preamble

(ii) a The prediction layer is formed by connecting several full connection layers in series. The structure of the whole network is shown in fig. 1.

(3) And (4) scaling the decision of the micro-service level. A DDPG model is employed to decide whether each microservice scales based on predictions of microservice workload. The environment state comprises the resource occupation condition and the service quality condition of each micro service; the resource occupation condition specifically comprises CPU occupancy rate, memory occupancy rate and the number of work copies; the quality of service situation comprises in particular an average request response time. The action set comprises three types of selection of capacity reduction, maintenance and capacity expansion of each micro service, capacity expansion is carried out when the action value is larger than 1, and the number of the working copies is rounded down for the action value; when the action value is smaller than-1, carrying out capacity reduction, wherein the number of the working copies is the action value and rounded up; the number of microservice working copies is maintained constant when the action value is between-1 and 1. And the reward is the reciprocal of the weighted average of the average occupancy rate of the CPU, the average occupancy rate of the memory and the normalized request response time of each micro service.

Examples

The method is deployed on a micro-service architecture of a cloud computing center, and all micro-service node information in the micro-service architecture and calling relation information between micro-services are acquired. Then, every time, the micro-service active scaling method according to the invention performs horizontal scaling control on each micro-service, namely, controls the number of working copies of each micro-service. The specific implementation method is shown in fig. 3, and mainly includes the following steps:

monitoring the occupation condition of micro service resources: the method mainly monitors the number of the micro-service working copies, the CPU occupancy rate and the memory occupancy rate of all the micro-service working copies;

and (3) predicting the micro-service working load: predicting the resource occupation situation of all the micro-services at the next moment according to the resource occupation situation of all the micro-services in the past period, and specifically using the GAT-GRU network to predict the working load of the micro-services. Then, the GAT-GRU network is continuously trained according to the actual micro-service working load data acquired at the next moment, and the GAT-GRU network prediction capability is improved continuously;

performing a microservice level scaling decision: after the micro-service workload at the next moment is predicted, the predicted micro-service workload data is input into the DDPG model to obtain the output action value, and then the capacity expansion operation, the capacity reduction operation or no operation is determined for the corresponding micro-service according to the action value of each micro-service. The network is then trained and updated according to the algorithms of the DDPG model. After a period of training, the DDPG model can reach a relatively stable state and provide a better expansion decision;

performing horizontal scaling on all micro-services in the micro-service architecture according to scaling decisions: the number of the micro-service work copies is directly controlled by the micro-service work copy controller, so that the scaling decision is applied to the micro-service architecture.

The circles in FIG. 3 represent working copies of the microservice, the circles within a box represent working copies of the same microservice, the unfilled circles represent working copies in a normal operating state, the black filled circles represent working copies in an initialized state, and the dotted filled circles represent working copies in a destroyed state. The microservice architecture of FIG. 3 contains four microservices, which the present invention (1) monitors for resource usage data; (2) predict their future workload; (3) making a horizontal scaling decision for each micro-service; (4) and applying the horizontal scaling decision to the micro-service architecture through the work copy controller, and adjusting the quantity of the work copies of each micro-service.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims

1. A micro-service active scaling method based on space-time diagram neural network load prediction comprises the following steps:

step one, modeling a micro service architecture:

Ith micro service

Is represented by

(ii) a Wherein the content of the first and second substances,

representing microservices

The work load of (a) is,

representing microservices

The computing resources of (a) are,

representing microservices

Quality of service of (2); fixed calling relation exists among micro services, and set of calling relation exists

，

(ii) a Wherein the relationship is invoked

Representing microservices

To micro service

Calling relationship of, micro-service

When the workload of (2) changes, the relationship is called

Will micro-serve

The workload of (2) changes;

step two, predicting the working load of the micro-service:

constructing and training a space-time diagram neural network consisting of a GAT network and a GRU network, and recording the space-time diagram neural network as a GAT-GRU network;

in a GAT-GRU network, the input includes input data

And set of calling relationships

Wherein

Representing the length of the time series of the input data,

representing the number of microservices;

representing the characteristic number of the micro-service workload, wherein the characteristics of the micro-service workload comprise the CPU occupancy rate and the memory occupancy rate of the micro-service; firstly processing input data by a GAT layer GAT-1, then inputting the hidden state output by the GAT-1 into a GRU layer, then processing the hidden state output by the GRU layer by another GAT layer GAT-2 as the input of the GRU layer of the next time sequence, finally merging the hidden state output by each time sequence GRU, processing the merged state by a prediction layer, and finally outputting the required prediction data

(ii) a Wherein

Representing the temporal length of the predicted data;

step three, scaling decision of micro-service level:

the action set of the DDPG model comprises the capacity reduction, maintenance or capacity expansion of each micro-service; when the action value is larger than 1, carrying out capacity expansion, and taking the number of the working copies as the action value and rounding down; when the action value is smaller than-1, carrying out capacity reduction, wherein the number of the working copies is the action value and rounded up; maintaining the number of microservice work copies unchanged when the action value is between-1 and 1;