CN115499511A - Micro-service active scaling method based on space-time diagram neural network load prediction - Google Patents

Micro-service active scaling method based on space-time diagram neural network load prediction Download PDF

Info

Publication number
CN115499511A
CN115499511A CN202211442766.6A CN202211442766A CN115499511A CN 115499511 A CN115499511 A CN 115499511A CN 202211442766 A CN202211442766 A CN 202211442766A CN 115499511 A CN115499511 A CN 115499511A
Authority
CN
China
Prior art keywords
micro
service
gat
network
representing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211442766.6A
Other languages
Chinese (zh)
Other versions
CN115499511B (en
Inventor
郑烇
李峥
李江明
陈双武
杨坚
杨锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Original Assignee
Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Artificial Intelligence of Hefei Comprehensive National Science Center filed Critical Institute of Artificial Intelligence of Hefei Comprehensive National Science Center
Priority to CN202211442766.6A priority Critical patent/CN115499511B/en
Publication of CN115499511A publication Critical patent/CN115499511A/en
Application granted granted Critical
Publication of CN115499511B publication Critical patent/CN115499511B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1001Protocols in which an application is distributed across nodes in the network for accessing one among a plurality of replicated servers
    • H04L67/1004Server selection for load balancing
    • H04L67/1008Server selection for load balancing based on parameters of servers, e.g. available memory or workload
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of cloud computing, and discloses a micro-service active scaling method based on space-time diagram neural network load prediction, which introduces a space-time diagram neural network to predict the working load, and better embodies the spatial relation between different micro-services in a micro-service scene, so that more accurate prediction can be made; based on accurate prediction of workload, micro-services can be better balanced between the computational resources occupied by the micro-services and the quality of service provided through micro-service scaling decisions.

Description

Micro-service active scaling method based on space-time diagram neural network load prediction
Technical Field
The invention relates to the field of cloud computing, in particular to a microservice active scaling method based on space-time diagram neural network load prediction.
Background
With the rapid development of network services, the services provided by network application service providers are more and more complex, the functions are more and more, and simultaneously, services are rapidly expanded and iterated. Under this trend, microservices architecture arose. In the micro-service architecture, the whole network application program is divided into a plurality of micro-services which are independent from each other, and only other micro-services are called through network requests to acquire required information. Compared with the traditional network application, the micro-service architecture realizes application modularization and has higher expandability, fault tolerance and maintainability. In a cloud data center, in order to provide better service quality for micro services, more computing resources should be allocated to the micro services, and the better the micro services are, but allocating too many computing resources may result in too low resource utilization, thereby generating resource waste. Therefore, the data center must provide an efficient elastic scaling scheme for the microservice so as to meet the service quality requirements of the microservice and improve the utilization rate of computing resources as much as possible to reduce the operation cost of the microservice. Therefore, dynamic resource scheduling of micro services is of great interest to academia and industry, and automatic elastic scaling of micro services is a specific implementation manner of the micro services. The current elastic expansion schemes are mainly divided into two categories: one is a threshold-based reactive algorithm and the other is a prediction-based proactive algorithm. Reactive algorithms such as SmartVM can only react when workload changes have occurred, and therefore have hysteresis, are prone to jitter when workload changes are rapid, and stretch frequently causing unnecessary overhead. Active algorithms such as HANSEL and the like rely more on the accuracy of workload prediction. The existing prediction algorithm is mainly based on a regression theory or a traditional neural network, can only predict according to the historical time sequence of the micro-service workload, and cannot reflect the spatial relation among the micro-services. Therefore, a model which can simultaneously embody the time relation and the space relation of the workload of the micro-service is needed to predict, and then the elastic expansion and contraction of the micro-service are carried out based on the workload prediction.
Disclosure of Invention
In order to solve the technical problems, the invention provides a micro-service active scaling method based on space-time diagram neural network load prediction, which improves the utilization rate of computing resources to reduce the operation cost of a cloud computing center while ensuring the service quality of micro-services as much as possible by scaling and scheduling the computing resources occupied by the micro-services.
In order to solve the technical problem, the invention adopts the following technical scheme:
a micro-service active scaling method based on space-time diagram neural network load prediction comprises the following steps:
step one, modeling a micro service architecture:
the whole micro-service architecture comprises N micro-services and a set of micro-services
Figure 899400DEST_PATH_IMAGE001
Ith microservice
Figure 217248DEST_PATH_IMAGE002
Is represented by
Figure 732543DEST_PATH_IMAGE003
(ii) a Wherein the content of the first and second substances,
Figure 655500DEST_PATH_IMAGE004
representing microservices
Figure 965259DEST_PATH_IMAGE002
The work load of (a) is,
Figure 770404DEST_PATH_IMAGE005
representing microservices
Figure 823810DEST_PATH_IMAGE002
The computing resources of (a) are set up,
Figure 601273DEST_PATH_IMAGE006
representing microservices
Figure 81933DEST_PATH_IMAGE002
The quality of service of (c); fixed calling relation exists among micro services, and set of calling relation exists
Figure 639954DEST_PATH_IMAGE007
Figure 169155DEST_PATH_IMAGE008
(ii) a Wherein the relationship is invoked
Figure 129021DEST_PATH_IMAGE009
Representing microservices
Figure 780582DEST_PATH_IMAGE002
To micro service
Figure 560319DEST_PATH_IMAGE010
Calling relationship of, micro-service
Figure 893212DEST_PATH_IMAGE002
When the workload of (2) changes, the relationship is called
Figure 707584DEST_PATH_IMAGE009
Will micro-serve
Figure 530046DEST_PATH_IMAGE010
The workload of (2) changes;
wherein, the ith micro-service
Figure 731833DEST_PATH_IMAGE002
Attributes may also be expressed as
Figure 930733DEST_PATH_IMAGE011
I.e. increase by one
Figure 599612DEST_PATH_IMAGE012
The attributes of the data are then compared to the attributes,
Figure 592976DEST_PATH_IMAGE012
representing microservices
Figure 284988DEST_PATH_IMAGE002
The identification of (a);
step two, predicting the working load of the micro-service:
constructing and training a time-space diagram neural network consisting of a GAT network and a GRU network, and recording as a GAT-GRU network;
in a GAT-GRU network, the input comprises input data
Figure 287579DEST_PATH_IMAGE013
And set of calling relationships
Figure 810965DEST_PATH_IMAGE014
Wherein
Figure 912913DEST_PATH_IMAGE015
Representing the length of the time series of the input data,
Figure 154538DEST_PATH_IMAGE016
representing the number of microservices;
Figure 695241DEST_PATH_IMAGE017
representing the characteristic number of the micro-service workload, wherein the characteristics of the micro-service workload comprise the CPU occupancy rate and the memory occupancy rate of the micro-service; firstly processing input data by a GAT layer GAT-1, then inputting the hidden state output by the GAT-1 into a GRU layer, then processing the hidden state output by the GRU layer by another GAT layer GAT-2 as the input of the GRU layer of the next time sequence, finally merging the hidden state output by each time sequence GRU, processing the merged state by a prediction layer, and finally outputtingRequired prediction data
Figure 10816DEST_PATH_IMAGE018
(ii) a Wherein
Figure 611561DEST_PATH_IMAGE019
Representing the temporal length of the predicted data;
step three, scaling decision of micro-service level:
adopting a DDPG model to decide whether each micro-service is scaled based on the prediction of the micro-service workload;
the environment state of the DDPG model comprises the resource occupation condition and the service quality condition of each micro service obtained from the prediction data; the resource occupation condition comprises the CPU occupancy rate, the memory occupancy rate and the number of the work copies of the microservice; the quality of service condition comprises an average request response time of the microservice;
the action set of the DDPG model comprises the capacity reduction, maintenance or capacity expansion of each micro-service; when the action value is larger than 1, carrying out capacity expansion, and taking the number of the working copies as the action value and rounding down; when the action value is smaller than-1, carrying out capacity reduction, wherein the number of the working copies is the action value and rounded up; maintaining the number of microservice working copies unchanged when the action value is between-1 and 1;
the reward of the DDPG model is the reciprocal of the weighted average of the average occupancy rate of the CPU of each micro service, the average occupancy rate of the memory and the normalized request response time.
Further, the prediction layer is formed by connecting several fully-connected layers in series.
Compared with the prior art, the invention has the beneficial technical effects that:
due to the fact that the space-time diagram neural network is introduced to predict the working load, the space relation among different micro services in a micro service scene is better reflected, and therefore more accurate prediction can be made. Based on accurate prediction of workload, the computational resources occupied by the micro-services and the quality of service provided can be better balanced through micro-service scaling decisions. And because the invention is based on the active expansion of prediction, can respond to the change of the working load in advance, prevent the system from difficult to respond in time and cause the service quality collapse or resource waste when the request quantity appears and changes greatly.
Drawings
FIG. 1 is a block diagram of a GAT-GRU network for microservice workload prediction in accordance with the present invention;
fig. 2 is an internal work flow diagram of a GRU network;
FIG. 3 is a flowchart illustrating an embodiment of micro-server resource scheduling according to the present invention.
Detailed Description
A preferred embodiment of the present invention will be described in detail with reference to the accompanying drawings.
The implementation of the invention is based on the combination of a space-time diagram neural network and a DDPG (Deep Deterministic Policy gradient) model.
The space-time diagram neural network is composed of a GRU network (Gated recurring Unit) network and a GAT network (Graph Attention Networks).
The conversion form of the GRU network is shown as follows:
Figure 74904DEST_PATH_IMAGE020
; (1)
Figure 356981DEST_PATH_IMAGE021
; (2)
Figure 589379DEST_PATH_IMAGE022
; (3)
Figure 361026DEST_PATH_IMAGE023
; (4)
wherein
Figure 249347DEST_PATH_IMAGE024
The representative matrix is multiplied element by element,
Figure 131853DEST_PATH_IMAGE025
Figure 484337DEST_PATH_IMAGE026
and
Figure 426885DEST_PATH_IMAGE027
respectively representing a reset gate, an update gate and a cell gate in the GRU network;
Figure 802502DEST_PATH_IMAGE028
Figure 488699DEST_PATH_IMAGE029
and
Figure 633372DEST_PATH_IMAGE030
the parameters of the reset gate, the update gate and the cell gate respectively,
Figure 746822DEST_PATH_IMAGE031
Figure 672052DEST_PATH_IMAGE032
and
Figure 896360DEST_PATH_IMAGE033
the offsets corresponding to the reset gate, the update gate and the cell gate respectively can be continuously learned in the training process;
Figure 988268DEST_PATH_IMAGE034
is a hyperbolic tangent function, and
Figure 7040DEST_PATH_IMAGE035
and
Figure 419566DEST_PATH_IMAGE036
are respectively GRU network
Figure 385248DEST_PATH_IMAGE037
Input and output of time of day.
The internal workflow of the GRU network is shown in fig. 2.
The transformation of the GAT network is shown as follows:
Figure 566831DEST_PATH_IMAGE038
; (5)
Figure 22083DEST_PATH_IMAGE039
; (6)
wherein
Figure 594010DEST_PATH_IMAGE040
Representing the input data of the GAT network,
Figure 160120DEST_PATH_IMAGE041
is a node in a GAT network
Figure 461789DEST_PATH_IMAGE042
Feature vectors, nodes of
Figure 87942DEST_PATH_IMAGE043
Node, node
Figure 147165DEST_PATH_IMAGE044
Is a node
Figure 516967DEST_PATH_IMAGE045
Of the node(s) of (a) is,
Figure 673141DEST_PATH_IMAGE046
is a node
Figure 407879DEST_PATH_IMAGE043
Is determined by the feature vector of (a),
Figure 16715DEST_PATH_IMAGE047
is a node
Figure 924628DEST_PATH_IMAGE044
Is determined by the feature vector of (a),
Figure 872993DEST_PATH_IMAGE041
Figure 840949DEST_PATH_IMAGE046
Figure 937081DEST_PATH_IMAGE047
has a length of
Figure 586368DEST_PATH_IMAGE048
Figure 451556DEST_PATH_IMAGE049
Represents the output data of the GAT network,
Figure 590413DEST_PATH_IMAGE050
the feature vector output after graph attention aggregation is carried out on each node in the GAT network, and the length is
Figure 111524DEST_PATH_IMAGE051
Figure 626819DEST_PATH_IMAGE052
Which is a function that is non-linear in expression,
Figure 346513DEST_PATH_IMAGE053
in order to activate the function(s),
Figure 656272DEST_PATH_IMAGE054
is a weight matrix of shape
Figure 661750DEST_PATH_IMAGE055
Figure 715156DEST_PATH_IMAGE056
Representing nodes
Figure 554936DEST_PATH_IMAGE045
Is determined by the node of the neighbor node set,
Figure 973279DEST_PATH_IMAGE057
the symbols represent vector concatenations, and
Figure 531300DEST_PATH_IMAGE058
is a length of
Figure 122818DEST_PATH_IMAGE059
The weight vector of (a) is calculated,
Figure 817105DEST_PATH_IMAGE060
is composed of
Figure 671928DEST_PATH_IMAGE058
The transposing of (1). Finally obtained from formula (6)
Figure 451665DEST_PATH_IMAGE061
I.e. the attention coefficient, which represents the node
Figure 846874DEST_PATH_IMAGE045
By its neighbor node
Figure 598930DEST_PATH_IMAGE043
The extent of the effect. After a multi-head attention mechanism is adopted, input data are processed by using a plurality of same GAT networks at the same time, and then the average value or the concatenation of the outputs is taken when the input data are output.
When the GAT network is applied to the micro-service workload prediction in the present invention, the nodes of the GAT network are the micro-services in the present invention.
The above two networks are used to predict the workload of the microservice.
The DDPG model is a reinforced learning algorithm and is used for carrying out scaling decision of micro-services.
The DDPG model comprises four networks, respectively
Figure 421392DEST_PATH_IMAGE062
A network,
Figure 688426DEST_PATH_IMAGE063
A network,
Figure 887326DEST_PATH_IMAGE064
Network and
Figure 493888DEST_PATH_IMAGE065
a network. Wherein
Figure 487251DEST_PATH_IMAGE064
The network is used to convert the input environmental state into an action value,
Figure 241581DEST_PATH_IMAGE062
the network is used for pairing in corresponding environment states
Figure 916276DEST_PATH_IMAGE064
The action value provided by the network is scored, and
Figure 439661DEST_PATH_IMAGE063
and
Figure 869505DEST_PATH_IMAGE065
networks for preventing respectively
Figure 48814DEST_PATH_IMAGE062
Network and
Figure 323937DEST_PATH_IMAGE064
the network fluctuates too much during a training session. The main workflow of the DDPG model is as follows:
(1) Random initialization
Figure 967408DEST_PATH_IMAGE062
Parameters of a network
Figure 302575DEST_PATH_IMAGE066
And
Figure 969179DEST_PATH_IMAGE064
parameters of a network
Figure 47994DEST_PATH_IMAGE067
(2) Initialization
Figure 545971DEST_PATH_IMAGE063
Parameters of a network
Figure 258231DEST_PATH_IMAGE068
And
Figure 208870DEST_PATH_IMAGE065
parameters of a network
Figure 91375DEST_PATH_IMAGE069
Let the parameters
Figure 443859DEST_PATH_IMAGE068
And
Figure 324090DEST_PATH_IMAGE066
are the same value, parameter
Figure 762025DEST_PATH_IMAGE069
And with
Figure 182642DEST_PATH_IMAGE067
The values of (A) and (B) are the same;
(3) Initializing a memory cache;
(4) For each round:
(5) Initializing a random variable with a mean value of 0 normal distribution;
(6) Obtaining initial states from an environment
Figure 592894DEST_PATH_IMAGE070
(7) For each time step
Figure 706344DEST_PATH_IMAGE037
(8) Selecting actions
Figure 631575DEST_PATH_IMAGE071
And adding a random variable;
(9) Performing actions on an environment
Figure 793566DEST_PATH_IMAGE072
And observe the reward
Figure 855063DEST_PATH_IMAGE073
And new state
Figure 139413DEST_PATH_IMAGE074
(10) Will be provided with
Figure 489623DEST_PATH_IMAGE075
Storing the data into a memory cache;
(11) Selecting from a memory cache
Figure 517622DEST_PATH_IMAGE076
A record therein of
Figure 699205DEST_PATH_IMAGE077
Is recorded as
Figure 888878DEST_PATH_IMAGE078
(12) Calculating target values for each record separately
Figure 726384DEST_PATH_IMAGE079
(13) By minimizing a loss function
Figure 292494DEST_PATH_IMAGE080
To update the Q network:
Figure 594163DEST_PATH_IMAGE081
(14) Updating
Figure 892420DEST_PATH_IMAGE064
Network:
Figure 279539DEST_PATH_IMAGE082
(ii) a Wherein
Figure 649340DEST_PATH_IMAGE083
Is composed of
Figure 477619DEST_PATH_IMAGE064
The policy represented by the network;
(15) Soft update
Figure 274674DEST_PATH_IMAGE063
And
Figure 883510DEST_PATH_IMAGE065
a network;
(16) Ending a time step, if the state is not the final state or the time does not exceed the range, returning to the step (7), and executing the next time step;
(17) And (4) ending one round, returning to the step (4), and entering the next round.
In a scenario where a microservice is deployed in a cloud computing center, computing resources need to be provided to many microservices at the same time. Generally, for microservices, the more computing resources are obtained, the stronger the capacity of providing services is, and the more the self-service quality can be guaranteed. However, the cloud computing center cannot allocate computing resources to the micro-service without limit in order to improve the service quality of the micro-service, which may lead to an unlimited increase in the operation cost of the cloud computing center. Therefore, when the workload of the micro service is large, more computing resources are needed to be allocated to the micro service to ensure the service quality of the micro service, and when the micro service is relatively idle, certain computing resources should be recycled to prevent the waste of the computing resources caused by the low resource utilization rate of the micro service.
Therefore, an active resource scheduling method is needed, which predicts the workload of the microservice in the future by monitoring the workload data of the microservice in real time, and then determines whether to scale the microservice according to the predicted workload. The specific method is described as follows:
(1) Modeling a micro-service architecture. The whole micro service architecture is arranged to contain
Figure 57002DEST_PATH_IMAGE016
A micro-service, use
Figure 2437DEST_PATH_IMAGE084
Represents a collection of microservices, then
Figure 970393DEST_PATH_IMAGE001
. For the first
Figure 66525DEST_PATH_IMAGE045
Personal microservice
Figure 715812DEST_PATH_IMAGE002
By using
Figure 581000DEST_PATH_IMAGE011
Indicating its attributes. Wherein, the first and the second end of the pipe are connected with each other,
Figure 719857DEST_PATH_IMAGE012
representing microservices
Figure 240968DEST_PATH_IMAGE002
The identity of (2);
Figure 490684DEST_PATH_IMAGE004
representing microservices
Figure 475958DEST_PATH_IMAGE002
The workload of (2);
Figure 723399DEST_PATH_IMAGE005
representing microservices
Figure 794123DEST_PATH_IMAGE002
The computing resources of (1);
Figure 847530DEST_PATH_IMAGE006
representing microservices
Figure 687310DEST_PATH_IMAGE002
The quality of service of. This is achievedBesides, a fixed calling relationship also exists between the micro services, the fixed calling relationship is determined at the time of micro service design, and the calling relationship is expressed as
Figure 105653DEST_PATH_IMAGE007
Figure 398094DEST_PATH_IMAGE008
Wherein
Figure 255192DEST_PATH_IMAGE009
Representing microservices
Figure 887161DEST_PATH_IMAGE002
To micro service
Figure 538723DEST_PATH_IMAGE010
The call relationship of (1). Due to this calling relationship
Figure 584039DEST_PATH_IMAGE009
In the presence of a gas (b) in a gas (a),
Figure 979248DEST_PATH_IMAGE002
when the workload of the system is changed, the system will be right
Figure 465724DEST_PATH_IMAGE010
Also causing a certain impact.
(2) And predicting the work load of the micro-service. And (3) adopting a model combining the graph attention network and the recurrent neural network to predict the micro-service working load. Specifically, a time-space diagram neural network combining a GAT network and a GRU network is constructed and trained to perform workload prediction, and the GAT-GRU network is also referred to as a GAT-GRU network in the invention.
In the GAT-GRU network constructed by the invention, the input data of the network is
Figure 553766DEST_PATH_IMAGE013
And
Figure 820799DEST_PATH_IMAGE014
wherein
Figure 957383DEST_PATH_IMAGE015
Representing the length of the time series of the input data,
Figure 360682DEST_PATH_IMAGE016
representing the number of micro-services,
Figure 619625DEST_PATH_IMAGE017
number of features representing microservice workload, here
Figure 302849DEST_PATH_IMAGE085
The matrix of (1) corresponds to the input data in the GAT network in the foregoing
Figure 39860DEST_PATH_IMAGE086
Figure 563246DEST_PATH_IMAGE014
A set of call relationships representing the entire microservice architecture. The CPU occupancy rate and the memory occupancy rate of the micro-service are mainly considered, so
Figure 993090DEST_PATH_IMAGE087
. The input data is first processed through a GAT layer and then the hidden state of its output is input into the GRU layer. And then, the hidden state output by the GRU layer is processed by another GAT layer to be used as the hidden input of the GRU of the next time sequence. Finally, merging the hidden outputs of GRUs in each time sequence, processing the merged outputs through a prediction layer, and finally outputting the required prediction data
Figure 906819DEST_PATH_IMAGE018
Wherein
Figure 447522DEST_PATH_IMAGE019
Represents the time-sequence length of the predicted data, here
Figure 90993DEST_PATH_IMAGE085
Is a matrix ofCorresponding output data in the GAT network in the preamble
Figure 363843DEST_PATH_IMAGE088
(ii) a The prediction layer is formed by connecting several full connection layers in series. The structure of the whole network is shown in fig. 1.
(3) And (4) scaling the decision of the micro-service level. A DDPG model is employed to decide whether each microservice scales based on predictions of microservice workload. The environment state comprises the resource occupation condition and the service quality condition of each micro service; the resource occupation condition specifically comprises CPU occupancy rate, memory occupancy rate and the number of work copies; the quality of service situation comprises in particular an average request response time. The action set comprises three types of selection of capacity reduction, maintenance and capacity expansion of each micro service, capacity expansion is carried out when the action value is larger than 1, and the number of the working copies is rounded down for the action value; when the action value is smaller than-1, carrying out capacity reduction, wherein the number of the working copies is the action value and rounded up; the number of microservice working copies is maintained constant when the action value is between-1 and 1. And the reward is the reciprocal of the weighted average of the average occupancy rate of the CPU, the average occupancy rate of the memory and the normalized request response time of each micro service.
Examples
The method is deployed on a micro-service architecture of a cloud computing center, and all micro-service node information in the micro-service architecture and calling relation information between micro-services are acquired. Then, every time, the micro-service active scaling method according to the invention performs horizontal scaling control on each micro-service, namely, controls the number of working copies of each micro-service. The specific implementation method is shown in fig. 3, and mainly includes the following steps:
monitoring the occupation condition of micro service resources: the method mainly monitors the number of the micro-service working copies, the CPU occupancy rate and the memory occupancy rate of all the micro-service working copies;
and (3) predicting the micro-service working load: predicting the resource occupation situation of all the micro-services at the next moment according to the resource occupation situation of all the micro-services in the past period, and specifically using the GAT-GRU network to predict the working load of the micro-services. Then, the GAT-GRU network is continuously trained according to the actual micro-service working load data acquired at the next moment, and the GAT-GRU network prediction capability is improved continuously;
performing a microservice level scaling decision: after the micro-service workload at the next moment is predicted, the predicted micro-service workload data is input into the DDPG model to obtain the output action value, and then the capacity expansion operation, the capacity reduction operation or no operation is determined for the corresponding micro-service according to the action value of each micro-service. The network is then trained and updated according to the algorithms of the DDPG model. After a period of training, the DDPG model can reach a relatively stable state and provide a better expansion decision;
performing horizontal scaling on all micro-services in the micro-service architecture according to scaling decisions: the number of the micro-service work copies is directly controlled by the micro-service work copy controller, so that the scaling decision is applied to the micro-service architecture.
The circles in FIG. 3 represent working copies of the microservice, the circles within a box represent working copies of the same microservice, the unfilled circles represent working copies in a normal operating state, the black filled circles represent working copies in an initialized state, and the dotted filled circles represent working copies in a destroyed state. The microservice architecture of FIG. 3 contains four microservices, which the present invention (1) monitors for resource usage data; (2) predict their future workload; (3) making a horizontal scaling decision for each micro-service; (4) and applying the horizontal scaling decision to the micro-service architecture through the work copy controller, and adjusting the quantity of the work copies of each micro-service.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein, and any reference signs in the claims are not intended to be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present description refers to embodiments, not every embodiment may contain only a single embodiment, and such description is for clarity only, and those skilled in the art should integrate the description, and the embodiments may be combined as appropriate to form other embodiments understood by those skilled in the art.

Claims (1)

1. A micro-service active scaling method based on space-time diagram neural network load prediction comprises the following steps:
step one, modeling a micro service architecture:
the whole micro-service architecture comprises N micro-services and a set of micro-services
Figure 425788DEST_PATH_IMAGE001
Ith micro service
Figure 351019DEST_PATH_IMAGE002
Is represented by
Figure 513010DEST_PATH_IMAGE003
(ii) a Wherein the content of the first and second substances,
Figure 840086DEST_PATH_IMAGE004
representing microservices
Figure 62120DEST_PATH_IMAGE002
The work load of (a) is,
Figure 209068DEST_PATH_IMAGE005
representing microservices
Figure 237067DEST_PATH_IMAGE002
The computing resources of (a) are,
Figure 418649DEST_PATH_IMAGE006
representing microservices
Figure 811584DEST_PATH_IMAGE002
Quality of service of (2); fixed calling relation exists among micro services, and set of calling relation exists
Figure 711407DEST_PATH_IMAGE007
Figure 277518DEST_PATH_IMAGE008
(ii) a Wherein the relationship is invoked
Figure 251290DEST_PATH_IMAGE009
Representing microservices
Figure 877443DEST_PATH_IMAGE002
To micro service
Figure 998983DEST_PATH_IMAGE010
Calling relationship of, micro-service
Figure 321116DEST_PATH_IMAGE002
When the workload of (2) changes, the relationship is called
Figure 477291DEST_PATH_IMAGE009
Will micro-serve
Figure 274346DEST_PATH_IMAGE010
The workload of (2) changes;
step two, predicting the working load of the micro-service:
constructing and training a space-time diagram neural network consisting of a GAT network and a GRU network, and recording the space-time diagram neural network as a GAT-GRU network;
in a GAT-GRU network, the input includes input data
Figure 820865DEST_PATH_IMAGE011
And set of calling relationships
Figure 728778DEST_PATH_IMAGE012
Wherein
Figure 739459DEST_PATH_IMAGE013
Representing the length of the time series of the input data,
Figure 707415DEST_PATH_IMAGE014
representing the number of microservices;
Figure 741230DEST_PATH_IMAGE015
representing the characteristic number of the micro-service workload, wherein the characteristics of the micro-service workload comprise the CPU occupancy rate and the memory occupancy rate of the micro-service; firstly processing input data by a GAT layer GAT-1, then inputting the hidden state output by the GAT-1 into a GRU layer, then processing the hidden state output by the GRU layer by another GAT layer GAT-2 as the input of the GRU layer of the next time sequence, finally merging the hidden state output by each time sequence GRU, processing the merged state by a prediction layer, and finally outputting the required prediction data
Figure 452834DEST_PATH_IMAGE016
(ii) a Wherein
Figure 583601DEST_PATH_IMAGE017
Representing the temporal length of the predicted data;
step three, scaling decision of micro-service level:
adopting a DDPG model to decide whether each micro-service is scaled based on the prediction of the micro-service workload;
the environment state of the DDPG model comprises the resource occupation condition and the service quality condition of each micro service obtained from the prediction data; the resource occupation condition comprises the CPU occupancy rate, the memory occupancy rate and the number of the work copies of the microservice; the quality of service condition comprises an average request response time of the microservice;
the action set of the DDPG model comprises the capacity reduction, maintenance or capacity expansion of each micro-service; when the action value is larger than 1, carrying out capacity expansion, and taking the number of the working copies as the action value and rounding down; when the action value is smaller than-1, carrying out capacity reduction, wherein the number of the working copies is the action value and rounded up; maintaining the number of microservice work copies unchanged when the action value is between-1 and 1;
the reward of the DDPG model is the reciprocal of the weighted average of the average occupancy rate of the CPU of each micro service, the average occupancy rate of the memory and the normalized request response time.
CN202211442766.6A 2022-11-18 2022-11-18 Micro-service active scaling method based on space-time diagram neural network load prediction Active CN115499511B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211442766.6A CN115499511B (en) 2022-11-18 2022-11-18 Micro-service active scaling method based on space-time diagram neural network load prediction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211442766.6A CN115499511B (en) 2022-11-18 2022-11-18 Micro-service active scaling method based on space-time diagram neural network load prediction

Publications (2)

Publication Number Publication Date
CN115499511A true CN115499511A (en) 2022-12-20
CN115499511B CN115499511B (en) 2023-03-24

Family

ID=85116144

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211442766.6A Active CN115499511B (en) 2022-11-18 2022-11-18 Micro-service active scaling method based on space-time diagram neural network load prediction

Country Status (1)

Country Link
CN (1) CN115499511B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116257363A (en) * 2023-05-12 2023-06-13 中国科学技术大学先进技术研究院 Resource scheduling method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190379605A1 (en) * 2018-06-08 2019-12-12 Cisco Technology, Inc. Inferring device load and availability in a network by observing weak signal network based metrics
CN112199150A (en) * 2020-08-13 2021-01-08 北京航空航天大学 Online application dynamic capacity expansion and contraction method based on micro-service calling dependency perception
US20210266358A1 (en) * 2020-02-24 2021-08-26 Netapp, Inc. Quality of service (qos) settings of volumes in a distributed storage system
CN114020326A (en) * 2021-11-04 2022-02-08 砺剑防务技术(新疆)有限公司 Micro-service response time prediction method and system based on graph neural network
WO2022167840A1 (en) * 2021-02-04 2022-08-11 Telefonaktiebolaget Lm Ericsson (Publ) Profiling workloads using graph based neural networks in a cloud native environment
CN115037749A (en) * 2022-06-08 2022-09-09 山东省计算中心(国家超级计算济南中心) Performance-aware intelligent multi-resource cooperative scheduling method and system for large-scale micro-service

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190379605A1 (en) * 2018-06-08 2019-12-12 Cisco Technology, Inc. Inferring device load and availability in a network by observing weak signal network based metrics
US20210266358A1 (en) * 2020-02-24 2021-08-26 Netapp, Inc. Quality of service (qos) settings of volumes in a distributed storage system
CN112199150A (en) * 2020-08-13 2021-01-08 北京航空航天大学 Online application dynamic capacity expansion and contraction method based on micro-service calling dependency perception
WO2022167840A1 (en) * 2021-02-04 2022-08-11 Telefonaktiebolaget Lm Ericsson (Publ) Profiling workloads using graph based neural networks in a cloud native environment
CN114020326A (en) * 2021-11-04 2022-02-08 砺剑防务技术(新疆)有限公司 Micro-service response time prediction method and system based on graph neural network
CN115037749A (en) * 2022-06-08 2022-09-09 山东省计算中心(国家超级计算济南中心) Performance-aware intelligent multi-resource cooperative scheduling method and system for large-scale micro-service

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
耿德胜: "面向微服务架构的容器级弹性资源供给方法", 《信息与电脑(理论版)》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116257363A (en) * 2023-05-12 2023-06-13 中国科学技术大学先进技术研究院 Resource scheduling method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115499511B (en) 2023-03-24

Similar Documents

Publication Publication Date Title
CN111835827B (en) Internet of things edge computing task unloading method and system
Liu et al. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning
CN110780938B (en) Computing task unloading method based on differential evolution in mobile cloud environment
CN113852432B (en) Spectrum Prediction Sensing Method Based on RCS-GRU Model
CN115686846B (en) Container cluster online deployment method integrating graph neural network and reinforcement learning in edge calculation
CN115499511B (en) Micro-service active scaling method based on space-time diagram neural network load prediction
Golshani et al. Proactive auto-scaling for cloud environments using temporal convolutional neural networks
Gali et al. A Distributed Deep Meta Learning based Task Offloading Framework for Smart City Internet of Things with Edge-Cloud Computing.
CN116126534A (en) Cloud resource dynamic expansion method and system
CN113902116A (en) Deep learning model-oriented reasoning batch processing optimization method and system
Bian et al. Neural task scheduling with reinforcement learning for fog computing systems
Qazi et al. Towards quantum computing algorithms for datacenter workload predictions
Chai et al. A computation offloading algorithm based on multi-objective evolutionary optimization in mobile edge computing
da Silva et al. Online machine learning for auto-scaling in the edge computing
CN113553149A (en) Cloud server cluster load scheduling method, system, terminal and storage medium
CN116009990A (en) Cloud edge collaborative element reinforcement learning computing unloading method based on wide attention mechanism
CN115883371A (en) Virtual network function placement method based on learning optimization method in edge-cloud collaborative system
CN115934349A (en) Resource scheduling method, device, equipment and computer readable storage medium
Liu et al. Hidden markov model based spot price prediction for cloud computing
CN113157344B (en) DRL-based energy consumption perception task unloading method in mobile edge computing environment
WO2023272726A1 (en) Cloud server cluster load scheduling method and system, terminal, and storage medium
Nguyen et al. Reinforcement learning for maintenance decision-making of multi-state component systems with imperfect maintenance
Damaševičius et al. Short time prediction of cloud server round-trip time using a hybrid neuro-fuzzy network
Kumaran et al. Deep Reinforcement Learning algorithms for Low Latency Edge Computing Systems
Jananee et al. Allocation of cloud resources based on prediction and performing auto-scaling of workload

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant