KR20230089509A

KR20230089509A - Bidirectional Long Short-Term Memory based web application workload prediction method and apparatus

Info

Publication number: KR20230089509A
Application number: KR1020220027330A
Authority: KR
Inventors: 유명식; 당꽝녓밍; 당꽝?퓜?
Original assignee: 숭실대학교산학협력단
Priority date: 2021-12-13
Filing date: 2022-03-03
Publication date: 2023-06-20

Abstract

According to the present invention, disclosed are a method and a device for predicting a web application workload based on Bi-long short term memory (LSTM). According to the present invention, the device comprises: a processor; and a memory connected to the processor. The device of the present invention stores program instructions executed by the processor to output a first hidden state by inputting, into a first LSTM, an input sequence which is time series data on an application matrix including application throughput and average response time requested by a user and pod information on all currently running nodes, in a forward direction, to output a second hidden state by inputting the input sequence into a second LSTM in a backward direction, to calculate a future HTTP request workload using the first and second hidden states through an auto-scaler, and to calculate the number of pods required for provisioning or de-provisioning based on the future HTTP request workload through the auto-scaler. Therefore, a predicted speed faster than ARIMA can be achieved.

Description

Bidirectional Long Short-Term Memory based web application workload prediction method and apparatus based on Bi-LSTM

본 발명은 Bi-LSTM 기반 웹 애플리케이션 워크로드 예측 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for predicting a Web application workload based on Bi-LSTM.

클라우드 컴퓨팅의 핵심 요소는 가상화 기술로, 클라우드 공급자가 서버와 같은 동일한 컴퓨팅 장치에서 여러 운영 체제와 애플리케이션을 실행할 수 있도록 지원한다. A key element of cloud computing is virtualization technology, which allows cloud providers to run multiple operating systems and applications on the same computing device, such as a server.

전통적으로 하이퍼바이저 기반 가상화는 특정 리소스(예: CPU 코어, RAM 및 네트워크 대역폭)와 게스트 운영 체제를 사용하여 가상머신(VM)을 생성하는데 사용되는 기술이다. Traditionally, hypervisor-based virtualization is a technology used to create virtual machines (VMs) using specific resources (eg CPU cores, RAM and network bandwidth) and a guest operating system.

컨테이너 기반 가상화는 가상머신에 대한 대안으로 가상머신보다 시작 시간을 줄이고 리소스를 적게 소모한다. Container-based virtualization is an alternative to virtual machines, which takes less startup time and consumes fewer resources than virtual machines.

클라우드 컴퓨팅의 주요 특징은 탄력성으로 애플리케이션 소유자가 비용을 절감하면서 성능을 향상시키려는 요구에 따라 리소스를 프로비저닝하거나 프로비저닝 해제하여 인터넷 기반 서비스에 내재된 예측할 수 없는 워크로드를 관리할 수 있다. A key feature of cloud computing is elasticity, which allows application owners to provision and de-provision resources on demand to increase performance while reducing costs to manage the unpredictable workload inherent in Internet-based services.

오토 스케일링(autoscaling)은 리소스를 동적으로 획득 및 해제하는 프로세스이며 리액티브(reactive) 및 프로액티브(proactive)로 분류할 수 있다. Autoscaling is a process of dynamically acquiring and releasing resources and can be classified into reactive and proactive.

적절한 오토 스케일링을 선택하면 CPU 사용률, 메모리 및 응답 시간과 같은 파라미터의 품질에 영향을 줄 수 있다. Choosing the right autoscaling can affect the quality of parameters such as CPU utilization, memory and response time.

하이퍼바이저 기반의 오토 스케일링 솔루션과 관련된 많은 방법이 제안된 반면, 컨테이너 기반 솔루션과 관련된 작업은 아직 초기 단계에 있다. While many approaches related to hypervisor-based auto-scaling solutions have been proposed, work related to container-based solutions is still in its infancy.

기존의 많은 리액티브 오토 스케일링 솔루션은 CPU 사용률 및 메모리와 같은 인프라 수준 메트릭을 기반으로 하는 임계값이 있는 규칙을 사용한다. 이러한 솔루션은 구현하기 쉽지만 사용자 행동에 따라 워크로드가 지속적으로 변동하기 때문에 임계값에 대한 올바른 값을 선택하는 것이 어렵다. 프로액티브 솔루션을 위해 시계열 데이터 분석을 사용하는 많은 방법이 제안되고 있으나, 오토스케일러 설계를 위해 기계 학습 알고리즘을 사용하는 방법은 소수에 불과하다.Many existing reactive auto-scaling solutions use rules with thresholds based on infrastructure-level metrics such as CPU utilization and memory. Although these solutions are easy to implement, choosing the right value for the threshold is difficult because the workload constantly fluctuates based on user behavior. Many methods have been proposed that use time-series data analysis for proactive solutions, but only a few use machine learning algorithms for autoscaler design.

KR 공개특허공보 10-2021-0066697KR Patent Publication No. 10-2021-0066697

상기한 종래기술의 문제점을 해결하기 위해, 본 발명은 리소스 추정 정확도를 높이고 워크로드 버스트를 잘 처리할 수 있는 Bi-LSTM 기반 웹 애플리케이션 워크로드 예측 방법 및 장치를 제안하고자 한다. In order to solve the problems of the prior art, the present invention proposes a Bi-LSTM based web application workload prediction method and apparatus capable of increasing resource estimation accuracy and handling workload bursts well.

상기한 바와 같은 목적을 달성하기 위하여, 본 발명의 일 실시예에 따르면, 양방향 장단기 메모리를 이용한 웹 애플리케이션 워크로드 예측 장치로서, 프로세서; 및 상기 프로세서에 연결되는 메모리를 포함하되, 제1 LSTM(Long Short Term Memory)에 사용자가 요청하는 애플리케이션 처리량 및 평균 응답 시간을 포함하는 애플리케이션 메트릭 및 현재 실행 중인 모든 노드의 포드(Pod) 정보에 대한 시계열 데이터인 입력 시퀀스를 순방향으로 입력하여 제1 은닉 상태를 출력하고, 제2 LSTM에 상기 입력 시퀀스를 역방향으로 입력하여 제2 은닉 상태를 출력하고, 오토 스케일러를 통해 상기 제1 은닉 상태 및 상기 제2 은닉 상태를 이용하여 미래 HTTP 요청 워크로드를 계산하고, 상기 오토 스케일러를 통해 상기 미래 HTTP 요청 워크로드를 기반으로 프로비저닝 또는 디프로비저닝에 필요한 포드 수를 계산하도록, 상기 프로세서에 의해 실행되는 프로그램 명령어들을 저장하는 웹 애플리케이션 워크로드 예측 장치가 제공된다. In order to achieve the above object, according to an embodiment of the present invention, an apparatus for predicting a web application workload using a bidirectional short-term memory is provided, comprising: a processor; And a memory connected to the processor, including a first LSTM (Long Short Term Memory) for application metrics including application throughput and average response time requested by the user and pod information of all currently running nodes A first hidden state is output by inputting an input sequence that is time series data in a forward direction, and a second hidden state is output by inputting the input sequence in a reverse direction to a second LSTM, and the first hidden state and the first hidden state are output through an auto scaler 2 Program instructions executed by the processor to calculate a future HTTP request workload using the hidden state and calculate the number of pods required for provisioning or deprovisioning based on the future HTTP request workload through the auto scaler A web application workload prediction apparatus for storing is provided.

상기 오토 스케일러에는 상기 필요한 포드 수를 계산하는 스케일링 결정 후 냉각 시간(Cooling down time, CDT)이 설정될 수 있다. A cooling down time (CDT) may be set in the auto scaler after determining the scaling for calculating the required number of pods.

상기 오토 스케일러는 상기 필요한 포드 수가 현재 포드 수보다 작으면 리소스 제거 전략(Resource removal strategy, RSS)을 따라 잉여 포드를 제거할 수 있다. The auto scaler may remove surplus pods according to a resource removal strategy (RSS) when the required number of pods is smaller than the current number of pods.

상기 오토 스케일러는 상기 제1 은닉 상태 및 상기 제2 은닉 상태를 결합한 출력 시퀀스를 입력으로 하여 상기 미래 HTTP 요청 워크로드를 계산하는 어댑션 매니저 서비스를 포함할 수 있다. The auto scaler may include an adaptation manager service that calculates the future HTTP request workload by taking an output sequence obtained by combining the first hidden state and the second hidden state as an input.

쿠버네티스 마스터 노드의 쿠버네티스 엔진은 상기 어댑션 매니저 서비스로부터 명령을 받아 포드 수를 변경할 수 있다. The Kubernetes engine of the Kubernetes master node can change the number of pods by receiving commands from the adaptation manager service.

본 발명의 다른 측면에 따르면, 양방향 장기 단기 메모리를 이용한 웹 애플리케이션 워크로드 예측 시스템으로서, 사용자가 요청하는 애플리케이션 처리량 및 평균 응답 시간을 포함하는 애플리케이션 메트릭을 수집하는 애플리케이션 메트릭 수집기; 상기 애플리케이션 메트릭과 마스터 노드에서 검색된 현재 실행 중인 모든 노드의 포드(Pod) 정보를 포함하는 시계열 데이터를 집계하여 시계열 데이터베이스에 저장하는 모니터링 서버; 및 상기 시계열 데이터에 상응하는 입력 시퀀스를 순방향으로 입력하여 제1 은닉 상태를 출력하는 제1 LSTM, 상기 입력 시퀀스를 역방향으로 입력하여 제2 은닉 상태를 출력하는 제2 LSTM 및 상기 제1 은닉 상태 및 상기 제2 은닉 상태를 이용하여 미래 HTTP 요청 워크로드를 계산하고 상기 미래 HTTP 요청 워크로드를 기반으로 프로비저닝 또는 디프로비저닝에 필요한 포드 수를 계산하는 어댑션 매니저 서비스를 포함하는 프로액티브 오토 스케일러를 포함하는 웹 애플리케이션 워크로드 예측 시스템이 제공된다.According to another aspect of the present invention, a web application workload prediction system using bi-directional long-term short-term memory, comprising: an application metric collector for collecting application metrics including application throughput and average response time requested by a user; a monitoring server for aggregating time-series data including the application metrics and pod information of all currently running nodes retrieved from the master node and storing them in a time-series database; And a first LSTM outputting a first hidden state by inputting an input sequence corresponding to the time series data in a forward direction, a second LSTM outputting a second hidden state by inputting the input sequence in a reverse direction, and the first hidden state; A proactive auto scaler including an adaptation manager service that calculates a future HTTP request workload using the second hidden state and calculates the number of pods required for provisioning or deprovisioning based on the future HTTP request workload A web application workload prediction system is provided.

본 발명의 또 다른 측면에 따르면, 프로세서 및 메모리를 포함하는 양방향 장단기 메모리를 이용한 웹 애플리케이션 워크로드 예측 방법으로서, 제1 LSTM(Long Short Term Memory)에 사용자가 요청하는 애플리케이션 처리량 및 평균 응답 시간을 포함하는 애플리케이션 메트릭 및 현재 실행 중인 모든 노드의 포드(Pod) 정보에 대한 시계열 데이터인 입력 시퀀스를 순방향으로 입력하여 제1 은닉 상태를 출력하는 단계; 제2 LSTM에 상기 입력 시퀀스를 역방향으로 입력하여 제2 은닉 상태를 출력하는 단계; 오토 스케일러를 통해 상기 제1 은닉 상태 및 상기 제2 은닉 상태를 이용하여 미래 HTTP 요청 워크로드를 계산하는 단계; 및 상기 오토 스케일러를 통해 상기 미래 HTTP 요청 워크로드를 기반으로 프로비저닝 또는 디프로비저닝에 필요한 포드 수를 계산하는 단계를 포함하는 웹 애플리케이션 워크로드 예측 방법이 제공된다. According to another aspect of the present invention, a method for predicting a web application workload using a bidirectional short-term memory including a processor and a memory, including application throughput and average response time requested by a user in a first Long Short Term Memory (LSTM) outputting a first hidden state by forwardly inputting an input sequence that is time-series data for application metrics and pod information of all currently running nodes; outputting a second hidden state by reversely inputting the input sequence to a second LSTM; Calculating a future HTTP request workload using the first hidden state and the second hidden state through an auto scaler; and calculating the number of pods required for provisioning or deprovisioning based on the future HTTP request workload through the auto scaler.

본 발명에 따르면, 예측 정확도에서 ARIMA 및 표준 피드 정방향 LSTM으로 통계 방법을 능가하며, ARIMA보다 빠른 예측 속도를 달성하는 장점이 있다. According to the present invention, ARIMA and standard feed forward LSTM outperform statistical methods in prediction accuracy, and have the advantage of achieving faster prediction speed than ARIMA.

도 1은 본 발명의 바람직한 일 실시예에 따른 워크로드 예측 시스템 아키텍쳐를 도시한 도면이다.
도 2는 본 실시예에 따른 오토 스케일러의 아키텍쳐로서 MAPE 루프의 각 단계를 도시한 도면이다.
도 3은 본 실시예에 따른 예측자가 다음 레벨

에서 HTTP 요청 수를 제공한 후 스케일링 전략을 나타낸 알고리즘이다.
도 4는 본 발명의 일 실시예에 따른 양방향 LSTM 네트워크 구조를 도시한 도면이다. 1 is a diagram illustrating the architecture of a workload prediction system according to a preferred embodiment of the present invention.
2 is an architecture of an auto scaler according to this embodiment, and is a diagram showing each step of a MAPE loop.
3 shows that the predictor according to this embodiment is the next level

It is an algorithm that shows the scaling strategy after providing the number of HTTP requests in .
4 is a diagram illustrating a bidirectional LSTM network structure according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세하게 설명하고자 한다.Since the present invention can make various changes and have various embodiments, specific embodiments are illustrated in the drawings and described in detail.

그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all modifications, equivalents, and substitutes included in the spirit and scope of the present invention.

본 실시예서는 미래에 HTTP 요청을 예측하기 위해 양방향 장단기 메모리(Bidirectional Long Short-Term Memory, Bi-LSTM)를 포함하는 반복 신경망을 사용한다.In this embodiment We use a recurrent neural network with bidirectional long short-term memory (Bi-LSTM) to predict future HTTP requests.

일반적으로 LSTM은 시계열 데이터를 순방향으로 입력 받는다. Bi-LSTM은 2개의 LSTM을 입력 데이터에 적용한 LSTM 모델의 확장이다. 제1 LSTM은 상기한 시계열 데이터를 순방향으로 입력 받아 훈련되고, 제2 LSTM은 시계열 데이터를 역방향 입력 받아 훈련된다. In general, LSTMs take time-series data in the forward direction. Bi-LSTM is an extension of the LSTM model that applies two LSTMs to the input data. The first LSTM is trained by receiving the time series data in the forward direction, and the second LSTM is trained by receiving the time series data in the backward direction.

본 실시예에 따른 Bi-LSTM 아키텍처는 LSTM 네트워크 두 개가 서로 쌓여 있다. 첫 번째 방향은 과거에서 미래로, 두 번째 방향은 미래에서 과거로 양방향으로 입력을 실행한다. 순방향 LSTM과 달리 역방향 LSTM을 이용하면 미래의 정보를 보호하는데 도움이 된다.In the Bi-LSTM architecture according to this embodiment, two LSTM networks are stacked on top of each other. The first direction runs the input in both directions, from the past to the future, and the second direction from the future to the past. Unlike forward LSTM, using backward LSTM helps to protect future information.

두 개의 숨겨진 상태를 결합함으로써 과거와 미래의 정보를 보호할 수 있다. LSTM을 두 번 적용하면 머신 러닝의 장기 의존성 개선으로 이어지고 그러므로 모델의 정확도가 향상된다.By combining the two hidden states, we can protect past and future information. Applying the LSTM twice leads to improved long-term dependence of machine learning and thus improves the accuracy of the model.

도 1은 본 발명의 바람직한 일 실시예에 따른 워크로드 예측 시스템 아키텍쳐를 도시한 도면이다. 1 is a diagram illustrating the architecture of a workload prediction system according to a preferred embodiment of the present invention.

본 실시예에 따른 워크로드 예측 시스템은 MAPE(모니터-분석-계획-실행) 루프를 기반으로 한다. The workload prediction system according to the present embodiment is based on a MAPE (monitor-analyze-plan-execute) loop.

도 1을 참조하면, 본 실시예에 따른 시스템은 로드 밸런서(Load balancer, 100), 애플리케이션 메트릭 수집기(Application metrics collector, 102), 모니터링 서버(104), 쿠버네티스 마스터(K8S Master 106) 및 복수의 쿠버네티스 미니언(K8S Minion #1 ~ #n, 108-n)을 포함할 수 있다. Referring to FIG. 1, the system according to this embodiment includes a load balancer (Load balancer, 100), an application metrics collector (102), a monitoring server 104, a Kubernetes master (K8S Master 106), and a plurality of of Kubernetes minions (K8S Minion #1 to #n, 108-n).

로드 밸런서(100)는 컨테이너화된 애플리케이션에 대한 지원을 제공한다. 최종 사용자로부터 입력되는 TCP/HTTP 요청을 수신하고, 이를 복수의 쿠버네티스 미니언(106-n)에 분산시키는 게이트웨이 역할을 수행한다. Load balancer 100 provides support for containerized applications. It serves as a gateway to receive TCP/HTTP requests input from end users and distribute them to a plurality of Kubernetes minions 106-n.

본 실시예에서는 TCP/HTTP 기반 애플리케이션의 로드 밸런싱 및 프록시를 위한 빠르고 안정적인 오픈 소스 기술인 HAProxy를 사용한다. In this embodiment, HAProxy, a fast and reliable open source technology for load balancing and proxying of TCP/HTTP-based applications, is used.

HAProxy는 쿠버네티스 클러스터의 외부에 위치하며, 도 1과 같이 외부 노드 IP 및 노드 포트를 통해 쿠버네티스 클러스터의 모든 미니언 노드에 대한 모든 HTTP 요청의 로드 밸런싱을 조정하도록 HAproy를 구성한다. HAProxy is located outside of the Kubernetes cluster, and we configure HAproy to load balance all HTTP requests to all minion nodes in the Kubernetes cluster via the external node IP and node port as shown in Figure 1.

애플리케이션 메트릭 수집기(102)는 사용자가 요청하는 애플리케이션 처리량 및 평균 응답 시간을 포함하는 애플리케이션 메트릭을 수집하며, 수집된 정보를 모니터링 서버(104)로 전송한다. The application metric collector 102 collects application metrics including application throughput and average response time requested by the user, and transmits the collected information to the monitoring server 104 .

모니터링 서버(104)는 애플리케이션 메트릭 수집기(102)로부터 모니터링된 정보를 수신하여 시계열 데이터베이스(110)에 전송한다. The monitoring server 104 receives monitored information from the application metric collector 102 and transmits it to the time series database 110 .

시계열 데이터는 시간 순서대로 인덱싱된 일련의 데이터 포인트이다. Time series data is a series of data points indexed in chronological order.

상기한 메트릭을 의사결정에 사용할 수 있고, 수집된 값은 API 서비스에 의해 노출될 수 있고 프로액티드 커스텀 오토 스케일러(112)가 액세스할 수 있다. The metrics described above can be used for decision making, and the collected values can be exposed by API services and accessed by the ProActed Custom Auto Scaler 112.

쿠버네티스에서 각 클러스터는 하나의 마스터와 복수의 미니언 노드로 구성된다. In Kubernetes, each cluster consists of one master and multiple minion nodes.

실제로 클러스터는 하나의 마스터 노드에 장애가 발생하면 다른 마스터 노드가 이를 교체하여 시스템을 안정적으로 유지하기 위해 복수의 마스터 노드를 가질 수 있다. In practice, a cluster can have multiple master nodes to keep the system stable by replacing one master node when it fails.

마스터 노드는 etcd, kube-scheduler, kubecontrol-manager 및 kube-apiserver의 네 가지 구성요소로 클러스터를 제어한다. The master node controls the cluster with four components: etcd, kube-scheduler, kubecontrol-manager and kube-apiserver.

etcd는 클러스터에 대한 모든 구성 데이터를 저장한다.etcd stores all configuration data for the cluster.

kube-scheduler는 노드에 할당할 생성된 포드(Pod)와 예약되지 않은 포드를 검색한다. kube-scheduler는 제약조건과 사용 가능한 리소스를 고려하여 노드의 순위를 매기고 포드를 적절한 노드에 할당한다. kube-scheduler searches for created and unscheduled pods to assign to nodes. kube-scheduler ranks the nodes taking into account constraints and available resources and assigns pods to the appropriate nodes.

kube-control-manager는 클러스터가 원하는 상태로 실행되는지를 감시한다. 예를 들어 애플리케이션이 5개의 포드로 실행 중이고 일정 시간이 지나면 1개의 포드가 다운되거나 누락되는 경우, kube-control-manager가 누락된 포드에 대한 새 복제본을 생성한다.kube-control-manager monitors that the cluster is running in the desired state. For example, if your application is running with 5 pods and 1 pod is down or missing after some time, kube-control-manager will create a new clone for the missing pod.

kube-apiserver는 포드, 서비스 및 복제 컨트롤러를 포함한 클러스터 구성 요소의 유효성을 검사하고 관리한다. 클러스터 상태의 각 변경은 kubelet을 통해 미니언 노드를 관리하는 kube-apiserver를 경유해야 한다.kube-apiserver validates and manages cluster components including pods, services, and replication controllers. Each change in cluster state needs to go through the kube-apiserver, which manages the minion nodes via kubelet.

미니언(108)은 포드를 호스팅하고 마스터(106)의 지시에 따라 실행한다.Minions 108 host pods and execute them according to the instructions of the master 106 .

kubelet은 클러스터의 각 노드에서 실행되는 에이전트이며, 마스터(106)의 kube-apiserver에 의해 지시되고 컨테이너를 잘 유지되도록 한다. The kubelet is an agent that runs on each node of the cluster, directed by the master's 106's kube-apiserver and keeps the containers well maintained.

kube-proxy는 노드의 네트워크 규칙을 제어한다. 이러한 규칙은 클러스터의 내부 또는 외부 네트워크에서 포드로의 네트워크 통신을 허용한다.kube-proxy controls the network rules of nodes. These rules allow network communication from the cluster's internal or external network to the Pod.

쿠버네티스는 컨테이너화된 애플리케이션을 위한 가장 일반적인 오케스트레이션 플랫폼 중 하나이다. 본 실시예에서는 경량, 이식성 등을 위해 쿠버네티스 용으로 가장 널리 사용되는 컨테이너 엔진인 Docker를 선택한다.Kubernetes is one of the most common orchestration platforms for containerized applications. In this embodiment, Docker, the most widely used container engine for Kubernetes, is selected for its light weight and portability.

NodeIP는 미니언 노드의 외부 IP 주소이고, 이 IP 주소는 인터넷에서 액세스할 수 있다.NodeIP is the external IP address of the minion node, and this IP address is accessible from the internet.

NodePort는 클러스터의 모든 미니언 노드에 있는 개방형 포트이며 각 미니언 노드에 애플리케이션을 노출시킨다. NodePort의 값은 30,000에서 32,767 사이의 미리 정의된 범위를 가질 수 있다. A NodePort is an open port on all minion nodes in the cluster and exposes applications to each minion node. The value of NodePort can have a predefined range between 30,000 and 32,767.

전술한 바와 같이, 본 실시예에 따른 워크로드 예측 시스템은 MAPE 루프를 기반으로 하며 오토 스케일러(112)를 핵심으로 하는 Bi-LSTM 예측 서비스(분석 단계)와 적응적 매니저 서비스(계획 단계)를 포함한다. As described above, the workload prediction system according to the present embodiment is based on the MAPE loop and includes a Bi-LSTM prediction service (analysis step) and an adaptive manager service (planning step) centered on the auto scaler 112. do.

도 2는 본 실시예에 따른 오토 스케일러의 아키텍쳐로서 MAPE 루프의 각 단계를 도시한 도면이다. 2 is an architecture of an auto scaler according to this embodiment, and is a diagram showing each step of a MAPE loop.

이하에서는 본 실시예에 따른 MAPE 루프의 각 단계를 상세하게 설명한다. Hereinafter, each step of the MAPE loop according to this embodiment will be described in detail.

(1) 모니터 단계(Monitor phase)(1) Monitor phase

본 실시예에 따른 모니터링 서버(104)는 분석 및 계획 단계를 위한 다양한 유형의 데이터를 지속적으로 수신한다. The monitoring server 104 according to this embodiment continuously receives various types of data for analysis and planning steps.

모니터링 서버(104)는 두 가지 소스로부터 데이터를 수집할 수 있다. Monitoring server 104 can collect data from two sources.

모니터링 서버(104)는 애플리케이션 메트릭 수집기(102)를 통한 초당 HTTP 요청 수와 같은 로드 밸런서(100)의 네트워킹 데이터와 마스터 노드(kube-apiserver)에서 검색된 포드 복제본의 수와 같은 현재 실행 중인 모든 노드의 포드 정보를 수신할 수 있다. The monitoring server 104 monitors the networking data of the load balancer 100, such as the number of HTTP requests per second through the application metrics collector 102, and the number of pod replicas discovered on the master node (kube-apiserver) of all currently running nodes. Pod information can be received.

상기한 정보를 수신한 후, 모니터링 서버(104)는 이를 집계하여 시계열 데이터베이스(110)로 전송한다. After receiving the above information, the monitoring server 104 aggregates them and transmits them to the time series database 110 .

본 실시예에 따르면, 시계열 데이터베이스(110)를 추가하여 수집된 모든 데이터를 과거 기록으로 유지하고 이를 통해 워크로드 예측 모델을 훈련하여 예측 정확도를 향상시킬 수 있다. According to this embodiment, it is possible to improve prediction accuracy by adding the time series database 110 to maintain all collected data as past records and training a workload prediction model through this.

(2) 분석 단계(Analysis phase) - Bi-LSTM 예측 서비스(2) Analysis phase - Bi-LSTM prediction service

분석 단계에서 Bi-LSTM 예측 서비스 기간(period)은 Prometheus의 restful API를 통해 윈도우 크기 w로 가장 최근에 수집된 메트릭 데이터를 획득한다. In the analysis step, the Bi-LSTM prediction service period obtains the most recently collected metric data with window size w through Prometheus' restful API.

다음으로, 훈련된 모델은 최신 데이터

를 이용하여

순서의 다음 데이터를 예측한다. 본 실시예에 따르면, Bi-LSTM 신경망 모델은 미래 HTTP 요청의 워크로드를 예측하는데 사용된다. Next, the trained model uses the latest data

using

Predict the next data in the sequence. According to this embodiment, the Bi-LSTM neural network model is used to predict the workload of future HTTP requests.

일반적으로 LSTM은 감정 분석, 음성 인식, 시계열 분석 등 다양한 분야에서 사용되는 심층 RNN의 한 종류이다. In general, LSTM is a type of deep RNN used in various fields such as sentiment analysis, speech recognition, and time series analysis.

피드포워드 신경망과 달리 RNN은 현재 및 이전 시간 단계 출력 데이터를 입력으로 사용하여 네트워크를 구축하며, 루프가 있는 네트워크가 있어 이력 데이터를 유지 관리할 수 있다. 그러나 기존 RNN은 그래디언트 소실 문제로 인해 장기적인 종속성이 발생한다. Unlike feed-forward neural networks, RNNs build networks using current and previous time-step output data as input, and have networks with loops that allow them to maintain historical data. However, conventional RNNs suffer from long-term dependencies due to the vanishing gradient problem.

RNN에서 그래디언트는 시간이 지남에 따라 기하급수적으로 확장된다. 이를 해결하기 위한 RNN의 특수한 형태인 LSTM이 제안되었으나, LSTM은 단방향으로만 정보를 처리할 수 있어 지속적인 데이터 변경을 무시하기 때문에 한계가 있다. In RNNs, gradients scale exponentially over time. To solve this problem, LSTM, a special type of RNN, has been proposed, but LSTM has limitations because it can process information only in one direction and ignores continuous data changes.

Bi-LSTM은 두 개의 LSTM이 입력 데이터에 적용된 LSTM 모델의 확장이다. Bi-LSTM is an extension of the LSTM model where two LSTMs are applied to the input data.

제1 LSTM은 시계열 데이터를 순방향으로 입력 받아 학습되며, 제2 LSTM은 시계열 데이터를 역방향으로 입력 받아 학습된다. The first LSTM is learned by receiving time-series data in a forward direction, and the second LSTM is learned by receiving time-series data in a reverse direction.

즉, 제1 LSTM은 시계열 데이터를 과거부터 현재와 가까운 순으로 순방향으로 입력 받으며, 제2 LSTM은 시계열 데이터를 현재부터 과거순으로 역방향으로 입력 받아 학습이 진행된다. That is, the first LSTM receives time-series data in a forward direction from the past to the present, and the second LSTM receives time-series data in a reverse order from the present to the past, and learning proceeds.

Bi-LSTM 아키텍처는 서로의 위에 쌓인 두 개의 LSTM 네트워크이며, 양방향으로 입력을 실행한다. 첫 번째 방향은 과거에서 미래로, 두 번째 방향은 미래에서 과거 방향이다. 순방향 LSTM과의 차이점은 역방향으로 실행하면 미래의 정보를 보존하는 데 도움이 된다는 것이다. Bi-LSTM은 두 개의 숨겨진 상태를 결합하여 과거와 미래의 정보를 보존할 수 있다. LSTM을 두 번 적용하면 학습 장기 종속성이 향상되어 결과적으로 모델의 정확도가 향상된다.The Bi-LSTM architecture is two LSTM networks stacked on top of each other, running inputs in both directions. The first direction is from the past to the future, and the second direction is from the future to the past. The difference from forward LSTM is that running backward helps preserve information for the future. Bi-LSTM can preserve past and future information by combining two hidden states. Applying the LSTM twice improves the learning long-term dependencies and consequently improves the accuracy of the model.

Bi-LSTM 방정식은 다음의 수학식 1 내지 3으로 표현된다. The Bi-LSTM equation is represented by Equations 1 to 3 below.

여기서,

,

및

는 서로 다른 가중치 행렬이다. here,

,

and

are different weight matrices.

f는 활성화 함수(예를 들어, Tanh 시그모이드 또는 로지스틱 시그모이드)를 나타낸다. 그리고,

,

는 각각 순방향, 역방향 및 출력 레이어 바이어스이다. f represents the activation function (eg Tanh sigmoid or logistic sigmoid). and,

,

are the forward, reverse and output layer biases, respectively.

입력 시퀀스

가 주어지면, Bi-LSTM은 t=1에서 T까지 순방향 레이어를 반복하여 순방향 은닉 상태

, t=T에서 1까지 역방향 레이어를 반복하여 역방향 은닉 상태

, 출력 레이어를 업데이트 하여 출력 시퀀스

를 계산한다. input sequence

Given , the Bi-LSTM iterates through the forward layers from t = 1 to T, resulting in a forward hidden state

, the reverse hidden state by iterating the reverse layer from t=T to 1

, update the output layer so that the output sequence

Calculate

t번째 시간 단계에서, 순방향 및 역방향의 두 병렬 은닉 레이어가

를 입력으로 처리한다. At the tth time step, the two parallel hidden layers in the forward and backward directions are

is treated as an input.

은닉 상태

및

는 각각 시간 단계 t에서 순방향 및 역방향의 두 개의 은닉 레이어의 출력 결과이다. hidden state

and

is the output result of the two hidden layers in the forward and backward directions at time step t, respectively.

출력 값

는 수학식 3에 설명된 것처럼 은닉 상태

및

의 합에 따라 달라진다. output value

is the hidden state as described in Equation 3

and

depends on the sum of

(3) 계획 단계(Planning phase) - Adaption Manager Service(3) Planning phase - Adaption Manager Service

계획 단계에서 적응적 매니저 서비스는 이전 단계에서 예측된 미래 HTTP 요청 워크로드를 기반으로 프로비저닝 또는 디프로비저닝에 필요한 포드 수를 계산한다. 향후 워크로드를 만족하기 위해 포드를 스케일링하도록 설계된다. In the planning phase, the adaptive manager service calculates the number of pods needed to provision or deprovision based on the future HTTP request workload predicted in the previous phase. It is designed to scale pods to meet future workloads.

도 3은 본 실시예에 따른 예측자가 다음 레벨

에서 HTTP 요청 수를 제공한 후 스케일링 전략을 나타낸 알고리즘이다. 3 shows that the predictor according to this embodiment is the next level

It is an algorithm that shows the scaling strategy after providing the number of HTTP requests in .

진동 문제로 인해 오토 스케일러(112)는 짧은 시간 내에 반대 동작을 자주 수행하기 때문에 리소스와 비용이 낭비된다. 이를 해결하기 위해 각 스케일링 결정 후 냉각 시간(Cooling down time, CDT)을 60초로 설정하며, 이는 컨테이너 scenery에 비해 세분화된다. Due to the vibration problem, the auto scaler 112 frequently performs the opposite operation within a short time, so resources and costs are wasted. To solve this, the cooling down time (CDT) is set to 60 seconds after each scaling decision, which is subdivided compared to the container scenery.

를 포드가 1분 동안 처리할 수 있는 최대 워크로드로 표시하고,

를 유지해야 하는 최소 포드 수로 표시하며, 시스템은 이 값보다 낮은 포드 수로 축소할 수 없다.

represents the maximum workload the pod can handle in one minute,

is the minimum number of pods that must be maintained, and the system cannot scale down to a lower number of pods than this value.

시스템이 작동하는 동안 매분 단위로 다음 단계에서 필요한 포드 수(

)를 계산하고, 현재 포드 수(

)와 비교한다. The number of pods required for the next step in every minute while the system is running (

), and the current number of pods (

) compared to

필요한 포드 수가 현재 포드 수를 초과하는 경우, 스케일링 아웃(scale out) 명령이 트리거되고, 스케일러는 가까운 미래에 리소스 수요를 충족하기 위해 포드 수를 늘린다. If the required number of pods exceeds the current number of pods, a scale out command is triggered, and the scaler will increase the number of pods to meet resource demand in the near future.

그러나 필요한 포드의 수가 현재 포드 수보다 작으면 스케일러는 리소스 제거 전략(resource removal strategy, RSS)에 따라 잉여 포드인

를 제거하여 시스템을 안정화하면서 워크로드는 다음 간격에 발생시킨다. However, if the required number of pods is less than the current number of pods, the scaler uses a resource removal strategy (RSS) to remove excess pods.

to stabilize the system while the workload occurs at the next interval.

본 실시예에 따르면, max 함수를 통해

값을 업데이트 하기 위해

과

값 중 더 높은 값을 선택한다. According to this embodiment, through the max function

to update the value

class

Choose the higher of the values.

시스템이

보다 낮은 수로 포드 수를 축소할 수 없기 때문에 이러한 작업을 수행해야 한다. the system

We have to do this because we can't scale the number of pods down to a lower number.

리소스 제거 전략을 따른 잉여 포드의 수는 수학식 4와 같이 계산된다. The number of surplus pods following the resource removal strategy is calculated as shown in Equation 4.

그런 다음 현재 포드의 값

에서 이전 단계에서 얻은 잉여 포드의 값

를 사용하여

의 값을 업데이트한다. Then the current pod's value

the value of the surplus pod obtained in the previous step from

use with

update the value of

이렇게 하면 전체가 아닌 잉여 리소스의 일부만 제거하면 된다. This way, you only need to remove some of the surplus resources, not all of them.

마지막으로 최종 업데이트된

의 값으로 스케일링 인(scale in) 명령을 실행한다. last last updated

Execute the scale in command with the value of

리소스 제거 전략을 사용하면 시스템이 낮은 워크로드로 안정화될 뿐만 아니라 포드 수를 생성하여 업데이트하는데 더 적은 시간을 소비하므로 워크로드 버스트에 더 빠르게 적응할 수 있다. The resource elimination strategy not only stabilizes the system with a low workload, but also allows it to adapt faster to bursts of workload as it spends less time creating and updating the number of pods.

(4) 실행 단계(Execution phase)(4) Execution phase

MAPE 루프의 이 마지막 단계에서 Kubernetes 엔진(kube-apiserver)은 계획 단계의 적응적 매니저 서비스로부터 명령을 받아 포드 복제본 수를 변경한다. In this final phase of the MAPE loop, the Kubernetes engine (kube-apiserver) receives commands from the adaptive manager service in the planning phase to change the pod replica count.

도 4는 본 발명의 일 실시예에 따른 양방향 LSTM 네트워크 구조를 도시한 도면이다. 4 is a diagram illustrating a bidirectional LSTM network structure according to an embodiment of the present invention.

도 4를 참조하면, 본 실시예에 따른 Bi-LSTM 모델은 순방향과 역방향으로 이동하는 두 개의 레이어를 포함한다. Referring to FIG. 4, the Bi-LSTM model according to the present embodiment includes two layers moving in forward and backward directions.

각 은닉 레이어에는 10개의 시간 단계에 대한 10개의 신경 셀이 있는 입력층이 포함되며, 각 신경 셀에 대해 30개의 은닉 유닛이 있다. Each hidden layer contains an input layer with 10 neural cells for 10 time steps, with 30 hidden units for each neural cell.

순방향 및 역방향 모두에서 은닉 레이어의 최종 출력이 연결되어 밀집 레이어(Dense layer)에 대한 입력으로 사용된다. In both the forward and reverse directions, the final output of the hidden layer is concatenated and used as the input to the dense layer.

밀집 레이어는 완전 연결 계층이다. A dense layer is a fully connected layer.

즉 밀집 레이어의 각 뉴런은 이전 계층의 모든 뉴런으로부터 입력을 받으며, 밀집 레이어의 출력은 지정된 뉴런 수의 영향을 받는다. That is, each neuron in a dense layer receives input from all neurons in the previous layer, and the output of a dense layer is affected by the specified number of neurons.

본 실시예에서는 밀집 레이어를 이용하여 예측 결과를 출력한다. In this embodiment, a prediction result is output using a dense layer.

본 실시예에 따른 모델은 비선형 회귀 모델이므로 은닉 레이어에서는 ReLU 활성화 함수를 사용한다.Since the model according to this embodiment is a nonlinear regression model, the hidden layer uses the ReLU activation function.

상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야에서 통상의 지식을 가진 자라면 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although the above has been described with reference to preferred embodiments of the present invention, those skilled in the art can make various modifications to the present invention within the scope not departing from the spirit and scope of the present invention described in the claims below. It will be appreciated that modifications and changes may be made.

Claims

As a web application workload prediction device using bidirectional long and short term memory,
processor; and
Including a memory coupled to the processor,
In the first LSTM (Long Short Term Memory), application metrics including application throughput and average response time requested by the user and input sequences, which are time-series data for pod information of all currently running nodes, are entered in the forward direction to control 1 output the hidden state,
Outputting a second hidden state by inputting the input sequence in a reverse direction to a second LSTM;
Calculate a future HTTP request workload using the first hidden state and the second hidden state through an auto scaler;
To calculate the number of pods required for provisioning or deprovisioning based on the future HTTP request workload through the auto scaler,
A web application workload predictor for storing program instructions executed by the processor.

According to claim 1,
A web application workload prediction device in which a cooling down time (CDT) is set in the auto scaler after the scaling decision for calculating the required number of pods.

According to claim 1,
The auto scaler removes surplus pods according to a resource removal strategy (RSS) when the required number of pods is less than the current number of pods.

According to claim 1,
The auto scaler includes an adaptation manager service that calculates the future HTTP request workload by taking an output sequence obtained by combining the first hidden state and the second hidden state as an input.

According to claim 4,
A web application workload prediction device in which the Kubernetes engine of the Kubernetes master node receives a command from the adaptation manager service and changes the number of pods.

A web application workload prediction system using bidirectional long-term short-term memory,
an application metric collector that collects application metrics including application throughput and average response time requested by the user;
a monitoring server for aggregating time-series data including the application metrics and pod information of all currently running nodes retrieved from the master node and storing them in a time-series database; and
A first LSTM that outputs a first hidden state by inputting an input sequence corresponding to the time series data in a forward direction, a second LSTM that outputs a second hidden state by inputting the input sequence in a reverse direction, and the first hidden state and the A web including an adaptation manager service that calculates a future HTTP request workload using a second hidden state and calculates the number of pods required for provisioning or deprovisioning based on the future HTTP request workload Application workload forecasting system.

A web application workload prediction method using bi-directional long-term short-term memory including a processor and memory,
In the first LSTM (Long Short Term Memory), application metrics including application throughput and average response time requested by the user and input sequences, which are time-series data for pod information of all currently running nodes, are entered in the forward direction to control outputting 1 hidden state;
outputting a second hidden state by reversely inputting the input sequence to a second LSTM;
Calculating a future HTTP request workload using the first hidden state and the second hidden state through an auto scaler; and
and calculating the number of pods required for provisioning or deprovisioning based on the future HTTP request workload through the auto scaler.