KR20220071895A

KR20220071895A - Method for auto scaling, apparatus and system thereof

Info

Publication number: KR20220071895A
Application number: KR1020210142448A
Authority: KR
Inventors: 홍원기; 유재형; 이도영
Original assignee: 포항공과대학교 산학협력단
Priority date: 2020-11-24
Filing date: 2021-10-25
Publication date: 2022-05-31

Abstract

A method for auto-scaling is to periodically perform scale-in/out of instances of virtual network functions (VNF) forming Service Function Chaining (SFC) through a deep Q-networks (DQN) device includes: outputting, as an action, whether to perform the scale-in/out or to maintain current virtual network function (VNF) in a specific physical server after defining, in the form of a reinforcement learning status, a status of a tier constituting service function changing (SFC) and receiving the status as an input value through the deep Q-network (DQN device); and applying scaling to a relevant tier by selecting a service function chaining (SFC) tier necessary for scaling when performing Scale-in/out.

Description

Auto-scaling method, apparatus and system

본 발명은 오토 스케일링 방법, 장치 및 시스템에 관한 것으로, 상세하게는 다계층 구조를 갖는 서비스 펑션 체이닝(Service Function Chaining)의 스케일 인아웃(Scale-in/out)을 위한 심층 Q 네트워크(Deep Q-networks, DQN)에 기반한 오토 스케일링 방법, 장치 및 시스템에 관한 것이다.The present invention relates to an auto-scaling method, apparatus and system, and in particular, a deep Q-networks for scale-in/out of service function chaining having a multi-layered structure. , DQN) based auto-scaling method, apparatus and system.

강화학습은 기계학습 방법 중 하나로, 에이전트(Agent)가 시행착오(Trial-and-error)를 거치며 주어진 환경(Environment)의 현재 상태(State)에서 어떤 행동(Action)을 수행할 지 결정하는 최적의 정책(Policy)을 찾는 학습 방법이다. 이 때, 최적의 정책은 에이전트가 각 상태에서 수행하는 행동들로 인해 최대의 누적 보상(Reward)을 받을 수 있도록 한다. 강화학습은 입력 값과 그에 대한 정답 데이터가 주어지지 않더라도 입력 값과 보상 값만으로 학습을 수행할 수 있기 때문에, 동적으로 변화하는 네트워크 환경에서 효과적으로 관리 정책을 찾기 위한 방법으로 적합하다. 특히, 강화학습은 수 많은 가상 네트워크와 자원들로 구성되는 복잡한 네트워크 기능 가상화(NFV, Network Function Virtuzliation) 환경에서 VNF(Virtual Network Function)의 라이프사이클(Life-cycle) 관리 기술에 활용될 수 있다. Reinforcement learning is one of the machine learning methods. The agent goes through trial-and-error to determine which action to perform in the current state of the given environment. It is a learning method to find a policy. At this time, the optimal policy allows the agent to receive the maximum cumulative reward due to the actions performed in each state. Reinforcement learning is suitable as a method for effectively finding a management policy in a dynamically changing network environment because learning can be performed only with the input value and the reward value even if the input value and the correct answer data are not given. In particular, reinforcement learning can be applied to the life-cycle management technology of a Virtual Network Function (VNF) in a complex network function virtualization (NFV) environment composed of numerous virtual networks and resources.

VNF 라이프사이클 관리 기술 중, 오토 스케일링(Auto-scaling)은 트래픽 변화에 대응하여 VNF 인스턴스(VNF이 동작하는 가상 머신 또는 컨테이너)의 자원을 동적으로 할당하는 기술이다. 오토 스케일링의 종류로는 인스턴스의 개수를 증감하는 스케일 인아웃(Scale-in/out)과 인스턴스에 할당한 컴퓨팅 자원(CPU, 메모리 등)을 조절하는 Scale-up/down이 존재하며, 오토 스케일링에서는 서비스 요구사항을 충족시킬 수 있도록 알맞은 자원을 동적으로 VNF 인스턴스에 할당하는 것이 중요하다. Among VNF lifecycle management technologies, auto-scaling is a technology that dynamically allocates resources of a VNF instance (a virtual machine or container in which the VNF operates) in response to traffic changes. There are two types of auto scaling: scale-in/out, which increases or decreases the number of instances, and scale-up/down, which adjusts compute resources (CPU, memory, etc.) allocated to instances. It is important to dynamically allocate the right resources to the VNF instance to meet the requirements.

일반적으로, NFV 환경에서는 서비스 펑션 체이닝(SFC, Service Function Chaining)을 통해 일련의 네트워크 기능들을 트래픽에 적용한다. 따라서 NFV 환경에서 VNF 인스턴스를 위한 오토 스케일링은 단일 VNF 종류만 고려하는 것이 아니라 SFC를 구성하는 VNF들의 종류와 개수(SFC를 구성하는 각 계층의 상황)를 함께 고려할 필요가 있다. 따라서 NFV 환경에서의 오토 스케일링은 SFC의 오토 스케일링 문제로 정의할 수 있다. SFC의 오토 스케일링에 강화학습을 활용하기 위해서는 오토 스케일링을 적용할 SFC 계층을 선택하고, 상황에 맞는 스케일링(VNF 인스턴스의 추가 또는 제거) 을 수행하도록 상태와 행동, 보상을 정의해야 한다.In general, in the NFV environment, a series of network functions are applied to traffic through service function chaining (SFC). Therefore, auto-scaling for a VNF instance in an NFV environment needs to consider the type and number of VNFs constituting the SFC (the situation of each layer constituting the SFC) as well as considering only a single VNF type. Therefore, auto-scaling in the NFV environment can be defined as the auto-scaling problem of SFC. In order to utilize reinforcement learning for SFC auto-scaling, it is necessary to select an SFC layer to which auto-scaling is applied, and to define the state, behavior, and reward to perform scaling (addition or removal of VNF instances) according to the situation.

네트워크 기능 가상화(NFV, Network Function Virtualization) 기술은 네트워크의 구성요소인 하드웨어와 소프트웨어를 분리하고, 범용 클라우드 컴퓨팅 환경에서 네트워크 기능을 가상화하여 제공하는 기술이다. 즉, 물리적인 네트워크 장치의 기능을 소프트웨어로 구현하고, VNF가 동작하는 가상 머신 또는 컨테이너와 가상 스토리지 및 가상 네트워크를 이용하여 실행하는 방식이다. NFV는 네트워크 장비 투자비와 운용비용을 절감하고, 서비스 대응 및 트래픽 변화에 신속하게 대처할 수 있는 장점이 있다. 이런 장점들로 인해 NFV는 소프트웨어 정의 네트워킹(SDN, Software-Defined Networking) 기술과 함께 5G 네트워크의 핵심 기술로 활용되고 있다. Network function virtualization (NFV) technology is a technology that separates hardware and software, which are components of a network, and virtualizes and provides network functions in a general-purpose cloud computing environment. That is, it is a method in which the function of a physical network device is implemented as software, and is executed using a virtual machine or container in which the VNF operates, virtual storage, and a virtual network. NFV has the advantage of reducing network equipment investment and operating costs, and quickly responding to service response and traffic changes. Due to these advantages, NFV is being used as a core technology for 5G networks along with Software-Defined Networking (SDN) technology.

기계학습은 사람의 도움 없이 컴퓨터 소프트웨어가 주어진 환경을 스스로 학습하여 문제를 해결하는 방법을 말하며, 크게 지도학습(Supervised Learning), 비지도학습(Unsupervised Learning), 강화학습(Reinforcement Learning)으로 구분한다. 그 중, 강화학습은 입력 값은 주어지지만 정답에 해당하는 값은 없고, 대신 보상 값(Reward)만 주어지는 경우에 사용할 수 있는 학습 방법이다. 따라서, 강화학습은 환경에 대한 사전지식이 없어도 학습을 수행할 수 있는 장점이 있으며 순차적으로 현재 상태에서 특정 행동을 선택하는 마르코프 의사결정 과정(MDP, Markov Decision Process) 문제를 해결하는 데 활용할 수 있다.Machine learning refers to a method in which computer software learns a given environment by itself and solves problems without human help. It is largely divided into supervised learning, unsupervised learning, and reinforcement learning. Among them, reinforcement learning is a learning method that can be used when an input value is given, but there is no value corresponding to the correct answer, and only a reward value is given instead. Therefore, reinforcement learning has the advantage of being able to perform learning without prior knowledge of the environment, and it can be used to solve the Markov Decision Process (MDP) problem, which sequentially selects a specific action from the current state. .

기존의 하드웨어 기반 네트워크 장치 및 미들박스 운용환경과 달리, NFV 환경에서는 가상화된 서버와 가상 네트워크 및 스토리지를 기반으로 운용이 이루어지므로 라이프사이클 관리가 매우 복잡해진다. 즉, 트래픽이나 장애 상태에 따라 가상 서버가 생성되거나 위치를 이동하고 이에 따라 가상 네트워크의 구성도 수시로 변경되는 등 매우 복잡한 운용관리 기능을 필요로 한다. 특히, 통신사업자나 대규모 데이터센터에 모든 미들박스(Middle-box)를 NFV로 대체하는 시점에서는 수만 개 이상의 가상 서버가 수시로 위치를 변경하거나 트래픽 변화에 따라 가상 서버 자원의 수와 가상 네트워크 대역폭을 증감하는 등 실시간으로 동적인 변화가 이루어지므로 사람의 판단에 의한 운용관리가 한계에 이를 것으로 예상된다.Unlike the existing hardware-based network device and middle box operating environment, lifecycle management becomes very complicated in the NFV environment because operation is performed based on a virtualized server, virtual network, and storage. In other words, a very complex operation management function is required, such as a virtual server is created or moved depending on traffic or failure conditions, and the configuration of a virtual network is also frequently changed accordingly. In particular, at the time when all middle boxes are replaced with NFV in telecommunication service providers or large data centers, tens of thousands of virtual servers frequently change locations or increase or decrease the number of virtual server resources and virtual network bandwidth according to traffic changes. As dynamic changes are made in real-time, such as,

복잡한 네트워크 환경의 운용 관리 문제를 해결하기 위해 기계학습 기술을 도입하여 네트워크 운용을 자동화하는 방법이 존재한다. 이를 실현하기 위해서는 기계학습을 통한 네트워크 상태 학습이 전제되어야 하며, 학습을 위한 많은 데이터가 요구된다. 네트워크에서는 네트워크를 구성하는 장치들의 자원 정보, 트래픽 정보 등 수집할 수 있는 대용량의 데이터가 존재하지만, 기계학습을 적용하는데 한계가 있다. 그 예로 기계학습 기법에 적용하기에 적합하게 표준화되거나 라벨링(Labeling) 된 데이터가 부족하며, 데이터를 수집하는 방법들은 기계학습으로 처리하기 어려운 형태로 데이터를 제공한다. 특히, 현재의 하드웨어 기반 통신장비에서 사용되는 프로토콜 및 입출력 데이터는 그 원리 및 데이터 구조가 모두 달라 기계학습 적용에 부적합한 상태이다. 따라서 기계학습 적용이 가능한 형태의 데이터 수집 및 전처리 기능이 요구된다. 뿐만 아니라, NFV 환경에서는 물리 자원뿐만 아니라 가상 자원, 네트워크 상태 정보, 트래픽 정보 등이 수집되어야 하는 등 상태 모니터링 및 분석에 대한 연구가 필요하다.In order to solve the operation management problem of a complex network environment, there is a method of automating network operation by introducing machine learning technology. To realize this, network state learning through machine learning must be premised, and a lot of data is required for learning. In the network, there is a large amount of data that can be collected, such as resource information and traffic information of devices constituting the network, but there is a limit to applying machine learning. For example, standardized or labeled data suitable for application to machine learning techniques is insufficient, and data collection methods provide data in a form that is difficult to process with machine learning. In particular, protocols and input/output data used in current hardware-based communication equipment are unsuitable for machine learning applications because their principles and data structures are all different. Therefore, data collection and pre-processing functions in a form that can be applied to machine learning are required. In addition, in the NFV environment, research on status monitoring and analysis is required, such as physical resources as well as virtual resources, network status information, traffic information, etc. must be collected.

기존의 기계학습을 활용한 대부분의 네트워크 관리 연구에서는 학습을 위한 대량의 데이터가 존재한다는 전제 조건 하에, 지도학습과 비지도학습을 이용하여 트래픽 분류(Traffic classification), 비정상 징후 탐지(Anomaly detection), 침입 탐지(Intrusion detection) 등을 주로 수행하였다. 하지만, 많은 학습 데이터를 필요로 하는 지도학습이나 비지도학습은 동적으로 변화하는 네트워크 환경에 빠르게 대응하기 어렵다는 한계가 있다. 이에 반해, 강화학습은 상태, 행동, 보상 값의 정의를 통해 네트워크 환경에서 수집되는 데이터를 즉각적으로 활용하여 최적의 관리 정책을 결정할 수 있다. 하지만, 강화학습을 네트워크 관리 자동화에 활용하려는 대부분의 연구는 미니넷(Mininet)과 같은 제한된 시뮬레이션 환경에서 SDN 어플리케이션을 개발하거나 프레임워크 구조를 제안하는 등, 강화학습 모델에 대한 논의보다는 네트워크 관리 기능 구현에 집중되어 왔다. 따라서, 강화학습을 VNF 라이프사이클 관리 기술에 효과적으로 적용하기 위해서는 상태, 행동, 보상 값을 어떻게 정의해야 할 지에 대한 연구가 필요하다. In most of the existing network management studies using machine learning, supervised and unsupervised learning are used to classify traffic, anomaly detection, Intrusion detection was mainly performed. However, supervised or unsupervised learning that requires a lot of learning data has a limitation in that it is difficult to quickly respond to a dynamically changing network environment. In contrast, reinforcement learning can determine the optimal management policy by immediately utilizing the data collected from the network environment through the definition of state, behavior, and reward values. However, most studies that try to utilize reinforcement learning for network management automation implement network management functions rather than discuss reinforcement learning models, such as developing SDN applications in a limited simulation environment such as Mininet or proposing framework structures. has been focused on Therefore, in order to effectively apply reinforcement learning to VNF lifecycle management technology, it is necessary to study how to define state, behavior, and reward values.

VNF 라이프사이클 관리 기능은 특정 사건이나 서비스 요청 발생 시 이에 대한 대응행위를 표준 및 자동화 된 절차에 따라 수행해야 한다. 관리자의 편의를 위해 스케일링을 포함한 일부 VNF 라이프사이클 관리 기능을 제공하는 오픈소스 소프트웨어(ex. OpenStack)들이 이미 존재하지만, SFC의 스케일링을 위해서는 관리자가 수동으로 VNF 인스턴스들의 개수를 조절하고 SFC를 재설정해야 하는 불편함이 존재한다. 이는 동적으로 변화하는 네트워크 상황에 유연하게 대응하여 스케일링을 적용하는 것을 어렵게 하며, 비효율적인 네트워크 관리의 원인이 된다.The VNF lifecycle management function should perform response actions according to standard and automated procedures when a specific event or service request occurs. For the convenience of administrators, open source software (ex. OpenStack) that provides some VNF lifecycle management functions including scaling already exists, but for SFC scaling, the administrator must manually adjust the number of VNF instances and reset the SFC. discomfort exists. This makes it difficult to flexibly respond to dynamically changing network conditions and apply scaling, which causes inefficient network management.

상기와 같은 문제점을 해결하기 위한 본 발명의 목적은 강화학습 기반으로 SFC의 오토 스케일링을 수행하는 문제를 정의하여, 이를 위한 보상 모델(Reward model)을 제안하고 스케일링을 적용할 SFC계층을 선택하는 방법을 구현하는 것이다. 이 때, 보상 모델에 이용되는 요소들은 SFC를 통과하는 트래픽의 평균 응답 시간(Response time), SFC를 구성하는 VNF 인스턴스들의 분포도, 전체 가용 서버 대비 VNF 인스턴스를 배치하기 위해 사용된 물리 서버의 개수 비율이다. 또한, 스케일링을 적용할 SFC 계층을 선택하는 것(어떤 종류의 VNF 인스턴스를 스케일링 할지)은 각 계층 내 VNF 인스턴스들의 평균 자원 사용량(CPU, 메모리), 해당 계층의 VNF 인스턴스들을 배치하는데 활용된 물리 서버의 개수를 고려한다. 본 발명은 강화학습 알고리즘 중 하나인 심층 Q 네트워크(Deep Q-networks, DQN)을 활용하며, SFC를 구성하는 각 계층의 상태 정보를 바탕으로 SFC에 어떤 스케일 인아웃(Scale-in/out)을 수행할지 결정한다. 이 때, 어떤 물리 서버에서 스케일링을 수행할지도 결정한다.An object of the present invention to solve the above problems is to define a problem of performing auto-scaling of SFC based on reinforcement learning, propose a reward model for this, and a method of selecting an SFC layer to which scaling is applied is to implement At this time, the factors used in the compensation model are the average response time of traffic passing through the SFC, the distribution of VNF instances constituting the SFC, and the ratio of the number of physical servers used to deploy VNF instances to all available servers. to be. In addition, selecting the SFC layer to which scaling is applied (what type of VNF instance to scale) depends on the average resource usage (CPU, memory) of VNF instances in each layer, and the physical server used to place the VNF instances of the layer. consider the number of The present invention utilizes Deep Q-networks (DQN), one of the reinforcement learning algorithms, and performs a certain scale-in/out in the SFC based on the state information of each layer constituting the SFC. decide whether At this time, it is also decided on which physical server to perform scaling.

상기 목적을 달성하기 위한 본 발명의 일 실시예에 따른 오토 스케일링 방법, 장치 및 시스템은, 강화학습 알고리즘 중 하나인 심층 Q 네트워크(Deep Q-networks, DQN)을 통해 SFC를 구성하는 VNF 인스턴스들의 스케일 인아웃(Scale-in/out)을 주기적으로 수행하는 오토 스케일링을 목표로 한다. 제안하는 방법은 DQN을 통해 SFC를 구성하는 계층(Tier)들의 상황(Status)을 강화학습의 상태(State)로 정의하여 입력 값으로 받아들인 후, 어떤 물리 서버에서 스케일 인아웃(Scale-in/out)을 수행할지 또는 현재 VNF 인스턴스들을 유지(Maintain)할지를 행동으로 출력한다. 또한, 스케일 인아웃(Scale-in/out)을 수행할 때, 스케일링이 필요한 SFC의 계층을 선택하여 해당 계층에 스케일링을 적용한다. 에이전트에서는 DQN의 안정적인 학습을 위해 Q-network및 Target Q-network, Replay Memory를 활용한다.Auto-scaling method, apparatus and system according to an embodiment of the present invention for achieving the above object, the scale of VNF instances constituting the SFC through a deep Q network (Deep Q-networks, DQN), which is one of the reinforcement learning algorithms It aims at auto-scaling that periodically performs scale-in/out. The proposed method defines the status of the layers constituting the SFC through DQN as the state of reinforcement learning, accepts it as an input value, and then scales in/out in a certain physical server. ) or whether to maintain the current VNF instances as an action. In addition, when performing scale-in/out, a layer of an SFC requiring scaling is selected and scaling is applied to the corresponding layer. The agent utilizes Q-network, Target Q-network, and Replay Memory for stable learning of DQN.

DQN은 일반적인 Q-learning과 마찬가지로, 특정 상태에서 행동을 수행할 때 얻을 수 있는 보상을 예측하는 지표인 Q-value를 반복적으로 학습한다. 학습된 Q-value는 특정 상태에서 어떤 행동을 수행할지 결정하는 정책으로 사용한다(예를 들어, 특정 상태에서 Q-value가 가장 큰 행동을 선택). DQN은 학습을 통해 특정 상태에서 수행할 행동을 출력하는 Q-network의 네트워크 파라미터를 갱신하는데, 학습된 Q-network는 최적의 스케일링 행동을 수행하는 최적 정책으로 사용된다. 본 발명의 DQN은 Q-network와 Target Q-network를 생성하고, 수학식 1의 손실 함수(Loss function) 값을 최소화하는 형태로 네트워크 파라미터를 학습한다. 수학식 1은 DQN에서 학습을 위해 일반적으로 사용되는 손실 함수이며, Replay Memory에 저장된 데이터

를 학습 데이터로 입력받는다. 이 때, Target Q-network(네트워크 파라미터

)에서 얻을 수 있는 최대 Q-value와 Q-network(네트워크 파라미터

)의 Q-value의 차이를 줄이는 방향으로 Q-network를 학습한다. 또한, 일정 횟수 이상 Q-network를 학습하면 Q-network의 네트워크 파라미터를 Target Q-network로 복사하는데, 이는 매 학습마다 Target Q-network도 같이 갱신하면 Q-network이 학습을 제대로 수행되지 않고 발산하기 때문이다. Like general Q-learning, DQN repeatedly learns the Q-value, an index that predicts the reward that can be obtained when performing an action in a specific state. The learned Q-value is used as a policy to decide which action to perform in a specific state (for example, select the action with the largest Q-value in a specific state). DQN updates the network parameters of the Q-network that outputs the action to be performed in a specific state through learning, and the learned Q-network is used as an optimal policy to perform the optimal scaling action. The DQN of the present invention generates a Q-network and a target Q-network, and learns network parameters in a form that minimizes the loss function value of Equation (1). Equation 1 is a loss function commonly used for learning in DQN, and the data stored in Replay Memory

is input as training data. At this time, Target Q-network (network parameter

) and the maximum Q-value obtainable from Q-network (network parameter

) to learn Q-network in the direction of reducing the difference in Q-value. In addition, if Q-network is learned more than a certain number of times, the network parameters of Q-network are copied to the target Q-network. Because.

DQN의 학습 과정에서는 수행한 행동에 대한 보상 값 r이 반영되어야 하기 때문에 수학식 2와 같이 보상 모델을 정의하였다. 수학식 2에서

은 스케일링을 수행한 SFC를 통해 트래픽을 전송하고, 응답을 받을 때까지 소요되는 응답 시간(Response time)을 의미한다. 또한, DQN 기반 오토 스케일링 방법에서는 SLO(Service Level Objectives)로 트래픽의 응답 시간을 활용한다. SFC 경로를 통해 측정되는 응답 시간은 편차가 클 수 있기 때문에, 측정된 결과 값을 그대로 활용하면 보상 값에도 큰 편차가 생길 수 있다. 따라서 본 발명에서는 측정된 응답 시간인

을 그대로 사용하는 것이 아니라, 미리 정의된 SLO 대비 응답 시간이 얼마나 되는 지를 비율로 환산하여 보상 값에 반영한다. 예를 들어, SLO로 50ms가 설정되어 있고, 실제 측정된

가 25ms일 경우, 0.5가 반영된다.In the learning process of DQN, the reward value r for the performed action must be reflected, so a reward model was defined as in Equation 2. in Equation 2

denotes the response time it takes until traffic is transmitted through the SFC on which scaling has been performed and a response is received. In addition, in the DQN-based auto-scaling method, the response time of traffic is utilized as SLO (Service Level Objectives). Since the response time measured through the SFC path may have a large deviation, if the measured result value is used as it is, a large deviation may occur in the compensation value. Therefore, in the present invention, the measured response time

Instead of using as it is, how much of the response time compared to the predefined SLO is converted into a ratio and reflected in the compensation value. For example, if 50 ms is set as the SLO, the actual measured

If is 25ms, 0.5 is reflected.

트래픽 응답 시간 외에도 수학식 2에서는

과

를 보상 r에 반영한다.

는 NFV 환경에서 가용할 수 있는 총 물리 서버 개수(

) 대비 SFC를 구성하는 VNF 인스턴스가 배치되어 있는 물리 서버 개수(

)의 비율을 의미한다. 반면,

는 SFC를 구성하는 VNF 인스턴스들의 분포도를 나타내는 값이다. SFC를 구성하는 전체 VNF 인스턴스 개수(

) 대비 VNF 인스턴스가 배치된 각 물리 서버에서 실행되는 VNF 인스턴스 개수(

)의 비율을 곱하여 계산한다.

는 VNF 인스턴스들이 적은 수의 물리 서버에 밀집되어 배치되면 높은 값을 가지게 되고, 많은 서버들에 분산 배치되어 있을 경우 작은 값을 가지게 된다. SFC의 각 계층 내 VNF 인스턴스들은 로드 밸런서(Load-balancer)를 통해 트래픽을 분산 받기 때문에, SFC를 구성하는 VNF 인스턴스들이 많은 물리 서버에 분산 배치되어 있을 경우 트래픽 또한 해당 물리 서버들로 분산된다. 결국, 각 계층에 속한 VNF 인스턴스들이 크게 분산되어 있다면, 동일한 SFC가 적용되는 트래픽의 패킷 전달 시간과 응답 시간의 편차가 클 수 있다. In addition to the traffic response time, Equation 2

class

is reflected in the reward r.

is the total number of physical servers available in the NFV environment (

) versus the number of physical servers on which VNF instances composing SFC are deployed (

) means the ratio of On the other hand,

is a value indicating the distribution of VNF instances constituting the SFC. Total number of VNF instances that make up the SFC (

) versus the number of VNF instances running on each physical server on which VNF instances are deployed (

) is multiplied by the ratio.

has a high value when VNF instances are densely deployed on a small number of physical servers, and has a small value when distributed among many servers. Since VNF instances in each layer of SFC receive traffic distributed through a load-balancer, when VNF instances constituting SFC are distributed across many physical servers, the traffic is also distributed to the corresponding physical servers. As a result, if the VNF instances belonging to each layer are widely distributed, the packet delivery time and response time of traffic to which the same SFC is applied may have a large deviation.

본 발명에서 제안하는 수학식 2의 보상 모델은 지수 함수에

와

를 가중치

와

로 보정하여 반영하고, SFC를 흐르는 트래픽의 응답 시간을 보상 r에 고려한다. 따라서 수학식 2로 계산되는 보상 값은 스케일링 행동으로 갱신된 SFC를 통과하는 트래픽의 응답 시간이 짧고, SFC를 구성하는 VNF 인스턴스들이 적은 물리 서버에 밀집된 형태로 배치되었을 경우 큰 값을 가지게 된다. The compensation model of Equation 2 proposed in the present invention is an exponential function.

Wow

weight the

Wow

is corrected and reflected, and the response time of the traffic flowing through the SFC is considered in the compensation r. Therefore, the compensation value calculated by Equation 2 has a large value when the response time of the traffic passing through the SFC updated by the scaling action is short, and when the VNF instances constituting the SFC are densely placed on a small physical server.

본 발명의 오토 스케일링 방법은 다계층으로 이루어 진 SFC를 대상으로 하기 때문에 스케일링을 적용할 특정 계층을 선택하는 것이 필요하다. 따라서 현재 상태에서 스케일링이 필요하다고 에이전트가 판단했을 경우, 수학식 3에 의해 스케일링을 적용할 계층을 선택한다. 본 발명에서 제안하는 수학식 3은 각 계층마다 점수(Score)를 계산하여, 가장 높은 점수를 가지는 계층을 스케일링 할 계층으로 선택한다. 각 계층의 점수는

와 함수

결과 값의 곱으로 계산된다. 이 중,

는 해당 계층 내에서 스케일링이 불가능한 경우에 0, 가능한 경우에는 1을 할당하여 점수를 보정한다.

는 각 계층의 함수가 스케일 인아웃(Scale-in/out)에 얼마나 적합한지를 나타내는 함수이다. 스케일링을 적용할 계층은 각 계층의 CPU 사용량(

)과 메모리 사용량(

)을 기반으로

로 정의하며, 각각 가중치

와

로 보정된다.

는 현재 SFC를 구성하는 VNF 인스턴스들이 배치된 물리 서버의 개수(

)와 선택된 계층의 VNF 인스턴스들이 배치된 물리 서버의 개수(

)의 비율을 고려한다.Since the auto-scaling method of the present invention targets a multi-layered SFC, it is necessary to select a specific layer to which scaling is to be applied. Therefore, when the agent determines that scaling is necessary in the current state, a layer to which scaling is applied is selected by Equation (3). Equation 3 proposed in the present invention calculates a score for each layer, and selects the layer having the highest score as the layer to be scaled. The score for each tier is

and function

It is calculated as the product of the resulting values. double,

The score is corrected by assigning 0 if scaling is not possible within the corresponding layer and 1 if possible.

is a function indicating how well the function of each layer is suitable for scale-in/out. The tier to which scaling is applied depends on the CPU usage (

) and memory usage (

) based on

, and each weighted

Wow

is corrected with

is the number of physical servers on which VNF instances composing the current SFC are deployed (

) and the number of physical servers on which VNF instances of the selected layer are deployed (

) is taken into account.

수학식 3은 상기 정의한

,

와 지수 함수를 활용하여 각 계층이 스케일링에 적합한지 점수로 나타낸다. 즉, 수학식 3에서 Scale-in의 경우에는 자원 사용량이 낮고, VNF 인스턴스들이 여러 물리 서버에 분산되어 있는 계층에 큰 점수를 부여한다. 반면, Scale-out의 경우에는 자원 사용량이 높고, VNF 인스턴스들이 적은 수의 물리 서버에 밀집해 있는 계층에 큰 점수를 부여한다. 이는 Scale-in의 경우 VNF 인스턴스들이 분산되어 있는 계층에서 불필요한 VNF 인스턴스를 제거하고, Scale-out에서는 VNF 인스턴스들이 밀집해 있는 계층에 가용 VNF 인스턴스를 추가하기 위해서이다. Equation 3 is defined above

,

and exponential function to indicate whether each layer is suitable for scaling as a score. That is, in the case of Scale-in in Equation 3, a large score is given to a layer in which resource usage is low and VNF instances are distributed in several physical servers. On the other hand, in the case of scale-out, a high score is given to a layer in which resource usage is high and VNF instances are concentrated on a small number of physical servers. This is to remove unnecessary VNF instances from the layer where VNF instances are distributed in the case of scale-in, and to add available VNF instances to the layer where VNF instances are dense in scale-out.

본 발명에서는 보상 정의에 활용되는 데이터를 가져오기 위한 모니터링 기능이 존재한다고 가정한다. SFC 데이터, VNF 인스턴스 설치 위치 데이터와 물리 서버 데이터는 VNF가 운영되는 NFV환경에서 제공하는 모니터링 도구(예를 들어, OpenStack의 경우 Ceilometer)를 활용하여 가져올 수 있으며, 각 VNF 인스턴스의 자원 활용률은 오픈소스 모니터링 에이전트인 Collectd를 설치해 주기적으로 모니터링한 후, 시계열 데이터베이스에 저장하는 것으로 확보할 수 있다.In the present invention, it is assumed that there is a monitoring function for fetching data used for compensation definition. SFC data, VNF instance installation location data, and physical server data can be imported by using a monitoring tool (for example, Ceilometer for OpenStack) provided by the NFV environment in which the VNF operates, and the resource utilization rate of each VNF instance is open source. It can be secured by installing the monitoring agent Collectd, monitoring it periodically, and storing it in a time series database.

본 발명의 성능은 임계값(Threshold) 기반 오토 스케일링 방법보다 제안하는 방법으로 수행한 오토 스케일링이 더 좋은 성능을 가진다는 것을 보여서 검증할 수 있다. 이 때, 성능 지표로는 오토 스케일링 되는 SFC의 SLO 위반 비율을 측정하여 활용할 수 있다.The performance of the present invention can be verified by showing that the auto-scaling performed by the proposed method has better performance than the threshold-based auto-scaling method. In this case, as a performance index, the SLO violation rate of the auto-scaling SFC can be measured and used.

본 발명의 일 실시예에 따르면, 사람이 수동으로 SFC의 스케일링을 결정하고 설정하는 것이 아닌, 강화학습 알고리즘 중 하나인 DQN을 활용하여 오토 스케일링을 수행하는 방법을 제시한다. According to an embodiment of the present invention, a method for performing auto-scaling using DQN, one of reinforcement learning algorithms, is provided, rather than manually determining and setting the scaling of the SFC.

본 발명의 결과물은 모듈 형태로 구현되어 실제 NFV 환경(예를 들면, OpenStack 등)에서 동작할 수 있으며, 오토 스케일링을 적용할 SFC를 정하면 해당 SFC를 구성하는 각 계층의 상황(Status) 정보를 주기적으로 상태로 받아들여 스케일링 행동을 결정한다. The result of the present invention is implemented in the form of a module and can be operated in an actual NFV environment (eg, OpenStack, etc.) It is accepted as a state to determine the scaling behavior.

이러한 방법은 SFC의 오토 스케일링을 위한 편의성을 제공하고, SFC의 성능(트래픽 응답 시간), SFC를 구성하는 VNF 인스턴스들의 분포도, 물리 서버 개수 등을 고려하기 때문에 SFC를 통한 패킷 처리 안정성 측면에서 임의로 스케일링을 수행할 때보다 좋은 성능을 보일 수 있다.This method provides convenience for SFC auto-scaling, and considers SFC performance (traffic response time), distribution of VNF instances constituting SFC, number of physical servers, etc. It can show better performance than performing

도 1은 본 발명의 일실시예의 오토 스케일링 장치의 오토 스케일링 대상이 되는 다계층(Multi-tier) 구조를 가지는 SFC의 예이다.
도 2은 본 발명의 일실시예의 오토 스케일링 장치의 오토 스케일링 문제를 본 발명에서 제안하는 방법으로 해결할 때, 각 구성요소들이 동작하는 과정을 도식화한 것이다.
도 3은 본 발명의 일실시예의 오토 스케일링 장치의 SFC의 각 계층 상황 정보가 상태로 주어졌을 때, DQN을 통해 스케일링 행동을 출력하는 과정을 도식화한 것이다.
도 4는 본 발명의 일실시예의 오토 스케일링 장치의 DQN을 활용하여 오토 스케일링을 수행하는 과정을 표현하고 있다.
도 5는 본 발명의 일실시예의 오토 스케일링 장치의 구성도이다. 1 is an example of an SFC having a multi-tier structure that is an auto-scaling target of an auto-scaling apparatus according to an embodiment of the present invention.
2 is a diagram schematically illustrating the operation of each component when solving the auto-scaling problem of the auto-scaling apparatus according to an embodiment of the present invention by the method proposed in the present invention.
3 is a diagram schematically illustrating a process of outputting a scaling behavior through a DQN when each layer context information of the SFC of the auto-scaling apparatus according to an embodiment of the present invention is given as a state.
4 illustrates a process of performing auto-scaling by using the DQN of the auto-scaling apparatus according to an embodiment of the present invention.
5 is a block diagram of an auto-scaling apparatus according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나, 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다. 각 도면을 설명하면서 유사한 참조부호를 유사한 구성요소에 대해 사용하였다. Since the present invention can have various changes and can have various embodiments, specific embodiments are illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and it should be understood to include all modifications, equivalents and substitutes included in the spirit and scope of the present invention. In describing each figure, like reference numerals have been used for like elements.

제1, 제2, A, B 등의 용어는 다양한 구성요소들을 설명하는 데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 본 발명의 권리 범위를 벗어나지 않으면서 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소도 제1 구성요소로 명명될 수 있다. "및/또는"이라는 용어는 복수의 관련된 기재된 항목들의 조합 또는 복수의 관련된 기재된 항목들 중의 어느 항목을 포함한다. Terms such as first, second, A, and B may be used to describe various elements, but the elements should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, without departing from the scope of the present invention, a first component may be referred to as a second component, and similarly, a second component may also be referred to as a first component. The term “and/or” includes a combination of a plurality of related listed items or any of a plurality of related listed items.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When an element is referred to as being “connected” or “connected” to another element, it is understood that it may be directly connected or connected to the other element, but other elements may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle.

본 출원에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 출원에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in the present application are only used to describe specific embodiments, and are not intended to limit the present invention. The singular expression includes the plural expression unless the context clearly dictates otherwise. In the present application, terms such as “comprise” or “have” are intended to designate that a feature, number, step, operation, component, part, or combination thereof described in the specification exists, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof.

다르게 정의되지 않는 한, 기술적이거나 과학적인 용어를 포함해서 여기서 사용되는 모든 용어들은 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자에 의해 일반적으로 이해되는 것과 동일한 의미를 가지고 있다. 일반적으로 사용되는 사전에 정의되어 있는 것과 같은 용어들은 관련 기술의 문맥 상 가지는 의미와 일치하는 의미를 가지는 것으로 해석되어야 하며, 본 출원에서 명백하게 정의하지 않는 한, 이상적이거나 과도하게 형식적인 의미로 해석되지 않는다.Unless defined otherwise, all terms used herein, including technical and scientific terms, have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning consistent with the meaning in the context of the related art, and should not be interpreted in an ideal or excessively formal meaning unless explicitly defined in the present application. does not

본 발명은 강화학습 기술을 활용하여 NFV 환경 내 SFC의 오토 스케일링을 수행하는 방법에 관한 것이다. 제안하는 방법은 SFC를 통과하는 트래픽의 변동으로 인해 발생할 수 있는 SFC의 과부하, 성능 저하 등의 문제에 대응하기 위하여, SFC를 구성하는 VNF가 동작하는 가상 머신(VM, Virtual Machine) 또는 컨테이너(Container)를 스케일링 하는 강화학습 문제로 정의한다. 오토 스케일링 문제를 해결하기 위해 강화학습 알고리즘 중 하나인 심층 Q 네트워크(Deep Q-networks, DQN)을 이용하며, 오토 스케일링으로 갱신된 SFC를 통해 측정되는 트래픽의 응답 시간(Response time)과 SFC를 구성하는 VNF 인스턴스들의 분포 상태, 가용 물리 서버 대비 VNF 인스턴스를 배치하는데 사용한 물리 서버 개수의 비를 보상 값으로 고려한다.The present invention relates to a method of performing auto-scaling of SFC in an NFV environment by using reinforcement learning technology. The proposed method responds to problems such as overload and performance degradation of SFC that may occur due to fluctuations in traffic passing through the SFC. ) is defined as a scaling reinforcement learning problem. To solve the auto-scaling problem, Deep Q-networks (DQN), one of the reinforcement learning algorithms, is used, and the response time and SFC of traffic measured through the SFC updated by auto-scaling are configured. The distribution status of VNF instances to be used and the ratio of the number of physical servers used to deploy VNF instances to available physical servers are considered as compensation values.

본 발명은 강화학습 알고리즘 중 하나인 심층 Q 네트워크(Deep Q-networks, DQN)를 통해 SFC를 구성하는 VNF 인스턴스들의 스케일 인아웃(Scale-in/out)을 주기적으로 수행하는 오토 스케일링을 목표로 한다. 제안하는 방법은 DQN을 통해 SFC를 구성하는 계층(Tier)들의 상황(Status)을 강화학습의 상태(State)로 정의하여 입력 값으로 받아들인 후, 어떤 물리 서버에서 스케일 인아웃(Scale-in/out)을 수행할지 또는 현재 VNF 인스턴스들을 유지(Maintain)할지를 행동으로 출력한다. 또한, 스케일 인아웃(Scale-in/out)을 수행할 때, 스케일링이 필요한 SFC의 계층을 선택하여 해당 계층에 스케일링을 적용한다. 에이전트에서는 DQN의 안정적인 학습을 위해 Q-network및 Target Q-network, Replay Memory를 활용한다.The present invention aims at auto-scaling by periodically performing scale-in/out of VNF instances constituting SFC through a deep Q-networks (DQN), one of the reinforcement learning algorithms. The proposed method defines the status of the layers constituting the SFC through DQN as the state of reinforcement learning, accepts it as an input value, and then scales in/out in a certain physical server. ) or whether to maintain the current VNF instances as an action. In addition, when performing scale-in/out, a layer of an SFC requiring scaling is selected and scaling is applied to the corresponding layer. The agent utilizes Q-network, Target Q-network, and Replay Memory for stable learning of DQN.

DQN은 일반적인 Q-learning과 마찬가지로, 특정 상태에서 행동을 수행할 때 얻을 수 있는 보상을 예측하는 지표인 Q-value를 반복적으로 학습한다. 학습된 Q-value는 특정 상태에서 어떤 행동을 수행할지 결정하는 정책으로 사용한다(예를 들어, 특정 상태에서 Q-value가 가장 큰 행동을 선택). DQN은 학습을 통해 특정 상태에서 수행할 행동을 출력하는 Q-network의 네트워크 파라미터를 갱신하는데, 학습된 Q-network는 최적의 스케일링 행동을 수행하는 최적 정책으로 사용된다. Like general Q-learning, DQN repeatedly learns the Q-value, an index that predicts the reward that can be obtained when performing an action in a specific state. The learned Q-value is used as a policy to decide which action to perform in a specific state (for example, select the action with the largest Q-value in a specific state). DQN updates the network parameters of the Q-network that outputs the action to be performed in a specific state through learning, and the learned Q-network is used as an optimal policy to perform the optimal scaling action.

(수학식 1) (Equation 1)

본 발명의 DQN은 Q-network와 Target Q-network를 생성하고, 수학식 1의 손실 함수(Loss function) 값을 최소화하는 형태로 네트워크 파라미터를 학습한다. 수학식 1은 DQN에서 학습을 위해 일반적으로 사용되는 손실 함수이며, Replay Memory에 저장된 데이터

)에서 얻을 수 있는 최대 Q-value와 Q-network(네트워크 파라미터

)의 Q-value의 차이를 줄이는 방향으로 Q-network를 학습한다. 또한, 일정 횟수 이상 Q-network를 학습하면 Q-network의 네트워크 파라미터를 Target Q-network로 복사하는데, 이는 매 학습마다 Target Q-network도 같이 갱신하면 Q-network이 학습을 제대로 수행되지 않고 발산하기 때문이다. The DQN of the present invention generates a Q-network and a target Q-network, and learns network parameters in a form that minimizes the loss function value of Equation (1). Equation 1 is a loss function commonly used for learning in DQN, and the data stored in Replay Memory

is input as training data. At this time, Target Q-network (network parameter

) and the maximum Q-value obtainable from Q-network (network parameter

(수학식 2) (Equation 2)

단,

는 최소한 1개 이상의 VNF 인스턴스가 배치된 물리 서버이고,only,

is a physical server on which at least one VNF instance is deployed,

If is 25ms, 0.5 is reflected.

트래픽 응답 시간 외에도 수학식 2에서는

과

를 보상 r에 반영한다.

는 NFV 환경에서 가용할 수 있는 총 물리 서버 개수(

)의 비율을 의미한다. 반면,

)의 비율을 곱하여 계산한다.

class

is reflected in the reward r.

is the total number of physical servers available in the NFV environment (

) means the ratio of On the other hand,

) is multiplied by the ratio.

본 발명에서 제안하는 수학식 2의 보상 모델은 지수 함수에

와

를 가중치

와

Wow

weight the

Wow

(수학식 3) (Equation 3)

와 함수

결과 값의 곱으로 계산된다. 이 중,

)과 메모리 사용량(

)을 기반으로

로 정의하며, 각각 가중치

와

로 보정된다.

and function

It is calculated as the product of the resulting values. double,

) and memory usage (

) based on

, and each weighted

Wow

is corrected with

) is taken into account.

수학식 3은 상기 정의한

,

이하, 본 발명에 따른 바람직한 실시예를 첨부된 도면을 참조하여 상세하게 설명한다. Hereinafter, preferred embodiments according to the present invention will be described in detail with reference to the accompanying drawings.

도 1은 오토 스케일링 대상이 되는 다계층(Multi-tier) 구조를 가지는 SFC(100)의 예를 도시한 구성도이다.1 is a configuration diagram illustrating an example of an SFC 100 having a multi-tier structure to be auto-scaling.

도 1을 참조하면, 본 발명에서 제안하는 DQN 기반 오토 스케일링 방법은 SFC(100)를 구성하는 각 계층(Tier)(110,120)의 VNF 인스턴스 개수를 조절하는 스케일 인아웃(Scale-in/out) 문제로 정의한다. 또한, SFC는 여러 계층으로 구성될 수 있기 때문에 스케일링을 적용할 계층을 선택하는 방법도 포함한다. 도 1의 오토 스케일링 대상인 되는 다계층(Multi-tier) 구조를 가진 SFC(100) 는 Firewall(111), IDS(121) 두 종류로 이루어진 2-계층(110,120) SFC(100)를 보이고 있으며, 오토 스케일링이 필요할 때 Fireall 계층(110)과 IDS 계층(120) 중 어느 계층에 스케일링을 수행할지 결정해야 한다. DQN 기반 오토 스케일링 문제에서는 SFC를 구성하는 각 계층의 상황을 강화학습 문제의 상태(State)로 정의하고, 이를 입력으로 받아들여 DQN이 특정 상태에서 수행해야 하는 스케일링을 행동으로 출력한다. 스케일링을 수행한 후에는 갱신된 SFC에서 측정된 트래픽 응답 시간, SFC를 구성하는 VNF 인스턴스들의 분포도 및 인스턴스를 배치하는데 사용된 물리 서버의 개수 등을 고려하여 보상을 부여한다.1, the DQN-based auto-scaling method proposed in the present invention is a scale-in/out problem of adjusting the number of VNF instances of each layer (Tier) 110 and 120 constituting the SFC 100. define. In addition, since SFC can be composed of several layers, it includes a method of selecting a layer to which scaling is applied. The SFC 100 having a multi-tier structure that is the target of auto scaling of FIG. 1 shows a two-layer (110, 120) SFC 100 consisting of two types of a Firewall 111 and an IDS 121, and When scaling is required, it is necessary to determine which layer of the Fireall layer 110 and the IDS layer 120 to perform scaling. In the DQN-based auto-scaling problem, the situation of each layer constituting the SFC is defined as the state of the reinforcement learning problem, and it receives this as an input and outputs the scaling that the DQN should perform in a specific state as an action. After scaling is performed, compensation is given in consideration of the traffic response time measured in the updated SFC, the distribution of VNF instances constituting the SFC, and the number of physical servers used to deploy the instances.

도 2는 오토 스케일링 문제를 본 발명에서 제안하는 방법으로 해결할 때, 각 구성요소들이 동작하는 과정을 도식화한 구성도이다.2 is a configuration diagram schematically illustrating the operation of each component when solving the auto-scaling problem by the method proposed in the present invention.

도 2를 참조하면, 본 발명의 DQN은 안정적인 학습을 위해 에이전트(200)에서 두 개의 심층 네트워크인 Q-network(210)와 Target Q-network(220)를 사용한다. 이는 Q-value를 학습하는 과정에서 최적 값으로 수렴하지 않고 발산하는 것을 방지한다. 또한, 특정 상태에서 스케일링 행동을 수행했을 때 받게 되는 보상과 다음 상태, 그리고 스케일링이 성공했는지 여부를 Replay Memory(400)에

형태로 저장한다.

은 각각 상태, 행동, 보상이며,

는 선택된 스케일링 행동이 정상적으로 수행되었는지를 나타내는 값이다. 예를 들어, 물리 서버에 가용 자원이 없이 VNF 인스턴스를 추가할 수 없는 경우는 Scale-out에 실패하여

값이 0으로 할당되며, 성공할 경우에는 1이 할당된다. Replay Memory(400)에 저장된 데이터들을 Mini-batch 방식으로 학습을 하면, 데이터 간 상관관계로 인해 잘못된 네트워크 파라미터를 학습하는 것을 방지할 수 있다.Referring to FIG. 2 , the DQN of the present invention uses two deep networks Q-network 210 and Target Q-network 220 in the agent 200 for stable learning. This prevents divergence without converging to the optimal value in the process of learning the Q-value. In addition, the reward received when the scaling action is performed in a specific state, the next state, and whether the scaling was successful are stored in the Replay Memory (400).

save in the form

are states, actions, and rewards, respectively,

is a value indicating whether the selected scaling action is normally performed. For example, if a VNF instance cannot be added without available resources on the physical server, the scale-out will fail.

A value of 0 is assigned, on success it is assigned a value of 1. When the data stored in the Replay Memory 400 is learned in a mini-batch method, it is possible to prevent learning of incorrect network parameters due to correlation between data.

본 발명은 NFV 환경(300)에서 SFC(100)의 오토 스케일링을 강화학습 알고리즘 중 하나인 심층 Q 네트워크(Deep Q-networks, DQN)(200)로 수행하는 방법을 개발하고자 한다. 제안하는 방법에서는 오토 스케일링을 적용할 SFC(100)가 정해지면, 주기적으로 SFC(100)를 구성하는 각 계층(110, 120)의 상황 정보를 입력 값으로 받아 어떤 위치(예를 들면, 물리 서버)에서 어떤 스케일링 행동이 수행되어야 하는지를 출력한다. 이 때, 수행할 행동이 스케일 인아웃(Scale-in/out)일 경우, 스케일링을 적용할 적합한 SFC 계층(110,120)을 선택한다. 본 발명에서는 DQN을 사용하여 오토 스케일링을 수행하는 방법과 함께 OpenStack 환경(300)에서 실제 시스템 형태로 구현할 수 있는 방법도 제시하고 있다.The present invention intends to develop a method for performing auto-scaling of the SFC 100 in the NFV environment 300 with the Deep Q-networks (DQN) 200, which is one of the reinforcement learning algorithms. In the proposed method, when the SFC 100 to which auto scaling is to be applied is determined, the situation information of each layer 110 and 120 constituting the SFC 100 is periodically received as an input value at a certain location (eg, a physical server). ) outputs which scaling action should be performed. At this time, when the action to be performed is scale-in/out, the appropriate SFC layers 110 and 120 to which scaling is applied are selected. In the present invention, along with a method of performing auto-scaling using DQN, a method that can be implemented in the form of an actual system in the OpenStack environment 300 is also presented.

도 3은 SFC의 각 계층 상황 정보가 상태로 주어졌을 때, DQN을 통해 스케일링 행동을 출력하는 과정을 도식화한 모식도이다.3 is a schematic diagram schematically illustrating a process of outputting a scaling action through a DQN when each layer context information of the SFC is given as a state.

도 3을 참조하면, SFC의 각 계층의 상황(Tier Status)이 DQN의 상태로 주어졌을 때, 수행할 스케일링 행동을 출력하는 과정이 도식화된다. 이 때, 스케일링 행동은 VNF 인스턴스를 추가(Add)하는 Scale-out, VNF 인스턴스를 제거(Remove)하는 Scale-in, 현재 VNF 인스턴스를 유지(Maintain)하는 경우로 나뉘며, 스케일 인아웃(Scale-in/out)의 경우 어떤 위치에 VNF 인스턴스를 추가/제거할지도 고려한다. 각 계층의 상황 정보는 5개의 데이터로 구성이 되며, 데이터 종류는 계층에 존재하는 VNF 인스턴스들의 평균 CPU 사용량, 평균 메모리 사용량, 평균 디스크 작업 수행 횟수, 계층 내 VNF 인스턴스 개수, VNF 인스턴스의 분포도이다. 계층 상황에서 CPU와 메모리를 고려하는 이유는 해당 자원들이 충분치 않으면 패킷(10) 처리가 지연(Delay)되거나 이로 인한 패킷 손실(Packet loss)를 발생시키는 등, 패킷(10) 처리 성능에 영향을 크게 미치는 요소들이기 때문이다. 또한, 디스크 작업 수행 횟수는 디스크를 읽거나 쓰는 작업 횟수를 의미하는데, 메모리 자원이 과도하게 사용될 경우 Swap 작업이 발생하여 디스크 작업 횟수가 높게 측정될 수 있다. Swap 작업은 메모리에 저장할 데이터 일부를 디스크에 저장하는 것인데, 메모리 작업에 비해 디스크 작업은 속도가 느리기 때문에 병목 현상을 발생시켜 간접적으로 패킷(10) 처리 성능에 영향을 미친다. 그 외에는 각 계층에 속한 VNF 인스턴스 개수와 VNF 인스턴스 분포도를 계층 상황으로 고려한다. VNF 인스턴스 분포도는 NFV 환경 내에서 VNF 인스턴스를 생성할 수 있는 총 가용 물리 서버 개수 대비 실제 VNF 인스턴스가 배치 된 물리 서버 개수로 계산 된 값이다. 예를 들어, 10개의 가용서버가 있는데, 그 중 3개의 서버에 현재 계층의 VNF 인스턴스가 배치되어 있을 경우, 분포도 값은 0.3이 된다. Referring to FIG. 3 , a process of outputting a scaling action to be performed when the status of each layer of the SFC is given as the status of the DQN is diagrammed. At this time, the scaling behavior is divided into Scale-out to add a VNF instance, Scale-in to remove a VNF instance, and a case to maintain the current VNF instance. out), consider adding/removing VNF instances at any location. The context information of each layer consists of 5 pieces of data, and the data types are average CPU usage, average memory usage, average number of disk operations performed, the number of VNF instances in the layer, and the distribution of VNF instances of the VNF instances in the layer. The reason for considering the CPU and memory in the hierarchical situation is that if the corresponding resources are not sufficient, the packet 10 processing performance is greatly affected, such as delaying the processing of the packet 10 or causing packet loss. Because these are the factors that affect it. In addition, the number of disk operations performed means the number of disk read or write operations. When memory resources are excessively used, a swap operation occurs and the number of disk operations may be measured to be high. The swap operation is to store some data to be stored in the memory on the disk. Compared to the memory operation, the disk operation is slow, so it creates a bottleneck and indirectly affects the packet (10) processing performance. Otherwise, the number of VNF instances belonging to each layer and the distribution of VNF instances are considered as the layer situation. The VNF instance distribution map is a value calculated by the number of physical servers on which VNF instances are actually deployed compared to the total number of available physical servers that can create VNF instances within the NFV environment. For example, if there are 10 available servers, and 3 of them have VNF instances of the current layer, the distribution value becomes 0.3.

본 발명에서는 사람이 수동으로 SFC의 스케일링을 결정하고 설정하는 것이 아닌, 강화학습 알고리즘 중 하나인 DQN을 활용하여 오토 스케일링을 수행하는 방법을 제시한다. 본 발명의 결과물은 모듈 형태로 구현되어 실제 NFV 환경(예를 들면, OpenStack 등)에서 동작할 수 있으며, 오토 스케일링을 적용할 SFC를 정하면 해당 SFC를 구성하는 각 계층의 상황(Status) 정보를 주기적으로 상태로 받아들여 스케일링 행동을 결정한다. 이러한 방법은 SFC의 오토 스케일링을 위한 편의성을 제공하고, SFC의 성능(트래픽 응답 시간), SFC를 구성하는 VNF 인스턴스들의 분포도, 물리 서버 개수 등을 고려하기 때문에 SFC를 통한 패킷(10) 처리 안정성 측면에서 임의로 스케일링을 수행할 때보다 좋은 성능을 보일 수 있다.The present invention proposes a method for performing auto-scaling by using DQN, which is one of reinforcement learning algorithms, rather than manually determining and setting the scaling of the SFC. The result of the present invention is implemented in the form of a module and can be operated in an actual NFV environment (eg, OpenStack, etc.) It is accepted as a state to determine the scaling behavior. This method provides convenience for SFC auto-scaling, and considers the performance (traffic response time) of the SFC, the distribution of VNF instances constituting the SFC, the number of physical servers, etc. It can show better performance than when arbitrarily scaling is performed.

도 4는 DQN을 활용하여 오토 스케일링을 수행하는 과정을 표현한 순서도이다.4 is a flowchart illustrating a process of performing auto-scaling using DQN.

도 4를 참조하면, DQN 기반 오토 스케일링을 요청했을 때, 오토 스케일링 기능을 수행하는 순서를 보인다(S401). 본 발명의 결과물은 실제 NFV 환경(ex. OpenStack)에서 동작할 수 있는 모듈 형태로 구현되며, 오토 스케일링 모듈이 오토 스케일링을 적용할 SFC의 이름과 오토 스케일링을 수행하는데 필요한 파라미터(Parameter)가 포함된 요청 메시지를 수신하면 오토 스케일링 프로세스를 실행한다. 이 때, 오토 스케일링 프로세스는 쓰레드(Thread)로 동작하여 여러 오토 스케일링 프로세스가 동시에 수행될 수 있도록 한다. Referring to FIG. 4 , when DQN-based auto-scaling is requested, the order of performing the auto-scaling function is shown (S401). The result of the present invention is implemented in the form of a module that can operate in an actual NFV environment (ex. OpenStack), and the name of the SFC to which the auto-scaling module will apply auto-scaling and parameters necessary to perform auto-scaling. When a request message is received, the auto-scaling process is executed. At this time, the auto-scaling process operates as a thread so that several auto-scaling processes can be simultaneously performed.

프로세스가 실행되면 오토 스케일링을 적용 할 SFC의 데이터와 물리 서버 정보 등, 오토 스케일링에서 필요한 데이터를 모니터링 모듈로 요청하여 받아온다(S402).When the process is executed, data required for auto-scaling, such as SFC data to which auto-scaling is to be applied and physical server information, is requested and received by the monitoring module (S402).

이후, DQN의 하이퍼파라미터(Hyperparameter) 값을 설정하고(S403), Q-network(210)와 Target Q-network(220)를 생성한다(S404). Thereafter, a hyperparameter value of the DQN is set (S403), and a Q-network 210 and a target Q-network 220 are generated (S404).

다음으로는 Replay Memory(400)를 생성하는데(S405), 만약 Replay Memory(400)로 읽어올 학습용 데이터(Dataset)이 미리 파일 형태로 존재한다면(S406), 해당 데이터를 읽어서 Replay Memory(400)에 저장한다(S407). Replay Memory(400) 생성까지 완료된 후에는 본격적인 오토 스케일링을 수행한다. Next, the Replay Memory 400 is created (S405). If the learning data (Dataset) to be read into the Replay Memory 400 exists in the form of a file in advance (S406), the data is read and stored in the Replay Memory 400 Save (S407). After the creation of the Replay Memory 400 is completed, full-scale auto-scaling is performed.

먼저, 오토 스케일링 대상이 되는 SFC 내 각 계층 상황(Tier Status)를 가져온 후, 현재 상태(State)를 DQN에 입력할 수 있는 형태인 텐서(Tensor)로 변환한다(S408). First, each tier status in the SFC to be auto-scaling is obtained, and then the current state is converted into a tensor that can be input to the DQN (S408).

텐서로 변환된 상태는 DQN에 입력되고, 어떤 물리 서버에서 어떤 스케일링을 수행할 것인지를 나타내는 행동이 출력된다(S409). The state converted into a tensor is input to the DQN, and an action indicating which scaling is to be performed in which physical server is output (S409).

출력된 결과가 스케일 인아웃(Scale-in/out) 일 경우(S410), 스케일링을 적용할 계층(Tier)를 선택한다(S411). When the output result is scale-in/out (S410), a layer to which scaling is applied is selected (S411).

계층까지 결정된 후에는 DQN의 결과로 선택된 스케일링 행동을 수행한다(S412). After the layer is determined, a scaling action selected as a result of the DQN is performed (S412).

행동을 수행하고 나서, 에이전트(200)는 스케일링으로 인한 보상 값을 계산하고(S413), 새롭게 추가 또는 제거된 VNF 인스턴스를 반영하여 SFC를 갱신한다(S414). After performing the action, the agent 200 calculates a reward value due to scaling (S413), and updates the SFC by reflecting the newly added or removed VNF instance (S414).

SFC 갱신을 완료한 후에는 SFC 내 각 계층 상황을 다시 가져온 후, 새로운 상태를 텐서로 변환한다(S415). After completing the SFC update, each layer state in the SFC is brought back and the new state is converted into a tensor (S415).

새로운 상태까지 텐서로 반환한 후에는 현재 상태, 행동, 보상, 새로운 상태, 행동 성공 여부로 이루어 진 데이터를 Replay Memory(400)에 저장한다(S416). After returning the new state as a tensor, data including the current state, action, reward, new state, and success or failure of the action are stored in the Replay Memory 400 (S416).

Replay Memory(400)에 최소 개수 N개 이상의 데이터가 쌓이면(S417), 이를 Mini-batch로 샘플링(Sampling)하여 Q-network(210)를 학습한다(S418). When the minimum number of N or more data is accumulated in the Replay Memory 400 (S417), the Q-network 210 is learned by sampling it in a mini-batch (S418).

본 발명의 DQN은 Q-network(210)와 Target Q-network(220)로 구성되어 있기 때문에 스케일링 행동을 일정 횟수 반복(S419)할 때마다 주기적으로 Q-network(210)의 학습된 네트워크 파라미터를 Target Q-network(220)로 복사 및 갱신한다(S420). Since the DQN of the present invention is composed of the Q-network 210 and the target Q-network 220, whenever the scaling action is repeated a certain number of times (S419), the learned network parameters of the Q-network 210 are periodically checked. It is copied and updated to the target Q-network 220 (S420).

매번 스케일링 행동을 수행하고 나서 오토 스케일링 모듈은 오토 스케일링 프로세스가 현재까지 실행된 시간을 계산한다(S421). 미리 정의된 수행 시간 한도(Duration)보다 실제 실행된 시간이 길면(S422) 오토 스케일링 프로세스를 종료한다(S423). 반면, 오토 스케일링 프로세스의 만료까지 시간이 남아있을 경우, 다시 현재 SFC의 상태를 계산(S408)한 후 스케일링 과정을 반복한다.After each scaling action is performed, the auto-scaling module calculates the time the auto-scaling process has been executed up to now (S421). If the actual execution time is longer than the predefined execution time limit (Duration) (S422), the auto-scaling process is terminated (S423). On the other hand, if time remains until the end of the auto-scaling process, the current SFC state is calculated again (S408) and the scaling process is repeated.

도 5는 본 발명의 일 실시예의 오토 스케일링 장치(1000)의 구성도이다.5 is a block diagram of an auto-scaling apparatus 1000 according to an embodiment of the present invention.

도 5를 참조하면, 본 발명의 일 실시예의 오토 스케일링 장치(1000)는, 프로세서(1100), 메모리(1200), 송수신 장치(transceiver, 1300), 입력 인터페이스 장치(1400), 출력 인터페이스 장치(1500), 저장 장치(1600) 및 버스(bus)(1700)를 포함하여 구성될 수 있다.Referring to FIG. 5 , the auto-scaling device 1000 according to an embodiment of the present invention includes a processor 1100 , a memory 1200 , a transceiver 1300 , an input interface device 1400 , and an output interface device 1500 . ), a storage device 1600 and a bus 1700 may be included.

본 발명의 오토 스케일링 장치(1000)는, 프로세서(processor)(1100) 및 프로세서(1100)를 통해 실행되는 적어도 하나의 명령이 저장된 메모리(memory)(1200)를 포함하되, 적어도 하나의 명령은 상기 프로세서(1100)가, The auto-scaling apparatus 1000 of the present invention includes a processor 1100 and a memory 1200 in which at least one command executed through the processor 1100 is stored, and the at least one command is the The processor 1100,

를 수행하도록 구성된다.is configured to perform

프로세서(1100)는 중앙 처리 장치(central processing unit, CPU), 그래픽 처리 장치(graphics processing unit, GPU), 또는 본 발명의 실시예들에 따른 방법들이 수행되는 전용의 프로세서를 의미할 수 있다. The processor 1100 may refer to a central processing unit (CPU), a graphics processing unit (GPU), or a dedicated processor on which methods according to embodiments of the present invention are performed.

메모리(1200) 및 저장 장치(1600) 각각은 휘발성 저장 매체 및 비휘발성 저장 매체 중에서 적어도 하나로 구성될 수 있다. 예를 들어, 메모리(1200)는 읽기 전용 메모리(read only memory, ROM) 및 랜덤 액세스 메모리(random access memory, RAM) 중에서 적어도 하나로 구성될 수 있다. Each of the memory 1200 and the storage device 1600 may be configured as at least one of a volatile storage medium and a non-volatile storage medium. For example, the memory 1200 may be configured as at least one of a read only memory (ROM) and a random access memory (RAM).

또한, 오토 스케일링 장치(1000)는 무선 네트워크를 통해 통신을 수행하는 송수신 장치(transceiver)(1300)를 포함할 수 있다. Also, the auto-scaling apparatus 1000 may include a transceiver 1300 that performs communication through a wireless network.

또한, 오토 스케일링 장치(1000)는 입력 인터페이스 장치(1400), 출력 인터페이스 장치(1500), 저장 장치(1600) 등을 더 포함할 수 있다.Also, the auto-scaling apparatus 1000 may further include an input interface device 1400 , an output interface device 1500 , a storage device 1600 , and the like.

또한, 오토 스케일링 장치(1000)에 포함된 각각의 구성 요소들은 버스(bus)(1700)에 의해 연결되어 서로 통신을 수행할 수 있다.In addition, each of the components included in the auto-scaling apparatus 1000 may be connected by a bus 1700 to communicate with each other.

본 발명의 오토 스케일링 장치(1000)의 예를 들면, 통신 가능한 데스크탑 컴퓨터(desktop computer), 랩탑 컴퓨터(laptop computer), 노트북(notebook), 스마트폰(smart phone), 태블릿 PC(tablet PC), 모바일폰(mobile phone), 스마트 워치(smart watch), 스마트 글래스(smart glass), e-book 리더기, PMP(portable multimedia player), 휴대용 게임기, 네비게이션(navigation) 장치, 디지털 카메라(digital camera), DMB(digital multimedia broadcasting) 재생기, 디지털 음성 녹음기(digital audio recorder), 디지털 음성 재생기(digital audio player), 디지털 동영상 녹화기(digital video recorder), 디지털 동영상 재생기(digital video player), PDA(Personal Digital Assistant) 등일 수 있다.For example, of the auto-scaling apparatus 1000 of the present invention, a communicable desktop computer (desktop computer), a laptop computer (laptop computer), a notebook (notebook), a smart phone (smart phone), a tablet PC (tablet PC), mobile Mobile phone, smart watch, smart glass, e-book reader, PMP (portable multimedia player), portable game console, navigation device, digital camera, DMB ( It can be a digital multimedia broadcasting player, digital audio recorder, digital audio player, digital video recorder, digital video player, PDA (Personal Digital Assistant), etc. have.

본 발명의 실시예에 따른 방법의 동작은 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 프로그램 또는 코드로서 구현하는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의해 읽혀질 수 있는 정보가 저장되는 모든 종류의 기록장치를 포함한다. 또한 컴퓨터가 읽을 수 있는 기록매체는 네트워크로 연결된 컴퓨터 시스템에 분산되어 분산 방식으로 컴퓨터로 읽을 수 있는 프로그램 또는 코드가 저장되고 실행될 수 있다. The operation of the method according to the embodiment of the present invention can be implemented as a computer-readable program or code on a computer-readable recording medium. The computer-readable recording medium includes all types of recording devices in which information readable by a computer system is stored. In addition, the computer-readable recording medium may be distributed in a network-connected computer system to store and execute computer-readable programs or codes in a distributed manner.

또한, 컴퓨터가 읽을 수 있는 기록매체는 롬(rom), 램(ram), 플래시 메모리(flash memory) 등과 같이 프로그램 명령을 저장하고 수행하도록 특별히 구성된 하드웨어 장치를 포함할 수 있다. 프로그램 명령은 컴파일러(compiler)에 의해 만들어지는 것과 같은 기계어 코드뿐만 아니라 인터프리터(interpreter) 등을 사용해서 컴퓨터에 의해 실행될 수 있는 고급 언어 코드를 포함할 수 있다.In addition, the computer-readable recording medium may include a hardware device specially configured to store and execute program instructions, such as ROM, RAM, and flash memory. The program instructions may include not only machine language codes such as those generated by a compiler, but also high-level language codes that can be executed by a computer using an interpreter or the like.

본 발명의 일부 측면들은 장치의 문맥에서 설명되었으나, 그것은 상응하는 방법에 따른 설명 또한 나타낼 수 있고, 여기서 블록 또는 장치는 방법 단계 또는 방법 단계의 특징에 상응한다. 유사하게, 방법의 문맥에서 설명된 측면들은 또한 상응하는 블록 또는 아이템 또는 상응하는 장치의 특징으로 나타낼 수 있다. 방법 단계들의 몇몇 또는 전부는 예를 들어, 마이크로프로세서, 프로그램 가능한 컴퓨터 또는 전자 회로와 같은 하드웨어 장치에 의해(또는 이용하여) 수행될 수 있다. 몇몇의 실시예에서, 가장 중요한 방법 단계들의 하나 이상은 이와 같은 장치에 의해 수행될 수 있다. Although some aspects of the invention have been described in the context of an apparatus, it may also represent a description according to a corresponding method, wherein a block or apparatus corresponds to a method step or feature of a method step. Similarly, aspects described in the context of a method may also represent a corresponding block or item or a corresponding device feature. Some or all of the method steps may be performed by (or using) a hardware device such as, for example, a microprocessor, programmable computer or electronic circuit. In some embodiments, one or more of the most important method steps may be performed by such an apparatus.

실시예들에서, 프로그램 가능한 로직 장치(예를 들어, 필드 프로그래머블 게이트 어레이)가 여기서 설명된 방법들의 기능의 일부 또는 전부를 수행하기 위해 사용될 수 있다. 실시예들에서, 필드 프로그머블 게이트 어레이는 여기서 설명된 방법들 중 하나를 수행하기 위한 마이크로프로세서와 함께 작동할 수 있다. 일반적으로, 방법들은 어떤 하드웨어 장치에 의해 수행되는 것이 바람직하다.In embodiments, a programmable logic device (eg, a field programmable gate array) may be used to perform some or all of the functions of the methods described herein. In embodiments, the field programmable gate array may operate in conjunction with a microprocessor to perform one of the methods described herein. In general, the methods are preferably performed by some hardware device.

이상 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야의 숙련된 당업자는 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.Although described above with reference to preferred embodiments of the present invention, those skilled in the art can variously modify and change the present invention within the scope without departing from the spirit and scope of the present invention described in the claims below. You will understand that you can.

Claims

A network function virtualization server that implements an NFV (Network Function Virtualization) environment composed of OpenStack; and
Deep Q-networks (DQN) device that performs auto-scaling of SFC in NFV environment with Deep Q-networks (DQN), one of reinforcement learning algorithms; containing,
Auto scaling system.

The method according to claim 1, Deep Q network (Deep Q-networks, DQN) device,
Q-network, Target Q-network and Replay Memory to solve the auto-scaling problem of SFC (Service Function Chaining); containing,
Auto scaling system.

The method according to claim 1, wherein the system,
It receives the Tier Status information of each layer constituting SFC (Service Function Chaining) as input and determines at which place to perform scaling.
Auto scaling system.

The method according to claim 1, wherein the system,
SLO (Service Level Objectives) of SFC (Service Function Chaining) through Equation 2, the model of the reward value obtained when performing the scaling action output in a specific state through the Deep Q-networks (DQN) device It is defined according to the number of physical servers on which the traffic average response time compared to the value, the distribution of virtual network function instances, and the virtual network function instances constituting SFC (Service Function Chaining) are deployed,
Whether the Virtual Network Functions (VNF) instances constituting SFC (Service Function Chaining) are densely deployed on a small number of physical servers and the performance (traffic response time) of SFC (Service Function Chaining) to define the compensation value. ,
Auto scaling system.
(Equation 2)

only,

is a physical server on which at least one VNF instance is deployed,

The method according to claim 4, wherein the system,
Compensation value is effectively calculated by considering whether virtual network function instances constituting SFC (Service Function Chaining) are densely placed on a small number of physical servers and the traffic response time, which is the performance of SFC (Service Function Chaining). defining,
Auto scaling system.

The method according to claim 1, wherein the system,
Selecting the SFC (Service Function Chaining) layer to be auto-scaling through Equation 3,
Auto scaling system.
(Equation 3)

In an auto-scaling method of periodically performing scale-in/out of VNF (Virtual Network Functions) instances constituting SFC (Service Function Chaining) through a Deep Q-networks (DQN) device ,
After defining the status of the layers constituting the SFC (Service Function Chaining) as the state of reinforcement learning through the Deep Q-networks (DQN) device and accepting it as an input value, Outputting as an action whether to perform scale-in/out in which physical server or to maintain current VNF (Virtual Network Functions) instances; and
when performing scale-in/out, selecting a service function chaining (SFC) layer requiring scaling and applying scaling to the corresponding layer; containing,
Auto-scaling method.

The method according to claim 7, wherein the method comprises:
Agent utilizes Q-network, Target Q-network, and Replay Memory for reliable training of Deep Q-networks (DQN) devices.
Auto-scaling method.

The method according to claim 7, wherein the method comprises:
Deep Q-networks (DQN) devices repeatedly learn Q-value, an index that predicts the reward that can be obtained when performing an action in a specific state of Q-learning do,
The learned queue value (Q-value) is used as a policy to determine which action to perform in a specific state.
Auto-scaling method.

The method according to claim 7, wherein the method comprises:
The Deep Q-networks (DQN) device updates the network parameters of the queue network (Q-network), which outputs the action to be performed in a specific state through learning,
The learned queue network (Q-network) is used as an optimal policy to perform optimal scaling behavior.
Auto-scaling method.

The method according to claim 7, wherein the method comprises:
Deep Q-networks (DQN) device creates a queue network (Q-network) and a target queue network (Target Q-network),
Learning the network parameters in a form that minimizes the loss function value of Equation 1,
Equation 1 is a loss function commonly used for training in Deep Q-networks (DQN) devices, and the data stored in Replay Memory

is input as training data,
Target Q-network (network parameters)

) and the maximum queue value (Q-value) that can be obtained from the queue network (Q-network) (network parameter

) to learn the Q-network in the direction of reducing the difference in Q-value,
When the queue network (Q-network) is learned more than a certain number of times, the network parameters of the queue network (Q-network) are copied to the target queue network (Target Q-network).
Auto-scaling method.
(Equation 1)

The method according to claim 7, wherein the method comprises:
Defined by the reward model of Equation 2 in which the reward value (r) for the action performed by the deep Q-networks (DQN) device in the learning process is reflected,
Auto-scaling method.
(Equation 2)

only,

is a physical server on which at least one VNF instance is deployed,

The method of claim 12, wherein the method comprises:
in Equation 2

means the response time it takes to transmit traffic through SFC (Service Function Chaining) that has performed scaling and receive a response,
In the DQN-based auto-scaling method, the response time of traffic is utilized as SLO (Service Level Objectives),
Measured response time

Converts the response time to a ratio of the predefined SLO (Service Level Objectives) and reflects it in the reward value.
Auto-scaling method.
(Equation 2)

only,

is a physical server on which at least one VNF instance is deployed,

The method of claim 13, wherein the method comprises:

is the total number of physical servers available in the NFV environment (

) versus the number of physical servers on which Virtual Network Functions (VNF) instances that make up SFC are deployed (

) is the ratio of

is a value indicating the distribution of VNF instances constituting SFC,
The compensation model of Equation 2 is

class

is reflected in the reward (r),
here,

is the total number of physical servers available in the NFV environment (

) compared to the number of physical servers (

) means the ratio of

is a value indicating the distribution of VNF instances constituting SFC (Service Function Chaining), and the total number of VNF instances constituting SFC (Service Function Chaining) (

) versus the number of Virtual Network Functions (VNF) instances running on each physical server where the Virtual Network Functions (VNF) instances are deployed (

), which means multiplying the ratio of
Auto-scaling method.

The method of claim 12, wherein the method comprises:
The reward model of Equation 2 is an exponential function

Wow

weight the

Wow

is corrected and reflected, and the response time of the traffic flowing through SFC (Service Function Chaining) is considered in the compensation (r),
Auto-scaling method.

The method according to claim 7, wherein the method comprises:
When the agent determines that scaling is necessary in the current state, selecting a layer to which scaling is applied by Equation 3,
Auto-scaling method.
(Equation 3)

The method according to claim 7, wherein the method comprises:
Equation 3 calculates a score for each layer, selects the layer with the highest score as the layer to scale,
The score for each tier is

and function

It is calculated as the product of the resulting values,
double,

calibrates the score by assigning 0 if scaling is not possible within the layer and 1 if possible,

is a function indicating the degree to which the function of each layer is suitable for scale-in/out,
Auto-scaling method.

The method according to claim 17, wherein the method comprises:
The tier to which scaling is applied depends on the CPU usage (

) and memory usage (

) based on

, and each weighted

Wow

is corrected with

is the number of physical servers (

) taking into account the ratio of
Equation 3 is

,

and exponential functions to indicate whether each layer is suitable for scaling as a score,
Auto-scaling method.

The method according to claim 7, wherein the method comprises:
Assuming that a monitoring function exists to obtain data used to define compensation,
SFC (Service Function Chaining) data, VNF (Virtual Network Functions) instance installation location data, and physical server data can be imported by using the monitoring tool provided by the NFV environment in which the VNF operates.
The resource utilization rate of each VNF (Virtual Network Functions) instance can be secured by installing an open source monitoring agent Collectd, monitoring it periodically, and then storing it in a time series database.
Auto-scaling method.

The method according to claim 7, wherein the method comprises:
The performance is verified by showing that the auto-scaling performed has better performance compared to the threshold-based auto-scaling method,
Measuring and utilizing the SLO violation rate of SFC (Service Function Chaining) that is auto-scaled as a performance indicator.
Auto-scaling method.