KR20240077270A

KR20240077270A - Integrated monitoring system and method for multi-cluster

Info

Publication number: KR20240077270A
Application number: KR1020220159608A
Authority: KR
Inventors: 김명진; 권경민; 박진휘; 최수녕
Original assignee: 주식회사 이노그리드
Priority date: 2022-11-24
Filing date: 2022-11-24
Publication date: 2024-05-31

Abstract

멀티 클러스터를 위한 통합 모니터링 시스템 및 방법이 개시된다. 본 발명의 일측면에 따른 멀티 클러스터를 위한 통합 모니터링 시스템은, 모니터링 관련 정보를 생성하여 외부로 푸쉬하는 에이전트가 설치되는 복수개의 클러스터; 클러스터들이 푸쉬 방식으로 전송한 모니터링 매트릭이 임시 저장되는 임시저장소; 및 클러스터로 에이전트를 배포하고 모니터링 매트릭을 수집하여 통합적으로 모니터링 정보를 관리하는 모니터링 서버를 포함하되, 모니터링 서버는, 배포된 각 에이전트의 상태 체크에 필요한 메타데이터를 저장하기 위한 메타저장소; 메타데이터를 이용하여 각 에이전트에 대한 정상 작동 여부를 확인하여 정상 에이전트에 대응되는 각각의 콜렉터를 생성하는 콜렉터 매니저; 임시저장소에 저장된 모니터링 매트릭을 각 콜렉터가 수집하여 모니터링 정보로서 저장하기 위한 모니터링정보 저장소; 및 모니터링 매트릭이 수집되지 않는 콜렉터가 존재하는 경우, 비정상 에이전트에 대한 메타데이터 정보를 수정하여 메타데이터 저장소에 저장하는 메타데이터 매니저를 포함한다.An integrated monitoring system and method for multi-cluster are disclosed. An integrated monitoring system for multi-clusters according to one aspect of the present invention includes a plurality of clusters in which an agent that generates monitoring-related information and pushes it to the outside is installed; Temporary storage where monitoring metrics sent by clusters by push method are temporarily stored; and a monitoring server that distributes agents in a cluster, collects monitoring metrics, and manages monitoring information in an integrated manner, wherein the monitoring server includes: a metarepository for storing metadata necessary for checking the status of each deployed agent; a collector manager that uses metadata to check whether each agent is operating normally and creates each collector corresponding to a normal agent; A monitoring information storage for each collector to collect monitoring metrics stored in temporary storage and store them as monitoring information; and a metadata manager that modifies metadata information about abnormal agents and stores them in a metadata repository when there is a collector for which monitoring metrics are not collected.

Description

Integrated monitoring system and method for multi-cluster}

본 발명은 멀티 클러스터를 위한 통합 모니터링 시스템 및 방법에 관한 것이다.The present invention relates to an integrated monitoring system and method for multi-cluster.

기존 쿠버네티스 멀티 클러스터 환경 모니터링 시스템의 한계로서 멀티 클러스터 구축 시 클러스터 별 모니터링 시스템 구축 및 모니터링 시스템 간 연동이 필요하다. 또한, 퍼블릭 클라우드 서비스 제공자들은 분산 구조의 모니터링 시스템 연동을 위해 별도의 소프트웨어를 구축할 필요가 있다. 예를 들어, 각 클러스터에는 오픈소스인 프로메테우사(Prometheus)가 설치되고 중앙 모니터링 서버에는 타노스(Thanos)가 설치되는데, 각 클러스트로의 모니터링 정보를 통합하기 위한 별도의 통합 소프트웨어의 개발 및 설치가 필요하다. 즉, 퍼블릭 클라우드 서비스 제공자의 모니터링 시스템 사용자는 다른 CSP(Cloud Service Provider)의 쿠버네티스 서비스에 대해 추가적인 모니터링 시스템 구축이 필요하다. As a limitation of the existing Kubernetes multi-cluster environment monitoring system, when building a multi-cluster, it is necessary to build a monitoring system for each cluster and link between monitoring systems. Additionally, public cloud service providers need to build separate software to link distributed monitoring systems. For example, the open source Prometheus is installed in each cluster and Thanos is installed on the central monitoring server. Development and installation of separate integration software to integrate monitoring information into each cluster. is needed. In other words, users of the public cloud service provider's monitoring system need to build an additional monitoring system for the Kubernetes service of another CSP (Cloud Service Provider).

대한민국 등록특허 제10-1987664 클라우드 플랫폼에서 복수의 클러스터 및 어플리케이션을 모니터링하는 방법Republic of Korea Patent No. 10-1987664 Method for monitoring multiple clusters and applications on a cloud platform

따라서, 본 발명은 상술한 문제점을 해결하기 위해 안출된 것으로서, 멀티 클러스터 환경에서 CSP에 종속적이지 않으면서도, 중앙 집중형 모니터링 시스템 기반 별도의 소프트웨어 연동이 필요 없는 유연한 통합 모니터링 시스템 및 방법을 제공하기 위한 것이다.Therefore, the present invention was devised to solve the above-mentioned problems, and is intended to provide a flexible integrated monitoring system and method that is not dependent on CSP in a multi-cluster environment and does not require separate software linkage based on a centralized monitoring system. will be.

또한, 본 발명은 푸쉬 기반으로 멀티 클러스터의 상황에 따른 안정된 모니터링 정보의 통합적인 관리를 위한 멀티 클러스터를 위한 통합 모니터링 시스템 및 방법을 제공하기 위한 것이다.In addition, the present invention is to provide an integrated monitoring system and method for multi-clusters for integrated management of stable monitoring information according to the situation of the multi-clusters on a push basis.

본 발명의 다른 목적들은 이하에 서술되는 바람직한 실시예를 통하여 보다 명확해질 것이다.Other objects of the present invention will become clearer through the preferred embodiments described below.

본 발명의 일 측면에 따르면, 모니터링 관련 정보를 생성하여 외부로 푸쉬하는 에이전트가 설치되는 복수개의 클러스터; 클러스터들이 푸쉬 방식으로 전송한 모니터링 매트릭이 임시 저장되는 임시저장소; 및 상기 클러스터로 에이전트를 배포하고 상기 모니터링 매트릭을 수집하여 통합적으로 모니터링 정보를 관리하는 모니터링 서버를 포함하되, 상기 모니터링 서버는, 배포된 각 에이전트의 상태 체크에 필요한 메타데이터를 저장하기 위한 메타저장소; 상기 메타데이터를 이용하여 각 에이전트에 대한 정상 작동 여부를 확인하여 정상 에이전트에 대응되는 각각의 콜렉터를 생성하는 콜렉터 매니저; 상기 임시저장소에 저장된 모니터링 매트릭을 각 콜렉터가 수집하여 모니터링 정보로서 저장하기 위한 모니터링데이터 저장소; 및 모니터링 매트릭이 수집되지 않는 콜렉터가 존재하는 경우, 비정상 에이전트에 대한 메타데이터 정보를 수정하여 상기 메타데이터 저장소에 저장하는 메타데이터 매니저를 포함하는, 멀티 클러스터를 위한 통합 모니터링 시스템이 제공된다.According to one aspect of the present invention, a plurality of clusters in which an agent that generates monitoring-related information and pushes it to the outside is installed; Temporary storage where monitoring metrics sent by clusters by push method are temporarily stored; and a monitoring server that distributes agents to the cluster, collects the monitoring metrics, and manages monitoring information in an integrated manner, wherein the monitoring server includes: a metarepository for storing metadata necessary for checking the status of each deployed agent; a collector manager that checks whether each agent is operating normally using the metadata and creates each collector corresponding to a normal agent; a monitoring data storage for each collector to collect and store monitoring metrics stored in the temporary storage as monitoring information; and a metadata manager that modifies metadata information about an abnormal agent and stores it in the metadata repository when there is a collector for which monitoring metrics are not collected.

여기서, 상기 모니터링 서버로부터 각 클러스터의 에이전트에 대한 정보를 제공받은 각 에이전트는 네트워크 환경, 가용한 프로세스 정보를 포함하는 상태정보를 서로 공유하고, 상기 상태정보를 기반으로 모니터링 관련정보의 푸쉬 스케줄을 설정할 수 있다.Here, each agent that receives information about the agents of each cluster from the monitoring server shares status information including network environment and available process information with each other, and sets a push schedule for monitoring-related information based on the status information. You can.

또한, 각 에이전트 중 마스터로 설정된 에이전트는 상기 푸쉬 스케줄에 대한 정보를 상기 모니터링 서버로 제공하며, 모니터링 서버는 상기 푸쉬 스케줄에 따라 생성할 콜렉터 및 생성 순서를 결정할 수 있다.Additionally, among each agent, the agent set as the master provides information about the push schedule to the monitoring server, and the monitoring server can determine the collector to be created and the creation order according to the push schedule.

또한, 상기 상태정보를 기반으로 상기 마스터를 동적으로 결정될 수 있다.Additionally, the master may be dynamically determined based on the status information.

또한, 각 에이전트는 타 에이전트로부터 취득되는 상기 상태정보를 자신의 모니터링 관련정보로서 포함하여 상기 모니터링 서버로 푸쉬할 수 있다.Additionally, each agent can include the status information acquired from other agents as its own monitoring-related information and push it to the monitoring server.

또한, 상기 메타데이터 매니저는 모니터링 매트릭이 수집되지 않은 에이전트에 대한 메타데이터의 정보 변경에 상기 상태정보를 참조할 수 있다.Additionally, the metadata manager can refer to the status information to change metadata information for agents for which monitoring metrics have not been collected.

본 발명의 다른 측면에 따르면, 푸쉬 방식으로 모니터링 관련정보를 전송하도록 하는 에이전트를 복수개의 클러스터로 배포하는 모니터링 서버에서의 통합 모니터링 방법에 있어서, 배포된 각 에이전트의 상태 체크에 필요한 메타데이터를 이용하여 각 에이전트에 대한 정상 작동 여부를 확인하여 정상 에이전트에 대응되는 각각의 콜렉터를 생성하는 단계; 클러스터들이 푸쉬 방식으로 전송한 모니터링 매트릭이 임시 저장되는 임시저장소에 저장된 모니터링 매트릭을 상기 콜렉터가 수집하여 모니터링 정보로서 저장하는 단계; 및 모니터링 매트릭이 수집되지 않는 콜렉터가 존재하는 경우, 비정상 에이전트에 대한 메타데이터 정보를 수정하여 저장하는 단계를 포함하는, 멀티 클러스터를 위한 통합 모니터링 방법 및 그 방법을 실행하는 컴퓨터 프로그램이 제공된다.According to another aspect of the present invention, in an integrated monitoring method in a monitoring server that distributes agents that transmit monitoring-related information in a push method to a plurality of clusters, metadata necessary for checking the status of each deployed agent is used. Checking whether each agent is operating normally and creating each collector corresponding to a normal agent; Collecting, by the collector, monitoring metrics stored in a temporary storage where monitoring metrics transmitted by clusters in a push manner are temporarily stored and storing them as monitoring information; and a step of modifying and storing metadata information about an abnormal agent when there is a collector in which monitoring metrics are not collected. An integrated monitoring method for a multi-cluster and a computer program executing the method are provided.

여기서, 각 클러스터에 설치된 에이전트들이 각각의 네트워크 환경, 가용한 프로세스 정보를 포함하는 상태정보를 서로 공유하여 모니터링 관련정보의 푸쉬 스케줄을 설정하도록, 각 클러스터의 에이전트에게 서로의 정보를 제공할 수 있다.Here, information can be provided to the agents of each cluster so that the agents installed in each cluster can set a push schedule for monitoring-related information by sharing status information including each network environment and available process information.

또한, 각 에이전트 중 마스터로 설정된 에이전트로부터 상기 푸쉬 스케줄에 대한 정보를 취득하며, 상기 푸쉬 스케줄에 따라 생성할 콜렉터 및 생성 순서를 결정할 수 있다.In addition, information about the push schedule can be obtained from the agent set as the master among each agent, and the collector to be created and the generation order can be determined according to the push schedule.

또한, 각 에이전트는 타 에이전트로부터 취득되는 상기 상태정보를 자신의 모니터링 관련정보로서 포함하여 푸쉬함에 따라, 모니터링 매트릭이 수집되지 않은 에이전트에 대한 메타데이터의 정보 변경에 상기 상태정보를 참조할 수 있다.Additionally, as each agent pushes the status information acquired from other agents as its own monitoring-related information, the status information can be referred to when changing metadata information for agents for which monitoring metrics have not been collected.

전술한 것 외의 다른 측면, 특징, 이점이 이하의 도면, 특허청구범위 및 발명의 상세한 설명으로부터 명확해질 것이다.Other aspects, features and advantages in addition to those described above will become apparent from the following drawings, claims and detailed description of the invention.

본 발명에 따르면, 멀티 클러스터 환경에서 CSP에 종속적이지 않으면서도, 중앙 집중형 모니터링 시스템 기반 별도의 소프트웨어 연동이 필요없는 유연한 통합 모니터링 시스템 및 방법을 제공할 수 있다. According to the present invention, it is possible to provide a flexible integrated monitoring system and method that is not dependent on CSP in a multi-cluster environment and does not require separate software linkage based on a centralized monitoring system.

또한, 본 발명에 따르면, 멀티 클러스터 간의 연동에 따른 각 상황에 맞는 스케줄링으로 인해 푸쉬 기반으로 보다 안정된 모니터링 정보의 통합적인 관리가 가능해진다. In addition, according to the present invention, more stable integrated management of monitoring information is possible on a push basis due to scheduling tailored to each situation according to interconnection between multi-clusters.

도 1은 종래의 멀티 클러스터 통합 모니터링을 위한 소프트웨어 설치를 예시한 도면.
도 2는 본 발명의 일 실시예에 따른 다중 CSP를 활용한 멀티 클러스터를 위한 통합 모니터링 방식을 도시한 예시도.
도 3은 본 발명의 일 실시예에 따른 멀티 클러스터를 위한 통합 모니터링 시스템의 구성을 도시한 블록도.
도 4 및 도 5는 본 발명의 각 실시예에 따른 통합 모니터링 시스템에서의 개략적인 모니터링 과정을 도시한 흐름도 및 상세 예시도.
도 6은 본 발명의 일 실시예에 따른 클러스터에서 수행되는 푸쉬 스케줄을 설정하여 모니터링 정보를 푸쉬하는 과정을 도시한 흐름도.
도 7은 본 발명의 일 실시예에 따른 모니터링 서버에서 수행되는 푸쉬스케줄과 상태정보를 활용한 멀티 클러스터 모니터링 과정을 도시한 흐름도.1 is a diagram illustrating software installation for conventional multi-cluster integrated monitoring.
Figure 2 is an example diagram showing an integrated monitoring method for multi-cluster using multiple CSPs according to an embodiment of the present invention.
Figure 3 is a block diagram showing the configuration of an integrated monitoring system for multi-cluster according to an embodiment of the present invention.
4 and 5 are flowcharts and detailed illustrations showing a schematic monitoring process in an integrated monitoring system according to each embodiment of the present invention.
Figure 6 is a flowchart showing a process for pushing monitoring information by setting a push schedule performed in a cluster according to an embodiment of the present invention.
Figure 7 is a flowchart showing a multi-cluster monitoring process using a push schedule and status information performed in a monitoring server according to an embodiment of the present invention.

본 발명은 다양한 변경을 가할 수 있고 여러 가지 실시예를 가질 수 있는 바, 특정 실시예들을 도면에 예시하고 상세한 설명에 상세하게 설명하고자 한다. 그러나 이는 본 발명을 특정한 실시 형태에 대해 한정하려는 것이 아니며, 본 발명의 사상 및 기술 범위에 포함되는 모든 변경, 균등물 내지 대체물을 포함하는 것으로 이해되어야 한다.Since the present invention can make various changes and have various embodiments, specific embodiments will be illustrated in the drawings and described in detail in the detailed description. However, this is not intended to limit the present invention to specific embodiments, and should be understood to include all changes, equivalents, and substitutes included in the spirit and technical scope of the present invention.

어떤 구성요소가 다른 구성요소에 "연결되어" 있다거나 "접속되어" 있다고 언급된 때에는, 그 다른 구성요소에 직접적으로 연결되어 있거나 또는 접속되어 있을 수도 있지만, 중간에 다른 구성요소가 존재할 수도 있다고 이해되어야 할 것이다. 반면에, 어떤 구성요소가 다른 구성요소에 "직접 연결되어" 있다거나 "직접 접속되어" 있다고 언급된 때에는, 중간에 다른 구성요소가 존재하지 않는 것으로 이해되어야 할 것이다. When a component is said to be "connected" or "connected" to another component, it is understood that it may be directly connected to or connected to the other component, but that other components may exist in between. It should be. On the other hand, when it is mentioned that a component is “directly connected” or “directly connected” to another component, it should be understood that there are no other components in between.

제1, 제2 등의 용어는 다양한 구성요소들을 설명하는데 사용될 수 있지만, 상기 구성요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용된다. 예를 들어, 후술될 제1 임계값, 제2 임계값 등의 용어는 실질적으로는 각각 상이하거나 일부는 동일한 값인 임계값들로 미리 지정될 수 있으나, 임계값이라는 동일한 단어로 표현될 때 혼동의 여지가 있으므로 구분의 편의상 제1, 제2 등의 용어를 병기하기로 한다. Terms such as first, second, etc. may be used to describe various components, but the components should not be limited by the terms. The above terms are used only for the purpose of distinguishing one component from another. For example, terms such as first threshold value and second threshold value, which will be described later, may be pre-designated as threshold values that are substantially different or partially the same, but may cause confusion when expressed with the same word threshold. Since there is room, for convenience of classification, terms such as first and second will be used together.

본 명세서에서 사용한 용어는 단지 특정한 실시예를 설명하기 위해 사용된 것으로, 본 발명을 한정하려는 의도가 아니다. 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 본 명세서에서, "포함하다" 또는 "가지다" 등의 용어는 명세서상에 기재된 특징, 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것이 존재함을 지정하려는 것이지, 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다.The terms used in this specification are merely used to describe specific embodiments and are not intended to limit the invention. Singular expressions include plural expressions unless the context clearly dictates otherwise. In this specification, terms such as “comprise” or “have” are intended to designate the presence of features, numbers, steps, operations, components, parts, or combinations thereof described in the specification, but are not intended to indicate the presence of one or more other features. It should be understood that this does not exclude in advance the possibility of the existence or addition of elements, numbers, steps, operations, components, parts, or combinations thereof.

또한, 각 도면을 참조하여 설명하는 실시예의 구성 요소가 해당 실시예에만 제한적으로 적용되는 것은 아니며, 본 발명의 기술적 사상이 유지되는 범위 내에서 다른 실시예에 포함되도록 구현될 수 있으며, 또한 별도의 설명이 생략될지라도 복수의 실시예가 통합된 하나의 실시예로 다시 구현될 수도 있음은 당연하다.In addition, the components of the embodiments described with reference to each drawing are not limited to the corresponding embodiments, and may be implemented to be included in other embodiments within the scope of maintaining the technical spirit of the present invention, and may also be included in separate embodiments. Even if the description is omitted, it is natural that a plurality of embodiments may be re-implemented as a single integrated embodiment.

또한, 첨부 도면을 참조하여 설명함에 있어, 도면 부호에 관계없이 동일한 구성 요소는 동일하거나 관련된 참조부호를 부여하고 이에 대한 중복되는 설명은 생략하기로 한다. 본 발명을 설명함에 있어서 관련된 공지 기술에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우 그 상세한 설명을 생략한다. In addition, when describing with reference to the accompanying drawings, identical or related reference numbers will be assigned to identical or related elements regardless of the drawing symbols, and overlapping descriptions thereof will be omitted. In describing the present invention, if it is determined that a detailed description of related known technologies may unnecessarily obscure the gist of the present invention, the detailed description will be omitted.

도 1은 종래의 멀티 클러스터 통합 모니터링을 위한 소프트웨어 설치를 예시한 도면이고, 도 2는 본 발명의 일 실시예에 따른 다중 CSP를 활용한 멀티 클러스터를 위한 통합 모니터링 방식을 도시한 예시도이다.FIG. 1 is a diagram illustrating the installation of software for conventional multi-cluster integrated monitoring, and FIG. 2 is an exemplary diagram illustrating an integrated monitoring method for multi-clusters using multiple CSPs according to an embodiment of the present invention.

먼저 도 1을 참조하면, 오픈소스를 활용한 종래의 멀티 클러스터 모니터링 시스템을 구축하기 위해서는, 클러스터에는 에이전트인 프로메테우스(Prometheus)를 설치하고 모니터링 서버에는 타노스(thanos)를 설치하여 모니터링을 수집하고, 또한 모니터링 서버에는 수집된 모니터링 정보의 통합을 위한 별도의 소프트웨어를 설치해야 한다.First, referring to Figure 1, in order to build a conventional multi-cluster monitoring system using open source, Prometheus, an agent, is installed in the cluster and Thanos is installed in the monitoring server to collect monitoring. , Additionally, separate software must be installed on the monitoring server to integrate the collected monitoring information.

이에 비해 본 발명의 실시예에 따른 도 2를 참조하면, 멀티 클러스터 환경에서 중앙 집중형 모니터링 시스템 기반 별도의 소프트웨어 연동 없는 유연한 통합 모니터링 시스템이 제공된다. In contrast, referring to FIG. 2 according to an embodiment of the present invention, a flexible integrated monitoring system without separate software linkage is provided based on a centralized monitoring system in a multi-cluster environment.

풀(PULL) 모니터링 매커니즘 기반의 쿠버네티스 모니터링 대비 멀티 클러스터 환경에 적합한 푸쉬(PUSH) 모니터링 매커니즘 기반 대규모 모니터링 기능을 지원한다. 그리고, 퍼블릭 클라우드 서비스 제공자들의 모니터링 솔루션과 달리 CSP에 종속적이지 않은 모든 쿠버네티스 클러스터에 대한 통합 모니터링 기능을 지원한다.Compared to Kubernetes monitoring based on the PULL monitoring mechanism, it supports large-scale monitoring based on the PUSH monitoring mechanism suitable for a multi-cluster environment. And, unlike monitoring solutions from public cloud service providers, it supports integrated monitoring functions for all Kubernetes clusters that are not dependent on CSP.

이하 본 발명의 멀티 클러스터를 위한 통합 모니터링 시스템 및 그 운영 방법에 대해 보다 상세히 설명한다.Hereinafter, the integrated monitoring system for multi-cluster of the present invention and its operating method will be described in more detail.

도 3은 본 발명의 일 실시예에 따른 멀티 클러스터를 위한 통합 모니터링 시스템의 구성을 도시한 블록도이다.Figure 3 is a block diagram showing the configuration of an integrated monitoring system for multi-cluster according to an embodiment of the present invention.

도 3을 참조하면, 전체 시스템은 복수개의 클러스터(10-1, ..., 10-n, 이하 10으로 통칭), 임시저장소(30) 및 모니터링 서버(50)를 포함한다. 모니터링 서버(50)는 메타저장소(51), 메타데이터 매니저(52), 콜렉터 매니저(53) 및 모니터링데이터 저장소(54)를 포함한다. Referring to FIG. 3, the entire system includes a plurality of clusters (10-1, ..., 10-n, hereinafter collectively referred to as 10), a temporary storage 30, and a monitoring server 50. The monitoring server 50 includes a meta storage 51, a meta data manager 52, a collector manager 53, and a monitoring data storage 54.

복수개의 각 클러스터(10)에는 모니터링 서버(50)에서 제공되는 에이전트가 설치된다. 에이전트는 모니터링 정보(예를 들어, CPU 사용량, 가용 저장용량, 네트워크 환경, 메모리 사용량, 실행 프로세스 정보 등, 당업자에게는 자명할 것이므로 상세한 설명은 생략)를 생성하여 푸쉬 방식으로 통신망을 통해 임시저장소로 전송한다.An agent provided by the monitoring server 50 is installed in each of the plurality of clusters 10. The agent generates monitoring information (e.g., CPU usage, available storage capacity, network environment, memory usage, execution process information, etc., which will be obvious to those skilled in the art, so detailed descriptions are omitted) and transmits it to temporary storage through the communication network in a push manner. do.

임시저장소(30)는 예를 들어 카푸카 브로커(kafka Broker)일 수 있다. 클러스터 에이전트는 카푸카 브로커로 모니터링 정보를 푸쉬 형태로 전송한다. 카프카 브로커는 일반적으로 '카프카'라고 불리는 시스템을 말하는데, 프로듀서와 컨슈머는 별도의 애플리케이션으로 구성되는 반면, 브로커는 카프카 자체이기 때문이다. 따라서 '카프카를 구성한다' 혹은 '카프카를 통해 메시지를 전달한다'에서 카프카는 브로커를 의미한다. 카프카 브로커는 당업자에게는 자명할 것이므로 더욱 상세한 설명은 생략한다.The temporary storage 30 may be, for example, a Kafka Broker. The cluster agent sends monitoring information to Kapuka Broker in push form. Kafka Broker refers to a system commonly referred to as 'Kafka' because the Producer and Consumer are composed of separate applications, while the Broker is Kafka itself. Therefore, in 'constructing Kafka' or 'delivering messages through Kafka', Kafka means a broker. The Kafka broker will be self-evident to those skilled in the art, so a more detailed explanation will be omitted.

모니터링 서버(50)는 클러스터(10)로 에이전트를 배포하고 임시저장소(30)에 저장된 모니터링 매트릭을 수집하여 통합적으로 모니터링 정보를 저장 및 관리한다. The monitoring server 50 distributes agents to the cluster 10, collects monitoring metrics stored in the temporary storage 30, and stores and manages monitoring information in an integrated manner.

모니터링 서버(50)의 메타자장소에는 배포된 각 에이전트의 상태 체크에 필요한 메타데이터가 저장된다. 메타데이터를 참조하여 각 에이전트의 상태(활성 또는 비활성 상태, 정상작동 또는 비정상 작동 등)를 확인할 수 있다. Metadata necessary for checking the status of each deployed agent is stored in the metadata location of the monitoring server 50. You can check the status of each agent (active or inactive, normal or abnormal operation, etc.) by referring to the metadata.

메타데이터 매니저(52)는 이러한 에이전트들의 메타데이터를 관리한다. 예를 들어, 모니터링 매트릭이 수집되지 않는 콜렉터가 존재하는 경우, 비정상 에이전트에 대한 메타데이터 정보를 수정하여 메타데이터 저장소에 저장한다.The metadata manager 52 manages the metadata of these agents. For example, if there is a collector for which monitoring metrics are not collected, metadata information about abnormal agents is modified and stored in the metadata repository.

콜렉터 매니저(53)는 메타데이터를 이용하여 각 에이전트에 대한 정상 작동 여부를 확인하여 정상 에이전트에 대응되는 각각의 콜렉터를 생성한다. 생성된 콜렉터들은 대응된 클러스터(10)의 에이전트로부터 푸쉬되어 임시저장소(30)에 저장된 모니터링 매트릭을 수집한다.The collector manager 53 uses metadata to check whether each agent is operating normally and creates each collector corresponding to a normal agent. The created collectors are pushed from the agent of the corresponding cluster 10 and collect monitoring metrics stored in the temporary storage 30.

모니터링데이터 저장소(54)에는 각 콜렉터가 임시저장소로부터 수집한 모니터링 매트릭에 따른 모니터링 정보가 저장된다. The monitoring data storage 54 stores monitoring information according to the monitoring metrics collected by each collector from the temporary storage.

도 4 및 도 5는 본 발명의 각 실시예에 따른 통합 모니터링 시스템에서의 개략적인 모니터링 과정을 도시한 흐름도 및 상세 예시도이다.Figures 4 and 5 are flowcharts and detailed illustrations showing a schematic monitoring process in the integrated monitoring system according to each embodiment of the present invention.

도 4를 참조하면, 모니터링 서버(50)는 에이전트를 관리하고자 하는 각 클러스터(10)에게 배포한다(S410). Referring to FIG. 4, the monitoring server 50 distributes the agent to each cluster 10 to be managed (S410).

그러면, 클러스터(10)는 설치된 에이전트를 구동하여 주기적으로 모니터링 정보를 생성하며(S420), 모니터링 서버(50)는 배포된 각 에이전트에 따른 콜렉터를 생성한다(S430). 이때 최초에는 모든 클러스터(10)에 대한 콜렉터를 생성할 수 있으며, 차후 클러스터(10)로부터 수집되는 모니터링 정보에 따라 각 클러스터(10)에 대한 메타데이터를 생성 및 관리하며, 메타데이터에 따른 각 에이전트의 상태에 따라 콜렉터를 생성 여부 및 시기를 결정할 수 있다.Then, the cluster 10 runs the installed agent to periodically generate monitoring information (S420), and the monitoring server 50 creates a collector according to each deployed agent (S430). At this time, a collector can be initially created for all clusters 10, and later metadata for each cluster 10 is created and managed according to the monitoring information collected from the clusters 10, and each agent according to the metadata Depending on the status, you can decide whether and when to create a collector.

클러스터(10)가 생성한 모니터링 정보를 푸쉬 방식으로 전송하면(S440), 모니터링 서버(50)에서 생성된 콜렉터가 모니터링 정보를 수집하여 모니터링데이터 저장소(54)에 저장한다(S450).When the monitoring information generated by the cluster 10 is transmitted by the push method (S440), the collector created in the monitoring server 50 collects the monitoring information and stores it in the monitoring data storage 54 (S450).

만일 모니터링 매트릭이 수집되지 않는 콜렉터가 존재하는 경우, 해당 에이전트를 비정상 에이전트로서 처리하여 관련 메타데이터 정보를 수정한다(S460). 그리고, 예를 들어 일정한 시간 또는 일정한 횟수의 모니터링 수집이 경과하면 비정상 에이전트에 대해서도 콜렉터를 생성하여 모니터링 정보의 수집을 시도하고, 수집되는 경우 해당 에이전트에 대해서는 정상 상태로 메타데이터를 수정할 수 있다.If there is a collector for which monitoring metrics are not collected, the corresponding agent is treated as an abnormal agent and related metadata information is modified (S460). And, for example, when a certain time or a certain number of monitoring collections have elapsed, a collector is created for an abnormal agent to attempt to collect monitoring information, and if collected, the metadata for the agent can be modified to a normal state.

모니터링 서버(50)는 이러한 프로세스에 대한 상세 예시도를 도시한 도 5를 참조하면, 에이전트 배포에 필요한 CSP의 정보 및 에이전트 IP, 연결 정보 등의 설치 데이터를 사용자로부터 수신하고, 그에 따라 에이전트를 배포 한다. Referring to Figure 5, which shows a detailed example of this process, the monitoring server 50 receives installation data such as CSP information, agent IP, and connection information required for agent deployment from the user, and distributes the agent accordingly. do.

이후, 에이전트 상태 체크에 필요한 메타데이터를 저장하고, 메타데이터 저장소의 에이전트 정보(클러스터 개수)를 콜렉터 매니저(53)에게 전달한다.Afterwards, the metadata required to check the agent status is stored, and the agent information (number of clusters) in the metadata storage is delivered to the collector manager 53.

콜렉터 매니저(53)는 에이전트 정보 기반 콜렉터 생성을 계산(비정상 에이전트에 대해서는 콜렉터 삭제)한다. 이후 콜렉터 매니저(53)는 콜렉터 생성 및 Subscribe할 Topic(에이전트 정보)을 전달한다. The collector manager 53 calculates collector creation based on agent information (collector deletion for abnormal agents). Afterwards, the collector manager 53 creates a collector and delivers a Topic (agent information) to subscribe to.

생성된 콜렉터는 전달받은 Topic 기반 Kafka Broker에 저장된 모니터링 메트릭을 수집한다. 만일 데이터 수집이 5회 이상 비정상일 경우 메타데이터의 에이전트 상태를 변경한다. The created collector collects monitoring metrics stored in the Kafka Broker based on the received Topic. If data collection is abnormal more than 5 times, the agent status of the metadata is changed.

그리고, 정상 수집된 모니터링 메트릭을 모니터링 데이터 저장소에 저장하며, 비정상 에이전트에 대한 메타데이터 정보를 수정하여 메타데이터 저장소에 저장한다. Then, normally collected monitoring metrics are stored in the monitoring data storage, and metadata information about abnormal agents is modified and stored in the metadata storage.

본 실시예에 따르면, 모니터링 서버(50)에서 통합적으로 각 클러스터(10)에게 배포한 에이전트를 이용하여 푸쉬 방식으로 모니터링 정보를 수집하고, 각 클러스터(10)의 에이전트 상태를 관리함으로써, 안정된 모니터링이 가능하게 된다.According to this embodiment, monitoring information is collected in a push method using agents distributed to each cluster 10 in an integrated manner by the monitoring server 50, and stable monitoring is achieved by managing the agent status of each cluster 10. It becomes possible.

그리고, CSP(Cloud Service Provider) 별 쿠버네티스 클러스터 구축이 가능하게 되며, 멀티 클러스터 연동 및 관리 S/W 구축이 가능하게 된다. In addition, it becomes possible to build a Kubernetes cluster for each CSP (Cloud Service Provider), and build multi-cluster interconnection and management software.

이하에서는, 클러스터(10)의 에이전트들 간의 연동에 의한 보다 안정된 모니터링 정보의 수집 방식에 대해 설명하기로 한다.Below, a more stable method of collecting monitoring information through interworking between agents of the cluster 10 will be described.

도 6은 본 발명의 일 실시예에 따른 클러스터에서 수행되는 푸쉬 스케줄을 설정하여 모니터링 정보를 푸쉬하는 과정을 도시한 흐름도이고, 도 7은 본 발명의 일 실시예에 따른 모니터링 서버(50)에서 수행되는 푸쉬스케줄과 상태정보를 활용한 멀티 클러스터 모니터링 과정을 도시한 흐름도이다.Figure 6 is a flowchart showing the process of pushing monitoring information by setting a push schedule performed in a cluster according to an embodiment of the present invention, and Figure 7 is a flowchart showing the process performed by the monitoring server 50 according to an embodiment of the present invention. This is a flowchart showing the multi-cluster monitoring process using push schedule and status information.

먼저 도 6을 참조하면, 모니터링 서버(50)로부터 모든 클러스터(10)의 각 에이전트에 대한 정보(예를 들어, IP주소, 맥(MAC)주소, 통신방식, 네트워크 환경 등)를 취득한다(S610). First, referring to FIG. 6, information (e.g., IP address, MAC address, communication method, network environment, etc.) on each agent of all clusters 10 is acquired from the monitoring server 50 (S610) ).

그러면, 클러스터(10)는 타 클러스터(즉 에이전트)들과 통신하여 네트워크 환경, 가용 프로세스 정보 등을 포함하는 상태정보를 서로 공유한다(S620). 일례에 따르면, 모니터링 서버(50)로 제공하는 모니터링 정보의 일부 정보를 상태정보로서 포함할 수 있다. 그러면 공유된 상태정보는 타 클러스터에 의해 모니터링 서버(50)로 제공될 수 있는데, 만일 임의의 클러스터(10)가 비정상 상태인 경우 모니터링 정보를 모니터링 서버(50)로 제공할 수 없게 되는데, 미리 타 클러스터로 전송한 상태정보에 의해 모니터링 서버(50)는 일부의 모니터링 정보를 확인할 수 있게 되는 것이다. Then, the cluster 10 communicates with other clusters (i.e. agents) and shares status information including network environment and available process information (S620). According to one example, some of the monitoring information provided to the monitoring server 50 may be included as status information. Then, the shared status information can be provided to the monitoring server 50 by another cluster. If any cluster 10 is in an abnormal state, the monitoring information cannot be provided to the monitoring server 50. The monitoring server 50 can check some of the monitoring information based on the status information transmitted to the cluster.

공유된 각 클러스터(10)의 상태정보를 기반으로, 모니터링 정보를 푸쉬할 푸쉬 스케쥴을 설정한다. 예를 들어, 각 클러스터(10)의 푸쉬 주기, 순번(또는 전송 타임) 등이 푸쉬 스케줄로서 설정될 수 있다. 일례에 따르면, 어느 하나의 에이전트가 마스터 에이전트로서 설정되고, 마스터 에이전트가 각 클러스터(10)의 상태정보를 기반으로 각 에이전트들의 푸쉬 스케줄을 결정할 수 있다. 마스터 에이전트는 각각의 상태정보를 참조하여 동적으로 결정될 수 있는데, 예를 들어 네트워크 환경이 우수하고 실행중인 프로세스의 개수가 가장 적은 어느 하나가 선정될 수 있으며, 물론 이는 하나의 예시일뿐 다양한 규칙에 따라 상태정보를 기반으로 어느 하나가 동적으로 마스터 에이전트로 설정될 수 있을 것이다.Based on the shared status information of each cluster 10, a push schedule to push monitoring information is set. For example, the push cycle, turn number (or transmission time), etc. of each cluster 10 can be set as a push schedule. According to one example, one agent is set as the master agent, and the master agent can determine the push schedule of each agent based on the status information of each cluster 10. The master agent can be dynamically determined by referring to each status information. For example, the one with the best network environment and the smallest number of running processes can be selected. Of course, this is just an example and can be determined according to various rules. One may be dynamically set as the master agent based on status information.

설정된 푸쉬 스케줄 또한 모든 에이전트로 공유되며, 각 에이전트는 푸쉬 스케줄에 따라 모니터링 정보의 푸쉬 프로세스를 수행한다(S640).The set push schedule is also shared with all agents, and each agent performs the push process of monitoring information according to the push schedule (S640).

본 실시예에 따르면, 클러스터(10)들의 자신들의 상태에 따라 동적으로 푸쉬 스케줄을 설정함으로써, 보다 안정된 모니터링 정보의 푸쉬가 수행될 수 있게 된다.According to this embodiment, by dynamically setting a push schedule according to the clusters 10's own status, more stable pushing of monitoring information can be performed.

그리고, 이러한 푸쉬 스케줄에 대한 정보는 모니터링 서버(50)로 제공될 수 있으며, 일례에 따르면 푸쉬 스케줄 정보는 마스터 에이전트가 모니터링 정보와는 별개로 또는 모니터링 정보와 함께 푸쉬 형태로 모니터링 서버(50)로 전송할 수 있다. In addition, information about this push schedule can be provided to the monitoring server 50. According to one example, the push schedule information is sent to the monitoring server 50 in a push form by the master agent separately from the monitoring information or together with the monitoring information. Can be transmitted.

도 7을 참조하면, 모니터링 서버(50)는 푸쉬 스케줄을 수신하면(S710), 푸쉬 스케줄을 분석하여 생성할 콜렉터 및 생성 순서를 결정한다(S720).Referring to FIG. 7, when the monitoring server 50 receives the push schedule (S710), it analyzes the push schedule and determines the collector to be created and the creation order (S720).

이로 인해, 모니터링 서버(50)는 에이전트들의 메타정보뿐 아니라 에이전트들간의 통신에 의해 설정된 푸쉬 스케줄을 참조함으로써, 보다 정확히 필요한 콜렉터를 생성할 수 있으며, 또한 생성순서에 의해 보다 신속하고 정확한 모니터링 정보의 수집이 가능해진다.As a result, the monitoring server 50 can more accurately create the necessary collector by referring to the push schedule set by communication between agents as well as the meta information of the agents, and can also provide faster and more accurate monitoring information by the generation order. Collection becomes possible.

결정된 내용으로 콜렉터를 생성하여 모니터링 정보를 수집한다(S730).A collector is created with the determined contents to collect monitoring information (S730).

그리고, 모니터링 정보로서 함께 수집된 각 에이전트들의 상태정보를 이용하여 메타데이터의 정보 변경에 참조한다(S740). 예를 들어, 상태정보를 확인한 결과 임의의 클러스터의 통신환경이 열악한 상태인 경우, 해당 클러스터에 따른 에이전트의 메타정보를 비정상 상태로 변경하는 것이다. In addition, the status information of each agent collected together as monitoring information is used to refer to changes in metadata information (S740). For example, if the communication environment of a certain cluster is poor as a result of checking the status information, the meta information of the agent for that cluster is changed to an abnormal state.

또한, 상술한 바와 같이, 모니터링 정보가 수집되지 않은 클러스터에 대해서는, 타 클러스터의 에이전트로부터 수집된 상태정보를 확인하여 일부를 모니터링 정보로서 활용할 수 있다. Additionally, as described above, for clusters for which monitoring information has not been collected, status information collected from agents of other clusters can be checked and some of it can be used as monitoring information.

상술한 본 발명에 따른 멀티 클러스터를 위한 통합 모니터링 방법을 수행하도록 하는 컴퓨터-판독 가능 매체에 저장된 컴퓨터 프로그램이 제공될 수 있다. A computer program stored in a computer-readable medium may be provided to perform the integrated monitoring method for multi-clusters according to the present invention described above.

또한, 상술한 멀티 클러스터를 위한 통합 모니터링 방법은 컴퓨터로 읽을 수 있는 기록 매체에 컴퓨터가 읽을 수 있는 코드로서 구현되는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체로는 컴퓨터 시스템에 의하여 해독될 수 있는 데이터가 저장된 모든 종류의 기록 매체를 포함한다. 예를 들어, ROM(Read Only Memory), RAM(Random Access Memory), 자기 테이프, 자기 디스크, 플래쉬 메모리, 광 데이터 저장장치 등이 있을 수 있다. 또한, 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 통신망으로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 읽을 수 있는 코드로서 저장되고 실행될 수 있다. Additionally, the integrated monitoring method for multi-cluster described above can be implemented as computer-readable code on a computer-readable recording medium. Computer-readable recording media include all types of recording media storing data that can be deciphered by a computer system. For example, there may be Read Only Memory (ROM), Random Access Memory (RAM), magnetic tape, magnetic disk, flash memory, optical data storage device, etc. Additionally, the computer-readable recording medium can be distributed to computer systems connected through a computer communication network, and stored and executed as code that can be read in a distributed manner.

또한, 상기에서는 본 발명의 바람직한 실시예를 참조하여 설명하였지만, 해당 기술 분야에서 통상의 지식을 가진 자라면 하기의 특허 청구의 범위에 기재된 본 발명의 사상 및 영역으로부터 벗어나지 않는 범위 내에서 본 발명을 다양하게 수정 및 변경시킬 수 있음을 이해할 수 있을 것이다.In addition, although the present invention has been described above with reference to preferred embodiments, those skilled in the art will understand the present invention without departing from the spirit and scope of the present invention as set forth in the claims below. You will understand that it can be modified and changed in various ways.

10 : 클러스터
30 : 임시저장소
50 : 모니터링 서버10: Cluster
30: Temporary storage
50: Monitoring server

Claims

A plurality of clusters where agents that generate monitoring-related information and push it to the outside are installed;
Temporary storage where monitoring metrics sent by clusters by push method are temporarily stored; and
A monitoring server that distributes agents to the cluster, collects the monitoring metrics, and manages monitoring information in an integrated manner,
The monitoring server is,
Metarepository for storing metadata required for checking the status of each deployed agent;
a collector manager that checks whether each agent is operating normally using the metadata and creates each collector corresponding to a normal agent;
a monitoring data storage for each collector to collect and store monitoring metrics stored in the temporary storage as monitoring information; and
An integrated monitoring system for a multi-cluster, including a metadata manager that modifies metadata information about abnormal agents and stores it in the metadata repository when there is a collector for which monitoring metrics are not collected.

In claim 1,
Each agent that receives information about the agents of each cluster from the monitoring server shares status information including network environment and available process information with each other, and sets a push schedule for monitoring-related information based on the status information. Integrated monitoring system for multi-cluster.

In claim 2,
Among each agent, the agent set as the master provides information about the push schedule to the monitoring server,
An integrated monitoring system for multi-cluster where the monitoring server determines the collectors to be created and the creation order according to the push schedule.

In claim 3,
An integrated monitoring system for multi-cluster in which the master is dynamically determined based on the status information.

In claim 2,
An integrated monitoring system for a multi-cluster in which each agent includes the status information acquired from other agents as its own monitoring-related information and pushes it to the monitoring server.

In claim 5,
The metadata manager refers to the status information when changing metadata information for agents for which monitoring metrics have not been collected. An integrated monitoring system for multi-clusters.

In an integrated monitoring method in a monitoring server that distributes an agent that transmits monitoring-related information in a push method to a plurality of clusters,
Confirming whether each agent is operating normally using metadata required to check the status of each deployed agent and creating each collector corresponding to a normal agent;
Collecting, by the collector, monitoring metrics stored in a temporary storage where monitoring metrics transmitted by clusters in a push manner are temporarily stored and storing them as monitoring information; and
An integrated monitoring method for multi-cluster, including the step of modifying and storing metadata information about abnormal agents when there is a collector for which monitoring metrics are not collected.

In claim 7,
For multi-clusters, the agents installed in each cluster share status information, including each network environment and available process information, to set a push schedule for monitoring-related information and provide each other's information to the agents of each cluster. Integrated monitoring method.

In claim 8,
An integrated monitoring method for a multi-cluster that acquires information about the push schedule from the agent set as the master among each agent, and determines the collector to be created and the creation order according to the push schedule.

In claim 8,
As each agent pushes the status information acquired from other agents as its own monitoring-related information,
An integrated monitoring method for multi-cluster that refers to the status information when changing metadata information for agents for which monitoring metrics have not been collected.

A computer program stored on a computer-readable medium for performing an integrated monitoring method for a multi-cluster, wherein the computer program causes a computer to perform the following steps, the steps comprising:
Confirming whether each agent is operating normally using metadata required to check the status of each deployed agent and creating each collector corresponding to a normal agent;
Collecting, by the collector, monitoring metrics stored in a temporary storage where monitoring metrics transmitted by clusters in a push manner are temporarily stored and storing them as monitoring information; and
A computer program stored in a computer-readable medium, comprising the step of modifying and storing metadata information about the abnormal agent when there is a collector for which monitoring metrics are not collected.