KR102376349B1

KR102376349B1 - Apparatus and method for automatically solving network failures based on automatic packet

Info

Publication number: KR102376349B1
Application number: KR1020210080081A
Authority: KR
Inventors: 김신규; 오현세
Original assignee: (주)소울시스템즈
Priority date: 2021-06-21
Filing date: 2021-06-21
Publication date: 2022-03-18
Also published as: WO2022270805A1

Abstract

The present invention is to provide a device or automatically solving a network failure based on automatic packet analysis and a method thereof. The method collects packet information on the network status, automatically determines whether there is a failure in a specific area through analyzed data, automatically resolves the network failure, provides a guide for accurately and quickly identifying and resolving various and complex network issues (performance, failure, etc.) with one click, allows any network operator to easily and conveniently manage the network, and provides a customized network management service tailored to user needs by interworking with various systems with information collection and analysis functions.

Description

Apparatus and method for automatically solving network failures based on automatic packet}

본 발명은 지능형 네트워크 관리 시스템에 관한 것으로, 특히 네트워크 상태에 대한 패킷 정보를 수집하고 분석된 데이터를 통해 특정 영역의 장애 여부를 자동으로 판단하고 네트워크 장애를 자동으로 해결하여, 다양하고 복잡한 네트워크 이슈(성능, 장애 등)에 대해 한 번의 클릭으로 정확하고 빠른 원인 파악 및 해결을 위한 가이드를 제공하고, 네트워크 운영자 누구나 쉽고 편리하게 네트워크를 관리할 수 있게 하며, 정보 수집 및 분석 기능으로 다양한 시스템과 연동하여 사용자 요구 맞춤형 커스터마이징된 네트워크 관리 서비스를 제공하기에 적당하도록 한 자동 패킷 분석 기반의 네트워크 장애 자동 해결 장치 및 그 방법에 관한 것이다.The present invention relates to an intelligent network management system, and in particular, collects packet information on network status, automatically determines whether there is a failure in a specific area through the analyzed data, and automatically resolves network failures to solve various and complex network issues ( performance, failure, etc.) with a single click, provides a guide for accurate and quick cause identification and resolution, enables any network operator to easily and conveniently manage the network, and works by linking with various systems with information collection and analysis functions. Disclosed are an apparatus and method for automatically resolving network failures based on automatic packet analysis suitable for providing customized network management services tailored to user needs.

일반적으로 지능형 네트워크 기술은 지능 기반의 4차 산업혁명 및 혁신성장을 위해 공통으로 사용될 네트워크 및 인프라 기술들을 총칭하며, 세부적으로는 SDN(Software-Defined Networking), NFV(Network Functions Virtualization), 네트워크 지능 기술, 저지연/시간-확정형 네트워크 기술, 양자정보통신 기술, 네트워크 구조 기술, 전달망 기술, 유무선 액세스 기술 등을 포괄적으로 포함한다.In general, intelligent network technology refers to networks and infrastructure technologies that will be commonly used for the 4th industrial revolution and innovative growth based on intelligence. Specifically, SDN (Software-Defined Networking), NFV (Network Functions Virtualization), and network intelligence technologies , low-latency/time-determined network technology, quantum information communication technology, network structure technology, transmission network technology, wired and wireless access technology, etc. are comprehensively included.

또한 네트워크 지능화 기술은 데이터의 자동 수집, 기계학습과 같은 인공지능 기술을 활용한 자율 의사 결정을 위한 피드백 등 일련의 절차를 반복하는 방식을 통해 네트워크 종단간 (재)설정, 제어, 관리 및 오케스트레이션 등의 기능을 자동적으로 수행하는 기술을 말한다.In addition, network intelligence technology repeats a series of procedures such as automatic data collection and feedback for autonomous decision-making using artificial intelligence technologies such as machine learning, so that end-to-end (re)setup, control, management and orchestration, etc. technology that automatically performs the functions of

이러한 지능형 네트워크의 의미는 시간이 지남에 따라 진화하고 있으며 주로 계산 및 알고리즘의 획기적인 발전으로 이어지고 있다.The meaning of these intelligent networks is evolving over time, mainly leading to breakthroughs in computation and algorithms.

종래기술로는 대한민국 등록특허 제 10-1998863 호의 '네트워크 장비의 통신장애관리와 유지관리를 위한 시스템', 대한민국 등록특허 제 10-2133001 호의 '네트워크 관리 장치, 네트워크 관리 시스템 및 네트워크 관리 방법' 등이 개시된 바 있다.As prior art, 'system for communication failure management and maintenance of network equipment' of Korean Patent Registration No. 10-1998863, and 'network management device, network management system and network management method' of Korean Patent Registration No. 10-2133001 are disclosed. has been disclosed.

네트워크가 중단되면 비즈니스의 중단으로 직결된다. 또한 네트워크 성능저하로 인한 업무처리 지연은 조직의 직접적인 손실로 연결된다. 네트워크가 한 번 중단될 경우의 평균 손실액은 미국의 경우 미화 402,542달러에 달한다고 답했다.(출처 : The Rise of AIOps: How Data, Machine Learning, and AI Will Transform Performance Monitoring, Appdynamics News, 2018.12.17.) 따라서 네트워크 중단 상황을 최소화할 필요가 있다.When the network goes down, it directly leads to business disruption. In addition, delay in business processing due to network performance degradation leads to direct loss of the organization. The average loss for a single network outage is $402,542 in the United States. (Source: The Rise of AIOps: How Data, Machine Learning, and AI Will Transform Performance Monitoring, Appdynamics News, 2018.12.17.) Therefore, it is necessary to minimize the network interruption situation.

네트워크의 성능을 평가하는 업타임 인스티튜트(Uptime Institute)는 공개적으로 보고된 네트워크 가동 정지 사례를 연구해왔다. 이를 보면, IT 장애 중 네트워크 장애는 2017년 19%에서 2018년 32%로 대폭 증가하였다. 따라서 네트워크 중단사태 발생 시 신속한 원인추적 및 해결방안을 제시할 수 있는 기술이 요구된다.The Uptime Institute, which evaluates the performance of networks, has studied publicly reported cases of network outages. Looking at this, network failure among IT failures increased significantly from 19% in 2017 to 32% in 2018. Therefore, in the event of a network outage, a technology capable of promptly tracing the cause and suggesting a solution is required.

종래의 네트워크 관리는 NMS(Network Management System), TMS(Traffic Management System), DPI(Data Packet Inspector) 및 패킷 분석기(Packet Analyzer) 등이 있다.Conventional network management includes a Network Management System (NMS), a Traffic Management System (TMS), a Data Packet Inspector (DPI), and a Packet Analyzer.

그러나 장비 및 회선 모니터링 중심의 NMS(Network Management System)는 복잡하게 얽힌 네트워크 이슈 해결에 한계가 있다. 또한 네트워크 트래픽 관리를 위한 TMS(Traffic Management System)는 페이로드(Payload)에 대한 심층분석을 지원하지 못하는 한계가 있다. 또한 DPI(Data Packet Inspector) 및 패킷 분석기(Packet Analyzer)는 매우 복잡하고 어려워 사용이 불편하며 고도의 전문성 필요로 하는 문제점이 있다.However, NMS (Network Management System) centered on equipment and line monitoring has limitations in solving complex network issues. In addition, TMS (Traffic Management System) for network traffic management has a limitation in not supporting in-depth analysis of payload. In addition, DPI (Data Packet Inspector) and packet analyzer (Packet Analyzer) are very complicated and difficult to use, and there is a problem that requires a high degree of expertise.

KRUS 10-199886310-1998863 B1B1 KRUS 10-213300110-2133001 B1B1

이에 본 발명은 상기와 같은 종래의 제반 문제점을 해결하기 위해 제안된 것으로, 본 발명의 목적은 네트워크 상태에 대한 패킷 정보를 수집하고 분석된 데이터를 통해 특정 영역의 장애 여부를 자동으로 판단하고 네트워크 장애를 자동으로 해결하여, 다양하고 복잡한 네트워크 이슈(성능, 장애 등)에 대해 한 번의 클릭으로 정확하고 빠른 원인 파악 및 해결을 위한 가이드를 제공하고, 네트워크 운영자 누구나 쉽고 편리하게 네트워크를 관리할 수 있게 하며, 정보 수집 및 분석 기능으로 다양한 시스템과 연동하여 사용자 요구 맞춤형 커스터마이징된 네트워크 관리 서비스를 제공할 수 있는 자동 패킷 분석 기반의 네트워크 장애 자동 해결 장치 및 그 방법을 제공하는 데 있다.Accordingly, the present invention has been proposed to solve the problems of the related art as described above, and an object of the present invention is to collect packet information on a network state, automatically determine whether a failure in a specific area exists through the analyzed data, and to determine a network failure automatically solves various and complex network issues (performance, failure, etc.) with a single click, provides a guide for accurate and quick cause identification and resolution, and enables any network operator to easily and conveniently manage the network. , to provide an automatic packet analysis-based automatic network failure resolution device and method that can provide customized network management services tailored to user needs by interworking with various systems with information collection and analysis functions.

도 1은 본 발명의 일 실시예에 의한 자동 패킷 분석 기반의 네트워크 장애 자동 해결 장치의 개념도이다.1 is a conceptual diagram of an apparatus for automatically resolving network failures based on automatic packet analysis according to an embodiment of the present invention.

이에 도시된 바와 같이, 네트워크의 관리를 수행하는 지능형 네트워크 관리 시스템(100)의 네트워크 장애 자동 해결 장치(110)에 있어서, 상기 네트워크 장애 자동 해결 장치(110)에서 네트워크 상태에 대한 패킷 정보를 수집하고 분석된 데이터를 통해 특정 영역의 장애 여부를 자동으로 판단하고 네트워크 장애를 자동으로 해결하여, 다양하고 복잡한 네트워크 이슈에 대해 한 번의 클릭으로 정확하고 빠른 원인 파악 및 해결을 위한 가이드를 제공하고, 네트워크 운영자 누구나 쉽고 편리하게 네트워크를 관리할 수 있게 하며, 정보 수집 및 분석 기능으로 다양한 시스템과 연동하여 사용자 요구 맞춤형 커스터마이징된 네트워크 관리 서비스를 제공하도록 제어하는 제어부(120)와; 상기 제어부(120)의 제어를 받고, 데이터 센터의 네트워크 장비(210) 또는 원격 지능형 네트워크 관리 장치(220)로부터 패킷 데이터를 네트워크 인터페이스 카드(Network Interface Card, NIC)를 통해 수신하고, 수신한 패킷 데이터를 하나의 데이터 스트림으로 묶어 정보묶음을 생성하기 위해 필요한 원시 데이터(raw data)를 생성하고, 원시 패킷 저장 버퍼에 저장하여, 상기 데이터 센터(210) 또는 상기 원격 지능형 네트워크 관리 장치(220)의 네트워크 패킷, SNMP TRAP, SYSLOG 정보를 포함한 데이터를 측정하는 패킷 캡쳐부(130)와; 상기 제어부(120)의 제어를 받고, 상기 패킷 캡쳐부(130)에서 수집한 패킷에 대한 메타데이터를 생성하고, 메타데이터에는 패킷 확인 시간, 패킷 크기, 세션 ID, 패킷 크기, MAC address 및 TCP 정보가 포함되며, 세션 정보묶음, BPS 정보묶음, PPS 정보묶음, RTT 정보묶음, 타임아웃 정보묶음, TCP 정보묶음, Remarks 정보묶음 및 이벤트 정보묶음을 생성하고, 각 정보묶음의 종류별로 동시에 데이터를 압축하여 저장하는 정보묶음 생성부(140)와; 상기 제어부(120)의 제어를 받고, 정보묶음 생성부(140)에서 생성된 정보묶음을 전달받고, 네트워크 관리에 필요한 성능지표를 생성하고, 성능지표에는 기본 성능지표와 추가 성능지표를 생성하는 성능지표 생성부(150)와; 상기 제어부(120)의 제어를 받고, 상기 성능지표 생성부(150)에서 생성한 성능지표를 바탕으로 정보묶음에서 사용할 정보 종류를 선택한 다음 네트워크의 성능을 분석하여 성능지표 분석결과를 생성하는 성능지표 분석부(160)와; 상기 성능지표 분석부(160)의 분석결과를 이용하여 특정 영역의 장애 여부를 자동으로 판단하고 네트워크 장애를 자동으로 해결하는 네트워크 장애처리부(170);를 포함하여 구성된 것을 특징으로 한다.As shown in this figure, in the automatic network failure resolution device 110 of the intelligent network management system 100 that manages the network, the network failure automatic resolution device 110 collects packet information on the network status and Through the analyzed data, it automatically determines whether there is a failure in a specific area and automatically resolves the network failure, providing a guide to accurately and quickly identifying and resolving various and complex network issues with one click, and network operators a control unit 120 that enables anyone to easily and conveniently manage a network, and controls to provide a customized network management service tailored to user needs by interworking with various systems with information collection and analysis functions; Under the control of the controller 120, packet data is received from the network equipment 210 of the data center or the remote intelligent network management device 220 through a network interface card (NIC), and the received packet data is combined into one data stream to generate raw data necessary to create an information bundle, and store it in a raw packet storage buffer, and the network of the data center 210 or the remote intelligent network management device 220 a packet capture unit 130 for measuring data including packets, SNMP TRAP, and SYSLOG information; Under the control of the control unit 120, metadata is generated for the packets collected by the packet capture unit 130, and the metadata includes packet confirmation time, packet size, session ID, packet size, MAC address, and TCP information. Session information bundle, BPS information bundle, PPS information bundle, RTT information bundle, timeout information bundle, TCP information bundle, Remarks information bundle, and event information bundle are created, and data is compressed simultaneously for each type of information bundle and an information bundle generating unit 140 for storing; Performance of receiving the control of the control unit 120, receiving the information bundle generated by the information bundle generating unit 140, generating performance indicators necessary for network management, and generating basic performance indicators and additional performance indicators in the performance indicators an indicator generating unit 150; A performance index that is controlled by the control unit 120, selects an information type to be used in an information bundle based on the performance index generated by the performance index generator 150, and then analyzes the network performance to generate a performance index analysis result an analysis unit 160; and a network failure processing unit 170 that automatically determines whether there is a failure in a specific area using the analysis result of the performance indicator analysis unit 160 and automatically resolves the network failure.

도 2는 도 1에서 본 발명이 적용되는 예를 보인 개념도이다.2 is a conceptual diagram illustrating an example to which the present invention is applied in FIG. 1 .

이에 도시된 바와 같이, 상기 네트워크 장애처리부(170)는, 만약 네트워크의 장애라고 판단될 경우 권고사항을 제시하거나 스스로 네트워크를 제어하고, 만약 스스로 네트워크를 제어하지 못하거나 내용을 허가받지 못한 경우 권고사항 형태로 해결방안을 제시하며, 만약 스스로 네트워크를 제어할 수 있는 경우 SSH로 접속하여 원격 shell command를 사용하여 네트워크의 특정 장비에 접속하여 설정 변경 또는 재부팅을 수행하는 것을 특징으로 한다.As shown in this figure, if it is determined that the network failure is a network failure, the network failure processing unit 170 presents recommendations or controls the network by itself, and if it is not possible to control the network by itself or the content is not permitted, the network failure processing unit 170 provides recommendations The solution is presented in the form of a solution, and if it is possible to control the network by itself, it is characterized in that it accesses a specific device in the network using a remote shell command by connecting via SSH to change settings or reboot.

상기 네트워크 장애처리부(170)는, 만약 네트워크에서 특정 hop을 거칠 경우 갑자기 속력이 느려지는 경우에는 해당 hop에서 packet loop이 예상된다고 판단하고, '케이블 배선 확인'이라는 권고사항을 전달하고; 만약 매일 특정 시간대만 되면 네트워크의 서비스가 느려지는 경우, NetFlow가 발견한 특정 위치에 대해서 QoS 적용하여 별도의 QoS 장비 또는 스위치에 접속하고, QoS 기능을 이용하여 해당 위치로부터 서버 요청이 폭주하지 않도록 트래픽의 총량을 조정하여 네트워크 장애를 자동 처리하고, 요청이 시간대별로 분산되도록 클라이언트의 네트워크 접속 시간을 조정하도록 권고하고, 필요한 추가 대역폭에 대한 제안을 포함하여 서버 및 네트워크 증설을 권고하는 것을 특징으로 한다.The network failure processing unit 170 determines that a packet loop is expected in the corresponding hop if the speed suddenly slows down when going through a specific hop in the network, and delivers a recommendation of 'check the cable wiring'; If network service is slow at a specific time every day, apply QoS to a specific location discovered by NetFlow and connect to a separate QoS device or switch, and use the QoS function to prevent congestion of server requests from that location It is characterized by automatically handling network failures by adjusting the total amount of network failure, recommending to adjust the client's network access time so that requests are distributed by time period, and recommending server and network expansion including suggestions for additional bandwidth required.

도 3은 본 발명의 일 실시예에 의한 자동 패킷 분석 기반의 네트워크 장애 자동 해결 방법을 보인 흐름도이다.3 is a flowchart illustrating a method for automatically resolving network failures based on automatic packet analysis according to an embodiment of the present invention.

이에 도시된 바와 같이, 지능형 네트워크 관리 시스템(100)의 네트워크 장애 자동 해결 장치(110)에서 네트워크 장애에 대한 자동 해결을 수행하면, 패킷 캡쳐부(130)는 데이터 센터의 네트워크 장비(210)에 대한 NetFlow 정보를 샘플링하거나 또는 원격 지능형 네트워크 관리 장치(220)로부터 패킷 데이터를 네트워크 인터페이스 카드(Network Interface Card, NIC)를 통해 수신하고, 수신한 패킷 데이터를 하나의 데이터 스트림으로 묶어 정보묶음을 생성하기 위해 필요한 원시 데이터(raw data)를 생성하고, 원시 패킷 저장 버퍼에 저장하여, 상기 데이터 센터(210) 또는 상기 원격 지능형 네트워크 관리 장치(220)의 네트워크 패킷, SNMP TRAP, SYSLOG 정보를 포함한 데이터를 측정하는 패킷 캡쳐단계(ST1)와; 상기 패킷 캡쳐단계 후, 정보묶음 생성부(140)는 수집한 패킷에 대한 메타데이터를 생성하고, 메타데이터에는 패킷 확인 시간, 패킷 크기, 세션 ID, 패킷 크기, MAC address 및 TCP 정보가 포함되며, 세션 정보묶음, BPS 정보묶음, PPS 정보묶음, RTT 정보묶음, 타임아웃 정보묶음, TCP 정보묶음, Remarks 정보묶음 및 이벤트 정보묶음을 생성하고, 각 정보묶음의 종류별로 동시에 데이터를 압축하여 저장하는 정보묶음 생성단계(ST2)와; 상기 정보묶음 생성단계 후, 성능지표 생성부(150)는 생성된 정보묶음을 전달받고, 네트워크 관리에 필요한 성능지표를 생성하고, 성능지표에는 기본 성능지표와 추가 성능지표를 생성하는 성능지표 생성단계(ST3)와; 상기 성능지표 생성단계 후, 성능지표 분석부(160)는 생성한 성능지표를 바탕으로 정보묶음에서 사용할 정보 종류를 선택한 다음 네트워크의 성능을 분석하여 성능지표 분석결과를 생성하는 성능지표 분석단계(ST4)와; 상기 성능지표 분석단계 후 네트워크 장애처리부(170)는 분석결과를 이용하여 특정 영역의 장애 여부를 자동으로 판단하고 네트워크 장애를 자동으로 해결하는 네트워크 장애처리단계(ST5);를 포함하여 수행하는 것을 특징으로 한다.As shown in this figure, if the automatic network failure resolution device 110 of the intelligent network management system 100 performs automatic resolution of network failures, the packet capture unit 130 provides for the network equipment 210 of the data center. To sample NetFlow information or receive packet data from the remote intelligent network management device 220 through a network interface card (NIC), and combine the received packet data into one data stream to create an information bundle Measuring data including network packets, SNMP TRAP, and SYSLOG information of the data center 210 or the remote intelligent network management device 220 by generating necessary raw data and storing it in a raw packet storage buffer a packet capture step (ST1); After the packet capture step, the information bundle generating unit 140 generates metadata for the collected packets, and the metadata includes packet confirmation time, packet size, session ID, packet size, MAC address, and TCP information, Information that creates a session information bundle, BPS information bundle, PPS information bundle, RTT information bundle, timeout information bundle, TCP information bundle, Remarks information bundle, and event information bundle, and compresses and stores data for each type of information at the same time a bundle creation step (ST2); After the information bundle generating step, the performance indicator generating unit 150 receives the generated information bundle, generates a performance indicator required for network management, and generates a basic performance indicator and an additional performance indicator in the performance indicator. (ST3) and; After the performance indicator generation step, the performance indicator analysis unit 160 selects the type of information to be used in the information bundle based on the generated performance indicator, and then analyzes the network performance to generate a performance indicator analysis result (ST4). )Wow; After the performance indicator analysis step, the network failure processing unit 170 automatically determines whether there is a failure in a specific area using the analysis result and automatically resolves the network failure network failure processing step (ST5); do it with

도 4는 도 3에서 네트워크 장애처리의 상세 흐름도이다.4 is a detailed flowchart of network failure processing in FIG. 3 .

이에 도시된 바와 같이, 상기 네트워크 장애처리단계는, NetFlow 정보와 패킷 정보를 분석한 데이터를 통해 특정 영역의 장애 여부를 판단하고, 어떤 장애처리를 수행할 것인지 판별하는 장애처리 판별단계(ST11, ST12)와; 상기 장애처리 판별단계에서 만약 네트워크의 장애라고 판단될 경우 권고사항을 제시하거나 스스로 네트워크를 제어하고, 만약 스스로 네트워크를 제어하지 못하거나 내용을 허가받지 못한 경우 권고사항 형태로 해결방안을 제시하며, 만약 스스로 네트워크를 제어할 수 있는 경우 SSH로 접속하여 원격 shell command를 사용하여 네트워크의 특정 장비에 접속하여 설정 변경 또는 재부팅을 수행하고, 네트워크 장애처리 결과를 제공하는 장애처리 수행단계(ST13, ST14);를 포함하여 수행하는 것을 특징으로 한다.As shown in this figure, the network failure processing step is a failure processing determination step (ST11, ST12) of determining whether a specific area has a failure through the data analyzed with NetFlow information and packet information, and determining which failure processing is to be performed. )Wow; If it is determined that it is a network failure in the failure processing determination step, a recommendation is presented or the network is controlled by itself, and if the network cannot be controlled by itself or the content is not authorized, a solution is presented in the form of a recommendation, and if If it is possible to control the network by itself, connect via SSH, access a specific device on the network using a remote shell command, change settings or reboot, and provide a network failure processing result (ST13, ST14); It is characterized in that it is carried out including

본 발명에 의한 자동 패킷 분석 기반의 네트워크 장애 자동 해결 장치 및 그 방법은 네트워크 상태에 대한 패킷 정보를 수집하고 분석된 데이터를 통해 특정 영역의 장애 여부를 자동으로 판단하고 네트워크 장애를 자동으로 해결하여, 다양하고 복잡한 네트워크 이슈(성능, 장애 등)에 대해 한 번의 클릭으로 정확하고 빠른 원인 파악 및 해결을 위한 가이드를 제공하고, 네트워크 운영자 누구나 쉽고 편리하게 네트워크를 관리할 수 있게 하며, 정보 수집 및 분석 기능으로 다양한 시스템과 연동하여 사용자 요구 맞춤형 커스터마이징된 네트워크 관리 서비스를 제공할 수 있는 효과가 있다.An apparatus and method for automatically solving network failures based on automatic packet analysis according to the present invention collect packet information on network status, automatically determine whether there is a failure in a specific area through the analyzed data, and automatically resolve network failures, It provides a guide for accurate and quick identification and resolution of various and complex network issues (performance, failure, etc.) with one click, and enables any network operator to easily and conveniently manage the network, and information collection and analysis function This has the effect of providing a customized network management service tailored to user needs by interworking with various systems.

또한 본 발명은 정보 수집에서 분석, 진단, 결과까지 하나의 시스템에서 운영(All-In-One)이 가능하고, 운영환경에 맞는 최적 시스템 선택 옵션 제공(Portable, Rack Mount, Rugged PC, Cloud 등)이 가능하며, 사전 설정작업 없이 즉시 사용(Zero Configuration)이 가능해진다.In addition, the present invention enables operation (All-In-One) in one system from information collection to analysis, diagnosis, and results, and provides an optimal system selection option suitable for the operating environment (Portable, Rack Mount, Rugged PC, Cloud, etc.) This is possible, and immediate use (Zero Configuration) is possible without pre-setting.

또한 본 발명은 일반 NIC(Network Interface Controller)를 이용한 패킷수집기술로 벤더에 의존하지 않는 장점이 있으며, L7 프로토콜 자동분류엔진 내재화로 사용자 환경에 영향받지 않는 장점이 있고, EMS, SIEM, NMS 등과 연동(Rest API) 가능한 장점이 있으며, 사용자 요구에 따른 커스터마이징 서비스가 가능한 효과가 있다.In addition, the present invention has the advantage of not relying on the vendor as a packet collection technology using a general NIC (Network Interface Controller), and has the advantage of not being affected by the user environment by internalizing the L7 protocol automatic classification engine, and interworking with EMS, SIEM, NMS, etc. (Rest API) There is a possible advantage, and there is an effect that a customized service according to the user's request is possible.

또한 본 발명은 동일 목적의 외국산 솔루션 가격 대비 약 1/4으로 저렴하면서, 동시에 MTTR(Mean time to repair, 평균장애복구시간)을 1/5 이상 줄여준 효과가 있다. 종래 기술은 시스템 설정정보 수집, 분석, 보고서 작성 문제 해결까지 약 1~2주의 시간이 소요된다. 반면 본 발명은 정보 수집, 분석, 보고서 작성, 문제 해결까지 약 2~3일 이내에 처리 가능한 장점이 있다. (여기서 문제해결 총 소요시간은 일반적 경험 값이며 문제 속성에 따라 다를 수 있다.) 본 발명은 네트워크 정보 수집을 위한 사전 준비(설정) 시간을 단축할 수 있으며, 분석을 통한 문제원인 확인 시간을 단축할 수 있다. 또한 문제 해결을 위한 조치 및 복구 시간을 단축할 수 있다. 또한 최종 보고서 작성시간을 단축할 수 있다.In addition, the present invention has the effect of reducing the mean time to repair (MTTR) by 1/5 or more while being cheaper at about 1/4 of the price of a foreign solution for the same purpose. In the prior art, it takes about 1 to 2 weeks to collect system setting information, analyze, and solve the report writing problem. On the other hand, the present invention has the advantage that it can be processed within about 2-3 days from information collection, analysis, report writing, and problem solving. (The total time required to solve the problem is a general empirical value and may vary depending on the nature of the problem.) The present invention can shorten the pre-preparation (setting) time for network information collection, and shorten the time to check the cause of the problem through analysis can do. It can also shorten the time to take action and recover from a problem. In addition, it is possible to shorten the preparation time of the final report.

또한 종래 기술의 경우, 네트워크 관리를 위해서 네트워크 및 솔루션 운영 전문가가 반드시 필요함에 반해, 본 발명은 초급 네트워크 엔지니어에 의해서도 운영이 가능한 장점이 있다.In addition, in the case of the prior art, a network and solution operation expert is absolutely necessary for network management, but the present invention has the advantage that it can be operated even by a beginner network engineer.

도 1은 본 발명의 일 실시예에 의한 자동 패킷 분석 기반의 네트워크 장애 자동 해결 장치의 개념도이다.
도 2는 도 1에서 본 발명이 적용되는 예를 보인 개념도이다.
도 3은 본 발명의 일 실시예에 의한 자동 패킷 분석 기반의 네트워크 장애 자동 해결 방법을 보인 흐름도이다.
도 4는 도 3에서 네트워크 장애처리의 상세 흐름도이다.1 is a conceptual diagram of an apparatus for automatically resolving network failures based on automatic packet analysis according to an embodiment of the present invention.
2 is a conceptual diagram illustrating an example to which the present invention is applied in FIG. 1 .
3 is a flowchart illustrating a method for automatically resolving network failures based on automatic packet analysis according to an embodiment of the present invention.
4 is a detailed flowchart of network failure processing in FIG. 3 .

이와 같이 구성된 본 발명에 의한 자동 패킷 분석 기반의 네트워크 장애 자동 해결 장치 및 그 방법의 바람직한 실시예를 첨부한 도면에 의거하여 상세히 설명하면 다음과 같다. 하기에서 본 발명을 설명함에 있어 관련된 공지 기능 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략할 것이다. 그리고 후술되는 용어들은 본 발명에서의 기능을 고려하여 정의된 용어들로서, 이는 사용자, 운용자의 의도 또는 판례 등에 따라 달라질 수 있으며, 이에 따라 각 용어의 의미는 본 명세서 전반에 걸친 내용을 토대로 해석되어야 할 것이다.A preferred embodiment of an automatic packet analysis-based automatic network failure resolution apparatus and method according to the present invention configured as described above will be described in detail with reference to the accompanying drawings. In the following description of the present invention, if it is determined that a detailed description of a related well-known function or configuration may unnecessarily obscure the gist of the present invention, the detailed description thereof will be omitted. And the terms to be described later are terms defined in consideration of functions in the present invention, which may vary depending on the intention or precedent of the user or operator, and accordingly, the meaning of each term should be interpreted based on the content throughout this specification. will be.

먼저 본 발명은 네트워크 상태에 대한 패킷 정보를 수집하고 분석된 데이터를 통해 특정 영역의 장애 여부를 자동으로 판단하고 네트워크 장애를 자동으로 해결하여, 다양하고 복잡한 네트워크 이슈(성능, 장애 등)에 대해 한 번의 클릭으로 정확하고 빠른 원인 파악 및 해결을 위한 가이드를 제공하고, 네트워크 운영자 누구나 쉽고 편리하게 네트워크를 관리할 수 있게 하며, 정보 수집 및 분석 기능으로 다양한 시스템과 연동하여 사용자 요구 맞춤형 커스터마이징된 네트워크 관리 서비스를 제공하고자 한 것이다.First, the present invention collects packet information on the network status, automatically determines whether there is a failure in a specific area through the analyzed data, and automatically resolves the network failure, so that various and complex network issues (performance, failure, etc.) It provides a guide for accurate and quick cause identification and resolution with one click, enables network operators to easily and conveniently manage the network, and links with various systems with information collection and analysis functions to provide customized network management service tailored to user needs was intended to provide.

네트워크의 관리를 수행하는 지능형 네트워크 관리 시스템(100)의 네트워크 장애 자동 해결 장치(110)는 제어부(120), 패킷 캡쳐부(130), 정보묶음 생성부(140), 성능지표 생성부(150), 성능지표 분석부(160), 네트워크 장애처리부(170)를 포함하여 구성될 수 있다.The network failure automatic resolution device 110 of the intelligent network management system 100 that manages the network includes the control unit 120 , the packet capture unit 130 , the information bundle generation unit 140 , and the performance indicator generation unit 150 . , a performance indicator analysis unit 160 , may be configured to include a network failure processing unit 170 .

제어부(120)는 네트워크 장애 자동 해결 장치(110)에서 네트워크 상태에 대한 패킷 정보를 수집하고 분석된 데이터를 통해 특정 영역의 장애 여부를 자동으로 판단하고 네트워크 장애를 자동으로 해결하여, 다양하고 복잡한 네트워크 이슈에 대해 한 번의 클릭으로 정확하고 빠른 원인 파악 및 해결을 위한 가이드를 제공하고, 네트워크 운영자 누구나 쉽고 편리하게 네트워크를 관리할 수 있게 하며, 정보 수집 및 분석 기능으로 다양한 시스템과 연동하여 사용자 요구 맞춤형 커스터마이징된 네트워크 관리 서비스를 제공하도록 제어한다.The control unit 120 collects packet information on the network state from the network failure automatic resolution device 110, automatically determines whether a failure in a specific area is present through the analyzed data, and automatically resolves the network failure, resulting in various and complex networks It provides a guide to accurately and quickly identifying and resolving issues with one click, allowing any network operator to easily and conveniently manage the network, and interlocking with various systems with information collection and analysis functions to customize user needs Controlled to provide network management services.

패킷 캡쳐부(130)는 제어부(120)의 제어를 받고, 데이터 센터의 네트워크 장비(210) 또는 원격 지능형 네트워크 관리 장치(220)로부터 패킷 데이터를 네트워크 인터페이스 카드(Network Interface Card, NIC)를 통해 수신하고, 수신한 패킷 데이터를 하나의 데이터 스트림으로 묶어 정보묶음을 생성하기 위해 필요한 원시 데이터(raw data)를 생성하고, 원시 패킷 저장 버퍼에 저장하여, 데이터 센터(210) 또는 원격 지능형 네트워크 관리 장치(220)의 네트워크 패킷, SNMP TRAP, SYSLOG 정보를 포함한 데이터를 측정한다.The packet capture unit 130 receives the control of the control unit 120, and receives packet data from the network equipment 210 of the data center or the remote intelligent network management device 220 through a network interface card (NIC). The data center 210 or the remote intelligent network management device ( 220) measures data including network packet, SNMP TRAP, and SYSLOG information.

정보묶음 생성부(140)는 제어부(120)의 제어를 받고, 패킷 캡쳐부(130)에서 수집한 패킷에 대한 메타데이터를 생성하고, 메타데이터에는 패킷 확인 시간, 패킷 크기, 세션 ID, 패킷 크기, MAC address 및 TCP 정보가 포함되며, 세션 정보묶음, BPS 정보묶음, PPS 정보묶음, RTT 정보묶음, 타임아웃 정보묶음, TCP 정보묶음, Remarks 정보묶음 및 이벤트 정보묶음을 생성하고, 각 정보묶음의 종류별로 동시에 데이터를 압축하여 저장한다.The information bundle generation unit 140 is controlled by the control unit 120 and generates metadata for the packets collected by the packet capture unit 130, and the metadata includes packet confirmation time, packet size, session ID, and packet size. , MAC address and TCP information are included. Session information bundle, BPS information bundle, PPS information bundle, RTT information bundle, timeout information bundle, TCP information bundle, Remarks information bundle, and event information bundle are created, and the It compresses and stores data by type at the same time.

성능지표 생성부(150)는 제어부(120)의 제어를 받고, 정보묶음 생성부(140)에서 생성된 정보묶음을 전달받고, 네트워크 관리에 필요한 성능지표를 생성하고, 성능지표에는 기본 성능지표와 추가 성능지표를 생성한다.The performance indicator generation unit 150 receives the control of the control unit 120, receives the information bundle generated by the information bundle generation unit 140, generates a performance indicator necessary for network management, and the performance indicator includes a basic performance indicator and Generate additional performance indicators.

성능지표 분석부(160)는 제어부(120)의 제어를 받고, 성능지표 생성부(150)에서 생성한 성능지표를 바탕으로 정보묶음에서 사용할 정보 종류를 선택한 다음 네트워크의 성능을 분석하여 성능지표 분석결과를 생성한다.The performance indicator analysis unit 160 is controlled by the control unit 120, selects the type of information to be used in the information bundle based on the performance indicator generated by the performance indicator generation unit 150, and then analyzes the performance of the network to analyze the performance indicator produce results.

네트워크 장애처리부(170)는 성능지표 분석부(160)의 분석결과를 이용하여 특정 영역의 장애 여부를 자동으로 판단하고 네트워크 장애를 자동으로 해결한다.The network failure processing unit 170 automatically determines whether there is a failure in a specific area using the analysis result of the performance indicator analysis unit 160 and automatically resolves the network failure.

이러한 본 발명의 동작을 좀더 상세히 설명하면 다음과 같다.The operation of the present invention will be described in more detail as follows.

패킷 캡쳐부(130)는 데이터 센터(210)의 정보를 패킷으로 전달받고, 패킷 데이터를 정보묶음으로 관리한다. 패킷 캡쳐부(130)는 NIC에 패킷이 수집되면, NIC에서의 부하 분산을 위해 각 패킷을 NIC 내부의 개별 queue(하드웨어 버퍼)에 별도로 분산하여 저장한다. 그리고 하드웨어 버퍼에서 데이터를 꺼내어 처리하는 부분은 응용프로그램에서 진행하도록 한다.The packet capture unit 130 receives information of the data center 210 as a packet, and manages the packet data as an information bundle. When packets are collected in the NIC, the packet capture unit 130 separately distributes and stores each packet in an individual queue (hardware buffer) inside the NIC for load distribution in the NIC. And let the application program take out data from the hardware buffer and process it.

또한 패킷 캡쳐부(130)는 NIC 하드웨어 자체에 사전 지정된 개수의 큐(queue)를 생성하도록 한다. 그리고 패킷 캡쳐부(130)는 NIC의 데이터를 읽기 위한 별도의 스레드를 할당한다. 이 때 queue별로 1개씩 할당한다. 또한 NIC 내부 queue의 원시 패킷을 옮겨 저장할 수 있는 별도의 버퍼를 미리 생성한다. 또한 Queue에 패킷이 쌓여있는지의 여부는 자동 또는 수동으로 확인할 수 있게 한다. 만약 자동 확인시 시스템이 queue 확인 후 프로그램에 결과를 알려주기까지 생기는 지연이 있다. 이는 "시스템에서 queue 확인 → 프로그램에 메시지 전송 → 프로그램의 메시지 처리 → queue 처리"하는 과정인데, 이때 '프로그램에 메시지 전송'하는 과정과 '프로그램의 메시지 처리' 과정이 지연 발생의 원인이다. 그래서 무한루프에 기반한 수동 제어로 해당 지연을 회피한다. 즉, "프로그램에서 queue 확인 → queue 처리 → 반복"의 과정을 수행하여 지연을 회피한다.In addition, the packet capture unit 130 creates a predetermined number of queues in the NIC hardware itself. In addition, the packet capture unit 130 allocates a separate thread for reading data of the NIC. At this time, one is allocated for each queue. In addition, a separate buffer is created in advance to move and store the raw packets in the NIC's internal queue. Also, it is possible to check whether packets are stacked in the queue automatically or manually. In case of automatic confirmation, there is a delay until the system notifies the result to the program after checking the queue. This is a process of "checking the queue in the system → sending a message to the program → processing the program's message → processing the queue". Therefore, the delay is avoided by manual control based on an infinite loop. That is, it avoids the delay by performing the process of "queue check → queue processing → repeat in the program".

또한 패킷 캡쳐부(130)는 각 스레드가 한꺼번에 동시에 수행하여 각 큐별 쌓인 데이터 크기를 계산 및 확인한다. 그리고 각 큐별로 버퍼에 저장할 위치를 선정한다. 이때 저장할 데이터의 크기가 버퍼의 잔여 크기 보다 클 경우에는 버퍼를 비어있는 새 버퍼로 교체한다. 또한 미리 저장할 위치를 지정한 다음 각 스레드가 단일 버퍼에 동시에 데이터 쓰기를 수행한다. 일반적으로 단일 버퍼에 여러 스레드가 동시에 쓰기를 수행하면 같은 위치에 다수의 스레드가 동시에 데이터를 쓰는 문제가 발생할 수 있지만, 이 경우 데이터를 쓰는 영역이 겹치지 않으므로 아무런 문제가 없다. 패킷 캡쳐부(130)에서는 사전에 미리 저장할 위치를 지정하므로 메모리 낭비의 여지가 없다. 종래기술의 경우 단일 스레드를 사용하기 때문에 쓰기 속도에 한계가 있거나(최고 10Gbps 내외), FPGA 기반의 별도 하드웨어를 사용하는 방법으로 속도 문제를 해결하는 반면, 본 발명은 순전히 100% 소프트웨어적인 방법으로만 고속처리를 가능하게 하는 장점이 있다.In addition, the packet capture unit 130 calculates and checks the size of data accumulated for each queue by concurrently executing each thread. And select the location to be stored in the buffer for each queue. At this time, if the size of the data to be stored is larger than the remaining size of the buffer, the buffer is replaced with a new empty buffer. Also, after specifying the storage location in advance, each thread writes data to a single buffer at the same time. In general, when multiple threads write to a single buffer at the same time, there may be a problem where multiple threads write data at the same time at the same location, but in this case, there is no problem because the data writing area does not overlap. Since the packet capture unit 130 designates a location to be stored in advance, there is no room for wasting memory. In the case of the prior art, because a single thread is used, the write speed is limited (up to 10 Gbps), or the speed problem is solved by using an FPGA-based separate hardware, whereas the present invention is a 100% software method only. It has the advantage of enabling high-speed processing.

또한 정보묶음 생성부(140)는 다수의 패킷을 저장하는 버퍼를 관리하고, 각각의 패킷은 L2 헤더, L3 헤더, L4 헤더, 패킷 본문(body 및 payload)을 포함한다.Also, the information bundle generating unit 140 manages a buffer for storing a plurality of packets, and each packet includes an L2 header, an L3 header, an L4 header, and a packet body (body and payload).

정보묶음 생성부(140)의 정보묶음 구조는 '저장하고 있는 최초 시간, 저장하고 있는 마지막 시간, 정보 블록 1, 정보 블록 2, 정보 블록 3, ..., 정보 블록 n'과 같은 구조로 이루어져 있다. 정보 블록의 구조는 '압축된 크기, 실제 크기, 압축된 이진 정보 자료'의 구조로 이루어져 있다. 이전 정보 자료 중 고정 길이는 '고정폭 데이터 1, 고정폭 데이터 2, 고정폭 데이터 3, 고정폭 데이터 4, ..., 고정폭 데이터 n'과 같은 구조로 이루어져 있다. 이전 정보 자료 중 가변 길이는 '고정폭 데이터 1(가변 길이 정보 포함), 가별 길이 데이터 1, 고정폭 데이터 2(가변 길이 정보 포함), 가변 길이 데이터 2, 고정폭 데이터 3(가변 길이 정보 포함), 가변 길이 데이터 3, ..., 고정폭 데이터 n(가변 길이 정보 포함), 가변 길이 데이터 n'과 같은 구조로 이루어져 있다.The information bundle structure of the information bundle generating unit 140 has a structure such as 'first time to be stored, last time to be stored, information block 1, information block 2, information block 3, ..., information block n'. there is. The structure of the information block consists of the structure of 'compressed size, actual size, and compressed binary information data'. Among the previous information materials, the fixed length has the same structure as 'fixed-width data 1, fixed-width data 2, fixed-width data 3, fixed-width data 4, ..., fixed-width data n'. Among the previous information materials, variable length is 'fixed-width data 1 (including variable-length information), separate-length data 1, fixed-width data 2 (including variable-length information), variable-length data 2, fixed-width data 3 (including variable-length information) , variable-length data 3, ..., fixed-width data n (including variable-length information), and variable-length data n'.

정보묶음 생성부(140)는 개별 패킷에 대한 메타데이터를 생성한다. 메타데이터에는 패킷 확인 시간, 패킷 크기, 세션 ID, 패킷 크기, MAC address, 각종 TCP 특화 정보 등이 포함된다.The information bundle generating unit 140 generates metadata for individual packets. The metadata includes packet confirmation time, packet size, session ID, packet size, MAC address, and various TCP-specific information.

정보묶음 생성부(140)에서 생성하는 정보묶음에는 세션 정보묶음, BPS 정보묶음, PPS 정보묶음, RTT 정보묶음, 타임아웃 정보묶음, TCP 정보묶음, Remarks 정보묶음, 이벤트 정보묶음 등이 포함된다.The information bundle generated by the information bundle generator 140 includes a session information bundle, a BPS information bundle, a PPS information bundle, an RTT information bundle, a timeout information bundle, a TCP information bundle, a Remarks information bundle, an event information bundle, and the like.

세션 정보묶음에는 세션 ID, 클라이언트 IP/port, 서버 IP/port, L4 프로토콜, L7 프로토콜 정보를 저장한다.Session ID, client IP/port, server IP/port, L4 protocol, and L7 protocol information are stored in the session information bundle.

BPS 정보묶음에는 세션 ID, 전송 시간(초단위), 클라이언트에서 서버로 초당 전송된 데이터 크기, 서버에서 클라이언트로 초당 전송된 데이터 크기 정보를 저장한다.Session ID, transmission time (in seconds), data size transmitted from client to server per second, and data size information transmitted from server to client per second are stored in the BPS information bundle.

PPS 정보묶음에는 세션 ID, 전송 시간(초단위), 클라이언트에서 서버로 초당 전송된 패킷 개수, 서버에서 클라이언트로 초당 전송된 패킷 개수 정보를 저장한다.The PPS information bundle stores the session ID, transmission time (in seconds), the number of packets transmitted from the client to the server per second, and the number of packets transmitted from the server to the client per second.

RTT(Round Trip Time) 정보묶음에는 세션 ID, 클라이언트에서 서버로의 전송 지연시간, 서버에서 클라이언트로의 전송 지연시간 정보를 저장한다.In the RTT (Round Trip Time) information bundle, session ID, transmission delay time from client to server, and transmission delay time information from server to client are stored.

타임아웃 정보묶음에 세션 전체 정보, 발생시간 정보를 저장한다.All session information and occurrence time information are stored in the timeout information bundle.

TCP 정보묶음에는 TCP SYN이 발생한 시간대 및 세션 정보인 TCP SYN, TCP RST가 발생한 시간대 및 세션 정보인 TCP RST, TCP DUP ACK이 발생한 시간대 및 세션 정보인 TCP DUP ACK, TCP 패킷 재전송이 발생한 시간대 및 세션 정보인 TCP 패킷 재전송, 발생한 시간대 및 문제점 종류(TCP Zero Window, Port Reused, Out of Order)인 TCP 기타 문제점 정보를 저장한다.In the TCP information bundle, the time zone and session information in which TCP SYN occurred, TCP SYN, the time zone and session information in which TCP RST occurred, TCP RST, the time zone and session information in which TCP DUP ACK occurred, TCP DUP ACK, the time zone and session in which TCP packet retransmission occurred It stores TCP packet retransmission information, which is the time of occurrence and the type of problem (TCP Zero Window, Port Reused, Out of Order), and other TCP problem information.

Remarks 정보묶음에는 HTTP 요청/응답 헤더, DNS query 및 응답 결과, SMTP email 수발신자 ID, FTP/IMAP/POP3 오류 내용 정보를 저장한다.HTTP request/response header, DNS query and response result, SMTP email sender ID, FTP/IMAP/POP3 error content information are stored in Remarks information bundle.

이벤트 정보묶음에는 사전에 사용자 정의된 임계치 이상이나 이하 또는 변동비 이상일 경우 발생한 이벤트 정보를 저장한다.In the event information bundle, event information that occurs when it is above or below a predefined threshold or above a variable rate is stored.

성능지표 생성부(150)는 기본 성능지표에 BPS, PPS, latency, timeout의 성능지표가 포함되어 생성한다. The performance indicator generating unit 150 generates the basic performance indicators including performance indicators of BPS, PPS, latency, and timeout.

또한 추가 성능지표에 시간별 및 IP별 생성된 flow 개수의 성능지표, TCP 성능지표, TCP 기반 서비스 제공 IP 목록의 성능지표, UDP 기반 서비스 제공 IP 목록, IP별 MAC address의 성능지표, 포트 번호별 데이터 사용 현황의 성능지표, 또는 L7 프로토콜별 성능지표 중에서 하나 이상의 성능지표를 생성한다. In addition, additional performance indicators include performance indicators of the number of flows created by time and IP, TCP performance indicators, performance indicators of IP list providing TCP-based services, IP list of UDP-based services, performance indicators of MAC addresses for each IP, data by port number One or more performance indicators are generated among performance indicators of usage status or performance indicators for each L7 protocol.

TCP 성능지표는 TCP RST, TCP Zero Windows, TCP DUP ACKS, TCP 재전송, TCP 포트 재사용, TCP 패킷 순서 뒤바뀜의 성능지표가 포함되고, L7 프로토콜별 성능지표에는 DNS 쿼리 결과별 분석, HTTP 접속 현황, SMTP 송신/수신자별 데이터 전송량 측정의 성능지표가 포함될 수 있다.TCP performance indicators include TCP RST, TCP Zero Windows, TCP DUP ACKS, TCP retransmission, TCP port reuse, and TCP packet order reversal. A performance indicator of data transmission amount measurement for each sender/receiver may be included.

성능지표 분석부(160)는 분석하고자 하는 성능지표가 BPS 기반 분석, PPS 기반 분석, Timeout 기반 분석, TCP RST 기반 분석, TCP Zero Windows 분석, TCP DUP ACK 분석, TCP 재전송 분석, TCP 포트 재사용 분석, TCP 패킷 순서 뒤바뀜 분석, HTTP error status 분석, 성능지표 추가 분석 중에서 어떤 성능지표 분석인지 판별한다.The performance indicator analysis unit 160 determines that the performance indicator to be analyzed is BPS-based analysis, PPS-based analysis, Timeout-based analysis, TCP RST-based analysis, TCP Zero Windows analysis, TCP DUP ACK analysis, TCP retransmission analysis, TCP port reuse analysis, Determines which performance indicator analysis is performed among TCP packet order reversal analysis, HTTP error status analysis, and additional performance indicator analysis.

성능지표 분석부(160)는 성능지표 분석이 BPS 기반 분석이면 트래픽이 총 가용 대역폭의 85% 이상이면 '트래픽 급증'으로 분석하고, 트래픽 급증 상태가 60초 이상 지속되면 '트래픽 과다 상태 지속'으로 분석하며, 총 트래픽의 50% 이상이 단일 IP에 집중되면 '특정 IP로의 트래픽 집중'으로 분석하고, 사용중인 트래픽이 총 가용 대역폭의 2% 미만이면 '네트워크 장애 의심'으로 분석한다.If the performance indicator analysis is BPS-based analysis, the performance indicator analysis unit 160 analyzes it as 'traffic surge' if the traffic is 85% or more of the total available bandwidth, and 'continues the excessive traffic state' if the traffic surge state lasts for 60 seconds or more. If more than 50% of the total traffic is concentrated on a single IP, it is analyzed as 'traffic concentration to a specific IP', and if the traffic in use is less than 2% of the total available bandwidth, it is analyzed as 'suspicious network failure'.

PPS 기반 분석이면 만약 Broadcast 패킷이 전체 패킷 중 70% 이상을 점유하는 경우이면 'Broadcast 패킷의 급격한 증가로 인한 높은 대역폭 점유'로 분석하고, 만약 IP 패킷이 아닌 패킷이 전체 패킷의 50% 이상을 점유하는 경우이면 '알 수 없는 패킷이 대역폭을 대폭 점유'로 분석한다.In the case of PPS-based analysis, if the broadcast packet occupies more than 70% of the total packet, it is analyzed as 'high bandwidth occupation due to the rapid increase of the broadcast packet'. In this case, it is analyzed as 'unknown packets occupy a large amount of bandwidth'.

Timeout 기반 분석이면 만약 사용자가 지정한 기간 동안 초당 20개 IP 이상에 대해서 timeout이 발생한 경우이면 'Network interface shutdown 또는 장비 정전으로 인한 서비스 불가 의심'으로 분석하고, 만약 사용자가 지정한 기간 동안 동시에 초당 10개 이상 ~ 20개 미만 IP에 대해서 timeout이 발생한 경우이면 '케이블 또는 GBIC(Giga Bitrate Interface Converter, 기가비트 인터페이스 컨버터) 불량으로 인한 서비스 끊김 의심'으로 분석한다.In case of timeout based analysis, if timeout occurs for more than 20 IPs per second during the period specified by the user, it is analyzed as 'suspicious of service unavailable due to network interface shutdown or equipment power failure', and if more than 10 IPs per second during the period specified by the user If timeout occurs for less than ~ 20 IPs, it is analyzed as 'suspicious of service interruption due to cable or GBIC (Giga Bitrate Interface Converter, Gigabit Interface Converter) failure'.

TCP RST 기반 분석이면 만약 동일 서버에서 RST를 초당 10회 이상 보낸 경우이면 '서버측에서 존재하지 않는 Destination port로 Request가 들어오거나, 이미 연결이 종료된 포트로 접속을 시도하는 등의 경우'로 분석하고, 만약 동일 클라이언트에서 RST를 초당 5회 이상 보낸 경우이면 'Application에서 FIN 대신 Reset을 사용하여 연결을 종료하고자 하는 경우'로 분석하며, 만약 동일 클라이언트/서버에서 RST를 초당 3~4회 발생시키는 경우이면, '서버와 클라이언트 양쪽 중 어느 한쪽에서 종료됨을 알리지 않고 종료하는 경우'로 분석한다.In the case of TCP RST-based analysis, if the same server sends RST more than 10 times per second, it is analyzed as 'a request is received from the server side to a destination port that does not exist, or a connection is attempted to a port that has already been connected' And, if the same client sends RST more than 5 times per second, it is analyzed as 'the application wants to close the connection using Reset instead of FIN'. If this is the case, it is analyzed as 'the case where either the server or the client terminates without notifying the termination'.

TCP Zero Windows 분석이면 만약 TCP Zero Window 현상이 초당 10회 이상 발생한 IP이면 '방화벽, IPS 등 보안장비 또는 WAN 가속기 등의 오류로 인한 Zero window 생성 의심'으로 분석한다.In the case of TCP Zero Windows analysis, if the TCP Zero Window phenomenon occurs more than 10 times per second, it is analyzed as 'suspected of zero window creation due to errors such as firewalls, IPS security equipment, or WAN accelerators'.

TCP DUP ACK 분석이면 만약 특정 IP에서 DUP ACK이 초당 60회 이상 발생한 경우이면 'Network Congestion(충돌)'로 분석한다.In the case of TCP DUP ACK analysis, if DUP ACK occurs more than 60 times per second from a specific IP, it is analyzed as 'Network Congestion'.

TCP 재전송 분석이면 만약 특정 IP에서 TCP 재전송이 초당 1000회 이상 발생하는 경우이면 '이중화 구간에서의 loop 발생 의심'으로 분석한다.In the case of TCP retransmission analysis, if TCP retransmission occurs more than 1000 times per second in a specific IP, it is analyzed as 'suspicious loop occurrence in the duplication section'.

TCP 포트 재사용 분석이면 만약 TCP 포트 재사용이 초당 3회 이상 확인된 경우이면 '클라이언트 측 local port 고갈 및 서버 time wait 상태 유지 의심'으로 분석한다.In the case of TCP port reuse analysis, if TCP port reuse is confirmed more than 3 times per second, it is analyzed as 'suspected of client-side local port exhaustion and server time wait status maintenance'.

TCP 패킷 순서 뒤바뀜 분석이면 만약 순서 뒤바뀜이 초당 3회 이상 발생한 경우이면 '패킷 유실 등으로 인한 TCP segment loss 발생 의심'으로 분석한다.In the case of TCP packet reordering analysis, if the reordering occurs more than 3 times per second, it is analyzed as 'suspected of TCP segment loss due to packet loss'.

HTTP error status 분석이면 만약 상태코드가 HTTP 4XX인 경우 10개 미만의 IP에서 동일 현상이 발견되면 '사용자 입력 문제'로 인식하여 분석하고, 만약 상태코드가 HTTP 5XX이거나 HTTP 4XX이면서 10개 이상의 IP에서 동일 현상이 발견되면 '서버 또는 클라이언트의 코드에 문제가 있는 것'으로 인식하여 분석한다.In the case of HTTP error status analysis, if the status code is HTTP 4XX, if the same phenomenon is found in less than 10 IPs, it is recognized as a 'user input problem' and analyzed. If the status code is HTTP 5XX or HTTP 4XX, If the same phenomenon is found, it is recognized as 'there is a problem in the server or client's code' and analyzed.

성능지표 추가 분석이면 시스템 설정의 추가 또는 사용자의 추가에 따라 성능지표를 추가하여 분석한다.In the case of additional analysis of performance indicators, performance indicators are added and analyzed according to the addition of system settings or user additions.

외부 장치는 데이터 센터의 네트워크 장비(210) 또는 원격 지능형 네트워크 관리 장치(220) 등이 될 수 있다. 데이터 센터의 네트워크 장비(210)는 Tapping 또는 Port Mirroring으로 물리적 네트워크(Physical NW)에 접속하여 데이터를 측정한다. 또한 데이터 센터의 네트워크 장비(210)는 가상 스위치(vSwitch)를 포함한 가상화 환경일 수 있다. 원격 지능형 네트워크 관리 장치(220)는 원격 오피스(remote Office)에 설치된 장치일 수 있다.The external device may be the network equipment 210 of the data center or the remote intelligent network management device 220 or the like. The network equipment 210 of the data center measures data by connecting to a physical network (Physical NW) by tapping or port mirroring. In addition, the network equipment 210 of the data center may be a virtualization environment including a virtual switch (vSwitch). The remote intelligent network management device 220 may be a device installed in a remote office.

또한 본 발명이 적용된 지능형 네트워크 관리 시스템은 정보묶음을 이용하여 네트워크에 대한 자동 진단을 수행할 수 있다. 자동 진단 내용을 보면, 진단 항목을 정의하고, 진단 대상의 상태를 측정하며, 진단 대상의 증상을 제공하고, 예상되는 원인을 제공하며, 예상원인 별 조치방법을 제공하고, 분석 결과 제공한다. 자동 진단 대상은 성능, 사용량, UDP, TCP 또는 HTTP 에러를 포함한다.In addition, the intelligent network management system to which the present invention is applied can perform automatic diagnosis of the network using information bundles. If you look at the contents of automatic diagnosis, the diagnosis items are defined, the condition of the diagnosis target is measured, the symptoms of the diagnosis target are provided, the expected cause is provided, the action method for each expected cause is provided, and the analysis result is provided. Auto-diagnosis targets include performance, usage, and UDP, TCP or HTTP errors.

네트워크에 대한 자동 진단은 네트워크 상태, 네트워크 사용 및 성능, 장애/이벤트, 응용 서비스, 자동 진단, L2 ~ L7 분석, 통계 및 추이 분석, 이벤트 처리 기능이 포함되며, 이러한 기능을 통해 네트워크에 대한 자동 진단을 수행할 수 있다.Automatic network diagnosis includes network status, network usage and performance, failure/event, application service, automatic diagnosis, L2 ~ L7 analysis, statistics and trend analysis, and event processing functions. Through these functions, automatic diagnosis of the network can be performed.

네트워크 상태의 자동 진단에서는 SNMP Trap 정보 분석으로 네트워크 장비상태를 파악하고, Syslog 자료 분석을 수행한다.In the automatic diagnosis of network status, the status of network equipment is identified through SNMP Trap information analysis and Syslog data analysis is performed.

네트워크 사용 및 성능의 자동 진단에서는 BPS(Bits Per Second), PPS(Packets Per Second), Latencies, Timeout에 대한 자동 진단을 수행한다.In the automatic diagnosis of network usage and performance, automatic diagnosis of BPS (Bits Per Second), PPS (Packets Per Second), Latencies, and Timeout is performed.

장애/이벤트의 자동 진단에서는 UDP Flag, TCP Resets, TCP Zero Windows, TCP Reuse, TCP Duplicate ACKs, TCP Retransmission에 대한 자동 진단을 수행한다. 또한 HTTP 4XX, HTTP 5XX에 대한 자동 진단을 수행한다.Automatic diagnosis of failure/event performs automatic diagnosis of UDP Flag, TCP Resets, TCP Zero Windows, TCP Reuse, TCP Duplicate ACKs, and TCP Retransmission. It also performs automatic diagnosis for HTTP 4XX and HTTP 5XX.

응용 서비스의 자동 진단에서는 응용 서비스 자동 인식 및 Payload 상세분석을 수행한다. 그래서 HTTP, DNS, SMTP, POP3, IMAP, FTP에 대한 자동 진단을 수행한다.In automatic diagnosis of application service, automatic recognition of application service and detailed analysis of payload are performed. Therefore, automatic diagnosis is performed for HTTP, DNS, SMTP, POP3, IMAP, and FTP.

문제원인 및 해결방안 제시를 위한 자동 진단에서는 TCP Retransmission, Hop Low, Microburst, RTT(Round Trip Time), TCP Reset, TCP Zero Windows, TCP DUP ACKs, Timeout에 대한 자동 진단을 수행한다.In automatic diagnosis to suggest the cause and solution of the problem, automatic diagnosis is performed on TCP Retransmission, Hop Low, Microburst, RTT (Round Trip Time), TCP Reset, TCP Zero Windows, TCP DUP ACKs, and Timeout.

L2 ~ L7 분석의 자동 진단에서는 Layer 2 분석으로 Mac 사용 분석, Layer 3 분석으로 Hop Account 분석, Layer 4 분석으로 포트별 분석(출발지, 도착지 별), Layer 7 분석으로 응용 서비스에 대한 자동 진단을 수행한다.In the automatic diagnosis of L2 ~ L7 analysis, Mac usage analysis is performed by Layer 2 analysis, Hop Account analysis is performed by Layer 3 analysis, analysis by port (by source and destination) by Layer 4 analysis, and automatic diagnosis of application services are performed by Layer 7 analysis do.

통계 및 추이 분석의 자동 진단에서는 성능지표(BPS, PPS, Latency, Timeout), TCP 관련, HTTP 오류, Layer 7 분석, Flow 추이에 대한 자동 진단을 수행한다.Automatic diagnosis of statistics and trend analysis performs automatic diagnosis of performance indicators (BPS, PPS, Latency, Timeout), TCP related, HTTP error, Layer 7 analysis, and flow trend.

이벤트의 자동 진단에서는 성능별 임계치 설정 및 제어, 알람 생성 및 등급 설정, 알람 등급별 검색 및 조회, Syslog Server(Remote), SNMP Trap Server에 대한 자동 진단을 수행한다. 그리고 알람/이벤트(Event)는 실시간 네트워크 상태 감시 및 알림 서비스를 제공한다.In the automatic diagnosis of events, threshold setting and control by performance, alarm generation and class setting, search and inquiry by alarm class, and automatic diagnosis of Syslog Server (Remote) and SNMP Trap Server are performed. And alarm/event provides real-time network status monitoring and notification service.

그래서 지능형 네트워크 관리 시스템(100)의 네트워크 장애 자동 해결 장치(110)는 스위치 또는 라우터 등 각종 네트워크 장비(210)에 대해서는 5분 샘플링에 의한 NetFlow를 이용하여 네트워크 정보를 수집하고, 원격 지능형 네트워크 관리 장치(220)에 대해서는 패킷 정보를 수신하여 패킷 분석을 수행한다.Therefore, the network failure automatic resolution device 110 of the intelligent network management system 100 collects network information using NetFlow by 5-minute sampling for various network equipment 210 such as a switch or router, and a remote intelligent network management device For 220, packet information is received and packet analysis is performed.

NetFlow는 스위치/라우터에서 제공되는 네트워크 정보로, 각 hop별로 데이터를 수집할 수 있으나 수집시 장비와 네트워크 대역폭에 동시에 부하가 걸려서 실제로는 대부분 5분단위 샘플링 수집을 수행하고 있음NetFlow is network information provided by the switch/router, and data can be collected for each hop.

NetFlow는 그 특성상 얻을 수 있는 종류의 데이터가 제한적이다. 즉, L3 또는 그 이하 레벨(L1, L2)의 데이터만 수신할 수 있다.NetFlow is limited in the types of data that can be obtained due to its nature. That is, only data of L3 or lower levels (L1, L2) can be received.

본 발명의 지능형 네트워크 관리 시스템(100)의 네트워크 장애 자동 해결 장치(110)는 미러링 기반이므로 네트워크 대역폭 부하는 없으나 패킷을 수집하는 해당 구간에 대한 정보만 얻을 수 있으므로, 각 hop별(예: 특정 스위치와 스위치 사이 구간)에 대한 정보를 얻을 수 없다.Since the automatic network failure resolution device 110 of the intelligent network management system 100 of the present invention is based on mirroring, there is no network bandwidth load, but only information about the section in which packets are collected can be obtained, so each hop (eg, a specific switch) and the interval between the switch) cannot be obtained.

지능형 네트워크 관리 시스템(100)의 네트워크 장애 자동 해결 장치(110)는 DPI(Deep Packet Inspection)를 통해 L2~L7까지의 모든 데이터를 세밀하게 분석할 수 있다.The network failure automatic resolution device 110 of the intelligent network management system 100 can analyze all data from L2 to L7 in detail through DPI (Deep Packet Inspection).

따라서 데이터 센터의 네트워크 장비(210)의 NetFlow와 본 발명이 적용된 원격 지능형 네트워크 관리 장치(220)는 수집하는 네트워크 데이터의 폭(width)과 깊이(depth) 측면에서 상호 보완적인 구성요소라고 할 수 있다.Therefore, it can be said that NetFlow of the network equipment 210 of the data center and the remote intelligent network management device 220 to which the present invention is applied are complementary components in terms of the width and depth of the collected network data. .

네트워크 장애 자동 해결 장치(110)는 네트워크 장비(210)의 NetFlow와 본 발명이 적용된 원격 지능형 네트워크 관리 장치(220)를 통해 수집되고 분석된 데이터를 통해 특정 영역의 장애 여부를 판단할 수 있다.The network failure automatic resolution device 110 may determine whether there is a failure in a specific area through the data collected and analyzed through the NetFlow of the network equipment 210 and the remote intelligent network management device 220 to which the present invention is applied.

그래서 네트워크 장애처리부(170)는 장애라고 판단될 경우 권고사항을 제시하거나 스스로 네트워크를 제어할 수 있다. 또한 스스로 네트워크를 제어하지 못하거나 내용을 허가받지 못한 경우 권고사항 형태로 해결방안을 제시할 수 있다. 또한 스스로 네트워크를 제어할 수 있는 경우 특정 장비에 접속하여 설정 변경 또는 재부팅 등 수행할 수 있다. 이때는 대부분 SSH로 접속하여 원격 shell command를 사용하여 제어하게 된다.Therefore, when it is determined that there is a failure, the network failure processing unit 170 may present a recommendation or control the network by itself. In addition, if the network cannot be controlled by itself or the content is not authorized, a solution can be suggested in the form of a recommendation. Also, if you can control the network yourself, you can access a specific device to change settings or reboot. In this case, most of the time, SSH is connected and the remote shell command is used to control.

네트워크 장애처리부(170)는 만약 매우 바쁜 네트워크에서 특정 hop을 거칠 경우 갑자기 속력이 느려지는 경우, 이는 NetFlow를 통해 확인된 패킷량은 정상 수준이거나 또는 NetFlow가 보여주는 BPS 또한 정상일 수 있다. 이때 네트워크 장애 자동 해결 장치(110)는 해당 hop과 관련된 IP들로부터 초당 수백~수천개에 이르는 TCP DUP ACK과 Retransmission을 감지한다. 그래서 이 경우에는 해당 hop에서 packet loop이 예상된다고 판단할 수 있다. Packet loop은 거의 대부분 케이블 배선이 잘못되어 일어나는 경우 발생한다. 이는 설정 변경으로 해결될 수 있는 사안이 아니므로 '케이블 배선 확인'과 같은 권고사항을 전달한다. 여기서 '매우 바쁜 네트워크' 는 측정 당시에 해당 통신을 수행하는 장비들 중 한 개 이상의 장비의 구성요소에서 과부하가 걸리는 경우를 말한다. 이때 통신을 수행하는 장비들은 클라이언트 컴퓨터, 서버 컴퓨터, 클라이언트와 서버를 연결하는 모든 네트워크 장비(스위치, 라우터 및 각종 보안장비 등)을 말한다. 또한 장비의 구성요소는 CPU, RAM, Disk, 네트워크 포트를 말한다.If the network failure processing unit 170 suddenly slows down when going through a specific hop in a very busy network, the amount of packets checked through NetFlow may be at a normal level, or BPS shown by NetFlow may also be normal. At this time, the network failure automatic resolution device 110 detects hundreds to thousands of TCP DUP ACKs and retransmissions per second from IPs related to the corresponding hop. So, in this case, it can be determined that a packet loop is expected at the corresponding hop. Packet loops are almost always caused by incorrect cable wiring. This is not an issue that can be resolved by changing the settings, so we deliver recommendations such as 'check the cable wiring'. Here, the 'very busy network' refers to a case in which one or more components of the equipment performing the corresponding communication are overloaded at the time of measurement. At this time, the communication equipment refers to a client computer, a server computer, and all network equipment (switches, routers, and various security equipment) connecting the client and the server. Also, the components of the equipment are CPU, RAM, Disk, and network port.

또한 네트워크 장애처리부(170)는 만약 매일 특정 시간대만 되면 네트워크의 서비스가 느려지는 경우에 대해서도 대처할 수 있다. 이는 데이터 센터의 네트워크 장비(210)의 NetFlow와 원격 네트워크 장애 자동 해결 장치(220) 모두 네트워크 서비스를 관할하는 서버로의 트래픽이 해당 시간대에 급증함을 알리게 된다. 그리고 NetFlow는 특정 위치에서 네트워크 서비스로의 트래픽이 집중됨을 확인할 수 있다. 원격 네트워크 장애 자동 해결 장치(220)는 해당 시간대에만 서버로부터의 응답 지연이 매우 길어짐을 확인할 수 있다. 추가로, 서버 담당자가 해당 시간대에만 CPU와 디스크 사용량이 폭주한다고 보고할 수 있다. 이런 상황에 대해서는 다양한 해결방안을 제시하거나 적용할 수 있다. 즉, NetFlow가 발견한 특정 위치에 대해서 QoS 적용하여 별도의 QoS 장비 또는 스위치에 접속하고, QoS 기능을 이용하여 해당 위치로부터 서버 요청이 폭주하지 않도록 트래픽의 총량을 조정하여 네트워크 장애를 자동 처리할 수 있다. 또한 요청이 시간대별로 분산되도록 클라이언트의 네트워크 접속 시간을 조정하도록 권고할 수 있다. 또한 필요한 추가 대역폭에 대한 제안을 포함하여 서버 및 네트워크 증설을 권고할 수 있다.In addition, the network failure processing unit 170 can cope with a case in which the service of the network becomes slow if only a specific time period is reached every day. This notifies that both the NetFlow of the network equipment 210 of the data center and the automatic remote network failure resolution device 220 increase the traffic to the server in charge of the network service during the corresponding time period. In addition, NetFlow can confirm that the traffic to the network service is concentrated in a specific location. The remote network failure automatic resolution apparatus 220 may confirm that the response delay from the server is very long only in the corresponding time period. In addition, server personnel may report bursts of CPU and disk usage only during that time period. In this situation, various solutions can be proposed or applied. In other words, it is possible to apply QoS to a specific location discovered by NetFlow, connect to a separate QoS device or switch, and automatically handle network failures by adjusting the total amount of traffic so that server requests do not overflow from the location using the QoS function. there is. It may also be recommended to adjust the client's network connection time so that requests are distributed over time. It can also recommend server and network expansion, including suggestions for additional bandwidth needed.

패킷 캡쳐단계(ST1)에서는 지능형 네트워크 관리 시스템(100)의 네트워크 장애 자동 해결 장치(110)에서 네트워크 장애에 대한 자동 해결을 수행하면, 패킷 캡쳐부(130)는 데이터 센터의 네트워크 장비(210)에 대한 NetFlow 정보를 샘플링하거나 또는 원격 지능형 네트워크 관리 장치(220)로부터 패킷 데이터를 네트워크 인터페이스 카드(Network Interface Card, NIC)를 통해 수신하고, 수신한 패킷 데이터를 하나의 데이터 스트림으로 묶어 정보묶음을 생성하기 위해 필요한 원시 데이터(raw data)를 생성하고, 원시 패킷 저장 버퍼에 저장하여, 데이터 센터(210) 또는 원격 지능형 네트워크 관리 장치(220)의 네트워크 패킷, SNMP TRAP, SYSLOG 정보를 포함한 데이터를 측정한다.In the packet capture step (ST1), if the automatic network failure resolution device 110 of the intelligent network management system 100 performs automatic resolution of network failures, the packet capture unit 130 sends to the network equipment 210 of the data center. Sampling the NetFlow information for the network, or receiving packet data from the remote intelligent network management device 220 through a network interface card (NIC), and combining the received packet data into one data stream to create an information bundle The raw data required for this purpose is generated and stored in a raw packet storage buffer, and data including network packets, SNMP TRAP, and SYSLOG information of the data center 210 or the remote intelligent network management device 220 are measured.

정보묶음 생성단계(ST2)에서는 패킷 캡쳐단계 후, 정보묶음 생성부(140)는 수집한 패킷에 대한 메타데이터를 생성하고, 메타데이터에는 패킷 확인 시간, 패킷 크기, 세션 ID, 패킷 크기, MAC address 및 TCP 정보가 포함되며, 세션 정보묶음, BPS 정보묶음, PPS 정보묶음, RTT 정보묶음, 타임아웃 정보묶음, TCP 정보묶음, Remarks 정보묶음 및 이벤트 정보묶음을 생성하고, 각 정보묶음의 종류별로 동시에 데이터를 압축하여 저장한다.In the information bundle generating step ST2, after the packet capturing step, the information bundle generating unit 140 generates metadata for the collected packets, and the metadata includes packet confirmation time, packet size, session ID, packet size, MAC address and TCP information are included, and a session information bundle, BPS information bundle, PPS information bundle, RTT information bundle, timeout information bundle, TCP information bundle, Remarks information bundle, and event information bundle are generated, and each type of information bundle is simultaneously generated. Compress and store data.

성능지표 생성단계(ST3)에서는 정보묶음 생성단계 후, 성능지표 생성부(150)는 생성된 정보묶음을 전달받고, 네트워크 관리에 필요한 성능지표를 생성하고, 성능지표에는 기본 성능지표와 추가 성능지표를 생성한다.In the performance indicator generating step ST3, after the information bundle generation step, the performance indicator generating unit 150 receives the generated information bundle, generates a performance indicator necessary for network management, and includes a basic performance indicator and an additional performance indicator in the performance indicator. to create

성능지표 분석단계(ST4)에서는 성능지표 생성단계 후, 성능지표 분석부(160)는 생성한 성능지표를 바탕으로 정보묶음에서 사용할 정보 종류를 선택한 다음 네트워크의 성능을 분석하여 성능지표 분석결과를 생성한다.In the performance indicator analysis step ST4, after the performance indicator generation step, the performance indicator analysis unit 160 selects the type of information to be used in the information bundle based on the generated performance indicator, and then analyzes the network performance to generate the performance indicator analysis result. do.

네트워크 장애처리단계(ST5)에서는 성능지표 분석단계 후 네트워크 장애처리부(170)는 분석결과를 이용하여 특정 영역의 장애 여부를 자동으로 판단하고 네트워크 장애를 자동으로 해결한다.In the network failure processing step ST5, after the performance indicator analysis step, the network failure processing unit 170 automatically determines whether a failure in a specific area is present using the analysis result and automatically resolves the network failure.

장애처리 판별단계(ST11, ST12)에서는 NetFlow 정보와 패킷 정보를 분석한 데이터를 통해 특정 영역의 장애 여부를 판단하고, 어떤 장애처리를 수행할 것인지 판별한다.In the failure processing determination step (ST11, ST12), it is determined whether there is a failure in a specific area through the data analyzed by NetFlow information and packet information, and which failure processing is to be performed.

장애처리 수행단계(ST13, ST14)에서는 장애처리 판별단계에서 만약 네트워크의 장애라고 판단될 경우 권고사항을 제시하거나 스스로 네트워크를 제어하고, 만약 스스로 네트워크를 제어하지 못하거나 내용을 허가받지 못한 경우 권고사항 형태로 해결방안을 제시하며, 만약 스스로 네트워크를 제어할 수 있는 경우 SSH로 접속하여 원격 shell command를 사용하여 네트워크의 특정 장비에 접속하여 설정 변경 또는 재부팅을 수행하고, 네트워크 장애처리 결과를 제공한다.In the failure processing execution step (ST13, ST14), if it is determined that the network is faulty in the failure processing determination step, a recommendation is presented or the network is controlled by oneself, and if the network cannot be controlled by oneself or the contents are not authorized, a recommendation is made The solution is presented in the form of a solution, and if you can control the network yourself, connect via SSH and use a remote shell command to access a specific device on the network, change settings or reboot, and provide the network error handling result.

이처럼 본 발명은 네트워크 상태에 대한 패킷 정보를 수집하고 분석된 데이터를 통해 특정 영역의 장애 여부를 자동으로 판단하고 네트워크 장애를 자동으로 해결하여, 다양하고 복잡한 네트워크 이슈(성능, 장애 등)에 대해 한 번의 클릭으로 정확하고 빠른 원인 파악 및 해결을 위한 가이드를 제공하고, 네트워크 운영자 누구나 쉽고 편리하게 네트워크를 관리할 수 있게 하며, 정보 수집 및 분석 기능으로 다양한 시스템과 연동하여 사용자 요구 맞춤형 커스터마이징된 네트워크 관리 서비스를 제공하게 된다.As such, the present invention collects packet information on the network status, automatically determines whether there is a failure in a specific area through the analyzed data, and automatically resolves the network failure, so that various and complex network issues (performance, failure, etc.) It provides a guide for accurate and quick cause identification and resolution with one click, enables network operators to easily and conveniently manage the network, and links with various systems with information collection and analysis functions to provide customized network management service tailored to user needs will provide

이상에서 실시예를 들어 본 발명을 더욱 상세하게 설명하였으나, 본 발명은 반드시 이러한 실시예로 국한되는 것은 아니고, 본 발명의 기술사상을 벗어나지 않는 범위 내에서 다양하게 변형실시될 수 있다. 따라서 본 발명에 개시된 실시예들은 본 발명의 기술적 사상을 한정하기 위한 것이 아니라 설명하기 위한 것이고, 이러한 실시예에 의하여 본 발명의 기술적 사상의 범위가 한정되는 것은 아니다. 본 발명의 보호범위는 청구범위에 의하여 해석되어야 하며, 그와 동등한 범위 내에 있는 모든 기술적 사상은 본 발명의 권리범위에 포함되는 것으로 해석되어야 할 것이다.Although the present invention has been described in more detail with reference to examples above, the present invention is not necessarily limited to these examples, and various modifications may be made within the scope without departing from the technical spirit of the present invention. Therefore, the embodiments disclosed in the present invention are not intended to limit the technical spirit of the present invention, but to explain, and the scope of the technical spirit of the present invention is not limited by these embodiments. The protection scope of the present invention should be construed by the claims, and all technical ideas within the scope equivalent thereto should be construed as being included in the scope of the present invention.

100 : 지능형 네트워크 관리 시스템
110 : 네트워크 장애 자동 해결 장치
120 : 제어부
130 : 패킷 캡쳐부
140 : 정보묶음 생성부
150 : 성능지표 생성부
160 : 성능지표 분석부
170 : 네트워크 장애처리부
210 : 데이터 센터의 네트워크 장비
220 : 원격 지능형 네트워크 관리 장치100: intelligent network management system
110: network failure automatic resolution device
120: control unit
130: packet capture unit
140: information bundle generation unit
150: performance indicator generation unit
160: performance indicator analysis unit
170: network failure handling unit
210: network equipment in the data center
220: remote intelligent network management device

Claims

An apparatus for automatically solving network failures of an intelligent network management system that manages networks, the apparatus comprising:
The network failure automatic resolution device collects packet information on the network status, automatically determines whether there is a failure in a specific area through the analyzed data, and automatically resolves network failures with one click for various and complex network issues It provides a guide for accurate and quick cause identification and resolution, enables any network operator to easily and conveniently manage the network, and provides customized network management services tailored to user needs by linking with various systems with information collection and analysis functions. a control unit for controlling;
Raw data required to receive packet data from network equipment in a data center or remote intelligent network management device through a network interface card under the control of the control unit, and combine the received packet data into one data stream to create an information bundle a packet capture unit that generates and stores data in a raw packet storage buffer to measure data including network packets, SNMP TRAP, and SYSLOG information of the data center or the remote intelligent network management device;
Under the control of the control unit, metadata for the packets collected by the packet capture unit is generated, and the metadata includes packet confirmation time, packet size, session ID, packet size, MAC address, and TCP information, and session information Creating bundles, BPS information bundles, PPS information bundles, RTT information bundles, timeout information bundles, TCP information bundles, Remarks information bundles, and event information bundles wealth;
a performance indicator generating unit receiving the control of the control unit, receiving the information bundle generated by the information bundle generating unit, generating a performance indicator required for network management, and generating a basic performance indicator and an additional performance indicator in the performance indicator;
a performance indicator analyzing unit under the control of the control unit, selecting an information type to be used in the information bundle based on the performance indicator generated by the performance indicator generating unit, and then analyzing network performance to generate a performance indicator analysis result;
A network failure processing unit that automatically determines whether a specific area has a failure using the analysis result of the performance indicator analysis unit and automatically resolves the network failure; and
The network failure processing unit provides recommendations or controls the network by itself if it is determined that it is a network failure, and presents a solution in the form of a recommendation if it is not possible to control the network by itself or the content is not permitted, and if If you can control the network yourself, connect via SSH and access a specific device on the network using a remote shell command to change settings or reboot.
the network failure processing unit, if the speed suddenly slows down when going through a specific hop in the network, determines that a packet loop is expected at the corresponding hop, and transmits a recommendation of 'check the cable wiring'; If network service is slow at a specific time every day, apply QoS to a specific location discovered by NetFlow and connect to a separate QoS device or switch, and use the QoS function to prevent congestion of server requests from that location Automatic, characterized in that it automatically handles network failures by adjusting the total amount of A device that automatically resolves network failures based on packet analysis.

delete

When the network failure automatic resolution device of the intelligent network management system performs automatic resolution of the network failure, the packet capture unit samples the NetFlow information about the network equipment in the data center or sends packet data from the remote intelligent network management apparatus to the network interface card. the raw data required to generate an information bundle by combining the received packet data into one data stream, and storing the raw data in a raw packet storage buffer, the network packet of the data center or the remote intelligent network management device; A packet capture step of measuring data including SNMP TRAP and SYSLOG information;
After the packet capture step, the information bundle generating unit generates metadata for the collected packets, and the metadata includes packet confirmation time, packet size, session ID, packet size, MAC address and TCP information, session information bundle, The information bundle creation step of creating BPS information bundle, PPS information bundle, RTT information bundle, timeout information bundle, TCP information bundle, Remarks information bundle, and event information bundle, and compressing and storing data at the same time for each type of information bundle; ;
a performance indicator generation step of, after the information bundle generating step, the performance indicator generation unit receives the generated information bundle, generates a performance indicator necessary for network management, and generates a basic performance indicator and an additional performance indicator in the performance indicator;
a performance indicator analysis step of generating a performance indicator analysis result by selecting the type of information to be used in the information bundle based on the performance indicator generated by the performance indicator analyzing unit, and then analyzing the network performance;
After the performance indicator analysis step, the network failure processing unit automatically determines whether there is a failure in a specific area using the analysis result and automatically resolves the network failure.
The network failure processing step,
a failure processing determination step of determining whether a specific area has a failure through the data analyzed by NetFlow information and packet information, and determining which failure processing is to be performed;
If it is determined that it is a network failure in the failure processing determination step, a recommendation is presented or the network is controlled by itself. a failure handling step of accessing a specific device in the network by using a remote shell command to access a specific device in the network by using SSH, when it is able to control the network by itself, changing settings or rebooting, and providing a network failure handling result;
A method for automatically resolving network failures based on automatic packet analysis, comprising:

delete